DE102013207603B4

DE102013207603B4 - Run jobs efficiently in a shared pool of resources

Info

Publication number: DE102013207603B4
Application number: DE102013207603.7A
Authority: DE
Inventors: Min Li; Prasenjit Sarkar; Dinesh K. Subhraveti
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2012-04-26
Filing date: 2013-04-25
Publication date: 2020-06-18
Anticipated expiration: 2033-04-26
Also published as: DE102013207603A1

Abstract

Ausführungsformen der Erfindung beziehen sich auf eine gemeinsame Gruppe von Ressourcen und die effiziente Verarbeitung von einem oder mehreren Jobs in der gemeinsamen Gruppe von Ressourcen. In der gemeinsamen Gruppe von Ressourcen werden Tools bereitgestellt, um auf eine Topologie der gemeinsamen Ressourcen, darunter physische und virtuelle Maschinen sowie auch Speichervorrichtungen, zuzugreifen und diese zu organisieren. Die Topologie wird an einem bekannten Speicherort gespeichert und für die effiziente Zuordnung von einem oder mehreren Jobs in Reaktion auf die Hierarchie genutzt.Embodiments of the invention relate to a shared set of resources and the efficient processing of one or more jobs in the shared set of resources. The shared set of resources provides tools to access and organize a topology of the shared resources, including physical and virtual machines, as well as storage devices. The topology is stored in a known location and used for the efficient assignment of one or more jobs in response to the hierarchy.

Description

HINTERGRUNDBACKGROUND

Die vorliegende Erfindung bezieht sich auf einen effizienten Ansatz zur Nutzung der Jobverarbeitung in einem gemeinsamen Pool (Shared Pool) von Ressourcen. Im Besonderen bezieht sich die Erfindung auf die Auswertung der virtuellen und physischen Topologie der gemeinsamen Ressourcen und die Verarbeitung von Jobs in Reaktion auf die kombinierte Topologie.The present invention relates to an efficient approach to using job processing in a shared pool of resources. In particular, the invention relates to evaluating the virtual and physical topology of the shared resources and processing jobs in response to the combined topology.

MapReduce ist ein Framework zum Verarbeiten hochgradig verteilbarer Aufgaben über riesige Datensets unter Verwendung sehr vieler Computerknoten. In Fällen, in denen alle Knoten die gleiche Hardware verwenden, oder bei einem Grid, in dem die Knoten unterschiedliche Hardware verwenden, wird das Framewerk im Allgemeinen als ein Cluster bezeichnet. Eine computerbezogene Verarbeitung kann für Daten erfolgen, die entweder in einem Dateisystem oder einer Datenbank gespeichert sind. Im Besonderen erhält ein Master-Knoten (bzw. übergeordneter Knoten) eine Job-Eingabe und teilt den Job in kleinere Teiljobs auf, die auf die anderen Knoten im Cluster oder Grid verteilt werden. In einer Ausführungsform sind die Knoten im Cluster oder Grid in einer Hierarchie angeordnet und die Teiljobs können weiter unterteilt und verteilt werden. Die Knoten, die für die Verarbeitung der Teiljobs verantwortlich sind, geben verarbeitete Daten an den Master-Knoten zurück. Im Besonderen werden die verarbeiteten Daten vom Master-Knoten erhoben und kombiniert, um eine Ausgabe zu bilden. Dementsprechend ist MapReduce eine Algorithmus-basierte Technik für die verteilte Verarbeitung zu einem Job gehörender großer Datenmengen.MapReduce is a framework for processing highly distributable tasks over huge data sets using a large number of computer nodes. In cases where all nodes use the same hardware, or in a grid where the nodes use different hardware, the framework is generally referred to as a cluster. Computer-related processing can be carried out for data that is either stored in a file system or in a database. In particular, a master node (or higher-level node) receives a job entry and divides the job into smaller sub-jobs, which are distributed to the other nodes in the cluster or grid. In one embodiment, the nodes in the cluster or grid are arranged in a hierarchy and the sub-jobs can be further divided and distributed. The nodes responsible for processing the sub-jobs return processed data to the master node. In particular, the processed data is collected and combined by the master node to form an output. Accordingly, MapReduce is an algorithm-based technique for the distributed processing of large amounts of data belonging to a job.

Wie oben beschrieben, ermöglicht MapReduce die Verteilung der Datenverarbeitung über ein Netzwerk von Knoten. Obwohl MapReduce sehr komfortabel im Einsatz ist, treten bei aktuellen Verwendungen von MapReduce Leistungsprobleme bei der Verarbeitung von Jobs auf.As described above, MapReduce enables data processing to be distributed across a network of nodes. Although MapReduce is very comfortable to use, current problems with MapReduce mean that performance problems arise when processing jobs.

Im Kontext dieses technologischen Standes wurden bereits verschiedene Dokumente veröffentlicht: Das Dokument Lee, G. et al.: Topology-Aware Resource Allocation for data-Intensive Worksloads, in Newsletter ACM SIGCOMM, Vol. 4, Jan. 2001, pp. 120-124 beschreibt eine optimierte Ressourcenzuordnung in IaaS-basierten Cloud-Systemen. Dazu wurde ein Prototyp für ein Topology-Aware Resource Allocation (TARA) System entwickelt und mit 80 Servern in einem Cluster mit zwei repräsentativen MapReduce-basierenden Benchmarks getestet.Various documents have already been published in the context of this technological status: The document Lee, G. et al .: Topology-Aware Resource Allocation for data-Intensive Worksloads, in Newsletter ACM SIGCOMM, Vol. 4, Jan. 2001, pp. 120-124 describes an optimized resource allocation in IaaS-based cloud systems. For this purpose, a prototype for a topology-aware resource allocation (TARA) system was developed and tested with 80 servers in a cluster with two representative MapReduce-based benchmarks.

Das Dokument von Diakhate , F. et al.: Efficient Shared Memory Passing for Inter-VM Communications, in Euro-Par 2008 Workshops, LNCS 5415, Springer-Verlag Berlin Heidelberg 2009, pp. 53-62 adressiert das Thema Inter-VM-MPI-Kommunikation, wenn VMs auf der gleichen physikalischen Maschine vorhanden sind. Darauf basierend kann eine effiziente MPI-Bibliothek für virtuelle Maschinen implementiert werden.The document from Diakhate, F. et al .: Efficient Shared Memory Passing for Inter-VM Communications, in Euro-Par 2008 Workshops, LNCS 5415, Springer-Verlag Berlin Heidelberg 2009, pp. 53-62 addresses the topic of inter-VM-MPI communication if VMs are present on the same physical machine. Based on this, an efficient MPI library for virtual machines can be implemented.

Trotz der bisher erreichten Fortschritte in der Verarbeitung von großen Datenmengen besteht nach wie vor der Bedarf - insbesondere bei MapReduce-Technologien - für weitere Verbesserungen, um Engpässe einzelnen zugeordneten VMs zu vermeiden.Despite the progress made so far in processing large amounts of data, there is still a need - especially with MapReduce technologies - for further improvements to avoid bottlenecks for individual assigned VMs.

KURZZUSAMMENFASSUNGBRIEF SUMMARY

Derartige Verbesserungen sind durch die Gegenstände der unabhängigen Ansprüche beschrieben.Such improvements are described in the subject matter of the independent claims.

Weitere Ausführungsformen sind durch die jeweils abhängigen Ansprüche angegeben.Further embodiments are specified by the respective dependent claims.

Weitere Merkmale und Vorteile der vorliegenden Erfindung werden aus der folgenden detaillierten Beschreibung der aktuellen bevorzugten Ausführungsform der Erfindung in Verbindung mit den begleitenden Zeichnungen deutlich.Further features and advantages of the present invention will become apparent from the following detailed description of the current preferred embodiment of the invention in conjunction with the accompanying drawings.

FigurenlisteFigure list

Die Zeichnungen, auf die hier Bezug genommen wird, bilden einen Bestandteil der Patenschrift. In den Zeichnungen dargestellte Merkmale dienen nur der Illustration einiger Ausführungsformen der Erfindung und nicht aller Ausführungsformen der Erfindung, außer dies wird anderweitig ausdrücklich angegeben.

1 stellt einen Cloud-Computingknoten gemäß einer Ausführungsform der vorliegenden Erfindung dar.
2 stellt eine Cloud-Computingumgebung gemäß einer Ausführungsform der vorliegenden Erfindung dar.
3 stellt Abstraktionsmodellschichten gemäß einer Ausführungsform der vorliegenden Erfindung dar.
4 stellt ein Blockdiagramm dar, das die Architektur zum Einsatz des die Cloud erfassenden MapReduce darstellt.
5 stellt ein Flussdiagramm dar, das die Ressourcenzuweisung und die Verwaltung der Daten sowie die Anordnung der virtuellen Maschine veranschaulicht.
6 stellt einen Flussgraphen dar, der eine Beispielnetzwerktopologie für die Datenanordnung veranschaulicht.
7 stellt einen Flussgraphen für eine beispielhafte Anordnung der Daten dar.
8 stellt einen Flussgraphen für die Anordnung einer virtuellen Maschine dar.
9 stellt eine Tabelle mit Werten dar, die dem Flussgraphen für die Anordnung der virtuellen Maschine und zur Knotenkatalogisierung zugeordnet werden.
10 stellt ein Ablaufdiagramm dar, das einem Prozess der Auswertung und Nutzung der Topologie der physischen und virtuellen Maschine in einem gemeinsamen Pool von Ressourcen veranschaulicht.
11 stellt ein Flussdiagramm dar, das die Schritte zur Unterstützung des Aspekts der Nutzung der Speichertopologie des gemeinsamen Pools von Ressourcen veranschaulicht.
12 stellt ein Blockdiagramm dar, das in einem Computersystem integrierte Tools zur Unterstützung einer Technik veranschaulicht, die zur Auswertung der Ressourcennutzung für die Zuweisung eines Jobs in einem gemeinsamen Pool von Ressourcen zum Einsatz kommt.
13 stellt ein Blockdiagramm dar, das ein System zur Implementierung einer Ausführungsform der vorliegenden Erfindung zeigt.

The drawings to which reference is made form part of the patent. Features shown in the drawings serve only to illustrate some embodiments of the invention and not all embodiments of the invention, unless expressly stated otherwise.

1 illustrates a cloud computing node according to an embodiment of the present invention.
2nd illustrates a cloud computing environment according to an embodiment of the present invention.
3rd illustrates abstraction model layers according to an embodiment of the present invention.
4th is a block diagram showing the architecture for using the MapReduce that captures the cloud.
5 is a flowchart that shows resource allocation and data management, as well as the arrangement of the virtual machine.
6 Figure 14 is a flowchart that illustrates an example network topology for data ordering.
7 represents a flow graph for an exemplary arrangement of the data.
8th represents a flow graph for the arrangement of a virtual machine.
9 represents a table with values that are assigned to the flow graph for the arrangement of the virtual machine and for node cataloging.
10th is a flowchart illustrating a process of evaluating and using the topology of the physical and virtual machines in a shared pool of resources.
11 FIG. 13 is a flowchart illustrating the steps to support the aspect of using the shared pool of resources storage topology.
12 FIG. 3 is a block diagram illustrating tools integrated in a computer system to support a technique used to evaluate resource usage for assigning a job to a shared pool of resources.
13 FIG. 12 is a block diagram showing a system for implementing an embodiment of the present invention.

DETAILLIERTE BESCHREIBUNGDETAILED DESCRIPTION

Es ist einfach einsehbar, dass die Komponenten der vorliegenden Erfindung, wie in den Figuren hier allgemein beschrieben und veranschaulicht, in einer breiten Vielzahl unterschiedlicher Konfigurationen angeordnet und gestaltet sein können. Somit ist die folgende detaillierte Beschreibung der Ausführungsformen der Vorrichtung, des Systems und des Verfahrens der vorliegenden Erfindung, wie in den Figuren dargestellt, nicht dazu gedacht, den Umfang der Erfindung gemäß den Ansprüchen einzuschränken, sondern steht nur für ausgewählte Ausführungsformen der Erfindung.It is readily appreciated that the components of the present invention, as generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the device, system and method of the present invention, as shown in the figures, is not intended to limit the scope of the invention according to the claims, but is only for selected embodiments of the invention.

Die in dieser Patentschrift beschriebenen Funktionseinheit(en) wurden mit Tools in der Form von Manager(n) und Director(s) gekennzeichnet. Ein Manager oder Director kann in programmierbaren Hardwarevorrichtungen implementiert werden, wie feldprogrammierbaren Gate-Arrays (bzw. Universalschaltkreise, Field Programmable Gate Arrays) programmierbarer Array-Logik, programmierbaren logischen Vorrichtungen oder Ähnliches. Der bzw. die Manager oder Director(s) können auch in Software zur Verarbeitung durch verschiedene Prozessortypen implementiert werden. Ein aus ausführbaren Code bestehender identifizierter Manager oder Director kann zum Beispiel einen oder mehrere physische oder logische Blöcke an Computeranweisungen aufweisen, die zum Beispiel als ein Objekt, eine Prozedur, eine Funktion oder ein anderes Konstrukt organisiert sein können. Dennoch muss der ausführbare Code eines identifizierten Managers oder Directors sich nicht physisch an einem einzigen Ort befinden, sondern kann voneinander getrennte Anweisungen aufweisen, die an verschiedenen Speicherorten gespeichert sind, die dann bei der logischen Verknüpfung die Manager und Director enthalten und den angegebenen Zweck der Manager und Director verwirklichen.The functional unit (s) described in this patent specification were identified with tools in the form of manager (s) and director (s). A manager or director can be implemented in programmable hardware devices, such as field programmable gate arrays (or universal circuits, field programmable gate arrays), programmable array logic, programmable logic devices or the like. The manager or director (s) can also be implemented in software for processing by different processor types. For example, an identified manager or director consisting of executable code may have one or more physical or logical blocks of computer instructions that may be organized, for example, as an object, procedure, function, or other construct. However, the executable code of an identified manager or director does not have to be physically in one place, but can have separate instructions stored in different locations that, when logically combined, contain the managers and director and the stated purpose of the manager and director realize.

De facto kann ein aus ausführbaren Code bestehender Manager oder Director eine einzelne Anweisung oder viele Anweisungen sein und kann sogar über verschiedene Codesegmente, über verschiedene Anwendungen und über verschiedene Speichervorrichtungen verteilt sein. Ebenso können operative Daten im Manager oder Director hier identifiziert und veranschaulicht werden und können in jeder geeigneten Form ausgeführt sein und in jedem geeigneten Datenstrukturtyp organisiert sein. Die operativen Daten können als einzelnes Datenset erfasst werden oder können über verschiedene Speicherorte verteilt sein, darunter über verschiedene Speichervorrichtungen, und können, wenigstens teilweise, als elektronische Signale auf einem System oder Netzwerk vorliegen.In fact, an executable code manager or director can be a single instruction or many instructions, and can even be distributed across different code segments, across different applications, and across different storage devices. Likewise, operational data in the manager or director can be identified and illustrated here and can be executed in any suitable form and organized in any suitable data structure type. The operational data can be captured as a single data set or can be distributed across different storage locations, including across different storage devices, and can, at least in part, exist as electronic signals on a system or network.

Jede Bezugnahme in dieser Beschreibung auf „eine ausgewählte Ausführungsform“, „eine einzelne Ausführungsform“, „eine Ausführungsform“ bedeutet, dass ein bestimmtes Merkmal, eine Struktur oder eine Eigenschaft, die in Verbindung mit der Ausführungsform beschrieben wird, in mindestens einer Ausführungsform der Erfindung enthalten ist. Somit beziehen sich die Vorkommen der Ausdrücke „eine ausgewählte Ausführungsform“, „in einer einzelnen Ausführungsform“, „in einer Ausführungsform“ an verschiedenen Stellen in dieser Patentschrift nicht unbedingt auf die gleiche Ausführungsform. Any reference in this specification to "a selected embodiment", "a single embodiment", "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment in at least one embodiment of the invention is included. Thus, the occurrences of the terms "a selected embodiment", "in a single embodiment", "in one embodiment" at different places in this specification do not necessarily refer to the same embodiment.

Weiterhin können die beschriebenen Merkmale, Strukturen oder Eigenschaften in jeder geeigneten Weise in einer oder in mehreren Ausführungsformen kombiniert werden. In der folgenden Beschreibung werden vielfältige spezifische Details bereitgestellt, wie Beispiele eines Topologiemanagers, eines Hook-Managers, eines Speichertopologiemanagers, eines Ressourcennutzungsmanagers, eines Anwendungsmanagers, eines Directors usw., um ein fundiertes Verständnis der Ausführungsformen der Erfindung zu ermöglichen. Für Fachleute in diesem Gebiet wird es jedoch ersichtlich sein, dass die Erfindung ohne ein oder mehrere der spezifischen Details oder mit anderen Verfahren, Komponenten, Materialien usw. ausgeübt werden kann. In anderen Instanzen werden bekannte Strukturen, Materialien oder Operationen nicht dargestellt oder im Detail beschrieben, um zu vermeiden, dass Aspekte der Erfindung verschleiert werden.Furthermore, the described features, structures or properties can be combined in any suitable manner in one or in several embodiments. The following description provides various specific details, such as examples of a topology manager, a hook manager, a storage topology manager, a resource usage manager, an application manager, a director, etc., to provide a thorough understanding of the embodiments of the invention. However, it will be apparent to those skilled in the art that the invention may be practiced without one or more of the specific details or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.

Die veranschaulichte Ausführungsformen der Erfindung werden am besten durch Bezugnahme auf die Zeichnungen verständlich, wobei ähnliche Elemente mit ähnlichen Bezugszeichen bezeichnet werden. Die folgende Beschreibung soll nur als Beispiel dienen und veranschaulicht einfach gewisse ausgewählte Ausführungsformen der Vorrichtungen, Systeme und Prozesse, die mit der hier beanspruchten Erfindung konsistent sind.The illustrated embodiments of the invention can best be understood by referring to the drawings, wherein like elements are designated by like reference numerals. The following description is provided by way of example only and simply illustrates certain selected embodiments of the devices, systems, and processes consistent with the invention as claimed herein.

Eine Cloud-Computingumgebung ist serviceorientiert, wobei eine Ausrichtung auf eine zustandslose, modulare und semantische Interoperabilität mit geringer Kopplung liegt. Cloud-Computing besteht im Kern aus einer Infrastruktur, die ein Netzwerk miteinander verbundener Knoten aufweist. Im Folgenden wird nun auf 1 Bezug genommen, in der eine schematische Darstellung eines Cloud-Computingknotens dargestellt wird. Der Cloud-Computingknoten (10) ist nur ein Beispiel eines geeigneten Cloud-Computingknotens und soll nicht auf eine Einschränkung bezüglich des Verwendungsumfangs oder der Funktionalität der Ausführungsformen der hier beschriebenen Erfindung verweisen. Unabhängig davon kann der Cloud-Computingknoten (10) implementiert werden und/oder eine beliebige oben ausgeführte Funktionalität ausführen. Im Cloud-Computingknoten (10) gibt es ein Computersystem/einen Server (12), das/der mit vielen anderen allgemeinen oder speziellen Computersystemumgebungen oder -konfigurationen zusammenarbeiten kann. Beispiele bekannter Computersysteme, Umgebungen und/oder Konfigurationen, die für den Einsatz mit dem Computersystem/Server (12) geeignet sein können, sind, ohne jedoch darauf beschränkt zu sein, Personal Computer-Systeme, Server-Computersysteme, Thin Clients, Thick Clients, portable oder Laptop-Vorrichtungen, Multiprozessorsysteme, Mikroprozessor-basierte Systeme, Set Top Boxes, programmierbare Unterhaltungselektronik, Netzwerk-PCs, Minicomputersysteme, Mainframe-Computersysteme und verteilte Cloud-Computingumgebungen, die beliebige der obigen Systeme oder Vorrichtungen enthalten, und Ähnliches.A cloud computing environment is service-oriented, with a focus on stateless, modular and semantic interoperability with little coupling. At its core, cloud computing consists of an infrastructure that has a network of interconnected nodes. The following is now going on 1 Reference, in which a schematic representation of a cloud computing node is shown. The cloud computing node ( 10th ) is only one example of a suitable cloud computing node and is not intended to indicate a limitation on the scope or functionality of the embodiments of the invention described herein. Regardless, the cloud computing node ( 10th ) are implemented and / or perform any of the functions described above. In the cloud computing node ( 10th ) there is a computer system / server ( 12 ) that can work with many other general or specific computer system environments or configurations. Examples of known computer systems, environments and / or configurations that are suitable for use with the computer system / server ( 12 ) may be, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, portable or laptop devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, networks -PCs, minicomputer systems, mainframe computer systems and distributed cloud computing environments that include any of the above systems or devices, and the like.

Das Computersystem/der Server (12) kann im allgemeinen Kontext von auf einem Computersystem ausführbaren Anweisungen beschrieben werden, wie Programmmodule, die von einem Computersystem ausgeführt werden. Im Allgemeinen können Programmmodule Routinen, Programme, Objekte, Komponenten, Logik, Datenstrukturen usw. beinhalten, die bestimmte Aufgaben ausführen oder bestimmte abstrakte Datentypen implementieren. Das Computersystem/der Server (12) kann in verteilten Cloud-Computingumgebungen ausgeführt werden, in denen Jobs von entfernten Verarbeitungsvorrichtungen durchgeführt werden, die über ein Kommunikationsnetzwerk verbunden sind. In einer verteilten Cloud-Computingumgebung können sich Programmmodule sowohl auf lokalen als auch auf entfernten Speichermedien des Computersystems befinden, darunter Hauptspeicher-Speichervorrichtungen.The computer system / server ( 12 ) can be described in the general context of instructions executable on a computer system, such as program modules that are executed by a computer system. In general, program modules can include routines, programs, objects, components, logic, data structures, etc. that perform certain tasks or implement certain abstract data types. The computer system / server ( 12 ) can be performed in distributed cloud computing environments where jobs are performed by remote processing devices that are connected through a communication network. In a distributed cloud computing environment, program modules can reside on both local and remote storage media of the computer system, including main memory storage devices.

Wie in 1 dargestellt, wird ein Computersystem/Server (12) im Cloud-Computingknoten (10) als eine allgemeine Computingvorrichtung dargestellt. Die Komponenten eines Computersystems/Servers (12) können beinhalten, ohne jedoch darauf beschränkt zu sein, einen oder mehreren Prozessoren oder Verarbeitungseinheiten (16), einen Systemspeicher (28) und einen Bus (18), der verschiedene Systemkomponenten einschließlich des Systemspeichers (28) mit dem Prozessor (16) koppelt. Der Bus (18) stellt einen oder mehrere eines beliebigen von verschiedenen Busstrukturtypen dar, darunter einen Speicherbus oder eine Speichersteuereinheit (bzw. Speicher-Controller, Memory Controller), einen Peripheriebus, einen beschleunigten Grafikport („accelerated graphic port“) und einen Prozessor oder lokalen Bus, unter Verwendung einer beliebigen von verschiedenen Busarchitekturen. Zu diesen Architekturen gehören zum Beispiel, ohne dies als Einschränkung zu verstehen, der Industry Standard Architecture (ISA)-Bus, der Micro Channel Architecture (MCA)-Bus, der Enhanced ISA (EISA)-Bus, der lokale Video Electronics Standards Association (VESA)-Bus und der Peripheral Component Interconnect (PCI)-Bus. Zu dem Computersystem/Server (12) gehört in der Regel eine Vielzahl an computerlesbaren Medien. Solche Medien können beliebige verfügbare Medien sein, auf die ein Computersystem/Server (12) zugreifen kann, und es beinhaltet sowohl flüchtige wie auch nicht flüchtige Medien, entfernbare und nicht entfernbare Medien.As in 1 shown, a computer system / server ( 12 ) in the cloud computing node ( 10th ) as a general computing device. The components of a computer system / server ( 12 ) may include, but are not limited to, one or more processors or processing units ( 16 ), system memory ( 28 ) and a bus ( 18th ), the various system components including system memory ( 28 ) with the processor ( 16 ) couples. The bus ( 18th ) represents one or more of any of various types of bus structure, including a memory bus or memory controller, a peripheral bus, an accelerated graphic port, and a processor or local bus Use any of various bus architectures. These architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MCA) - Bus, the Enhanced ISA (EISA) bus, the local Video Electronics Standards Association (VESA) bus and the Peripheral Component Interconnect (PCI) bus. To the computer system / server ( 12 ) usually includes a variety of computer-readable media. Such media can be any available media on which a computer system / server ( 12 ), and it includes both volatile and non-volatile media, removable and non-removable media.

Der Systemspeicher (28) kann von einem Computersystem lesbare Medien in Form flüchtigen Speichers, wie Direktzugriffsspeicher (RAM, Random Access Memory) (30) und/oder Cache-Speicher (32) beinhalten. Das Computersystem/der Server (12) kann weiterhin andere entfernbare/nicht entfernbare Speichermedien des Computersystems beinhalten. Es kann zum Beispiel ein Speichersystem (34) bereitgestellt werden, um nicht entfernbare, nicht flüchtige magnetische Medien zu lesen und darauf zu schreiben (nicht dargestellt und in der Regel als „Festplatte“ bezeichnet). Obwohl nicht dargestellt, kann ein magnetisches Plattenlaufwerk zum Lesen von und zum Schreiben auf ein entfernbares, nicht flüchtiges magnetisches Plattenmedium (wie einer „Diskette“) und ein optisches Plattenlaufwerk zum Lesen von oder Schreiben auf ein entfernbares, nicht flüchtiges optisches Plattenmedium, wie eine CD-ROM, DVD-ROM oder ein anderes optisches Medium bereitgestellt werden. In solchen Fällen kann ein jedes mit dem Bus (18) durch eine oder mehrere Datenmedienschnittstellen verbunden sein. Wie im Folgenden weiter dargestellt und beschrieben wird, kann ein Speicher (28) wenigstens ein Programmprodukt mit einem Set von (z. B. wenigstens einem) Programmmodulen enthalten, die dazu ausgebildet sind, die Funktionen der Ausführungsformen der Erfindung auszuführen.The system memory ( 28 ) media that can be read by a computer system in the form of volatile memory, such as random access memory (RAM, Random Access Memory) ( 30th ) and / or cache memory ( 32 ) include. The computer system / server ( 12 ) may further include other removable / non-removable storage media of the computer system. For example, a storage system ( 34 ) are provided to read and write to non-removable, non-volatile magnetic media (not shown and typically referred to as a "hard drive"). Although not shown, a magnetic disk drive for reading and writing to a removable, non-volatile magnetic disk medium (such as a "floppy disk") and an optical disk drive for reading from or writing to a removable, non-volatile optical disk medium, such as a CD -ROM, DVD-ROM or another optical medium can be provided. In such cases, everyone can take the bus ( 18th ) connected by one or more data media interfaces. As will be further illustrated and described below, a memory ( 28 ) contain at least one program product with a set of (e.g. at least one) program modules which are designed to carry out the functions of the embodiments of the invention.

Ein Programm/Dienstprogramm (bzw. Utility) (40) mit einem Set von (wenigstens einem) Programmmodulen (42) kann zum Beispiel und ohne Einschränkung im Speicher (Memory) (28) gespeichert werden, wie auch ein Betriebssystem, ein oder mehrere Anwendungsprogramme, weitere Programmmodule und Programmdaten. Ein jedes von den Betriebssystemen, einen oder mehreren Anwendungsprogrammen, weiteren Programmmodulen und Programmdaten oder einer bestimmten Kombination daraus kann eine Implementierung einer Netzwerkumgebung beinhalten. Programmmodule (42) führen im Allgemeinen die Funktionen und/oder Methodologien der Ausführungsformen der Erfindung, wie in diesem Dokument beschrieben, aus.A program / utility program (or utility) (40) with a set of (at least one) program modules ( 42 ) can, for example, and without limitation in the memory (Memory) ( 28 ) are saved, as is an operating system, one or more application programs, further program modules and program data. Each of the operating systems, one or more application programs, further program modules and program data or a specific combination thereof can include an implementation of a network environment. Program modules ( 42 generally perform the functions and / or methodologies of the embodiments of the invention as described in this document.

Das Computersystem/der Server (12) kann auch mit einer oder mehreren externen Vorrichtungen (14) kommunizieren, wie einer Tastatur, einer Zeigevorrichtung, einer Anzeige (24) usw.; einer oder mehreren Vorrichtungen, die einem Benutzer ermöglichen, mit dem Computersystem/Server (12) zu interagieren; und/oder beliebige Vorrichtungen (z. B. Netzwerkkarte, Modem usw.), die einem Computersystem/Server (12) ermöglichen, mit einem oder mehreren anderen Computingvorrichtungen zu kommunizieren. Eine solche Kommunikation kann über Input/Output (I/O bzw. Eingabe/Ausgabe)-Schnittstellen (22) erfolgen. Und ferner kann ein Computersystem/Server (12) mit einem oder mehreren Netzwerken über einen Netzwerkadapter (20) kommunizieren, wie einem lokalen Netzwerk (LAN, Local Area Network), einem allgemeinen Weitbereichsnetzwerk (WAN, Wide Area Network) und/oder einem öffentlichen Netzwerk (z. B. das Internet). Wie dargestellt, kommuniziert der Netzwerkadapter (20) mit den anderen Komponenten des Computersystems/Servers (12) über den Bus (18). Es sollte verständlich sein, dass auch wenn nicht dargestellt, weitere Hardware- und/oder Softwarekomponenten in Verbindung mit dem Computersystem/Server (12) verwendet werden können. Zu den Beispielen gehören, ohne jedoch darauf beschränkt zu sein: Microcode, Gerätetreiber, redundante Verarbeitungseinheiten, externe Plattenlaufwerk-Arrays, RAID-Systeme, Bandlaufwerke und Speichersysteme zur Datenarchivierung usw.The computer system / server ( 12 ) can also be used with one or more external devices ( 14 ) communicate, such as a keyboard, a pointing device, a display ( 24th ) etc .; one or more devices that enable a user to communicate with the computer system / server ( 12 ) to interact; and / or any devices (e.g. network card, modem, etc.) that a computer system / server ( 12 ) enable communication with one or more other computing devices. Such communication can be done via input / output (I / O or input / output) interfaces ( 22 ) respectively. And further a computer system / server ( 12 ) with one or more networks via a network adapter ( 20th ) communicate, such as a local area network (LAN, Local Area Network), a general wide area network (WAN) and / or a public network (e.g. the Internet). As shown, the network adapter communicates ( 20th ) with the other components of the computer system / server ( 12 ) on the bus ( 18th ). It should be understood that, even if not shown, other hardware and / or software components in connection with the computer system / server ( 12 ) can be used. Examples include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and storage systems for data archiving, etc.

Es wird nun auf 2 Bezug genommen, in der zur Veranschaulichung eine Cloud-Computingumgebung (50) dargestellt ist. Wie dargestellt, weist die Cloud-Computingumgebung (50) einen oder mehrere Cloud-Computingknoten (10) auf, mit denen von Cloud-Consumern verwendete lokale Computingvorrichtungen, wie beispielsweise ein persönlicher digitaler Assistent (PDA, Personal Digital Assistant) oder ein Mobiltelefon (54A), ein Desktopcomputer (54B), ein Laptop-Computer (54C) und/oder ein Automobilcomputersystem (54N), kommunizieren können. Knoten (10) können miteinander kommunizieren. Sie können physisch oder virtuell in einem oder mehreren Netzwerken, in Form von privaten, Community-, öffentlichen oder hybriden Clouds, wie oben beschrieben, oder in einer Kombination davon gruppiert werden (nicht dargestellt). Dies ermöglicht einer Cloud-Computingumgebung (50) eine Infrastruktur, Plattformen und/oder Software as Services anzubieten, für die ein Cloud-Consumer keine Ressourcen auf einem lokalen Computinggerät vorhalten muss. Es versteht sich, dass die in 2 dargestellten Typen von Computingvorrichtungen (54A - 54N) nur der Veranschaulichung dienen und dass Computingknoten (10) und die Cloud-Computingumgebung (50) mit jedem beliebigen Typ von computerbasierter Vorrichtung über jeden beliebigen Netzwerktyp und/oder in einem Netzwerk adressierbaren Verbindung (z. B. mit einem Web-Browser) kommunizieren können.It is now going on 2nd Referred to in which a cloud computing environment ( 50 ) is shown. As shown, the cloud computing environment ( 50 ) one or more cloud computing nodes ( 10th ) with which local computing devices used by cloud consumers, such as a personal digital assistant (PDA) or a mobile phone ( 54A) , a desktop computer ( 54B) , a laptop computer ( 54C) and / or an automotive computer system ( 54N) , to be able to communicate. Knot ( 10th ) can communicate with each other. They can be grouped physically or virtually in one or more networks, in the form of private, community, public or hybrid clouds, as described above, or in a combination thereof (not shown). This enables a cloud computing environment ( 50 ) to offer an infrastructure, platforms and / or software as services for which a cloud consumer does not have to keep resources on a local computing device. It is understood that the in 2nd shown types of computing devices ( 54A - 54N ) are for illustration only and that computing nodes ( 10th ) and the cloud computing environment ( 50 ) can communicate with any type of computer-based device over any type of network and / or in a network addressable connection (e.g. with a web browser).

Im Folgenden wird nun auf 3 Bezug genommen, wo eine Gruppe von der Cloud-Computingumgebung (50) bereitgestellten funktionalen Abstraktionsschichten dargestellt ist. Es versteht sich im Vorhinein, dass die in 3 dargestellten Komponenten, Schichten und Funktionen nur der Veranschaulichung dienen sollen und Ausführungsformen der Erfindung nicht auf diese beschränkt sind. Wie dargestellt, werden die folgenden Schichten und entsprechende Funktionen bereitgestellt: Hardware- und Softwareschicht (60), Virtualisierungsschicht (62), Managementschicht (64) und Workload-Schicht (66). Die Hardware -und Softwareschicht (60) beinhaltet die Hardware-und Softwarekomponenten. Zu den Beispielen von Hardwarekomponenten gehören Mainframes, in einem Beispiel Systeme der IBM® zSeries®; Server, die auf der RISC (Reduced Instruction Set Computer)-Architektur basieren, in einem Beispiel Systeme der IBM pSeries®; Systeme der IBM xSeries®; Systeme des IBM BladeCenter®; Speichervorrichtungen; Netzwerke und Netzwerkkomponenten. Zu den Beispielen der Softwarekomponenten gehören Software für den Netzwerkanwendungsserver, in einem Beispiel Software für den IBM WebSphere®-Anwendungsserver; und Datenbanksoftware, in einem Beispiel die Datenbanksoftware IBM DB2®. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere und DB2 sind Markenzeichen der International Business Machines Corporation und in vielen Gerichtsbarkeiten weltweit eingetragen). The following is now going on 3rd Where a group is from the cloud computing environment ( 50 ) provided functional abstraction layers. It goes without saying that the in 3rd The components, layers and functions shown are only intended to be illustrative and embodiments of the invention are not limited to these. As shown, the following layers and corresponding functions are provided: Hardware and software layer ( 60 ), Virtualization layer ( 62 ), Management layer ( 64 ) and workload layer ( 66 ). The hardware and software layer ( 60 ) includes the hardware and software components. Examples of hardware components include mainframes, in one example systems from IBM® zSeries®; Servers based on the RISC (Reduced Instruction Set Computer) architecture, in one example systems from IBM pSeries®; IBM xSeries® systems; IBM BladeCenter® systems; Storage devices; Networks and network components. Examples of the software components include software for the network application server, in one example software for the IBM WebSphere® application server; and database software, in one example the database software IBM DB2®. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere and DB2 are trademarks of International Business Machines Corporation and registered in many jurisdictions worldwide).

Eine Virtualisierungsschicht (62) stellt eine Abstraktionsschicht bereit, aus der die folgenden Beispiele virtueller Entitäten bereitgestellt werden können: virtuelle Server; virtueller Speicher; virtuelle Netzwerke, darunter virtuelle private Netzwerke; virtuelle Anwendungen und Betriebssysteme; und virtuelle Clients.A virtualization layer ( 62 ) provides an abstraction layer from which the following examples of virtual entities can be provided: virtual servers; virtual memory; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.

In einem Beispiel kann die Managementschicht (64) die folgenden Funktionen bereitstellen: Bereitstellen von Ressourcen, Verbrauchsmessungen („metering“) und Preisbildung, Benutzerportal, Service Level Management und SLA-Planung und - Ausführung (Fulfillment). Die Funktionen werden im Folgenden beschrieben. Die Bereitstellung von Ressourcen ermöglicht die dynamische Beschaffung von Computingressourcen und anderer Ressourcen, die zur Durchführung von Jobs in der Cloud-Computingumgebung genutzt werden. Die Messung und Preisbildung ermöglicht die Verfolgung der Kosten, wenn Ressourcen in der Cloud-Computingumgebung genutzt werden, sowie die Abrechnung oder Rechnungsstellung für die Nutzung dieser Ressourcen. In einem Beispiel können diese Ressourcen Anwendungssoftware-Lizenzen umfassen. Die Sicherheit ermöglicht die Verifizierung der Identität für Cloud-Consumer und Jobs sowie einen Schutz der Daten und anderer Ressourcen. Ein Benutzerportal ermöglicht den Zugriff auf die Clouds-Computingumgebung für Consumer und Systemadministratoren. Ein Service Level Management ermöglicht die Zuweisung und das Management der Cloud-Computingressourcen, sodass die erforderlichen Service-Level erfüllt werden. Die Planung und Erfüllung des Service Level Agreements (SLA) ermöglicht das vorbereitende Arrangement für und die Beschaffung von Cloud-Computingressourcen, für die in Übereinstimmung mit der SLA eine zukünftige Anforderungen vorweggenommen wird.In one example, the management layer ( 64 ) provide the following functions: provision of resources, metering and pricing, user portal, service level management and SLA planning and execution (fulfillment). The functions are described below. The provision of resources enables the dynamic procurement of computing resources and other resources that are used to carry out jobs in the cloud computing environment. Measurement and pricing allows you to track costs when resources are used in the cloud computing environment, as well as billing or billing for the use of those resources. In one example, these resources may include application software licenses. Security enables verification of identity for cloud consumers and jobs as well as protection of data and other resources. A user portal provides access to the cloud computing environment for consumers and system administrators. Service level management enables the allocation and management of cloud computing resources so that the required service levels are met. The planning and fulfillment of the Service Level Agreement (SLA) enables the preparatory arrangement for and procurement of cloud computing resources, for which future requirements are anticipated in accordance with the SLA.

Eine Workload-Schicht (66) stellt Beispiele der Funktionalität bereit, für die die Cloud-Computingumgebung genutzt werden kann. Zu den Beispielen der Workloads und Funktionen, die über diese Schicht bereitgestellt werden können, gehören, ohne jedoch darauf beschränkt zu sein: Zuordnung (Mapping) und Navigation, Softwareentwicklung und Lifecycle-Management, die Bereitstellung von Schulungen in virtuellen Schulungsumgebungen, die Verarbeitung von Datenanalysen, die Verarbeitung von Jobs und die Verarbeitung von einem oder mehreren Jobs in Reaktion auf die Hierarchie der virtuellen Ressourcen in der Cloud-Computingumgebung.A workload shift ( 66 ) provides examples of functionality for which the cloud computing environment can be used. Examples of the workloads and functions that can be provided via this layer include, but are not limited to: mapping and navigation, software development and lifecycle management, the provision of training in virtual training environments, the processing of data analysis , processing jobs, and processing one or more jobs in response to the hierarchy of virtual resources in the cloud computing environment.

Eine virtuelle Maschine ist eine Software- und/oder Hardwareimplementierung eines Computers, die Programme ähnlich einer physischen Maschine ausführt. Die virtuelle Maschine unterstützt eine Instanz eines Betriebssystems zusammen mit einer oder mehreren Anwendungen, um auf einer isolierten Partition auf dem Computer ausgeführt zu werden. In einer Ausführungsform ermöglicht die virtuelle Maschine die gleichzeitige Ausführung verschiedener Betriebssysteme auf dem gleichen Computer. Eine physische Maschine kann mehrere virtuelle Maschinen unterstützen. Mehrere Betriebssysteme können auf der gleichen physischen Maschine ausgeführt werden, und jede der virtuellen Maschinen kann Jobs mit verschiedenen Betriebssystemen verarbeiten. Dementsprechend unterstützt die Verwendung von einer oder mehreren virtuellen Maschinen den effizienten Einsatz der Hardware bei der Verarbeitung mehrerer Jobs.A virtual machine is a software and / or hardware implementation of a computer that executes programs similar to a physical machine. The virtual machine supports an instance of an operating system along with one or more applications to run on an isolated partition on the computer. In one embodiment, the virtual machine enables different operating systems to run simultaneously on the same computer. A physical machine can support multiple virtual machines. Multiple operating systems can run on the same physical machine, and each of the virtual machines can process jobs with different operating systems. Accordingly, the use of one or more virtual machines supports the efficient use of the hardware when processing multiple jobs.

Die effiziente Verwendung einer Konfiguration einer virtuellen Maschine in einem Cloud-Computingsystem stellt aufgrund der verteilten Natur der physischen Topologie der physischen Maschinen, d. h. den Knoten, eine Herausforderung dar. Im Besonderen bestehen Schwierigkeiten bei der Auswertung und Wiedergabe der physischen Topologie, die der virtuellen Topologie des Cloud-Computingsystems zugrundeliegt, und der Nutzung der Jobverarbeitung in Reaktion auf die physische und virtuelle Topologie.The efficient use of a virtual machine configuration in a cloud computing system is due to the distributed nature of the physical topology of the physical machines, i.e. H. the nodes, is a challenge. In particular, there are difficulties in evaluating and rendering the physical topology underlying the virtual topology of the cloud computing system and in using job processing in response to the physical and virtual topology.

Eine Cloud-Plattform, die in diesem Dokument als CAM bezeichnet wird, wird bereitgestellt, um ein Cluster-Dateisystem mit einer Ressourcenplanung zu kombinieren. CAM weist einen dreistufigen Ansatz auf, um Anomalien bei der Anordnung zu vermeiden, wobei die drei Ebenen die Anordnung der Daten, die Anordnung des Jobs der virtuellen Maschine und die Jobanordnung beinhalten. Bezüglich der Datenanordnung werden Daten im Cluster basierend auf einem Offline-Profiling der Jobs angeordnet, die am häufigsten für die Daten ausgeführt werden. Die Anordnung von Jobs wird von CAM beeinflusst, die den/die besten möglichen physischen Knoten zur Anordnung der einem Job zugeordneten Sets von virtuellen Maschinen auswählt. Um die Möglichkeit einer Anordnungsanomalie weiter zu minimieren, stellt CAM andernfalls verborgene Computing-, Speicher- und Netzwerktopologien dem Job-Scheduler dar, sodass er die optimalen Jobzuweisungen vornehmen kann. CAM stimmt die Zuweisung von Ressourcen mit einer Vielzahl anderer im Wettstreit stehenden Bedingungen, wie der Speichernutzung, der Änderung der CPU-Last und der Netzwerkverbindungskapazitäten unter Verwendung eines Algorithmus ab, der auf einem Flussnetzwerk basiert, mit dem die angegebenen Bedingungen sowohl für die anfängliche Anordnung und für die Neuanpassung bei der Migration der virtuellen Maschine und der Daten gleichzeitig reduziert werden können. A cloud platform, referred to in this document as CAM, is provided to combine a cluster file system with resource planning. CAM has a three-tier approach to avoid ordering anomalies, the three tiers including the ordering of the data, the ordering of the virtual machine job, and the job ordering. Regarding the data arrangement, data is arranged in the cluster based on offline profiling of the jobs that are most frequently carried out for the data. The order of jobs is influenced by CAM, which selects the best possible physical node (s) for arranging the sets of virtual machines assigned to a job. Otherwise, to further minimize the possibility of a placement anomaly, CAM exposes the job scheduler to hidden computing, storage and network topologies so that it can make the best job assignments. CAM reconciles resource allocation with a variety of other competitive conditions, such as memory usage, CPU load change, and network connection capacity, using an algorithm that is based on a flow network that meets the specified conditions for both initial placement and can be reduced for the readjustment during the migration of the virtual machine and the data at the same time.

4 ist ein Blockdiagramm (400), das die Gesamtarchitektur zur Verwendung von CAM veranschaulicht. Die physischen Ressourcen zur Unterstützung der Cloud bestehen aus einem Cluster physischer Knoten, wobei der lokale Speicher direkt den einzelnen Knoten zugeordnet ist. Wie dargestellt, verwendet CAM ein Dateisystem, das in einer Ausführungsform ein General Parallel File System-Shared Nothing Cluster (410) sein kann, um die Speicherschicht dafür bereitzustellen. GPFS-SNC ist als eine Cloud-Speicherplattform ausgelegt, die die zeitgerechte und ressourceneffiziente Bereitstellung virtueller Maschinen in der Cloud unterstützt. GPFS-SNC verwaltet die lokalen Platten, die einem Cluster handelsüblicher physischer Maschinen direkt zugeordnet sind. 4th is a block diagram ( 400 ) that illustrates the overall architecture for using CAM. The physical resources to support the cloud consist of a cluster of physical nodes, with local storage directly associated with each node. As shown, CAM uses a file system that, in one embodiment, is a General Parallel File System-Shared Nothing Cluster ( 410 ) to provide the storage layer for it. GPFS-SNC is designed as a cloud storage platform that supports the timely and resource-efficient provision of virtual machines in the cloud. GPFS-SNC manages the local disks that are directly assigned to a cluster of commercially available physical machines.

Wie gezeigt, wird die physikalische Schicht (420) in Form drei handelsüblicher physischer Maschinen (430), (440) und (450) veranschaulicht. Die Anzahl der hier dargestellten physischen Maschinen dient nur der Veranschaulichung. In einer Ausführungsform kann die Anzahl eine kleinere oder größere Anzahl von handelsüblichen physischen Maschinen beinhalten. Jede dieser Maschinen verfügt jeweils über eine lokale Platte (432), (442) und (452). Das Dateisystem (410) unterstützt eine gemeinsame Anordnung aller Blöcke einer Datei an einem Speicherort anstatt die Datei über das Netzwerk (mittels Striping) zu verteilen. Dadurch kann eine I/O-Anforderung einer virtuellen Maschine lokal vom gespeicherten Speicherort anstatt entfernt über physische Hosts über das Netzwerk bedient werden. CAM nutzt diese Funktion, um sicherzustellen, dass an einem gemeinsamen (co-located) Speicherplatz befindliche Images (Abbilder) einer virtuellen Maschine an einem Speicherort gespeichert werden und in effizienter Weise darauf zugegriffen werden kann.As shown, the physical layer ( 420 ) in the form of three commercially available physical machines ( 430 ), ( 440 ) and ( 450 ) illustrates. The number of physical machines shown here is for illustration purposes only. In one embodiment, the number may include a smaller or larger number of commercially available physical machines. Each of these machines has a local plate ( 432 ), ( 442 ) and ( 452 ). The file system ( 410 ) supports a common arrangement of all blocks of a file in one location instead of distributing the file over the network (using striping). This enables an I / O request of a virtual machine to be served locally from the stored location rather than remotely via physical hosts over the network. CAM uses this function to ensure that images (images) of a virtual machine located in a common (co-located) storage location are stored in one storage location and can be accessed efficiently.

Das Dateisystem (410) unterstützt auch ein effizientes, auf Blockebene durchgeführtes (pipelined) Replikationsschema, das ein schnelles verteiltes Recovery (Wiederherstellung) und hohen I/O-Durchsatz über schnelle parallele Lesevorgänge sicherstellt. Diese Funktion ist für CAM von Vorteil, um ein effizientes Recovery bei Ausfällen zu erzielen. Darüber hinaus gibt das Dateisystem (410) eine Anwendungsprogrammschnittstelle (API, Application Program Interface) auf Benutzerebene an, mit der physische Speicherorte von Dateien abgefragt werden können. CAM ruft mit diesem API konkrete Blockspeicherortinformationen ab, um die Nähe der Speicherung für die Anordnung von Daten und virtuellen Maschinen zu bestimmen.The file system ( 410 ) also supports an efficient, pipelined block-level replication scheme that ensures fast distributed recovery (recovery) and high I / O throughput through fast parallel reads. This function is advantageous for CAM in order to achieve efficient recovery in the event of failures. In addition, the file system ( 410 ) an application program interface (API) at the user level, with which physical locations of files can be queried. With this API, CAM retrieves specific block location information to determine the proximity of storage for the arrangement of data and virtual machines.

CAM nutzt drei Komponenten, um stets über die Topologie informiert zu sein. Wie dargestellt, wird ein Server (460) in Kommunikation mit der physikalischen Schicht (420) bereitgestellt. Der Server (460), der in diesem Dokument auch als CAM-Topologieserver bezeichnet wird, stellt Topologieinformationen bereit, die erforderlich sind, damit ein Scheduler die optimale Jobzuweisung vornehmen kann. Die vom Server (460) dargestellten Informationen bestehen aus Netzwerk-und Speichertopologien und anderen dynamischen Informationen auf Knotenebene, wie die CPU-Last. Weiterhin, wie dargestellt, weist jede der physischen Maschinen (430), (440) und (450) jeweils einen Agent (434), (444) und (454) auf. Jeder Agent wird auf der entsprechenden physischen Maschine ausgeführt und erhebt und übermittelt verschiedene Datenstücke bezüglich der jeweiligen Maschine an den Server (460), wie die Nutzung der ausgehenden und eingehenden Netzwerkbandbreite, der I/O-Nutzung und der CPU-, Speicher (Memory)- und Speicher (Storage)-Last. Der Server (460) konsolidiert die dynamische Information, die er von den Agents (434), (444) und (454) erhält, und verarbeitet sie zusammen mit Topologieinformationen über jeden in dem Cluster ausgeführten Job. Die Topologieinformationen werden von der Anordnungskonfiguration der bestehenden virtuellen Maschinen abgeleitet. Ein Job-Scheduler (470) bildet eine Schnittstelle mit dem Server (460), um exakte und aktuelle Topologieinformationen zu erhalten. Der Scheduler (470) passt die Jobanordnung bei jeder beobachteten Änderung der Konfiguration dementsprechend neu an. Dementsprechend greift der Job-Scheduler auf die Nutzung der Speicher- und physischen Hostressourcen in CAM zurück.CAM uses three components to always be informed about the topology. As shown, a server ( 460 ) in communication with the physical layer ( 420 ) provided. The server ( 460 ), which is also referred to in this document as the CAM topology server, provides topology information that is necessary for a scheduler to make the optimal job assignment. From the server ( 460 ) Information presented consists of network and storage topologies and other dynamic information at the node level, such as the CPU load. Furthermore, as shown, each of the physical machines ( 430 ), ( 440 ) and ( 450 ) one agent each ( 434 ), ( 444 ) and ( 454 ) on. Each agent runs on the corresponding physical machine and collects and transmits various pieces of data relating to the respective machine to the server ( 460 ), such as the use of the outgoing and incoming network bandwidth, the I / O usage and the CPU, memory and storage load. The server ( 460 ) consolidates the dynamic information it receives from the agents ( 434 ), ( 444 ) and ( 454 ), and processes it along with topology information about each job running in the cluster. The topology information is derived from the arrangement configuration of the existing virtual machines. A job scheduler ( 470 ) forms an interface with the server ( 460 ) to get exact and current topology information. The scheduler ( 470 ) adjusts the job arrangement accordingly with each observed change in the configuration. Accordingly, the job scheduler uses the storage and physical host resources in CAM.

Netzwerktopologieinformationen werden über eine Distanzmatrix dargestellt, die eine Distanz zwischen jedem Paar virtueller Maschinen als über Racks, Knoten oder virtuelle Maschinen kodiert. Wenn zwei virtuelle Maschinen auf dem gleichen Knoten angeordnet werden, werden sie durch eine virtuelle Netzwerkverbindung verbunden. Aufgrund der Tatsache, dass die virtuellen Maschinen die gleiche Knotenhardware nutzen, stellt das virtuelle Netzwerk ein Hochgeschwindigkeitsmedium bereit, das wesentlich schneller ist als Links (bzw. Verbindungen) zwischen Knoten oder zwischen Racks. Der Netzwerkverkehr zwischen virtuellen Maschinen auf dem gleichen Knoten muss kein Hardwarelink (bzw. Hardwareverbindung) durchlaufen. Die virtuelle Netzwerkvorrichtung leitet den Datenverkehr im Hauptspeicher über hochgradig optimierte Ringpuffer weiter. Network topology information is represented by a distance matrix that encodes a distance between each pair of virtual machines as racks, nodes, or virtual machines. When two virtual machines are placed on the same node, they are connected by a virtual network connection. Due to the fact that the virtual machines use the same node hardware, the virtual network provides a high-speed medium that is significantly faster than links (or connections) between nodes or between racks. Network traffic between virtual machines on the same node does not have to go through a hardware link (or hardware connection). The virtual network device forwards the data traffic in the main memory via highly optimized ring buffers.

Speichertopologieinformationen werden als eine Zuordnung (Mapping) zwischen jeder das Datenset enthaltende virtuellen Vorrichtung und der virtuellen Maschine, für die sie lokal ist, bereitgestellt. In einem nativen Hardwarekontext kann auf eine einem Knoten zugeordnete Platte direkt über einen PCI-Bus zugegriffen werden. In der Cloud können die physischen Blöcke, die zu einem Image einer virtuellen Maschine gehören, das einer virtuellen Maschine zugeordnet ist, sich jedoch auf einem anderen Knoten befinden. Selbst wenn eine virtuelle Vorrichtung direkt mit der virtuellen Maschine verbunden zu sein scheint, kann die Image-Datei, die als Sicherung für die Vorrichtung fungiert, sich an einer anderen Stelle im Netzwerk befinden und sich möglicherweise näher zu einer anderen virtuellen Maschine im Cluster als zu der direkt zugeordneten virtuellen Maschine befinden. Der Server (460) frägt den Speicherort des physischen Images über eine API ab und stellt dem Scheduler (470) Informationen bereit.Storage topology information is provided as a mapping between each virtual device containing the data set and the virtual machine for which it is local. In a native hardware context, a disk assigned to a node can be accessed directly via a PCI bus. However, in the cloud, the physical blocks associated with a virtual machine image associated with a virtual machine can reside on a different node. Even if a virtual device appears to be directly connected to the virtual machine, the image file that acts as a backup for the device may be located elsewhere on the network and possibly closer than another virtual machine in the cluster the directly assigned virtual machine. The server ( 460 ) queries the location of the physical image via an API and provides the scheduler ( 470 ) Information ready.

Beispiele spezieller vom Server (460) bereitgestellte APIs werden in Tabelle 1 nachfolgend beschrieben: Tabelle 1 API Beschreibung Int get VM distance (string vm1, string vm2) Gibt die Distanz zwischen zwei virtuellen Maschinen zurück struct block_location get block location Gibt den tatsächlichen Speicherort von (string src, long offset, long length) Blöcken zurück int get vm networking (string vm, struct networkinfo) Gibt Netzwerknutzungsinformationen des physischen Hosts zurück, auf dem die virtuelle Maschine ausgeführt wird int get_vm _diskinfo (string vm, string device, struct diskinfo) Gibt Plattennutzungsinformationen zurück int get_vm_cpuinfo (string vm, struct cpuinfo) Gibt CPU-Nutzungsinformationen des physischen Hosts zurück, auf dem die virtuelle Maschine ausgeführt wird Examples of special from the server ( 460 ) APIs provided are described in Table 1 below: Table 1 API description Int get VM distance (string vm1, string vm2) Returns the distance between two virtual machines struct block_location get block location Returns the actual location of (string src, long offset, long length) Blocks back int get vm networking (string vm, struct networkinfo) Returns network usage information of the physical host on which the virtual machine is running int get_vm _diskinfo (string vm, string device, struct diskinfo) Returns disk usage information int get_vm_cpuinfo (string vm, struct cpuinfo) Returns CPU usage information of the physical host on which the virtual machine is running

Die Anweisung get vm distance stellt dem Job-Scheduler (470) Jobs mit Hinweisen zur Netzwerkdistanz zwischen zwei virtuellen Maschinen bereit. Die Distanz wird basierend auf den beobachteten Datenübertragungsraten zwischen den virtuellen Maschinen geschätzt, und in einer Ausführungsform wird sie in Einheiten der Bandbreite ausgedrückt. Die Anweisung get block _location unterstützt den Abruf des konkreten Blockspeicherorts anstelle des Speicherorts der virtuellen Maschine, wodurch die Lokalität sichergestellt wird. Die Anweisungen get_vm_networkinfo, get_vm_diskinfo und get_vm_cpuinfo stellen für den Job-Scheduler (470) bezüglich der Abfrage der Informationen zu I/O- und CPU-Zugriffskonflikten, die sich auf die Netzwerk- und Plattennutzung beziehen, eine Vereinfachung dar. In einer Ausführungsform kann der Scheduler (470) diese zusätzlichen Informationen für intelligentere Entscheidungen nutzen, darunter in einer Ausführungsform die Anordnung I/O-intensiver Jobs auf physischen Hosts, die über I/O-Ressourcen im Leerlauf verfügen.The get vm distance statement provides the job scheduler ( 470 ) Jobs with hints about network distance between two virtual machines ready. The distance is estimated based on the observed data transfer rates between the virtual machines, and in one embodiment it is expressed in units of bandwidth. The get block _location statement supports getting the specific block location instead of the location of the virtual machine, which ensures locality. The instructions get_vm_networkinfo, get_vm_diskinfo and get_vm_cpuinfo provide for the job scheduler ( 470 ) simplifies querying I / O and CPU access conflict information related to network and disk usage. In one embodiment, the scheduler ( 470 ) use this additional information for smarter decisions, including, in one embodiment, placing I / O intensive jobs on physical hosts that have idle I / O resources.

5 ist ein Flussdiagramm (500), das die Ressourcenzuweisung und die Datenverwaltung sowie die Anordnung der virtuellen Maschine veranschaulicht. Wie beschrieben, ist CAM eine Cloud-Plattform mit speziellen Schnittstellen und unterstützt die Ausführung von MapReduce-Jobs. Ein zu verarbeitendes Datenset wird anfangs auf einem allgemeinen Dateisystem (GPFS, General Purpose Filesystem) abgelegt. Im Besonderen sind die Speicher- und Computingressourcen nicht voneinander isoliert. Ein MapReduce-Job wird abgesetzt, indem die Anwendung, z. B. relevante Java-Klassendateien, die ein zuvor hochgeladenes Datenset für den Job angeben, und die Anzahl und der Typ der für den Job (502) zu verwendenden virtuellen Maschinen bereitgestellt werden. In einer Ausführungsform unterstützt jede virtuelle Maschine mehrere MapReduce-Jobslots (Bereiche für Jobs, Jobs Slots) in Abhängigkeit von der Anzahl der jeder virtuellen Maschine zugewiesenen virtuellen CPUs und virtuellem RAM. Je höher die Anzahl der einem Job zugewiesenen virtuellen Maschinen ist, umso schneller wird der Job fertiggestellt. 5 is a flowchart ( 500 ) that illustrates resource allocation and data management, as well as the arrangement of the virtual machine. As described, CAM is a cloud platform with special interfaces and supports the execution of MapReduce jobs. A data set to be processed is initially stored on a general file system (GPFS, General Purpose Filesystem). In particular, the storage and computing resources are not isolated from each other. A MapReduce job is issued by the application, e.g. B. Relevant Java class files that specify a previously uploaded data set for the job, and the number and type of files for the job ( 502 ) virtual machines to be used to be provided. In one embodiment, each virtual machine supports multiple MapReduce job slots (areas for jobs, job slots) depending on the number of virtual CPUs and virtual RAM assigned to each virtual machine. The higher the number of virtual machines assigned to a job, the faster the job will be completed.

CAM bestimmt eine optimale Anordnung für das Set der angeforderten neuen virtuellen Maschinen (504), indem Faktoren berücksichtigt werden wie, ohne jedoch darauf beschränkt zu sein, die aktuelle Workload-Verteilung auf den Clusterknoten, die Verteilung der für den Job erforderlichen Eingabedatensets und der physischen Speicherorte der erforderlichen Master Images der virtuellen Maschine. Die Images von virtuellen Maschinen, die zum Booten der virtuellen Maschinen auf den ausgewählten Knoten erforderlich sind, werden aus den jeweiligen Master Images (506) erstellt. In einer Ausführungsform wird ein von dem allgemeinen Dateisystem bereitgestellter Mechanismus für das Kopieren bei Schreibvorgängen (Copy-on-Write Mechanism) zum Erstellen der virtuellen Maschine genutzt, da dadurch schnell eine Instanz eines Images einer virtuellen Maschine bereitgestellt werden kann, ohne dass eine Datenkopie des Master Images erforderlich ist. Im folgenden Schritt (506) werden die Klassendateien des Jobs auf ein kopiertes Image der virtuellen Maschine (508) kopiert. In einer Ausführungsform findet das Kopieren in Schritt (508) statt, indem das Image als ein Loop-Back-Dateisystem gemountet wird. Schließlich werden die Daten-Images der virtuellen Maschine zugeordnet und die entsprechenden Dateien werden in der virtuellen Maschine (510) gemountet, da so Jobs auf die hierin enthaltenen Daten zugreifen können.CAM determines an optimal arrangement for the set of the requested new virtual machines ( 504 ) by taking into account factors such as, but not limited to, the current workload distribution across the cluster nodes, the distribution of the input data sets required for the job, and the physical locations of the required master images of the virtual machine. The images of virtual machines that are required to boot the virtual machines on the selected nodes are taken from the respective master images ( 506 ) created. In one embodiment, a copy-on-write mechanism provided by the general file system is used to create the virtual machine because it can quickly provide an instance of an image of a virtual machine without a data copy of the Master Images is required. In the next step ( 506 ) the class files of the job are copied to a virtual machine image ( 508 ) copied. In one embodiment, the copying takes place in step ( 508 ) instead by mounting the image as a loop back filesystem. Finally, the data images are assigned to the virtual machine and the corresponding files are stored in the virtual machine ( 510 ) mounted, so that jobs can access the data contained therein.

Es wird nun auf 4 Bezug genommen, in der jede physische Maschine (430), (440) und (450) jeweils mit lokalen Platten (432), (442) und (452) ausgestattet ist. Ein verteiltes Dateisystem (410), das in diesem Dokument auch als GPFS-SNC bezeichnet wird, ist auf den physischen Maschinen (430), (440) und (450) installiert. Die Images der virtuellen Maschinen (436), (446) und (456) werden im verteilten Dateisystem (410) gespeichert. In einer Ausführungsform weist ein Cloud-Manager Jobs Ressourcen zu und verwaltet die Datenanordnung und die Anordnung der virtuellen Maschine.It is now going on 4th Referred to in which each physical machine ( 430 ), ( 440 ) and ( 450 ) each with local plates ( 432 ), ( 442 ) and ( 452 ) Is provided. A distributed file system ( 410 ), also referred to in this document as GPFS-SNC, is on the physical machines ( 430 ), ( 440 ) and ( 450 ) Installed. The virtual machine images ( 436 ), ( 446 ) and ( 456 ) are stored in the distributed file system ( 410 ) saved. In one embodiment, a cloud manager assigns resources to jobs and manages the data arrangement and arrangement of the virtual machine.

Die Anordnung der Daten und virtuellen Maschinen sind Aspekte, die hinsichtlich der Kosten in Betracht gezogen werden müssen. Im Besonderen wird bei der Anordnung berücksichtigt, dass die Nähe der virtuellen Maschinen gewährleistet ist, dass Hotspots vermieden werden und dass die Nutzung des physischen Speichers unter Berücksichtigung der unterschiedlichen Jobtypen ausgeglichen ist. Die Nähe der virtuellen Maschine drückt die Kosten des Datenzugriffs aus und erfasst, wie Daten bezüglich der zugeordneten virtuellen Maschinen angeordnet werden sollten, um die Zugriffszeiten zu minimieren. Der Hotspots-Faktor drückt die erwartete Last für eine Maschine aus und identifiziert Maschinen, die über nicht ausreichend Rechenressourcen zur Unterstützung der zugewiesenen virtuellen Maschinen verfügen. Um einen Hotspot zu vermeiden, müssen die Daten auf der Maschine mit der geringsten Last angeordnet werden. Dies kann bestimmt werden, indem die der Maschine aktuell zugewiesenen Rechenressourcen gemessen werden und dazu die erwartete Zuweisungsanforderung der virtuellen Maschinen addiert wird, die mit den auf der Maschine anzuordnenden Daten arbeiten würden. Schließlich gibt die Speichernutzung einen Prozentsatz des gesamten verwendeten Speicherbereichs der physischen Maschine an.The arrangement of the data and virtual machines are aspects that need to be considered in terms of cost. In particular, the arrangement takes into account that the proximity of the virtual machines is ensured, that hotspots are avoided and that the use of physical memory is balanced taking into account the different job types. The proximity of the virtual machine expresses the cost of data access and how data should be arranged with respect to the associated virtual machines to minimize access times. The hotspots factor expresses the expected load for a machine and identifies machines that do not have sufficient computing resources to support the assigned virtual machines. In order to avoid a hotspot, the data must be arranged on the machine with the lowest load. This can be determined by measuring the computing resources currently assigned to the machine and adding the expected assignment request of the virtual machines that would work with the data to be arranged on the machine. Finally, memory usage is a percentage of the total physical machine memory space used.

Die folgende Tabelle, Tabelle 2, zeigt eine Tabelle, die die Bedeutung der drei oben ausgeführten Faktoren, die die Leistung unter verschiedenen Workloads beeinflussen, veranschaulicht. Tabelle 2 Jobtyp Nähe der virtuellen Maschine Hotspot-Faktor Speichernutzung Zuordnungs- und Reduzierungsintensiv Ja Ja Ja Zuordnungsintensiv Nein Ja Ja Reduzierungsintensiv Nein Nein Ja The following table, Table 2, shows a table that illustrates the meaning of the three factors listed above that affect performance under different workloads. Table 2 Job type Proximity of the virtual machine Hotspot factor Memory usage Allocation and reduction intensive Yes Yes Yes Assignment intensive No Yes Yes Reduction-intensive No No Yes

Bei sowohl zuordnungs- (Map) wie auch reduzierungsintensiven (Reduce) Workloads sollten die zugehörenden Daten nahe zueinander und auf der Maschine mit der geringsten Last angeordnet werden. Bei zuordnungsintensiven Workloads sollten die Daten auf der Maschine mit der geringsten Last angeordnet werden, sie müssen aber nicht unbedingt nahe beieinander angeordnet sein, da bei einer solchen Workload ein mäßiger Datenverkehr (Light Shuffle Traffic)) auftritt. Bei reduzierungsintensiven Workloads ist nur die Speichernutzung der Maschine zu berücksichtigen, auf der die virtuelle Maschine angeordnet werden soll. Bei allen Workload-Typen ist es erstrebenswert, die Daten gleichmäßig über die Racks zu verteilen, um die Erfordernis der Neuanordnung der Daten im Laufe der Zeit zur Unterstützung der Migration virtueller Maschinen zu minimieren.In the case of both assignment (map) and reduction-intensive (reduce) workloads, the associated data should be arranged close to one another and on the machine with the lowest load. In the case of assignment-intensive workloads, the data should be arranged on the machine with the lowest load, but they do not necessarily have to be arranged close to one another, since moderate traffic (light shuffle traffic) occurs with such a workload. With reduction-intensive workloads, only the memory usage is the machine on which the virtual machine is to be placed. For all types of workloads, it is desirable to distribute the data evenly across the racks to minimize the need to rearrange the data over time to help migrate virtual machines.

Die in Tabelle 2 angegebenen Faktoren werden bei der Erstellung eines Flussgraphen mit den geringsten Kosten (Min-cost Flow Graph) verwendet, in dem die Faktoren kodiert werden. 6 stellt einen Flussgraphen (600) dar, der eine Beispielnetzwerktopologie für die Anordnung der Daten veranschaulicht. Im Besonderen besteht die Beispielnetzwerktopologie aus sechs physischen Knoten (p₁ , p₂ , p₃ , p₄ , p₅ , p₆ ), die als (610), (612), (614), (616), (618) und (620) identifiziert werden. Die physischen Knoten sind in drei Racks (r₁ , r₂ , r₃ ), die als (630), (632) und (634) bezeichnet werden, organisiert. Ein Master-Rack r₄ , wie ein Switch, das als (650) bezeichnet wird, verbindet die Racks. In einer Ausführungsform kann die hier dargestellte Topologie jede Topologie unterstützen, in der der Datenverkehr im Netzwerk geschätzt werden kann. Die Einheit der Datenanordnung ist ein Image einer virtuellen Maschine, um sicherzustellen, dass an einem Speicherort ein vollständiges Image verfügbar ist.The factors given in Table 2 are used when creating a flow chart with the lowest cost (min-cost flow graph) in which the factors are encoded. 6 represents a flow graph ( 600 ) that illustrates a sample network topology for the ordering of the data. In particular, the sample network topology consists of six physical nodes ( p ₁ , p ₂ , p ₃ , p ₄ , p ₅ , p ₆ ), as ( 610 ), ( 612 ), ( 614 ), ( 616 ), ( 618 ) and ( 620 ) be identified. The physical nodes are in three racks ( r ₁ , r ₂ , r ₃ ), as ( 630 ), ( 632 ) and ( 634 ) are organized. A master rack r ₄ , like a switch that as ( 650 ) connects the racks. In one embodiment, the topology shown here can support any topology in which network traffic can be estimated. The unit of data arrangement is an image of a virtual machine to ensure that a complete image is available in one location.

Basierend auf dem Flussgraphen von 6 wird in 7 ein zweiter Flussgraph (700) zur beispielhaften Anordnung von Daten dargestellt. Zwei Datenelemente, d₁ , (702) und d₂ , (704), mit Anforderungen für jeweils fünf und zwei Images virtueller Maschinen werden an die Cloud übergeben, z. B. einen gemeinsamen Pool an Ressourcen. Die Anzahl der von einem Datenelement angeforderten Images der virtuellen Maschine wird als Eingabe (Supply) des Datenelements für den Flussgraphen bezeichnet. Ein Knoten ohne Nachfolger (Sink Node), S (790), wird dem Graphen zur Unterstützung virtueller Maschinen hinzugefügt. Die Anzahl der virtuellen Maschinen, die ein Knoten ohne Nachfolger verwalten kann, wird als Sollwert (Demand Value) zugewiesen. In dem hier dargestellten Beispiel weist der Knoten ohne Nachfolger (790) einen Sollwert von 7 auf, und es ist die einzige Stelle, die alle Flüsse erhalten kann. Jeder Kante des Flussgraphen sind zwei Parameter zugeordnet, darunter die Kapazität der Kante und die Kosten für einen durch diese Kante verlaufenden Flusses. Die Datenknoten d₁ (702) und d₂ (704) verfügen über ausgehende Links (bzw. Linienverbindungen) zu jedem Rack, wobei die Nähe zur virtuellen Maschine die Kosten bilden. Wie dargestellt, verfügt der Datenknoten d₁ (702) über einen Link (bzw. Linienverbindung) (712), der zu Rack r₁ (722) verläuft, der Datenknoten d₁ (702) verfügt über einen Link (714), der zu Rack r₂ (724) verläuft; der Datenknoten d₁ (702) verfügt über einen Link (716), der zu Rack r₃ (726) verläuft; der Datenknoten d₁ (702) verfügt über einen Link (718), der zu Rack r₄ (728) verläuft; der Datenknoten d₂ (704) verfügt über einen Link (732), der zu Rack r₁ (722) verläuft; der Datenknoten d₂ (704) verfügt über einen (734), der zu Rack r₂ (724) verläuft; der Datenknoten d₂ (704) verfügt über einen Link (736), der zu Rack r₃ (726) verläuft und der Datenknoten d₂ (704) verfügt über einen Link (738), der zu Rack r₄ (728) verläuft. Hier werden sechs physische Knoten dargestellt, im Einzelnen sind dies p₁ (740), p₂ (742), p₃ (744), p₄ (746), p₅ (748) und p₆ (750). Der Hotspot-Faktor ist in den Links (750), (752), (754), (756), (758) und (760) von den Racks zu jedem physischen Knotenp innerhalb des Geltungsbereichs davon kodiert. Auch wenn r₄ (728) als ein Switch zwischen den Racks fungiert, wird es als mit allen physischen Knoten direkt verbunden dargestellt, um sicherzustellen, dass die Maschine mit der geringsten Last für zuordnungsintensive Jobs gewählt werden kann, ohne durch die Netzwerktopologie beschränkt zu werden.Based on the flow graph of 6 is in 7 a second flow graph ( 700 ) for the exemplary arrangement of data. Two data elements, d ₁ , ( 702 ) and d ₂ , ( 704 ), with requirements for five and two images of virtual machines, respectively, are transferred to the cloud, e.g. B. a shared pool of resources. The number of images of the virtual machine requested by a data element is referred to as the input (supply) of the data element for the flow graph. A node without a successor (sink node), S ( 790 ) is added to the graph to support virtual machines. The number of virtual machines that a node can manage without a successor is assigned as a demand value. In the example shown here, the node has no successor ( 790 ) has a set point of 7 and it is the only place that can get all the rivers. There are two parameters associated with each edge of the flow graph, including the capacity of the edge and the cost of a flow through that edge. The data nodes d ₁ ( 702 ) and d ₂ ( 704 ) have outgoing links (or line connections) to each rack, whereby the proximity to the virtual machine is the cost. As shown, the data node has d ₁ ( 702 ) via a link (or line connection) ( 712 ) that too rack r ₁ ( 722 ) runs, the data node d ₁ ( 702 ) has a link ( 714 ) that too rack r ₂ ( 724 ) runs; the data node d ₁ ( 702 ) has a link ( 716 ) that too rack r ₃ ( 726 ) runs; the data node d ₁ ( 702 ) has a link ( 718 ) that too rack r ₄ ( 728 ) runs; the data node d ₂ ( 704 ) has a link ( 732 ) that too rack r ₁ ( 722 ) runs; the data node d ₂ ( 704 ) has a ( 734 ) that too rack r ₂ ( 724 ) runs; the data node d ₂ ( 704 ) has a link ( 736 ) that too rack r ₃ ( 726 ) runs and the data node d ₂ ( 704 ) has a link ( 738 ) that too rack r ₄ ( 728 ) runs. Six physical nodes are shown here, in detail these are p ₁ ( 740 ), p ₂ ( 742 ), p ₃ ( 744 ), p ₄ ( 746 ), p ₅ ( 748 ) and p ₆ ( 750 ). The hotspot factor is in the links ( 750 ), ( 752 ), ( 754 ), ( 756 ), ( 758 ) and ( 760 ) from the racks to each physical node within the scope thereof. Even if r ₄ ( 728 ) acts as a switch between the racks, it is shown as being directly connected to all physical nodes to ensure that the machine with the lowest load can be selected for assignment-intensive jobs without being restricted by the network topology.

Alle physischen Knoten p₁ (740), p₂ (742), p₃ (744), p₄ (746), p₅ (748) und p₆ (750) sind mit dem Knoten ohne Nachfolger (790) verbunden, wobei die Speichernutzung die Linkkosten bildet. Es gibt keinen direkten Link von dem Datenelementknoten d_j zum zugehörigen physischen Host p_i . Der Grund hierfür liegt in der Unterstützung des Hochskalierens des Systems. Die folgende Tabelle, Tabelle 3, enthält die Werte, die dem Flussgraphen für die Anordnung der Daten zugewiesen sind. Tabelle 3 Datenset d_j Rack r_k Physischer Host Knoten ohne Nachfolger Eingabe Σ(Nd_j) 0 0 - Σ(Nd_j) Eingehender Link von N/ Nd_j Rack r_k Physischer Host Ausgehender Link (Kapazität, Kosten) Zum Rack (Nd_j, a_jk) Zum physischen Host (Cap_i, β_i) Zum Knoten ohne Nachfolger (Capi, Yi) N/A All physical nodes p ₁ ( 740 ), p ₂ ( 742 ), p ₃ ( 744 ), p ₄ ( 746 ), p ₅ ( 748 ) and p ₆ ( 750 ) are with the node without a successor ( 790 ) connected, the storage usage forming the link costs. There is no direct link from the data element node d _j to the associated physical host p _i . The reason for this is to support scaling up the system. The following table, Table 3, contains the values assigned to the flow graph for the arrangement of the data. Table 3 Data set d _j Rack r _k Physical host Knot without a successor input Σ (Nd _j ) 0 0 - Σ (Nd _j ) Incoming link from N / Nd _j Rack r _k Physical host Outbound link (capacity, cost) To the rack (Nd _j , a _jk ) To the physical host (Cap _i , β _i ) To the knot without successor (Capi, Yi) N / A

Wie dargestellt, ist Nd_j die Anzahl der vom Datenset d_j, a_jk angeforderten Images der virtuellen Maschinen, um die Nähe der virtuellen Maschinen zu erfassen. Die Kosten, a_jk, des ausgehenden Links vom Datenset d_j zum physischen Host p_i, auf dem die Daten auf dem Rack r_k angeordnet werden, werden anhand des Datenverkehrs in der Shuffle-Phase wie folgt in konservativer Weise geschätzt: $a_{j k} = s i z e_{i n t e r m e d i a t e - d a t a} * \underset{\underline{n u m} \underline{_{R e d u c e r}}}{{\underline{n u m}}_{\underline{R e d u c e r}}^{\underline{- 1}}} * d i s t a n c e_{m a x}$

wobei distance_max die Netzwerkdistanz zwischen zwei beliebigen Knoten im Rack r_k ist. As shown, Nd _{j is} the number of virtual machine images requested by data set d _j , a _jk to detect the proximity of the virtual machines. The cost, a _jk , of the outbound link from the data set d _j to the physical host p _i on which the data is placed on the rack r _{k is} estimated conservatively from the traffic in the shuffle phase as follows:

a_{j k} = s i e.g. e_{i n t e r m e d i a t e - d a t a} * \underset{\underline{n u m} \underline{_{R e d u c e r}}}{{\underline{n u m}}_{\underline{R e d u c e r}}^{\underline{- 1}}} * d i s t a n c e_{m a x}

where distance _{max is} the network distance between any two nodes in the rack r _k .

Der Hotspot-Faktor wird mit β_i für den physischen Knoten p_i erfasst und anhand der aktuellen und erwarteten Last (Load) wie folgt geschätzt: $β_{i} = a * (l o a d_{e x p} + l o a d_{c u r r} - l o a d_{m i n})$

wobei load_curr und load_min jeweils die aktuelle Last und die minimale aktuelle Last darstellen, und a ein Parameter ist, der als Möglichkeit („Knopf“, Knob) der Feineinstellung der Gewichtung des Hotspot-Faktors bezüglich anderer Kosten dient. Die erwartete Last, load_exp, lautet wie folgt:

l o a d_{e x p} = \sum_{j} (p_{j} / (1 - p_{j}) * C R e s (d_{j})

wobei p_j = λ_j/µ_j, und λj die Anzahl der d_j zugeordneten Jobs darstellt, die in einem bestimmten Zeitintervall eingehen, und µ_j für jede virtuelle Maschine die mittlere Zeit zur Verarbeitung eines Blocks angegeben.The hotspot factor is recorded with β _i for the physical node p _i and is estimated as follows on the basis of the current and expected load:

β_{i} = a * (l O a d_{e x p} + l O a d_{c u r r} - l O a d_{m i n})

where load _curr and load _min each represent the current load and the minimum current load, and a is a parameter which serves as a possibility (“knob”, knob) for fine-tuning the weighting of the hotspot factor with respect to other costs. The expected load, load _exp , is as follows:

l O a d_{e x p} = \sum_{j} (p_{j} / (1 - p_{j}) * C. R e s (d_{j})

where p _j = λ _j / µ _j , and λj represents the number of jobs assigned to d _j that arrive in a specific time interval, and µ _j indicates the average time for processing a block for each virtual machine.

Die Speichernutzung eines physischen Knotens p_i wird durch y_i erfasst, das durch die aktuelle Speichernutzung im Vergleich zur minimalen Speichernutzung aller p_is wie folgt bestimmt wird: $γ_{i} = b * (s t o r a g e U t i l i z a t i o n p_{i} - s t o r a g e U t i l i z a t i o n_{m i n})$

wobei b ein Parameter ist, um die Gewichtung der Speichernutzung bezüglich anderer Faktoren feinabzustimmen. Schließlich wird für die Kapazität die folgende Formel als Schätzung der Kapazität für jeden physischen Host verwendet:

C a p_{i} = f r e e s p a c e_{p i} / s i z e_{V M I m g}

The memory usage of a physical node p _i is recorded by y _i , which is determined by the current memory usage in comparison to the minimum memory usage of all p _i s as follows:

γ_{i} = b * (s t O r a G e U t i l i e.g. a t i O n p_{i} - s t O r a G e U t i l i e.g. a t i O n_{m i n})

where b is a parameter to fine-tune memory usage weighting with respect to other factors. Finally, for capacity, the following formula is used to estimate capacity for each physical host:

C. a p_{i} = f r e e s p a c e_{p i} / s i e.g. e_{V M I. m G}

Damit die Korrelation zwischen den Anordnungen der Images der virtuellen Maschine für eine Datenanforderung im Graphen erfasst werden kann, wird ein Splitfaktor-Parameter bereitgestellt, um anzugeben, ob die Flüsse von einem Knoten über verschiedene Links aufgeteilt werden können. In einer Ausführungsform wird der Wert dieses Parameters als wahr (True) oder falsch (False) definiert. Wenn zum Beispiel der Splitfaktor für alle Links von d₁ und d₂ „wahr“ ergibt, verlaufen alle Flüsse von Datenknoten durch eines von r₁ , r₂ , r₃ , r₄ , werden aber nicht zwischen den Racks aufgeteilt. Sobald eine neue Anforderung zum Hochladen von Daten empfangen wird, aktualisiert der Cloud-Server den Graphen und berechnet basierend auf der berechneten Lösung für die neu aktualisierten Daten eine globale optimale Lösung. Dementsprechend wird der Scheduler in periodischen Abständen basierend auf der neuen Lösung aktualisiert und kann variierende Lasten berücksichtigen.In order that the correlation between the arrangements of the virtual machine images for a data request can be recorded in the graph, a split factor parameter is provided to indicate whether the flows from a node can be divided over different links. In one embodiment, the value of this parameter is defined as true or false. For example, if the split factor for all links from d ₁ and d ₂ If "true" results, all flows of data nodes run through one of r ₁ , r ₂ , r ₃ , r ₄ , but are not split between the racks. As soon as a new data upload request is received, the cloud server updates the graph and calculates a global optimal solution for the newly updated data based on the calculated solution. Accordingly, the scheduler is updated periodically based on the new solution and can take into account varying loads.

Das Ziel der Anordnung der virtuellen Maschine ist die Maximierung der Lokalität (Locality) der globalen Daten und des Jobdurchsatzes. Unser Modell berücksichtigt sowohl die Migration der virtuellen Maschinen und die verzögerte Planung eines Jobs als Bestandteil der optimalen Lösung. Die Verzögerung bei einem Job wird dazu genutzt, bessere Möglichkeiten der Lokalität von Daten zu untersuchen, die in naher Zukunft entstehen können, wobei gleichzeitig die beim Warten aufgewendete Zeit minimiert wird. Durch die Migration einer zu einem Job gehörenden virtuellen Maschine kann der Scheduler Raum für andere geeignete Jobs schaffen oder bessere Möglichkeiten bezüglich des Speicherorts bestimmen.The goal of arranging the virtual machine is to maximize the locality of the global data and job throughput. Our model takes into account both the migration of virtual machines and the delayed planning of a job as part of the optimal solution. The delay in a job is used to examine better ways of localizing data that may arise in the near future, while minimizing the time spent waiting. By migrating a virtual machine belonging to a job, the scheduler can create space for other suitable jobs or determine better options with regard to the storage location.

8 stellt einen Flussgraphen (800) für die Anordnung virtuellen Maschinen dar. Jeder Job v_j (802) und (804) wird an das System an dem Quellknoten mit der Anzahl der angeforderten virtuellen Maschinen, N_vj, als Eingabewert übergeben. Das Ziel der Zuweisungsfunktion (Allocator) der virtuellen Maschinen besteht darin, den Job als nicht geplant (bzw. nicht zur Ausführung geplant, unscheduled, im Folgenden nur: nicht geplant) beizubehalten, z. B. jeder Anforderung 0 oder N_vj, virtuelle Maschinen zuzuordnen. Es gibt einen Knoten S ohne Nachfolger (890) mit einem Sollwert gleich bis abzüglich der Summe des Eingabewertes. Die Anforderung eines jeden Jobs fungiert als ein Fluss (Flow), der entweder durch die Rack-Knoten (810), (812), (814) und (816) oder durch die nicht geplanten Knoten (820), (822) und (824) und schließlich zum Knoten ohne Nachfolger (890) verläuft. Wenn ein Job nicht geplant ist, werden keine seiner virtuellen Maschinen zugewiesen. Andernfalls verläuft der Fluss durch die physischen Knoten (830), (832), (834), (836), (838) und (840). Basierend auf der Lösung der geringsten Kosten kann ein Zuweisungsschema mit den geringsten Kosten abgeleitet werden. Wenn die virtuellen Maschinen dem Rack der obersten Ebene zugeordnet werden, gibt dies an, dass die virtuellen Maschinen jedem Set von Knoten in den virtuellen Maschinen auf dem Rack beliebig zugeordnet werden können. 8th represents a flow graph ( 800 ) for the arrangement of virtual machines. Each job v _j ( 802 ) and ( 804 ) is passed to the system at the source node with the number of requested virtual machines, N _vj , as an input value. The aim of the allocation function (allocator) of the virtual machines is to keep the job as not planned (or not planned for execution, unscheduled, hereinafter only: not planned). B. each request 0 or N _vj to assign virtual machines. There is a node S without a successor ( 890 ) with a setpoint equal to minus the sum of the input value. Each job's request acts as a flow, either through the rack nodes ( 810 ), ( 812 ), ( 814 ) and ( 816 ) or through the unplanned nodes ( 820 ), ( 822 ) and ( 824 ) and finally to the node without a successor ( 890 ) runs. If a job is not scheduled, none of its virtual machines are assigned. Otherwise, the flow passes through the physical nodes ( 830 ), ( 832 ), ( 834 ), ( 836 ), ( 838 ) and ( 840 ). Based on the lowest cost solution, an allocation scheme with the lowest cost can be derived. When the virtual machines are mapped to the top-level rack, it indicates that the virtual machines can be mapped to any set of nodes in the virtual machines on the rack.

Die Jobtypinformationen werden als die Kosten der Kante von jedem Job zu den Rack-Knoten im flussbasierten Graphen modelliert. Das Rack der höheren Ebene weist hinsichtlich des reduzierten Datenverkehrs höhere Kosten auf als das Rack der untergeordneten Ebene. In einer Ausführungsform werden die Kosten des Racks der obersten Ebene durch eine Anordnung der virtuellen Maschinen im Fall der schlechtesten Zuordnung und des reduzierten Datenverkehrs geschätzt. Die Kosten der Kanten der nicht geplanten Knoten werden so festgelegt, dass sie im Laufe der Zeit erhöht werden, so dass verzögerte Jobs schneller zugewiesen werden als vor kurzem abgesetzte Jobs. Diese Kosten steuern auch, wann ein Job anhält und auf eine bessere Lokalität wartet, und bieten daher eine Einstellungsmöglichkeit, um den Kompromiss zwischen Lokalität der Daten und der Latenz zu optimieren. Die nicht geplanten aggregierten Knoten steuern, wie viele virtuelle Maschinen in einem nicht geplanten Zustand verbleiben können, um die Nutzung der Systemressourcen und den Kompromiss bei der Lokalität der Daten zu steuern. Die Kosten der Kanten zum ausführenden Knotenset steigen im Laufe der Zeit an und berücksichtigen den Fortschritt der Jobausführung.The job type information is modeled as the cost of the edge from each job to the rack nodes in the flow-based graph. The higher level rack is more expensive than the lower level rack in terms of reduced traffic. In one embodiment, the cost of the top-level rack is estimated by arranging the virtual machines in the event of the worst allocation and reduced traffic. The cost of the edges of the unplanned nodes is set to increase over time so that delayed jobs are assigned faster than recently posted jobs. These costs also control when a job stops and waits for better locality, and therefore offer a setting option to optimize the compromise between locality of data and latency. The unplanned aggregate nodes control how many virtual machines can remain in an unplanned state to control the use of system resources and the compromise on the locality of the data. The cost of the edges to the executing node set increases over time and takes into account the progress of the job execution.

9 stellt eine Tabelle (900) mit Werten dar, die dem Flussgraphen für die Anordnung der virtuellen Maschine und der Knotenkategorisierung zugewiesen werden. Verschiedene Knoten im Graphen werden als unterschiedliche Typen kategorisiert, wie in der Tabelle gezeigt. Ein bevorzugtes Knotenset ist ein Set (pr_j) (910) aus Graphenknoten, die auf ein Set physischer Knoten p_i zeigen, die ein dem Job v_j zugeordnetes darin gespeichertes Datenset aufweisen. Eine Kante von einem bevorzugten Knoten zu einen physischen Knoten p_i hat die Kosten 0 und die Kapazität der Anzahl der auf den physischen Knoten p_i gespeicherten Platten-Images der virtuellen Maschinen. Ein ausgeführtes Knotenset (ru_j) (920) ist ein Set dynamischer hinzugefügten Knoten, die auf physische Knoten (p_i s) zeigen, die aktuell die virtuellen Maschinen der Jobs (v_j) hosten. Eine Kante von ru_j zu p_i hat die Kosten 0 und die Kapazität in der Anzahl der virtuellen Maschinen, die auf den physischen Knoten p_i ausgeführt werden. Ein nicht geplantes Knotenset, u_j, (930) ist ein Knotenset, dass Informationen über aktuell nicht geplante Jobs bereitstellt. Das nicht geplante Knotenset, u_j, hat eine ausgehende Kante mit der Kapazität N_vj und dem Code 0 zu einem nicht geplanten Aggregator. Ein nicht geplanter Aggregatknoten, u, (940) hat eine ausgehende Kante mit Kosten 0 zum Knoten ohne Nachfolger, wobei die Kapazität definiert ist als: $N_{u n s c h e d} = \sum (N_{v j}) - M + M_{i d l e}$

wobei M die Gesamtzahl der virtuellen Maschinen ist, die im Cluster unterstützt werden können, und M_idle die im Cluster zulässige Anzahl der im Leerlauf befindlichen virtuellen Maschinenslots angibt. 9 represents a table (900) with values assigned to the flow graph for the arrangement of the virtual machine and the node categorization. Different nodes in the graph are categorized as different types, as shown in the table. A preferred knot set is a set (pr _j ) ( 910 ) from graph nodes that point to a set of physical nodes p _i that have a data set associated with the job v _j stored therein. An edge from a preferred node to a physical node p _i has the cost 0 and the capacity of the number of virtual machine disk images stored on the physical nodes p _i . An executed node set (ru _j ) ( 920 ) is a set of dynamically added nodes that point to physical nodes (p _i s) that currently host the job's virtual machines (v _j ). An edge from ru _j to p _i has the cost 0 and the capacity in the number of virtual machines running on the physical nodes p _i . An unplanned set of knots, u _j , ( 930 ) is a node set that provides information about currently unplanned jobs. The unplanned node set, u _j , has an outgoing edge with the capacity N _vj and the code 0 to an unplanned aggregator. An unplanned aggregate node, u, ( 940 ) has an outgoing edge with cost 0 to the node without successor, whereby the capacity is defined as:

N_{u n s c H e d} = \sum (N_{v j}) - M + M_{i d l e}

where M is the total number of virtual machines that can be supported in the cluster, and M _{idle is} the number of virtual machine slots that are allowed to be _idle in the cluster.

Das Rack-Knotenset, r_k, (950) stellt in der Topologie des Clusters ein Rack dar. Es hat ausgehende Links mit den Kosten 0 zu den untergeordneten Racks, oder wenn es sich auf der untersten Ebene befindet, zu physischen Knoten. Die Links haben die Kapazität N_rk, dies ist die Gesamtzahl der virtuellen Maschinen-Slots, die von den zugrunde liegenden Knoten bedient werden können. Das physische Hostknotenset, p_i, (960) verfügt über einen ausgehenden Link zu dem Knoten ohne Nachfolger, wobei die Kapazität die Anzahl der virtuellen Maschinen ist, die auf dem physischen Host N_vm vorhanden sein können, und die Kosten 0. Der Graph verfügt über einen Knoten ohne Nachfolger (970) mit einem Sollwert, der als Σ(N_vj) dargestellt wird. Das Jobknotenset, v_j, (980) stellt jeden Jobknoten v_j mit der Eingabe N_vj dar. Es weist mehrere ausgehende Kanten auf, die den möglichen Zuweisungsentscheidungen der virtuellen Maschinen für das Jobset entsprechen.The rack knot set, r _k , ( 950 ) represents a rack in the topology of the cluster. It has outgoing links with the cost 0 to the subordinate racks or, if it is at the lowest level, to physical nodes. The links have the capacity N _rk , this is the total number of virtual machine slots that can be served by the underlying nodes. The physical host node set, p _i , ( 960 ) has an outgoing link to the node without a successor, where the capacity is the number of virtual machines that can exist on the physical host N _vm and the cost 0. The graph has a node without a successor ( 970 ) with a setpoint, which is represented as Σ (N _vj ). The job node set, v _j , ( 980 ) represents each job node v _j with the input N _vj . It has several outgoing edges that correspond to the possible assignment decisions of the virtual machines for the job set.

Zu den Kanten gehören ein Rack-Knotenset, ein bevorzugtes Knotenset, ein ausgeführtes Knotenset und ein nicht geplantes Knotenset. Das Rack-Knotenset, r_k, weist eine Kante zu r_k auf, die angibt, dass r_k v_j aufnehmen kann. Die Kosten der Kante lauten p_j, die durch die Zuordnung (Map) berechnet werden und Datenverkehrskosten reduzieren. Wenn die Kapazität der Kante größer ist als N_vj, gibt dies an, dass die virtuellen Maschinen von v_j einigen bevorzugten Knoten im Rack zugewiesen werden. Das bevorzugte Knotenset, p_rj, hat eine Kante vom Job v_j zu dem für den gesamten Job bevorzugten Knotenset, p_rj, es hat eine Kapazität N_vj und Kosten θ. Die Kosten werden nur anhand des Datenverkehrs in der reduzierten Phase geschätzt, da in diesem Fall von einem zugeordneten Datenverkehr von 0 ausgegangen wird. Das ausgeführte Knotenset, r_uj, hat einen Link von Job v_j, eine Kapazität N_vj und cost ϕ = c*T, wobei T die Zeit ist, die der Job auf dem Set der Maschinen ausgeführt wurde, und c eine Konstante ist, mit der die Kosten relativ zu den anderen Kosten angepasst werden. Das nicht geplante Knotenset, u_j, besitzt eine Kante zu dem im gesamten Job nicht geplanten Knoten u_j mit der Kapazität N_vj und den Kosten ε_j, die der Strafe entsprechen, dass der Job v_j nicht geplant wurde. ε_j = d * T, wobei T die Zeit ist, die der Job v_j nicht geplant ist, und d eine Konstante ist, mit der die Kosten relativ zu anderen Kosten angepasst werden. Der Splitfaktor für diesen Link wird als wahr gekennzeichnet, das heißt, die Zuweisungen aller virtuellen Maschinen sind entweder erfüllt oder werden bis zur nächsten Runde aufgeschoben. The edges include a rack knot set, a preferred knot set, an executed knot set, and an unplanned knot set. The rack node set, r _k , has an edge to r _k that indicates that r _k can accommodate v _j . The cost of the edge is p _j , which is calculated by the mapping and which reduces traffic costs. If the capacity of the edge is greater than N _vj , it indicates that v _j's virtual machines are assigned to some preferred nodes in the rack. The preferred node set, p _rj , has an edge from job v _j to the preferred node set for the entire job, p _rj , it has a capacity N _vj and cost θ. The costs are only estimated on the basis of the data traffic in the reduced phase, since in this case an assigned data traffic of 0 is assumed. The node set executed, r _uj , has a link of job v _j , a capacity N _vj and cost ϕ = c * T, where T is the time that the job was executed on the set of machines and c is a constant, with which the costs are adjusted relative to the other costs. The unplanned node set, u _j , has an edge to the node u _j not planned in the entire job with the capacity N _vj and the costs ε _j , which correspond to the penalty that the job v _{j was} not planned. ε _j = d * T, where T is the time that the job v _{j is} not scheduled and d is a constant with which the costs are adjusted relative to other costs. The split factor for this link is marked as true, which means that the assignments of all virtual machines are either fulfilled or are postponed until the next round.

Basierend auf einer Ausgabe einer Flusslösung mit minimalen Kosten kann die Zuordnung der Zuweisung der virtuellen Maschinen aus dem Graphen erhalten werden, indem ermittelt wird, wohin für jede Anforderung von v_j einer virtuellen Maschine der zugehörige Fluss hinführt. Der Fluss zu einem nicht geplanten Knoten gibt an, dass die Anforderung der virtuellen Maschine in der aktuellen Runde übergangen wird. Wenn der Fluss zu einem bevorzugten Knotenset führt, wird die Anforderung der virtuellen Maschine für dieses Knotenset geplant. Wenn der Fluss zu einem Rack-Knoten verläuft, gibt dies an, dass die virtuellen Maschinen des Jobs beliebigen Hosts in diesem Rack zugeordnet werden.Based on output of a flow solution with minimal costs, the assignment of the assignment of the virtual machines can be obtained from the graph by determining where the associated flow leads to for each request from v _{j of} a virtual machine. The flow to an unplanned node indicates that the virtual machine request is skipped in the current round. If the flow leads to a preferred node set, the virtual machine request for that node set is scheduled. If the flow goes to a rack node, it indicates that the job's virtual machines are mapped to any host in that rack.

Die Anzahl der Flüsse, die auf einem physischen Host über Rack-Knoten oder ein Set bevorzugter Knoten festgelegt werden, ist niedriger als die Anzahl der verfügbaren virtuellen Maschinen eines jeden physischen Hosts. Dies wird durch die angegebene Linkkapazität vom physischen Host zum Knoten ohne Nachfolger sichergestellt. Dementsprechend werden alle Anforderungen virtueller Maschinen, die zugewiesen werden, einem entsprechenden physischen Host zugeordnet.The number of flows specified on a physical host through rack nodes or a set of preferred nodes is less than the number of available virtual machines for each physical host. This is ensured by the specified link capacity from the physical host to the node without successor. Accordingly, all virtual machine requests that are assigned are mapped to a corresponding physical host.

10 stellt ein Ablaufdiagramm (1000) dar, das einen Prozess der Auswertung und Nutzung der Topologie der physischen und virtuellen Maschinen in einem gemeinsamen Pool von Ressourcen veranschaulicht. Der Status der virtuellen Maschinen im gemeinsamen Pool wird erhoben (1002). Es ist ersichtlich, dass eine oder mehrere virtuelle Maschinen einem einzelnen physischen Host zugeordnet werden können, wodurch die Virtualisierung einer zugrundeliegenden physischen Maschine unterstützt wird. Basierend auf den erhobenen Daten werden zugehörige Topologieinformationen erfasst und einem Server an einem Root-Knoten der hierarchischen Organisation der physischen und virtuellen Maschinen kommuniziert (1004). 10th provides a flow chart ( 1000 ) that illustrates a process of evaluating and using the topology of the physical and virtual machines in a shared pool of resources. The status of the virtual machines in the shared pool is collected ( 1002 ). It can be seen that one or more virtual machines can be mapped to a single physical host, thereby supporting the virtualization of an underlying physical machine. Based on the data collected, associated topology information is recorded and communicated to a server at a root node of the hierarchical organization of the physical and virtual machines ( 1004 ).

Jede virtuelle Maschine wird mit einem integrierten Agent bereitgestellt, und jede physische Maschine, der die virtuellen Maschinen zugewiesen sind, wird mit einem integrierten Monitor bereitgestellt. Eine Servermaschine wird in Kommunikation mit einer jeden der physischen Maschinen bereitgestellt und ihre Funktion besteht darin, in periodischen Abständen Informationen von den integrierten Monitoren der zugrundeliegenden physischen Maschinen zu erheben. Die Funktion der integrierten Monitore besteht darin, lokale Topologie-, Platten-und Netzwerkinformationen zu erfassen. In einer Ausführungsform liegen die integrierten Monitore in Form von Software vor, die auf der physischen Maschine ausgeführt wird, um Statusdaten der zugehörigen virtuellen Maschinen zu erheben. Ebenso kommuniziert der integrierte Agent einer jeden virtuellen Maschine die konkrete Topologie- und Systemnutzungsinformationen an die Servermaschine. Dementsprechend enthält jede virtuelle und physische Maschine integrierte Tools, um zur Topologie gehörende Nutzungsinformationen zu erfassen und die erfassten Informationen an die Servermaschine zu übertragen.Each virtual machine is deployed with a built-in agent, and each physical machine to which the virtual machines are assigned is deployed with a built-in monitor. A server machine is provided in communication with each of the physical machines and its function is to periodically collect information from the integrated monitors of the underlying physical machines. The function of the integrated monitors is to collect local topology, disk and network information. In one embodiment, the integrated monitors are in the form of software that runs on the physical machine to collect status data of the associated virtual machines. The integrated agent of each virtual machine also communicates the specific topology and system usage information to the server machine. Accordingly, each virtual and physical machine contains integrated tools to collect usage information belonging to the topology and to transmit the collected information to the server machine.

Nach dem Schritt des Erfassens der topologischen Daten in Schritt (1004) werden die erfassten Daten an einem einzelnen Speicherort organisiert (1006). In einer Ausführungsform kann der einzelne Speicherort ein Root-Knoten sein, der einen physischen Server darstellt, der sich in Kommunikation mit jeder der virtuellen Maschinen und deren zugehörigen physischen Maschine befindet. Wie in Schritt (1004) dargestellt, enthalten die von den integrierten Agents der virtuellen Maschinen kommunizierten Daten die konkrete Topologie-und Systemnutzungsinformationen. Die im Schritt (1004) erfassten Daten weisen eine Speichertopologie, die einer virtuellen Topologie des gemeinsamen Pools von Ressourcen zugrundeliegt, zusammen mit den zugehörigen Ressourcennutzungsinformationen auf. Ein Job wird in Kenntnis der erfassten Daten einer ausgewählten virtuellen Maschine im gemeinsamen Pool zugeordnet, wobei bei der Zuordnung die organisierten Speichertopologieinformationen genutzt werden (1008). In einer Ausführungsform ist die Jobzuweisung in Schritt (1008) dazu ausgelegt, die effiziente Leistung des zum Job gehörenden I/O zu unterstützen. Dementsprechend können Jobs durch den Prozess der Erfassung und des Organisierens der lokalen Topologieinformationen einer ausgewählten virtuellen Maschine auf intelligente Weise zugeordnet werden.After the step of collecting the topological data in step ( 1004 ) the collected data is organized in a single location ( 1006 ). In one embodiment, the single location may be a root node that represents a physical server that is in communication with each of the virtual machines and their associated physical machine. As in step ( 1004 ), the data communicated by the integrated agents of the virtual machines contain the specific topology and system usage information. The step ( 1004 ) recorded data have a storage topology, which is based on a virtual topology of the shared pool of resources, together with the associated resource usage information. A job is assigned to a selected virtual machine in the shared pool in the knowledge of the recorded data, the organized ones being assigned Storage topology information can be used ( 1008 ). In one embodiment, the job assignment in step ( 1008 ) designed to support the efficient performance of the I / O belonging to the job. Accordingly, jobs can be intelligently assigned to a selected virtual machine through the process of collecting and organizing the local topology information.

Die Jobzuordnung in Schritt (1008) kann an eine virtuelle Maschine oder an mehrere virtuelle Maschinen erfolgen. Ebenso kann der Job ein Lesejob oder ein Schreibjob sein. In beiden Szenarien wird in Reaktion auf die Jobzuweisung eine virtuelle topologische Distanz zurückgegeben (1010). Die virtuelle topologische Distanz kann eine Distanz zwischen zwei oder mehreren virtuellen Maschinen sein, wenn der Job mehreren Maschinen zugewiesen wird, oder die virtuelle topologische Distanz kann zwischen einer virtuellen Maschine und einem Datenblock bestehen, wenn der Job eine einzelne virtuelle Maschine für einen Lese- oder Schreibjob unterstützt. Im folgenden Schritt (1010) wird bestimmt, ob die zurückgegebene topologische Distanz zwischen mindestens zwei virtuellen Maschinen (1012) besteht. Fällt die Antwort auf die Bestimmung in Schritt (1012) positiv aus, wird ein Shared Memory-Channel für die Datenkommunikation zwischen virtuellen Maschinen zwischen zwei virtuellen Maschinen erstellt, die auf der gleichen physischen Maschine lokal vorhanden sind (1014). Durch Erstellen des Shared Memory-Channels wird die effiziente Datenübertragung zwischen den beiden virtuellen Maschinen unterstützt. Im Besonderen kann eine Speicherkopie für die Kommunikation zwischen den virtuellen Maschinen genutzt werden, wodurch die Kommunikation über einen virtuellen Netzwerkstapel vermieden wird (1016). Umgekehrt wird bei einer negativen Antwort auf die Bestimmung in Schritt (1012) in der Folge der virtuelle Netzwerkstapel zur Kommunikation zwischen den virtuellen Maschinen genutzt, die den zugewiesenen Job (1018) unterstützen. Dementsprechend kann die physische Nähe der virtuellen Maschinen zu einer effizienten Übertragung der Kommunikation zwischen virtuellen Maschinen beitragen.The job assignment in step ( 1008 ) can take place on one virtual machine or on several virtual machines. The job can also be a read job or a write job. In both scenarios, a virtual topological distance is returned in response to the job assignment ( 1010 ). The virtual topological distance can be a distance between two or more virtual machines if the job is assigned to multiple machines, or the virtual topological distance can be between a virtual machine and a data block if the job is a single virtual machine for a read or Write job supported. In the next step ( 1010 ) it is determined whether the returned topological distance between at least two virtual machines ( 1012 ) consists. If the answer to the determination in step ( 1012 ) positive, a shared memory channel is created for data communication between virtual machines between two virtual machines that exist locally on the same physical machine ( 1014 ). Creating the shared memory channel supports efficient data transfer between the two virtual machines. In particular, a memory copy can be used for communication between the virtual machines, thereby avoiding communication via a virtual network stack ( 1016 ). Conversely, if the answer to the determination in step ( 1012 ) subsequently the virtual network stack is used for communication between the virtual machines performing the assigned job ( 1018 ) support. Accordingly, the physical proximity of the virtual machines can contribute to an efficient transfer of communication between virtual machines.

Eine physische Maschine unterstützt die virtuelle Maschine und Datenblöcke unterstützen den Job. Der Speicherort der Datenblöcke im gemeinsamen Pool wirkt sich auf die Zuweisung des Jobs zur physische Maschine und die zugehörige(n) virtuelle(n) Maschine(n) aus. Im Besonderen stellt eine effiziente Verwendung der Ressourcen im gemeinsamen Pool die physische Nähe der physischen Maschine zu den Datenblöcken sicher. In einer Ausführungsform wird der Job einer physischen Maschine im gleichen physischen Datencenter wie die abhängigen Datenblöcke zugeordnet. Dementsprechend besteht ein Teil des Zuweisungsprozesses des Jobs in Schritt (1008) in der Sicherstellung, dass ein physischer Speicherort der Datenblöcke im gemeinsamen Pool den Job unterstützt.A physical machine supports the virtual machine and data blocks support the job. The location of the data blocks in the shared pool affects the assignment of the job to the physical machine and the associated virtual machine (s). In particular, efficient use of resources in the shared pool ensures the physical proximity of the physical machine to the data blocks. In one embodiment, the job is mapped to a physical machine in the same physical data center as the dependent data blocks. Accordingly, part of the job assignment process consists of step ( 1008 ) ensuring that a physical location of the data blocks in the shared pool supports the job.

Zusätzlich zum Speicherort der Blöcke ist die Bandbreite der zugrundeliegenden physischen Maschine zur Unterstützung des Jobs entscheidend. Der Schritt der Nutzung der Speichertopologieinformationen in Schritt (1008) kann einen oder mehrere weitere Schritte erforderlich machen. 11 ist ein Flussdiagramm (1100), das die weiteren Schritte zur Unterstützung des Aspekts der Speichertopologienutzung des gemeinsamen Pools von Ressourcen veranschaulicht. Wie oben beschrieben, wird als Reaktion auf die Speichertopologie eine virtuelle Maschine angegeben, die den Job zur Verarbeitung erhält (1 102). Vor der konkreten Jobverarbeitung werden Nutzungsinformationen der physischen Maschine, die für die virtuelle Maschine lokal vorliegt, bestimmt (1104). Die Nutzungsinformationen beinhalten, ohne jedoch darauf beschränkt zu sein, die Verarbeitungseinheit und Netzwerknutzungsinformationen. Es wird bestimmt, ob die zugrundeliegende physische Maschine über die Bandbreite und die Fähigkeit zur Unterstützung des Jobs verfügt (1106). Bei einer negativen Antwort auf die Bestimmung in Schritt (1106) wird zur Auswahl und Zuordnung einer anderen virtuellen Maschine im gemeinsamen Pool zurückgegangen. Demgegenüber ist eine positive Antwort für die Bestimmung in Schritt (1106) ein Hinweis darauf, dass die ausgewählte virtuelle Maschine sowohl eine ausreichende Bandbreite zur Unterstützung des Jobs als auch eine nahe topologische Distanz zum physischen Speicherort des den Job unterstützenden Datenblocks/der Datenblöcke besitzt (1108). In einer Ausführungsform beinhaltet die nahe topologische Distanz, ohne jedoch darauf beschränkt zu sein, Daten, die sich im gleichen Datencenter wie die virtuelle Maschine befinden. Demgemäß beinhaltet der Aspekt der Speichertopologienutzung eine Auswertung der Operation der Maschine zusammen mit dem Speicherort der den Job unterstützenden Daten.In addition to the location of the blocks, the bandwidth of the underlying physical machine to support the job is critical. The step of using the storage topology information in step ( 1008 ) may require one or more additional steps. 11 is a flowchart ( 1100 ), which illustrates the next steps to support the aspect of storage topology usage of the shared pool of resources. As described above, in response to the storage topology, a virtual machine is specified that receives the job for processing ( 1 102 ). Before concrete job processing, usage information of the physical machine that is available locally for the virtual machine is determined ( 1104 ). The usage information includes, but is not limited to, the processing unit and network usage information. It determines whether the underlying physical machine has the bandwidth and ability to support the job ( 1106 ). If the answer to the determination in step ( 1106 ) is returned to the selection and assignment of another virtual machine in the shared pool. In contrast, a positive answer for the determination in step ( 1106 ) an indication that the selected virtual machine has both sufficient bandwidth to support the job and a close topological distance to the physical location of the data block (s) supporting the job ( 1108 ). In one embodiment, the close topological distance includes, but is not limited to, data that is in the same data center as the virtual machine. Accordingly, the aspect of memory topology usage includes evaluating the operation of the machine along with the location of the data supporting the job.

Wie in 10-11 dargestellt, ist ein Verfahren vorgesehen, um die topologische Organisation der Maschinen zusammen mit dem Speicherort der den Job unterstützenden Datenblöcke für eine intelligente Zuweisung eines Jobs zu nutzen. Der Job wird einer Maschine zugeordnet, für die ausgewertet wurde, dass sie eine effiziente Verarbeitung unterstützt. 12 stellt ein Blockdiagramm (1200) dar, das in einem Computersystem integrierte Tools zur Unterstützung einer Technik veranschaulicht, die für die Auswertung der Ressourcennutzung für die Zuweisung eines Jobs in einem gemeinsamen Pool von Ressourcen zum Einsatz kommen. Im Besonderen wird ein gemeinsamer Pool konfigurierbar Computerressourcen mit einem ersten Datencenter (1210), einem zweiten Datencenter (1230) und einem dritten Datencenter (1250) dargestellt. Obwohl in dem Beispiel drei Datencentren dargestellt werden, sollte die Erfindung nicht auf diese Anzahl von Datencentern im Computersystem beschränkt sein. Ein jedes der Datencenter stellt eine Computingressource dar. Demgemäß können ein oder mehrere Datencenter eingesetzt werden, um die effiziente und intelligente Zuweisung von Jobs bezüglich der Ressourcennutzung und der Nähe zu den den Job/die Jobs unterstützenden Datenblöcken zu unterstützen.As in 10-11 a method is provided in order to use the topological organization of the machines together with the storage location of the data blocks supporting the job for an intelligent assignment of a job. The job is assigned to a machine that has been evaluated to support efficient processing. 12 presents a block diagram ( 1200 ), which illustrates tools integrated into a computer system to support a technology that are used for the evaluation of resource use for the assignment of a job in a shared pool of resources. In particular, a common pool of configurable computer resources with a first data center ( 1210 ), a second data center ( 1230 ) and a third data center ( 1250 ). Although three data centers are shown in the example, the invention should not be limited to this number of data centers Computer system may be limited. Each of the data centers represents a computing resource. Accordingly, one or more data centers can be used to support the efficient and intelligent allocation of jobs in terms of resource use and proximity to the data blocks supporting the job (s).

Ein jedes der Datencenter im System wird mit wenigstens einem Server bereitgestellt, der in Kommunikation mit dem Datenspeicher steht. Im Besonderen wird das erste Datencenter (1210) mit einem ersten Server (1220) mit einer Verarbeitungseinheit (1222) bereitgestellt, die über einen Bus (1226) mit einem Speicher (1224) in Kommunikation steht und in Kommunikation mit einem Datenspeicher (1228) steht; das zweite Datencenter (1230) wird mit einem zweiten Server (1240) bereitgestellt, der eine Verarbeitungseinheit (1242) aufweist, die über einen Bus (1246) in Kommunikation mit einem Speicher (1244) steht und in Kommunikation mit einem zweiten lokalen Speicher (1248) steht; und das dritte Datencenter (1250) wird mit einem dritten Server (1260) mit einer Verarbeitungseinheit (1262) bereitgestellt, die über einen Bus (1266) mit einem Speicher (1264) in Kommunikation steht und mit einem dritten lokalen Speicher (1268) in Kommunikation steht. Der erste Server (1022) wird in diesem Dokument auch alles ein physischer Host bezeichnet. Die Kommunikation zwischen den Datencentern wird über eine oder mehrere Netzwerkverbindungen (1205) unterstützt.Each of the data centers in the system is provided with at least one server that is in communication with the data store. In particular, the first data center ( 1210 ) with a first server ( 1220 ) with a processing unit ( 1222 ) provided via a bus ( 1226 ) with a memory ( 1224 ) is in communication and in communication with a data storage device ( 1228 ) stands; the second data center ( 1230 ) with a second server ( 1240 ) which a processing unit ( 1242 ) via a bus ( 1246 ) in communication with a memory ( 1244 ) and is in communication with a second local storage ( 1248 ) stands; and the third data center ( 1250 ) is connected to a third server ( 1260 ) with a processing unit ( 1262 ) provided via a bus ( 1266 ) with a memory ( 1264 ) is in communication and with a third local storage ( 1268 ) is in communication. The first server ( 1022 ) everything in this document is also referred to as a physical host. Communication between the data centers is via one or more network connections ( 1205 ) supports.

Der zweite Server (1240) enthält zwei virtuelle Maschinen (1232) und (1236). Die erste virtuelle Maschine (1232) verfügt über einen integrierten Agent (1232a), und die zweite virtuelle Maschine (1236) verfügt über einen integrierten Agent (1236a). Darüber hinaus enthält der zweites Server (1240) einen Monitor (1234), um die Kommunikation jeweils mit der ersten und der zweiten virtuellen Maschine (1232) und (1236) zu vereinfachen. Der dritte Server (1260) enthält zwei virtuelle Maschinen (1252) und (1256). Die erste virtuelle Maschine (1252) verfügt über einen integrierten Agent (1252a), und die zweite virtuelle Maschine (1256) verfügt über einen integrierten Agent (1256a). Darüber hinaus enthält der dritte Server (1260) einen Monitor (1254), um die Kommunikation jeweils mit der ersten und zweiten virtuellen Maschine (1252) und (1256) zu vereinfachen. Obwohl nur zwei virtuelle Maschinen (1232) und (1236) als in Kommunikation mit dem zweiten Server (1240) befindlich dargestellt sind und nur zwei virtuelle Maschinen (1252) und (1256) als in Kommunikation mit dem dritten Server (1260) befindlich dargestellt sind, soll die Erfindung nicht auf diese Anzahl beschränkt sein, da diese Anzahl nur zum Zwecke der Veranschaulichung verwendet wird. Die Anzahl der virtuellen Maschinen, die jeweils in Kommunikation mit dem zweiten und dritten Server (1240) und (1260) stehen, kann jeweils erhöht oder verringert werden.The second server ( 1240 ) contains two virtual machines ( 1232 ) and ( 1236 ). The first virtual machine ( 1232 ) has an integrated agent ( 1232a) , and the second virtual machine ( 1236 ) has an integrated agent ( 1236a) . In addition, the second server ( 1240 ) a monitor ( 1234 ) to communicate with the first and second virtual machines, respectively ( 1232 ) and ( 1236 ) to simplify. The third server ( 1260 ) contains two virtual machines ( 1252 ) and ( 1256 ). The first virtual machine ( 1252 ) has an integrated agent ( 1252a) , and the second virtual machine ( 1256 ) has an integrated agent ( 1256a) . In addition, the third server ( 1260 ) a monitor ( 1254 ) to communicate with the first and second virtual machines, respectively ( 1252 ) and ( 1256 ) to simplify. Although only two virtual machines ( 1232 ) and ( 1236 ) than in communication with the second server ( 1240 ) are shown and only two virtual machines ( 1252 ) and ( 1256 ) than in communication with the third server ( 1260 ) are shown, the invention should not be limited to this number, since this number is used only for the purpose of illustration. The number of virtual machines each in communication with the second and third servers ( 1240 ) and ( 1260 ) can be increased or decreased in each case.

Wie in diesem Dokument dargestellt, unterstützt jeweils jeder von zweitem und drittem Server (1240) und (1260) jeweils zwei virtuelle Maschinen (1232), (1236) und (1252), (1256). Der Monitor (1234) des Servers (1230) erhebt Statusdaten von jeder der virtuellen Maschinen (1232) und (1236). Der Monitor (1234) kommuniziert mit integrierten Agents (1232a) und (1236a), um jeweils den virtuellen Maschinen-Status von den virtuellen Maschinen (1232) und (1236) zu erheben. Ebenso erhebt der Monitor (1254) Statusdaten jeweils von jeder der virtuellen Maschinen (1252) und (1256) und im Besonderen von den integrierten Agents (1252a) und (1256a).As shown in this document, each supports the second and third server ( 1240 ) and ( 1260 ) two virtual machines each ( 1232 ), ( 1236 ) and ( 1252 ), ( 1256 ). The display ( 1234 ) of the server ( 1230 ) collects status data from each of the virtual machines ( 1232 ) and ( 1236 ). The display ( 1234 ) communicates with integrated agents ( 1232a) and ( 1236a ) to see the virtual machine status of the virtual machines ( 1232 ) and ( 1236 ) to raise. Likewise, the monitor ( 1254 ) Status data from each of the virtual machines ( 1252 ) and ( 1256 ) and especially from the integrated agents ( 1252a) and ( 1256a ).

Der erste Server (1220) wird mit einer Funktionseinheit (1270) bereitgestellt, die eine oder mehrere Tools zur Unterstützung der intelligenten Zuweisung von einem oder mehreren Jobs im gemeinsamen Pool von Ressourcen aufweist. Die Funktionseinheit (1270) wird für das erste Datencenter (1210) lokal und im Besonderen in Kommunikation mit dem Speicher (1224) dargestellt. In einer Ausführungsform kann die Funktionseinheit (1270) lokal zu einem der Datencenter in dem gemeinsamen Pool von Ressourcen angeordnet sein. Die in der Funktionseinheit (1270) integrierten Tools beinhalten, ohne jedoch darauf beschränkt zu sein, einen Director (1272), einen Topologiemanager (1274), einen Hook-Manager (1276), einen Speichertopologiemanager (1278), einen Ressourcennutzungsmanager (1280) und einen Anwendungsmanager (1282).The first server ( 1220 ) with a functional unit ( 1270 ), which has one or more tools to support the intelligent assignment of one or more jobs in the shared pool of resources. The functional unit ( 1270 ) for the first data center ( 1210 ) locally and in particular in communication with the storage ( 1224 ). In one embodiment, the functional unit ( 1270 ) be local to one of the data centers in the shared pool of resources. The in the functional unit ( 1270 ) built-in tools include, but are not limited to, a director ( 1272 ), a topology manager ( 1274 ), a hook manager ( 1276 ), a storage topology manager ( 1278 ), a resource usage manager ( 1280 ) and an application manager ( 1282 ).

Der Director (1272) ist in dem gemeinsamen Pool vorgesehen, um in periodischen Abständen mit den Monitoren (1234) und (1254) zu kommunizieren, um eine Speichertopologie, die einer virtuellen Topologie des gemeinsamen Pools von Ressourcen zugrundeliegt, zusammen mit zugehörigen Ressourcennutzungsinformationen an einem einzelnen Speicherort zu organisieren und beizubehalten. Im Besonderen unterstützt die Kommunikation des Director (1272) mit den Monitoren (1234) und (1254) das Erfassen und Organisieren der Topologie des gemeinsamen Pools von Ressourcen. Durch das Organisieren und das Verständnis der Topologiedaten kann der Director (1272) die Ressourcennutzungsinformationen zur intelligenten Zuweisung eines Jobs an eine oder mehrere der gemeinsamen Ressourcen im Pool nutzen, und das auf eine Weise, die die effiziente Leistung des zum Job gehörenden I/O unterstützt. Demgemäß erfasst der Director (1272) die Topologie und nutzt diese, um die effiziente Verarbeitung von Lese- und Schreibjobs im gemeinsamen Pool von Ressourcen zu unterstützen.The director ( 1272 ) is provided in the common pool to periodically with the monitors ( 1234 ) and ( 1254 ) to organize and maintain a storage topology that underlies a virtual topology of the shared pool of resources along with related resource usage information in a single location. In particular, the communication of the Director ( 1272 ) with the monitors ( 1234 ) and ( 1254 ) Collecting and organizing the topology of the shared pool of resources. By organizing and understanding the topology data, the director ( 1272 ) use the resource usage information to intelligently assign a job to one or more of the shared resources in the pool, in a manner that supports the efficient performance of the job's I / O. Accordingly, the director ( 1272 ) the topology and uses it to support the efficient processing of read and write jobs in the shared pool of resources.

Wie oben beschrieben, werden verschiedene Manager zur Unterstützung der Funktionalität des Directors (1272) bereitgestellt. Der Topologiemanager (1274), der mit dem Director (1272) in Kommunikation steht, gibt in seiner Funktion dem Director (1272) virtuelle topologische Distanzdaten zurück. Zu den virtuellen topologischen Distanzdaten gehören, ohne jedoch darauf beschränkt zu sein, eine Distanz zwischen zwei virtuellen Maschinen und eine Distanz zwischen einer virtuellen Maschine und einem Datenblock. As described above, various managers are employed to support the functionality of the director ( 1272 ) provided. The topology manager ( 1274 ) with the Director ( 1272 ) is in communication, gives the director ( 1272 ) virtual topological distance data back. The virtual topological distance data includes, but is not limited to, a distance between two virtual machines and a distance between a virtual machine and a data block.

Zum Beispiel werden zwei virtuelle Maschinen, die mit dem gleichen Server in Kommunikation stehen, als in einer vergleichsweisen engen Nähe zueinander befindlich betrachtet. Eine zweite virtuelle Maschine, die mit einem zweiten Server in Kommunikation steht, und eine dritte virtuelle Maschine, die mit einen dritten Server in Kommunikation steht, werden jedoch im Vergleich zu den zwei virtuellen Maschinen, die in Kommunikation mit dem gleichen Server stehen, als vergleichsweise entfernt betrachtet. Der Speichertopologiemanager (1278), der mit dem Director (1272) in Kommunikation steht, gibt in seiner Funktion einen physischen Speicherort von einem oder mehreren Datenblöcken zur Unterstützung eines Jobs im gemeinsamen Pool von Ressourcen zurück. In einer Ausführungsform gibt der Speichertopologiemanager (1278) den physischen Speicherort der Datenblöcke dem Director (1272) zurück, wodurch der Director einer virtuellen Maschine auf intelligente Weise einen Job als Reaktion auf den Speicherort des/der abhängigen Datenblocks/Datenblöcke zuweisen kann. Demgemäß adressiert der Topologiemanager (1274) in seiner Funktion Distanzen in der Hierarchie bezüglich der effizienten Jobverarbeitung, und der Speichertopologiemanager (1278) adressiert in seiner Funktion den Speicherort des den Job unterstützenden Datenblocks.For example, two virtual machines that are communicating with the same server are considered to be in relatively close proximity to each other. However, a second virtual machine that is in communication with a second server and a third virtual machine that is in communication with a third server are compared to the two virtual machines that are in communication with the same server considered distant. The storage topology manager ( 1278 ) with the Director ( 1272 ) in communication, returns a physical location of one or more data blocks to support a job in the shared pool of resources. In one embodiment, the storage topology manager ( 1278 ) the physical location of the data blocks to the Director ( 1272 ) back, allowing the director to intelligently assign a job to a virtual machine in response to the location of the dependent data block (s). Accordingly, the topology manager addresses ( 1274 ) in its function distances in the hierarchy with regard to efficient job processing, and the storage topology manager ( 1278 ) addresses the storage location of the data block supporting the job.

Ferner werden drei weitere Manager bereitgestellt, darunter ein Ressourcennutzungsmanager (1280), ein Anwendungsmanager (1282) und ein Hook-Manager (1274). Der Ressourcennutzungsmanager (1280) verwaltet in seiner Funktion die Nutzung von einer oder mehreren physischen oder virtuellen Ressourcen. Jede Ressource hat ihre eigenen Beschränkungen. Der Ressourcennutzungsmanager (1280) gibt Nutzungsinformationen einer Verarbeitungseinheit und Netzwerknutzungsinformationen, die zu den zugrundeliegenden physischen und virtuellen Maschinen gehören, an den Director (1272) zurück. Der Anwendungsmanager (1282), der in Kommunikation mit dem Ressourcennutzungsmanager (1280) steht, weist den Job einer virtuellen Maschine in Reaktion auf die Ressourcennutzungsinformationen zu. Im Besonderen stellt der Anwendungsmanager (1282) sicher, dass die Zuweisung eines Jobs zu einer Maschine in der Topologie sicherstellt, dass die Maschine über eine ausreichende Bandbreite zur Unterstützung des Jobs wie auch eine ausreichende nahe topologische Distanz zu den den Job unterstützenden Datenblöcken verfügt. Dementsprechend werden vom Ressourcennutzungsmanager (1280) und den Anwendungsmanager (682) jeweils sowohl die Nutzung wie auch die Bandbreite berücksichtigt.Three more managers are also provided, including a resource usage manager ( 1280 ), an application manager ( 1282 ) and a hook manager ( 1274 ). The resource usage manager ( 1280 ) manages the use of one or more physical or virtual resources. Each resource has its own limitations. The resource usage manager ( 1280 ) gives the processing unit usage information and network usage information related to the underlying physical and virtual machines to the director ( 1272 ) back. The application manager ( 1282 ) in communication with the resource use manager ( 1280 ) assigns the job to a virtual machine in response to the resource usage information. In particular, the application manager ( 1282 ) ensures that the assignment of a job to a machine in the topology ensures that the machine has sufficient bandwidth to support the job as well as a sufficient close topological distance to the data blocks supporting the job. Accordingly, the resource usage manager ( 1280 ) and the application manager ( 682 ) both usage and bandwidth are taken into account.

Zusätzlich zu den detailliert oben beschriebenen Managern dient der Hook-Manager (674) zur Vereinfachung der Kommunikation zwischen virtuellen Maschinen. Im Besonderen wird der Hook-Manager (674), der in Kommunikation mit dem Director (672) steht, bereitgestellt, um einen Shared Memory-Channel für die Kommunikation zwischen virtuellen Maschinen zu erstellen. Der Shared Memory-Channel vereinfacht die Kommunikation zwischen zwei virtuellen Maschinen, die sich auf der gleichen physischen Maschine befinden, indem ermöglicht wird, dass die Datenübertragung zwischen zwei solchen virtuellen Maschinen im gleichen Speicherstapel stattfinden kann, z. B. über den Memory-Channel. Dementsprechend unterstützt der vom Hook-Manager (674) erstellte Shared Memory-Channel die effiziente Datenkommunikation in der hierarchischen Struktur des gemeinsamen Pools von Ressourcen.In addition to the managers described in detail above, the hook manager ( 674 ) to simplify communication between virtual machines. In particular, the hook manager ( 674 ) who is in communication with the Director ( 672 ) is provided to create a shared memory channel for communication between virtual machines. The shared memory channel simplifies communication between two virtual machines that are on the same physical machine by allowing data transfer between two such virtual machines to take place in the same memory stack, e.g. B. via the memory channel. Accordingly, the hook manager ( 674 ) shared memory channel created the efficient data communication in the hierarchical structure of the shared pool of resources.

Wie oben bestimmt, werden der Director (1272), der Topologiemanager (1274), der Hook-Manager (1276), der Speichertopologiemanager (1278), der Ressourcennutzungsmanager (1280) und der Anwendungsmanager (1282) lokal im ersten Datencenter (1210) als in der Funktionseinheit (1270) des Servers (1220) befindlich dargestellt. Obwohl in einer Ausführungsform die Funktionseinheit (1270) und der jeweils zugeordnete Director und die Manager als Hardwaretools extern zum Speicher (1224) des Servers (1220) des ersten Datencenters (1210) vorhanden sein können, können sie als eine Kombination aus Hardware und Software implementiert werden, oder sie können sich lokal im zweiten Datencenter (1230) oder im dritten Datencenter (1250) im gemeinsamen Pool von Ressourcen befinden. Ebenso können in einer Ausführungsform der Director und die Manager in einem einzelnen Funktionselement vereint werden, das die Funktionalität der einzelnen Elemente beinhaltet. Wie in diesem Dokument dargestellt, ist jeder von Director und den Managern in einem Datencenter lokal dargestellt. In einer Ausführungsform können sie jedoch zusammen oder einzeln über den gemeinsamen Pool der konfigurierbaren Computerressourcen verteilt sein und als Einheit fungieren, um die Topologie der Verarbeitungseinheiten und der Datenspeicherung im gemeinsamen Pool auszuwerten, und um einen oder mehrere Jobs in Reaktion auf die Hierarchie zu verarbeiten. Dementsprechend können die Manager als Softwaretools, Hardwaretools oder eine Kombination aus Software- und Hardwaretools implementiert werden.As determined above, the Director ( 1272 ), the topology manager ( 1274 ), the hook manager ( 1276 ), the storage topology manager ( 1278 ), the resource use manager ( 1280 ) and the application manager ( 1282 ) locally in the first data center ( 1210 ) than in the functional unit ( 1270 ) of the server ( 1220 ) shown. In one embodiment, although the functional unit ( 1270 ) and the assigned director and managers as hardware tools external to the storage ( 1224 ) of the server ( 1220 ) of the first data center ( 1210 ), they can be implemented as a combination of hardware and software, or they can be located locally in the second data center ( 1230 ) or in the third data center ( 1250 ) are in the shared pool of resources. Likewise, in one embodiment, the director and the manager can be combined in a single functional element that contains the functionality of the individual elements. As shown in this document, each of Director and the managers is represented locally in a data center. In one embodiment, however, they can be distributed together or individually over the common pool of configurable computer resources and act as a unit to evaluate the topology of the processing units and data storage in the common pool, and to process one or more jobs in response to the hierarchy. Accordingly, the managers can be implemented as software tools, hardware tools or a combination of software and hardware tools.

Fachleute werden verstehen, dass Aspekte der vorliegenden Erfindung als System, Verfahren oder Computerprogrammprodukt ausgeführt werden können. Demgemäß können Aspekte der vorliegenden Erfindung die Form einer reinen Hardwareausführungsform annehmen, einer reinen Softwareausführungsform (einschließlich Firmware, speicherresidenter Software, Mikrocode usw.) oder einer Ausführungsform, die Software- und Hardwareaspekte vereint, die im Allgemeinen hier alle als „Schaltkreis“, „Modul“ oder „System“ bezeichnet werden können. Weiterhin können Aspekte der vorliegenden Erfindung die Form eines Computerprogrammprodukts annehmen, das in einem computerlesbaren Medium oder mehreren computerlesbaren Medien, die computerlesbaren Programmcode enthalten, verkörpert ist. Those skilled in the art will understand that aspects of the present invention can be implemented as a system, method, or computer program product. Accordingly, aspects of the present invention may take the form of a hardware-only embodiment, a software-only embodiment (including firmware, resident software, microcode, etc.) or an embodiment that combines software and hardware aspects, all of which are generally referred to herein as "circuit", "module.""Or" System "can be called. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable media that contain computer readable program code.

Jede Kombination aus einem computerlesbaren Medium oder mehreren computerlesbaren Medien kann verwendet werden. Das computerlesbare Medium kann ein computerlesbares Signalmedium oder ein computerlesbares Speichermedium sein. Ein computerlesbares Speichermedium kann zum Beispiel, ohne jedoch darauf beschränkt zu sein, ein elektronisches, magnetisches, optisches, elektromagnetisches, Infrarot- oder Halbleitersystem, - vorrichtung, -einrichtung oder jede geeignete Kombination aus den Vorhergehenden sein. Zu den weiteren speziellen Beispielen (eine nicht erschöpfende Liste) von computerlesbaren Speichermedien gehören folgende: eine elektrische Verbindung mit einem oder mehreren Kabeln, eine tragbare Computerdiskette, eine Festplatte, Direktzugriffsspeicher (RAM bzw. Random Access Memory), Festspeicher (ROM bzw. Read-only Memory), ein löschbarer programmierbarer Festspeicher (EPROM- oder Flash-Speicher), eine Glasfaser, ein tragbarer Compact Disc-Festspeicher (CD-ROM bzw. Compact Disc Read-only Memory), eine optische Speichervorrichtung, eine magnetische Speichervorrichtung oder jede geeignete Kombination des Vorhergehenden. Im Kontext dieses Dokuments kann ein computerlesbares Speichermedium jedes materielle Medium sein, das ein Programm für die Verwendung durch oder in Verbindung mit einem Anweisungsausführungssystem, einer Anweisungsausführungsvorrichtung oder einem Anweisungsausführungsvorrichtung speichern kann.Any combination of one or more computer readable media can be used. The computer readable medium can be a computer readable signal medium or a computer readable storage medium. For example, but not limited to, a computer readable storage medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, device, or any suitable combination of the foregoing. Other special examples (a non-exhaustive list) of computer-readable storage media include the following: an electrical connection with one or more cables, a portable computer diskette, a hard disk, random access memory (RAM or random access memory), read-only memory (ROM or read- only memory), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM or compact disc read-only memory), an optical storage device, a magnetic storage device or any suitable Combination of the previous. In the context of this document, a computer readable storage medium may be any tangible medium that can store a program for use by or in connection with an instruction execution system, instruction execution device, or instruction execution device.

Zu den computerlesbaren Signalmedien kann ein weitergeleitetes Datensignal mit computerlesbarem Programmcode darin gehören, zum Beispiel im Basisband oder als Teil einer Trägerwelle. Solch ein weitergeleitetes Signal kann eine beliebige einer Vielzahl an Formen annehmen, einschließlich, ohne jedoch darauf beschränkt zu sein, eine elektromagnetische Form, optische Form oder jede geeignete Kombination davon. Ein computerlesbares Signalmedium kann ein beliebiges computerlesbares Medium sein, das kein computerlesbares Speichermedium ist und das ein Programm zur Verwendung oder in Verbindung mit einem Anweisungsausführungssystem, einer Anweisungsausführungsvorrichtung oder einem Anweisungsausführungsvorrichtung kommunizieren, weiterleiten oder transportieren kann.The computer-readable signal media can include a forwarded data signal with computer-readable program code therein, for example in the baseband or as part of a carrier wave. Such a relayed signal may take any of a variety of forms, including, but not limited to, an electromagnetic, optical, or any suitable combination thereof. A computer readable signal medium can be any computer readable medium that is not a computer readable storage medium and that can communicate, forward, or transport a program for use or in conjunction with an instruction execution system, instruction execution device, or instruction execution device.

Auf einem computerlesbaren Medium enthaltener Programmcode kann mit einem angemessenen Medium übertragen werden, darunter, ohne jedoch darauf beschränkt zu sein, Funk, Kabel, Glasfaser, HF oder jede geeignete Kombination aus dem vorhergehenden.Program code contained on a computer readable medium can be transmitted using an appropriate medium, including, but not limited to, radio, cable, fiber, RF, or any suitable combination of the foregoing.

Computerprogrammcode zum Ausführen der Operationen für Aspekte der vorliegenden Erfindung kann in jeder beliebigen Kombination einer oder mehrerer Sprachen geschrieben sein, darunter objektorientierte Programmiersprachen wie Java, Smalltalk, C++ oder Ähnliches und herkömmliche prozedurale Programmiersprachen, wie die Programmiersprache „C“ oder ähnliche Programmiersprachen. Der Programmcode kann ganz auf dem Rechner des Benutzers ausgeführt werden, teilweise auf dem Computer des Benutzers, als eigenständiges Softwarepaket, teilweise auf dem Computer des Benutzers und teilweise auf einem entfernten Computer oder ganz auf dem entfernten Computer oder Server. Im letzteren Szenario kann der Ferncomputer mit dem Computer des Benutzers über jede Art von Netzwerk verbunden sein, darunter ein lokales Netzwerk (LAN bzw. Local Area Network) oder ein Weitverkehrsnetzwerk (WAN bzw. Wide Area Network), oder die Verbindung kann zu einem externen Computer hergestellt werden (zum Beispiel über das Internet mit einem Internetdienstanbieter).Computer program code for performing the operations for aspects of the present invention may be written in any combination of one or more languages, including object-oriented programming languages such as Java, Smalltalk, C ++ or the like, and conventional procedural programming languages such as the "C" programming language or similar programming languages. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external one Computers are manufactured (for example over the Internet with an Internet service provider).

Aspekte der vorliegenden Erfindung werden in diesem Dokument mit Bezugnahme auf Flussdiagrammabbildungen und/oder Blockdiagramme von Verfahren, Vorrichtungen (Systeme) und Computerprogrammprodukten gemäß den Ausführungsformen der Erfindung beschrieben. Es versteht sich, dass jeder Block der Flussdiagrammabbildungen und/oder Blockdiagramme und Kombinationen der Blöcke in den Flussdiagrammabbildungen und/oder Blockdiagrammen durch Computerprogrammanweisungen implementiert werden können. Diese Computerprogrammanweisungen können einem Prozessor eines Standardcomputers, eines Spezialcomputers oder einer anderen programmierbaren Datenverarbeitungsvorrichtung bereitgestellt werden, um eine Maschine zu erzeugen, sodass die Anweisungen, die über den Prozessor des Computers oder einer anderen programmierbaren Datenverarbeitungsvorrichtung ausgeführt werden, Mittel zur Implementierung der im Flussdiagramm und/oder Blockdiagrammblock bzw. -blöcken angegebenen Funktionen/Vorgänge erstellen.Aspects of the present invention are described in this document with reference to flowchart illustrations and / or block diagrams of methods, devices (systems) and computer program products according to the embodiments of the invention. It is understood that each block of the flowchart illustrations and / or block diagrams and combinations of the blocks in the flowchart illustrations and / or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a standard computer, special purpose computer or other programmable data processing device in order to generate a machine, so that the instructions which are executed via the processor of the computer or another programmable data processing device, means for implementing the flowchart and / or or create block diagram block or blocks of specified functions / operations.

Diese Computerprogrammanweisungen können auch in einem computerlesbaren Medium gespeichert werden, das einen Computer, eine andere programmierbare Datenverarbeitungsvorrichtung oder andere Vorrichtungen anweisen kann, in einer bestimmten Weise zu funktionieren, sodass die auf dem computerlesbaren Medium gespeicherten Anweisungen ein Produkt erstellen, das Anweisungen aufweist, die die Funktion/den Vorgang implementieren, der/die im Flussdiagramm und/oder Blockdiagrammblock bzw. -blöcke angegeben sind. These computer program instructions can also be stored in a computer readable medium that can instruct a computer, other programmable computing device, or other device to function in a particular manner so that the instructions stored on the computer readable medium create a product that includes instructions that Implement the function / process specified in the flowchart and / or block diagram block or blocks.

Die Computerprogrammanweisungen können auch auf einen Computer, eine andere programmierbare Datenverarbeitungsvorrichtung oder andere Vorrichtungen geladen werden, um die Durchführung einer Reihe operativer Schritte auf dem Computer oder auf anderen programmierbaren Vorrichtungen oder anderen Vorrichtungen zu veranlassen, um einen computerimplementierten Prozess zu erzeugen, sodass die auf dem Computer oder der programmierbaren Vorrichtung ausgeführten Anweisungen die Prozesse zum Implementieren der Funktionen/Vorgänge bereitstellen, die in dem Flussdiagramm und/oder Blockdiagrammblock bzw. -blöcken angegeben sind.The computer program instructions may also be loaded onto a computer, other programmable computing device, or other devices to cause a series of operational steps to be performed on the computer, or on other programmable devices, or other devices, to create a computer-implemented process such that those on the computer Instructions executed on the computer or the programmable device provide the processes for implementing the functions / operations specified in the flowchart and / or block diagram block or blocks.

Es wird nun auf 13 Bezug genommen, die ein Blockdiagramm (1300) darstellt, das ein System zur Implementierung einer Ausführungsform der vorliegenden Erfindung darstellt. Das Computersystem enthält eine oder mehrere Prozessoren, wie einen Prozessor (1302). Der Prozessor (1302) ist mit einer Kommunikationsinfrastruktur (1304) verbunden (z. B. einem Kommunikationsbus, einer Cross-Over-Verbindung oder einem Netzwerk). Das Computersystem kann eine Anzeigeschnittstelle (1306) enthalten, die Grafiken, Text und andere Daten von der Kommunikationsinfrastruktur (1304) (oder von einem nicht dargestellten Frame Buffer) zur Anzeige auf eine Anzeigeeinheit (1308) weiterleitet. Das Computersystem kann auch einen Hauptspeicher (1310) enthalten, vorzugsweise einen Direktzugriffsspeicher (RAM, Random Access Memory), und es kann auch einen sekundären Speicher (1312) enthalten. Der sekundäre Speicher (1312) kann beispielsweise ein Festplattenlaufwerk (1314) und/oder ein Laufwerk für entfernbare Speichermedien (1316) enthalten, das zum Beispiel ein Diskettenlaufwerk, ein Magnetbandlaufwerk oder ein Laufwerk für optische Speichermedien darstellt. Das Laufwerk für entfernbare Speichermedien (1316) liest von und/oder schreibt auf eine entfernbare Speichermedieneinheit (1318) in einer Weise, die Fachleuten wohlbekannt ist. Die entfernbare Speichereinheit (1318) stellt zum Beispiel eine Diskette, eine CD, ein Magnetband oder eine optische Platte usw. dar, die von dem Laufwerk für entfernbare Speichermedien (1316) gelesen sowie davon darauf geschrieben wird. Dabei ist zu beachten, dass die entfernbare Speichereinheit (1318) ein computerlesbares Medium enthält, auf dem die Computersoftware und/oder die Daten gespeichert sind.It is now going on 13 Which is a block diagram ( 1300 ) representing a system for implementing an embodiment of the present invention. The computer system contains one or more processors, such as a processor ( 1302 ). The processor ( 1302 ) is with a communication infrastructure ( 1304 ) connected (e.g. a communication bus, a cross-over connection or a network). The computer system can have a display interface ( 1306 ) that contain graphics, text and other data from the communication infrastructure ( 1304 ) (or from a frame buffer, not shown) for display on a display unit ( 1308 ) forwards. The computer system can also have a main memory ( 1310 ), preferably a random access memory (RAM), and it can also be a secondary memory ( 1312 ) contain. The secondary storage ( 1312 ), for example, a hard drive ( 1314 ) and / or a drive for removable storage media ( 1316 ) which, for example, represents a floppy disk drive, a magnetic tape drive or a drive for optical storage media. The removable storage drive ( 1316 ) reads from and / or writes to a removable storage media unit ( 1318 ) in a manner well known to those skilled in the art. The removable storage device ( 1318 ) represents, for example, a floppy disk, a CD, a magnetic tape or an optical disk, etc., which is removed from the removable storage drive ( 1316 ) is read and written on it. Please note that the removable storage unit ( 1318 ) contains a computer-readable medium on which the computer software and / or the data are stored.

In alternativen Ausführungsformen kann der zweite Speicher (1312) andere ähnliche Mittel enthalten, damit Computerprogramme oder andere Anweisungen auf das Computersystem geladen werden können. Zu diesen Mitteln können beispielsweise eine entfernbare Speichereinheit (1320) und eine Schnittstelle (1322) gehören. Zu den Beispielen dieser Mittel können ein Programmpaket und eine Paketschnittstelle (wie sie im Videospielgeräten zu finden sind), ein entfernbarer Speicherchip (wie ein EPROM oder PROM) und ein zugehöriges Socket und andere entfernbare Speichereinheiten (1320) und Schnittstellen (1322) gehören, mit denen Software und Daten von der entfernbaren Speichereinheit (1320) auf das Computersystem übertragen werden können.In alternative embodiments, the second memory ( 1312 ) contain other similar means so that computer programs or other instructions can be loaded onto the computer system. A removable storage unit ( 1320 ) and an interface ( 1322 ) belong. Examples of these means include a program package and a package interface (as found in video game machines), a removable memory chip (such as an EPROM or PROM) and an associated socket and other removable memory units ( 1320 ) and interfaces ( 1322 ) with which software and data from the removable storage device ( 1320 ) can be transferred to the computer system.

Das Computersystem kann auch eine Kommunikationsschnittstelle (1324) enthalten. Die Kommunikationsschnittstelle (1324) ermöglicht, dass Software und Daten zwischen dem Computersystem und externen Vorrichtungen übertragen werden können. Zu den Beispielen von Kommunikationsschnittstellen (1324) können gehören ein Modem, eine Netzwerkschnittstelle (wie eine Ethernet-Karte), ein Datenübertragungsport oder ein PCMCIA-Slot und -Karte usw. Über die Kommunikationsschnittstelle (1324) übertragene Software und Daten liegen in Form von Signalen vor, die beispielsweise elektronische, elektromagnetische, optische oder andere Signale sein können, die von der Kommunikationsschnittstelle (1324) empfangen werden können. Diese Signale werden der Kommunikationsschnittstelle (1324) über einen Kommunikationspfad (d. h. Channel) (1326) bereitgestellt. Dieser Kommunikationspfad (1326) überträgt Signale und kann unter Verwendung von Verdrahtungen oder Kabeln, Glasfaserkabeln, einer Telefonleitung, einer Funktelefonverbindung, einer Hochfrequenz (HF)-Verbindung und/oder anderen Kommunikationskanälen implementiert werden.The computer system can also have a communication interface ( 1324 ) contain. The communication interface ( 1324 ) enables software and data to be transferred between the computer system and external devices. To the examples of communication interfaces ( 1324 ) can include a modem, a network interface (such as an Ethernet card), a data transmission port or a PCMCIA slot and card, etc. Via the communication interface ( 1324 ) The transmitted software and data are in the form of signals, which can be, for example, electronic, electromagnetic, optical or other signals that are transmitted by the communication interface 1324 ) can be received. These signals are sent to the communication interface ( 1324 ) via a communication path (ie channel) ( 1326 ) provided. This communication path ( 1326 ) transmits signals and can be implemented using wiring or cables, fiber optic cables, a telephone line, a radiotelephone connection, a radio frequency (RF) connection and / or other communication channels.

In dem vorliegenden Dokument werden die Begriffe „Computerprogrammmedium“, „von einem Computer verwendbares Medium“ und „computerlesbares Medium“ verwendet, um in allgemeiner Weise auf Medien Bezug zu nehmen, wie den Hauptspeicher (1310) und den sekundären Speicher (1312), das Laufwerk für entfernbare Speichermedien (1316) und eine im Festplattenlaufwerk (1314) installierte Festplatte.Throughout this document, the terms "computer program medium", "medium usable by a computer" and "computer readable medium" are used to refer generally to media such as the main memory ( 1310 ) and secondary storage ( 1312 ), the drive for removable storage media ( 1316 ) and one in the hard drive ( 1314 ) installed hard disk.

Computerprogramme (auch alles Steuerlogik des Computers bezeichnet) werden im Hauptspeicher (1310) und/oder im sekundären Speicher (1312) gespeichert. Computerprogramme können auch über eine Kommunikationsschnittstelle (1324) empfangen werden. Diese Computerprogramme ermöglichen bei der Ausführung dem Computersystem, die Merkmale der vorliegenden Erfindung, wie in diesem Dokument beschrieben, auszuführen. Im Besonderen ermöglichen die Computerprogramme bei der Ausführung dem Prozessor (1302), die Merkmale des Computersystems auszuführen. Dementsprechend stellen diese Computerprogramme Steuereinheiten des Computersystems dar. Computer programs (also called control logic of the computer) are stored in the main memory ( 1310 ) and / or in secondary storage ( 1312 ) saved. Computer programs can also be operated via a communication interface ( 1324 ) are received. These computer programs, when executed, enable the computer system to carry out the features of the present invention as described in this document. In particular, the computer programs enable the processor to execute ( 1302 ) to perform the features of the computer system. Accordingly, these computer programs represent control units of the computer system.

Das Flussdiagramm und die Blockdiagramme in den Abbildungen (FIG.) veranschaulichen die Architektur, Funktionalität und Operation der möglichen Implementierungen der Systeme, Verfahren und Computerprogrammprodukte gemäß verschiedener Ausführungsformen der vorliegenden Erfindung. In dieser Hinsicht kann jeder Block in den Flussdiagrammen oder Blockdiagrammen ein Modul, ein Segment oder einen Codeabschnitt darstellen, der eine oder mehrere ausführbare Anweisungen zur Implementierung der angegebenen logischen Funktion(en) aufweist. Es sollte ebenso beachtet werden, dass in einigen alternativen Implementierungen die im Block angegebenen Funktionen nicht in der in den FIG. angegeben Reihenfolge auftreten können. Zum Beispiel können in Abhängigkeit von der beinhalteten Funktionalität zwei aufeinanderfolgende Blöcke im Wesentlichen gleichzeitig ausgeführt werden, oder Blöcke können manchmal in der umgekehrten Reihenfolge ausgeführt werden. Es sollte auch beachtet werden, dass jeder Block in den Blockdiagramm- und/oder Flussdiagrammabbildungen und die Kombinationen der Blöcke in den Blockdiagramm- und/oder Flussdiagrammabbildungen durch hardwarebasierte Spezialsysteme, die die angegebenen Funktionen oder Vorgänge ausführen, oder Kombinationen von Spezialhardware und Computeranweisungen implementiert werden können.The flowchart and block diagrams in the figures (FIG.) Illustrate the architecture, functionality, and operation of the possible implementations of the systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagram may represent a module, segment, or section of code that has one or more executable instructions for implementing the specified logic function (s). It should also be noted that in some alternative implementations, the functions specified in the block are not shown in the functions shown in FIG. specified order can occur. For example, depending on the functionality included, two consecutive blocks can be executed substantially simultaneously, or blocks can sometimes be executed in the reverse order. It should also be noted that each block in the block diagram and / or flowchart illustrations and the combinations of the blocks in the block diagram and / or flowchart illustrations are implemented by special hardware-based systems that perform the specified functions or operations, or combinations of special hardware and computer instructions can.

Die hier verwendete Terminologie dient dem Zweck, nur bestimmte Ausführungsformen zu beschreiben und ist nicht dazu gedacht, die Erfindung einzuschränken. Wie hier verwendet, sind die Singularformen „ein“, „eine“ und „der“, „die“ „das“ dazu gedacht, auch die Pluralformen einzuschließen, sofern der Kontext dies nicht klar anders angibt. Es versteht sich, dass der Begriff „aufweist“ oder „aufweisend“, wenn er in dieser Beschreibung verwendet wird, zur Angabe des Vorhandenseins angegebener Merkmale, Ganzzahlen, Schritte, Operationen, Elemente und/oder Komponenten verwendet wird, dies aber nicht das Vorhandensein oder das Hinzufügen eines oder mehrerer Merkmale, einer oder mehrerer Ganzzahlen, einer oder mehrerer Schritte, einer oder mehrerer Operationen, eines oder mehrerer Elemente, einer oder mehrerer Komponenten und/oder Gruppen hiervon ausschließt.The terminology used here is for the purpose of describing only certain embodiments and is not intended to limit the invention. As used here, the singular forms "a", "a" and "the", "the" "that" are intended to include the plural forms, unless the context clearly indicates otherwise. It is understood that the term "comprises" or "having", when used in this description, is used to indicate the presence of specified features, integers, steps, operations, elements and / or components, but not the presence or excludes adding one or more characteristics, one or more integers, one or more steps, one or more operations, one or more elements, one or more components and / or groups thereof.

Die zugehörigen Strukturen, Materialien, Vorgänge und Entsprechungen aller Mittel oder Schritte plus Funktionselemente in den folgenden Ansprüchen sind dazu gedacht, jede beliebige Struktur, jedes beliebige Material oder jeden beliebigen Vorgang einzuschließen, um die Funktion in Kombination mit anderen beanspruchten Elementen, wie im Besonderen beansprucht, auszuführen. Die Beschreibung der vorliegenden Erfindung wurde zum Zwecke der Veranschaulichung und Beschreibung dargelegt, es ist aber nicht beabsichtigt, dass sie erschöpfend oder auf die offengelegte Form der Erfindung beschränkt ist. Für Fachleute werden viele Veränderungen und Variationen ersichtlich sein, ohne dabei vom Umfang und Geist der Erfindung abzuweichen. Die Ausführungsform wurde gewählt und in der Reihenfolge beschrieben, um die Prinzipien der Erfindung und der praktischen Anwendung am besten zu erklären und um anderen Fachleuten zu ermöglichen, die Erfindung in verschiedenen Ausführungsformen mit verschiedenen Änderungen zu verstehen, wie sie für die bestimmte vorgesehene Verwendung geeignet sind. Dementsprechend unterstützt das verbesserte Cloud-Computingmodell die Flexibilität bezüglich der Anwendungsverarbeitung und dem Disaster Recovery (bzw. Wiederherstellung in Falle einer Katastrophe), einschließlich, ohne jedoch darauf beschränkt zu sein, der Unterstützung der Trennung des Speicherorts der Daten von dem Speicherort der Anwendung und die Auswahl eines angemessenen Recovery-Standorts.The associated structures, materials, processes, and correspondences of all means or steps plus functional elements in the following claims are intended to include any structure, material, or process to function in combination with other claimed elements, as particularly claimed to execute. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the disclosed form of the invention. Many changes and variations will be apparent to those skilled in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and its practical application and to enable others of ordinary skill in the art to understand the invention in various embodiments with various changes as appropriate for the particular use contemplated . Accordingly, the improved cloud computing model supports flexibility in application processing and disaster recovery, including but not limited to, support for separating the location of the data from the location of the application and the Choosing an appropriate recovery location.

Wie in diesem Dokument beschrieben, wird eine Plattform mit einem Ressourcen-Scheduler bereitgestellt, um einer Leistungsverschlechterung von MapReduce-Jobs bei der Ausführung in der Cloud-Umgebung entgegenzuwirken. Das die Cloud erfassende MapReduce setzt einen dreistufigen Ansatz ein, um Anomalien bei der Anordnung aufgrund einer ineffizienten Ressourcenzuweisung zu vermeiden, darunter:

Anordnung von Daten im Cluster, das die am häufigsten mit den Daten ausgeführten Jobs ausführt, Auswählen des für die physischen Knoten angemessenen Modus, um das einem Job zugewiesenen Set virtueller Maschinen anzuordnen, und Darstellen der Computing-, Speicher- und Netzwerktopologien für den Scheduler. CAM verwendet einen flussnetzwerkbasierten Algorithmus, der die Ressourcenzuweisung mit einer Vielzahl anderer im Wettbewerb stehenden Bedingungen in Einklang bringen kann, wie die Speichernutzung, Änderungen der Prozessorlast und die Netzwerkverbindungskapazitäten.

As described in this document, a platform is provided with a resource scheduler to counteract performance degradation of MapReduce jobs when executed in the cloud environment. MapReduce, which captures the cloud, uses a three-tier approach to avoid ordering anomalies due to inefficient resource allocation, including:

Arrange data in the cluster that performs the jobs most frequently performed on the data, select the mode appropriate for the physical nodes to arrange the set of virtual machines assigned to a job, and present the computing, storage, and network topologies for the scheduler. CAM uses a flow network-based algorithm that can reconcile resource allocation with a variety of other competitive conditions such as memory usage, changes in processor load, and network connectivity.

Alternative AusführungsformAlternative embodiment

Es ist hervorzuheben, dass obwohl spezielle Ausführungsformen der Erfindung hier zum Zwecke der Veranschaulichung beschrieben wurden, verschiedene Modifikationen vorgenommen werden können, ohne vom Geist und Umfang der Erfindung abzuweichen. Dementsprechend ist der Umfang der Erfindung nur durch die folgenden Ansprüche und deren Entsprechungen beschränkt.It is to be emphasized that although specific embodiments of the invention have been described here for purposes of illustration, various modifications can be made without departing from the spirit and scope of the invention. Accordingly, the scope of the invention is limited only by the following claims and their equivalents.

Claims

A method that includes: Collecting a virtual machine status of one or more virtual machines in a shared pool of resources, including a physical host that is in communication with at least one physical machine that supports one or more virtual machines, Collecting local topology information from the shared pool of resources, including periodic communication with an integrated monitor of the physical machine; Organizing the collected local topology information, including a storage topology that underlies a virtual topology, and the associated resource usage information; Taking advantage of organized topology information, including assigning a job to a selected virtual machine in the shared pool in response, including job assignment that supports efficient performance of the job's I / O, a return of a virtual topological distance that is assigned to the job, the virtual topological distance being selected from the group consisting of: a distance between two virtual machines and a distance between a virtual machine and a data block, A distinction is made between three job types, namely a first job type for assignment and reduction-intensive tasks, a second job type for assignment-intensive tasks and a third job type for reduction-intensive tasks, being in an execution a machine with the lowest load is assigned to a job according to the second job type, of a job according to the third job type, a short distance between associated data is preferred, and a job according to the first job type and the job is executed on the machine with the lowest load, and wherein, in order to avoid hotspots, the data are arranged on the machine with the lowest load by measuring the computing resources currently allocated to the machine and so on add the expected virtual machine assignment request that would work with the data to be placed on the machine.

The procedure after Claim 1 , further comprising creating a shared memory channel to support the job, the channel supporting data communication between virtual machines between a first virtual machine and a second virtual machine located on the same physical machine.

The procedure after Claim 2 , whereby the shared memory channel supports efficient data transmission for both the first and the second virtual machine.

The procedure after Claim 1 wherein the step of using the organized storage topology information includes returning a physical location of one or more data blocks in the shared pool supporting the job.

The procedure after Claim 4 , further comprising a return of usage information of a processing unit and network usage information of the physical machine that is local to the virtual machine.

The procedure after Claim 5 wherein the step of assigning the job to a virtual machine includes selecting a virtual machine with sufficient bandwidth and a close topological distance to the physical location of one or more data blocks supporting the job.

A system comprising: a shared pool of resources, the shared pool comprising a physical host in communication with multiple physical machines, each physical machine supporting at least one virtual machine, each physical machine having an integrated monitor, around collect virtual machine status from a local virtual machine, and each virtual machine has a built-in agent; a functional unit in communication with the physical host, the functional unit having tools to support efficient processing of a job in response to a topological architecture of the shared pool, the tools comprising: a director in communication with the host, the director periodically communicating with each integrated monitor and collecting and organizing a topology that underlies a virtual topology, the topology having a data pool storage topology and associated resource usage information, and the director being the organized topology to assign the job to one or more of the shared resources, the job assignment supporting efficient performance of the I / O assigned to the job, and further comprising a topology manager in communication with the director, the topology m anager returns a virtual topological distance selected from the group consisting of: a distance between two virtual machines and a distance between a virtual machine and a data block. A distinction is made between three job types, namely a first job type for assignment-intensive and reduction-intensive tasks, a second job type for assignment-intensive tasks and a third job type for reduction-intensive tasks, whereby when executing a job according to the second job type, a machine with the is assigned to the lowest load, a job according to the third job type is preferred a short distance between associated data, and a job according to the first job type and the job is executed on the machine with the lowest load, and wherein, in order to avoid hotspots, the data be placed on the machine with the least load by measuring the computing resources currently assigned to the machine and adding the expected assignment request of the virtual machine that would work with the data to be placed on the machine.

The system according to Claim 7 , further comprising a hook manager which is in communication with an integrated agent from one of the virtual machines and is in communication with the director, the hook manager being a shared memory channel for data communication between virtual machines between a first virtual machine and a second virtual machine that is on the same physical machine.

The system according to Claim 8 , whereby the shared memory channel supports efficient data transmission for both the first and the second virtual machine.

The system according to Claim 7 further comprising a storage topology manager in communication with the director, the storage topology manager returning a physical location of one or more data blocks supporting the job in the shared pool.

The system according to Claim 10 further comprising a resource usage manager in communication with the director, the resource usage manager returning processing unit usage information and network usage information of the physical machine local to the virtual machine.

The system according to Claim 11 , further comprising an application manager in communication with the director, the application manager assigning the job to a virtual machine having sufficient bandwidth to support the job and a close topological distance to the physical location of one or more data blocks supporting the job.

A computer program product that has a computer readable, non-volatile storage medium with computer readable program code contained therein, the computer readable program code, when executed on a computer, causes the computer to: collect information from the resources in the shared pool, the information being a virtual machine status from a local virtual machine in the shared pool of resources and having at least one status of a physical machine that supports one or more virtual machines; Periodically collecting local topology information from the shared pool of resources and communicating the acquired information with an integrated monitor of the physical machine; Organizing the collected local topology information, including a storage topology based on a virtual topology and related resource usage information; and Utilizing, in response, organized storage topology information and assigning a job to a selected virtual machine in the shared pool, including assignment that supports efficient performance of the job's I / O that still has program code to return a virtual topological distance that selected from the group consisting of: a distance between two virtual machines and a distance between a virtual machine and a data block. A distinction is made between three job types, namely a first job type for assignment-intensive and reduction-intensive tasks, a second job type for assignment-intensive tasks and a third job type for reduction-intensive tasks, whereby when executing a job according to the second job type, a machine with the is assigned to the lowest load, a job according to the third job type is preferred a short distance between associated data, and a job according to the first job type and the job is executed on the machine with the lowest load, and wherein, in order to avoid hotspots, the data be placed on the machine with the least load by measuring the computing resources currently assigned to the machine and adding the expected assignment request of the virtual machine that would work with the data to be placed on the machine.

The computer program product after Claim 13 , further comprising program code to create a shared memory channel for data communication between virtual machines between a first virtual machine and a second virtual machine, which are located on the same physical machine.

The computer program product after Claim 14 , whereby the shared memory channel supports efficient data transmission for the first and also the second virtual machine.

The computer program product after Claim 13 , further comprising program code to return a physical location of one or more blocks of data supporting the job in the shared pool.

The computer program product after Claim 16 , further comprising program code to return usage information of a processing unit and network usage information of the physical machine that is local to the virtual machine.

The computer program product after Claim 17 , further comprising program code to assign the job to a virtual machine having sufficient bandwidth to support the job and a close topological distance to the physical location of one or more blocks of data supporting the job.

A method implemented in a computer, comprising: collecting a virtual machine status from one or more virtual machines in a shared pool of resources, including a physical host that is in communication with at least one physical machine that has one or more virtual machines supports; periodically collecting local topology information of a hierarchical organization of resources represented by the physical and virtual machines; Organizing the topology information; Access usage information of storage resources and virtual machines represented in the topology; and assigning a job to one or more selected virtual machines in the shared pool, the job assignment supporting efficient performance in response to the topology and resource utilization evaluation, further comprising determining a topological distance associated with the job, the topological distance from the group is selected, which consists of: a distance between two virtual machines and a distance between a virtual machine and a data block, whereby three job types are distinguished in the assignment, namely a first job type for assignment and reduction-intensive tasks, a second Job type for assignment-intensive tasks and a third job type for reduction-intensive tasks, a machine with the lowest load being assigned when executing a job according to the second job type, a short distance between associated data is preferred for a job according to the third job type, and a job according to the first job type and the job is executed on the machine with the lowest load, and wherein, in order to avoid hotspots, the data are arranged on the machine with the lowest load by measuring the computing resources currently allocated to the machine and so on add the expected virtual machine assignment request that would work with the data to be placed on the machine.

The procedure after Claim 19 that further includes creating a shared memory channel to support the job between a first virtual machine and a second virtual machine that are on the same physical machine.