WO2013008036A2 - Modelling virtualized infrastructure - Google Patents

Modelling virtualized infrastructure Download PDF

Info

Publication number
WO2013008036A2
WO2013008036A2 PCT/GB2012/051683 GB2012051683W WO2013008036A2 WO 2013008036 A2 WO2013008036 A2 WO 2013008036A2 GB 2012051683 W GB2012051683 W GB 2012051683W WO 2013008036 A2 WO2013008036 A2 WO 2013008036A2
Authority
WO
WIPO (PCT)
Prior art keywords
user
users
group
concurrency
value
Prior art date
Application number
PCT/GB2012/051683
Other languages
French (fr)
Other versions
WO2013008036A3 (en
Inventor
Jeremy Bullock
Original Assignee
Centrix Networking Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Centrix Networking Limited filed Critical Centrix Networking Limited
Publication of WO2013008036A2 publication Critical patent/WO2013008036A2/en
Publication of WO2013008036A3 publication Critical patent/WO2013008036A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling

Definitions

  • This invention relates to modelling virtualized infrastructure. More particularly the invention relates to a process intended to improve modelling virtualization of computer resources.
  • Virtualization of infrastructure is a technique which optimises resources (such as processing capabilities, IOPS and memory) so as to operate them at higher average capacity.
  • Data servers are expensive capital equipment to purchase, to manage and to maintain. However, very rarely are these hardware resources utilised to their full (or nearly full) capacity.
  • Virtualization of infrastructure is achieved by using specific routers (switches) operating under control of software, to augment a virtual infrastructure of systems.
  • the virtual infrastructure of systems can be accessed and shared by many users, for example by way of a wide area network (WAN) connection.
  • WAN wide area network
  • the users may not be located in the same building and so share remote facilities, yet their user experience is no different from someone with a direct connection to their own dedicated computer.
  • the major difference is a saving in cost of hardware and maintenance as well as a saving in licence fees payable for use of software.
  • the virtual infrastructure can include collections of applications, computer devices such as man-machine-interfaces (MMIs) and server computer devices, such as routers and wide area networks (WANs). These applications are often visualised in block or functional diagram format.
  • Other forms of virtualized infrastructure include: available network bandwidth and maximum and actual disk storage capacity and these too may be conveniently represented graphically in a number of differing formats so as to convey useful meaning to design and maintenance engineers, as well as end users.
  • Fat is a computer file system architecture that is employed on many computer systems. It is not to be confused with so-called FAT memory files, which are found on many portable devices such as memory cards, (in digital cameras) as well as diskettes.
  • Fat files are present within existing user based infrastructure. They are employed in order to model and predict demand that is likely to be placed on new, virtualized infrastructure and they do this by providing an indication of so-called user concurrency.
  • User concurrency is a term defining the specific usage of applications that a population of users are accessing at any instant or during any given time period. Specific software is available to perform this and operates in virtualized architectures so as to provide average concurrency as well as individual concurrency, typically for an estate of 2000 users.
  • Thin client infrastructure is a term that is used to describe client resources when, for example a client's requirements changes, such as when there is a shift towards a 'cloud based' architecture.
  • Cloud computing is a technique that enables users to work in a variety of locations, by accessing remotely stored data and using remote systems, so as to perform a task that was previously done using resources at a desk.
  • advantages in cloud computing in particular, surety of access to data (by way of alternative data paths as systems are networked), reduced purchase and operational costs, the ability to back-up data in several locations, the capabilities of rapid changes from small to large requirements (and vice versa) and of course flexibility of access with respect to user location.
  • a thin client is a computer or a computer program which depends heavily on some other computer (such as a remote server) to fulfil its traditional computational roles.
  • a fat client is one whose computational requirements are designed to take on all these roles (as well as others), usually 'in-house' and almost always by itself.
  • the exact roles assumed by the server may vary, from providing data persistence (for example, for diskless nodes) to actual information processing on a client's behalf.
  • Thin clients occur as components of a broader computer infrastructure, in which many clients share their computations with the same server.
  • thin client infrastructures can be viewed as the amortization of some computing service across several user-interfaces. This is desirable in contexts where individual so-called 'fat clients' have much more functionality or power than their infrastructure either requires or uses. This can be contrasted, for example, with grid computing.
  • the most common type of modern thin client is a low-end computer terminal which concentrates solely on providing a graphical user interface to the end-user.
  • the remaining functionality, in particular the operating system, is provided by the server, which may be local or remote - such as is achieved with local area networks (LANs) or wide area networks (WANs).
  • LANs local area networks
  • WANs wide area networks
  • the shift from so-called 'fat' to 'thin' client infrastructure requires that network and storage requirements of such parameters as: application usage, memory usage, central processing unit (CPU) utilisation, are no longer measured in a synchronous mode, rather they are measured in an asynchronous mode.
  • Synchronous processes operate in separate, networked devices and either depend on a common source emitting clocking pulses, provided by a transmitting device (clock); or by synchronizing bits or bit patterns embedded in a set of data.
  • a switching technique for telecommunication networks uses asynchronous time- division multiplexing (ATM) which encodes data into small, fixed-sized cells. This differs from networks such as the Internet or Ethernet LANs that use variable sized packets or frames.
  • ATM asynchronous time- division multiplexing
  • ATM provides data link layer services that run over an open systems interconnected model (OSI) which describes a modelling technique of dividing communications channels into different levels.
  • OSI open systems interconnected model
  • ATM operates over OSI layer 1 physical links.
  • ATM has functional similarity with both circuit switched networking and small packet switched networking. This makes it a good choice for a network that must handle both traditional high-throughput data traffic (such as file transfers) and real-time, low-latency content, such as voice and video.
  • ATM uses a connection-oriented model in which a virtual circuit must be established between two endpoints before the actual data exchange begins.
  • ATM is a switching protocol used over Synchronous optical networking (SONET) and synchronous digital hierarchy (SDH).
  • SONET and SDH are standardized multiplexing protocols that transfer multiple digital bit streams over optical fibres using lasers or light- emitting diodes (LEDs).
  • SONET/SDH backbones of public switched telephone networks (PSTNs) and Integrated Services Digital Network (ISDNs) has been declining in favour of internet protocols (IP).
  • PSTNs public switched telephone networks
  • ISDNs Integrated Services Digital Network
  • IP internet protocols
  • Recently the success of so-called cloud computing has impacted across small numbers of user groups, or numbers of user groups, often referred to as an estate. Practically it is the end result that is usually modelled, rather than the impact of the sum of individual computer devices.
  • Cloud based systems include databases, web servers business applications which can be consolidated and can run side by side - so improving efficiency. However, they also have separate operating systems - for example Windows (Trade Mark) or Linux (Trade Mark).
  • the present invention arose in an attempt to overcome the aforementioned problem and to provide an improved method of modelling virtualized infrastructure specifically for use in the design, implementation and management of cloud based systems.
  • a method of modelling the to-be virtualized infrastructure so as to predict the computing resources required by a group of users comprises the steps of: performing an assessment of computing resources consumed by the group of users by monitoring their use of their non-virtualised infrastucture; deriving at least one of: a User Template Session, a User Template Component Session and a User Template Image, in order to determine latency of computer resources available to the group of Users; determining an acceptable value of latency to the group of users; deriving a value of User concurrency for users within the user group; and deriving a base value, for attributes based upon: application input/output usage and/or CPU usage and/or network bandwidth, said base value being indicative of concurrency values for a specified resource consumed in a virtualized computing environment.
  • the invention is able to improve the design, implementation and management of cloud based systems by determining typical usage parameters based upon obtaining a resource demand for an application as exhibited by their use in the existing non-virtualised environment.
  • the 'Base' is calculated first to generate a subset from which subsequent calculations are made.
  • the amount of computing resources demanded by any given computer application is largely dependent on the content being loaded by the user.
  • an instance of a user application which is loaded without any user content (a "Vanilla Application”) will often be significantly lower than a user application loaded with user content.
  • the Base value is obtained for attributes such as: application input/output usage, CPU usage, IOPS and network bandwidth by examining all measures of "Vanilla Application” and determining the estate mean. It is a set of “tuples” (i.e. records), where each record contains instances of the attributes used to perform subsequent calculations where users deviance from the mean of Vanilla Applications can be measured.
  • a filtering process is used to generate the Base value.
  • a sequence list is derived.
  • Preferably calculations are performed detailing concurrency values for a predefined group of variables.
  • the variables are selected from the group comprising: User Template Sessions, User Template Component Sessions and User Template Images.
  • a computer processor so as to allocate available resources comprising the steps of: performing an assessment of resources available to a user group; determining typical usage parameters based upon one or more criteria and this is ideally achieved by determining a running sum of user concurrencies.
  • a cache device is disposed on a connection path between a user computer executing a software application and a network. The application exchanges data with a further computer via the network.
  • the cache device advantageously includes a cache memory and a processor.
  • the cache device is configured to measure, by the processor, a first latency between the user computer and another computer.
  • the cache device is configured to determine an acceptable latency range based on the latency and a requirement of the software application.
  • the cache device is further configured to measure a second latency between the user computer and the further computer.
  • the cache device is further configured to store, in the cache memory, a set of data transmitted from the user computer to the further computer, if the second latency is not within the acceptable latency range.
  • the assessment of resources includes an initial preparatory or assessment step (namely calculating a subset of data used as the basis of the remainder of the modelling process) and extracting definitions of User Templates, which define the specification or the 'shape' of the virtualized infrastructure.
  • a set of calculations is performed detailing concurrency values for a predefined group of variables.
  • a concurrency value is an indication of the extent to which multiple users are accessing the same hardware or software at the same time.
  • Such predefined variables may also include: User Template Sessions, User Template Component Sessions and User Template Images.
  • the method for modelling this involves two preparatory steps namely: calculation of a value of a subset of data, used as the basis for the remainder of the modelling process, and extracting definitions of User Templates. These subsets of data help define the shape of the virtualized infrastructure.
  • a first step in this process is generating the Base which is described below.
  • the Base consists of a subset of usage information that is used for all subsequent User Template calculations. Usage information concerns what Executables are placed in a process on a device, and generally comprise the following data: i) An Executable Name
  • the cache device is inferred every time a usage report is sent from a device to a central collection server.
  • the Base is generated by filtering attributes, directly or indirectly associated with collected Usage information. These attributes relate to User Accounts. User Accounts are responsible for placing applications in process on a device. For example they determine which Business Group a specific User Account belongs to, and which Applications are deemed suitable for use in a virtualised computing environment. It is understood that this list may in itself be further filtered to assess alternative scenarios. A period of time is determined, within which the processes listed within the usage information must have either started or stopped.
  • Figure 1 illustrates two rows containing pivoted start and stop times, retaining in each new row the reference to the User Template and CPU utilisation;
  • Figure 2 illustrates next step in the process, namely: amalgamation of the pivoted start/stop time data into a single list
  • Figure 3 illustrates generation of concurrency of consumption of physical resources (sum as percentage of usage of CPU) by calculating the running sum over these columns;
  • Figure 4 illustrates determining a maximum user template session of concurrency
  • Figure 5 illustrates the next stage which is extraction for each User Template/Application Component, pair start/stop times for each Application process associated with them in the Base;
  • Figure 6 illustrates how pivoted start/stop time data is amalgamated into a single list
  • Figure 7 shows how a Running Sum is calculated which is typically a distinct combination of User Templates;
  • Figure 8 shows how the Maximum User Template Session Concurrency is determined;
  • Figure 9 is a block diagram of an apparatus for performing the method.
  • the outputs of a virtualization modelling exercise (either on-screen or printed) give sufficient information for the effective sizing of a virtualized environment.
  • the key to creating these outputs is in the computation of concurrency.
  • FIG. 9 is a block diagram of an example of a typical distributed system that is found in many companies, organisations and offices. Steps that are performed in the method of modelling virtualized infrastructure are then described with reference to Figures 1 to 8.
  • Figure 9 is an overall diagrammatical view of a plurality of servers or user work stations 10, 20, 30, 40 interconnected for example by way of network cables, routers and other hardware 50.
  • the central processor (CPU) and memory operate under control of a host operating system 70 such as LINUX and resources from the host operating system are pooled by way of virtualization software 60.
  • host operating system 70 such as LINUX
  • resources from the host operating system are pooled by way of virtualization software 60.
  • Figure 1 represents a pivot each row containing the start and stop times, in two rows, each new row the reference to the User Template.
  • the first row in Figure 1 indicates pairs that include the start time; an indicator marking the row as a start time, and attributes recording the consumption of physical resources by a particular process. For example a particular application may demand more random access memory or processing capability than another process and these resource demands are recorded.
  • the second row in Figure 1 indicates the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process. It is appreciated that these attributes are now duplicated across two rows. Both sets of attributes are accorded negative values. This data is obtained and extracted for a specific end use described below.
  • Figure 2 illustrates the next step in the process, namely: amalgamation of the pivoted start/stop time data into a single list; and deriving a sequence list by using an event time column (which now contains both start and stop times).
  • an event time column which now contains both start and stop times.
  • This special condition may be flagged.
  • the next stage of the process involves calculating a running sum. This is achieved by passing down all rows in the sequenced list, for each user template and calculating the running sum of each start/stop indicators (to calculate user concurrency). This provides an indication referred to as User Template Session Concurrency and in practical terms this indicates the number of executables concurrently in use by User Accounts associated with the User Template at the point in time of each start/stop time i.e. the User Template Session Concurrency at that instant.
  • the next stage is to determine a maximum user template session of concurrency and this is shown in Figure 4. This stage is achieved for each user template and the maximum User Session Concurrency value is calculated. For each User Template, calculating maximum concurrency values for the consumption of physical resources is required so as to be able to model actual real time usage and forecast any usage trends.
  • User Template Component Session Concurrency is derived by generating the Usage Event List. For each User Template a list pairing is generated that references each User Template with each Application. This can be considered at whatever level of granularity and forms part of that User Template. Optionally the same Application can form part of a different User Template.
  • the contents of the Base therefore include all attributes mentioned above that are used in a filtering process that is used to generate the Base, but include additional attributes such as application input/output usage, CPU usage and network bandwidth.
  • the modelling process enables modelling of concurrency values for any resource consumed in a virtualized computing environment.
  • a number of attributes helping to describe the base have an implicit hierarchical relationship. Most obviously in an information technology or informatics (IT) environment, user accounts are often grouped into Domains. Similarly, attributes describing applications contain hierarchies.
  • the first row of these pairs includes the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process.
  • the second row of these pair includes the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process.
  • these attributes are now duplicated across two rows and are given negative values.
  • FIG 6 illustrates how the pivoted start/stop time data is amalgamated into a single list. Sequencing this list by the Event Time column (which now contains both start and stop times) and considering stop times before start times where they are equal, it is apparent that a value indicative of instantaneous user concurrency is obtained.
  • Figure 7 shows how a Running Sum is calculated. This is achieved effectively by passing down all rows in the sequenced list, for each distinct combination of User Template and Component value, calculating the Running Sum of start/stop indicators (to calculate user concurrency) and providing a User Template Component Session Concurrency. It is also possible to generate concurrency of consumption of physical resources (sum as CPU) by calculating the Running Sum over these columns in the sequenced list.
  • the Maximum User Template Session Concurrency is determined. Therefore for each User Template firstly the maximum User Session Concurrency value is calculated. Then, for each User Template, a maximum concurrency value for the consumption of physical resources can be derived and calculated. It is apparent that the method may be applied locally or over a large area network and that specific application software is capable of undertaking all or parts of the method. It is also understood that software supported on a recordable data medium is also within the scope of the invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method of modelling a to-be virtualized infrastructure so as to predict the computing resources required by a group of users, comprises the steps of: performing an assessment of computing resources consumed by the group of users by monitoring their use of their non-virtualised infrastucture; deriving at least one of: a User Template Session, a User Template Component Session and a User Template Image, in order to determine latency of computer resources available to the group of Users; determining an acceptable value of latency to the group of users; deriving a value of User concurrency for users within the user group; and deriving a base value, for attributes based upon: application input/output usage and/or CPU usage and/or network bandwidth, said base value being indicative of concurrency values for a specified resource consumed in a virtualized computing environment. Also provided by the invention is an apparatus for carrying out the method, which apparatus includes a display and a man machine interface and operates under the control of software which is performing steps as defined in the method.

Description

Modelling Virtualized Infrastructure
Field of the Invention
This invention relates to modelling virtualized infrastructure. More particularly the invention relates to a process intended to improve modelling virtualization of computer resources. Virtualization of infrastructure is a technique which optimises resources (such as processing capabilities, IOPS and memory) so as to operate them at higher average capacity.
Background
One way in which computer architectures and networks are often visualised is by way of an optimal virtualized infrastructure. This is a technique that enables designers, architects, programmers and users to visualise complex systems and applications and so encourage load sharing.
Companies and organisations possess hundreds, sometimes thousands, of data hardware resources, such as for example servers. Data servers are expensive capital equipment to purchase, to manage and to maintain. However, very rarely are these hardware resources utilised to their full (or nearly full) capacity.
In recent years so called virtualization has enabled groups of pre-defined users to pool hardware resources and so reduce capital cost has become popular. Virtualization of infrastructure is achieved by using specific routers (switches) operating under control of software, to augment a virtual infrastructure of systems. The virtual infrastructure of systems can be accessed and shared by many users, for example by way of a wide area network (WAN) connection. The users may not be located in the same building and so share remote facilities, yet their user experience is no different from someone with a direct connection to their own dedicated computer. The major difference is a saving in cost of hardware and maintenance as well as a saving in licence fees payable for use of software.
The virtual infrastructure can include collections of applications, computer devices such as man-machine-interfaces (MMIs) and server computer devices, such as routers and wide area networks (WANs). These applications are often visualised in block or functional diagram format. Other forms of virtualized infrastructure include: available network bandwidth and maximum and actual disk storage capacity and these too may be conveniently represented graphically in a number of differing formats so as to convey useful meaning to design and maintenance engineers, as well as end users.
In order to determine an optimal, virtualized infrastructure, it is necessary to use available data from sources such as existing file allocation tables - so called fat files. Fat is a computer file system architecture that is employed on many computer systems. It is not to be confused with so-called FAT memory files, which are found on many portable devices such as memory cards, (in digital cameras) as well as diskettes.
Fat files are present within existing user based infrastructure. They are employed in order to model and predict demand that is likely to be placed on new, virtualized infrastructure and they do this by providing an indication of so-called user concurrency. User concurrency is a term defining the specific usage of applications that a population of users are accessing at any instant or during any given time period. Specific software is available to perform this and operates in virtualized architectures so as to provide average concurrency as well as individual concurrency, typically for an estate of 2000 users.
Thin client infrastructure is a term that is used to describe client resources when, for example a client's requirements changes, such as when there is a shift towards a 'cloud based' architecture. Cloud computing is a technique that enables users to work in a variety of locations, by accessing remotely stored data and using remote systems, so as to perform a task that was previously done using resources at a desk. There are many advantages in cloud computing, in particular, surety of access to data (by way of alternative data paths as systems are networked), reduced purchase and operational costs, the ability to back-up data in several locations, the capabilities of rapid changes from small to large requirements (and vice versa) and of course flexibility of access with respect to user location.
A thin client is a computer or a computer program which depends heavily on some other computer (such as a remote server) to fulfil its traditional computational roles. In contrast a fat client is one whose computational requirements are designed to take on all these roles (as well as others), usually 'in-house' and almost always by itself. The exact roles assumed by the server may vary, from providing data persistence (for example, for diskless nodes) to actual information processing on a client's behalf.
Thin clients occur as components of a broader computer infrastructure, in which many clients share their computations with the same server. As such, thin client infrastructures can be viewed as the amortization of some computing service across several user-interfaces. This is desirable in contexts where individual so-called 'fat clients' have much more functionality or power than their infrastructure either requires or uses. This can be contrasted, for example, with grid computing.
The most common type of modern thin client is a low-end computer terminal which concentrates solely on providing a graphical user interface to the end-user. The remaining functionality, in particular the operating system, is provided by the server, which may be local or remote - such as is achieved with local area networks (LANs) or wide area networks (WANs). The shift from so-called 'fat' to 'thin' client infrastructure requires that network and storage requirements of such parameters as: application usage, memory usage, central processing unit (CPU) utilisation, are no longer measured in a synchronous mode, rather they are measured in an asynchronous mode. Synchronous processes operate in separate, networked devices and either depend on a common source emitting clocking pulses, provided by a transmitting device (clock); or by synchronizing bits or bit patterns embedded in a set of data.
A switching technique for telecommunication networks uses asynchronous time- division multiplexing (ATM) which encodes data into small, fixed-sized cells. This differs from networks such as the Internet or Ethernet LANs that use variable sized packets or frames.
ATM provides data link layer services that run over an open systems interconnected model (OSI) which describes a modelling technique of dividing communications channels into different levels. ATM operates over OSI layer 1 physical links. ATM has functional similarity with both circuit switched networking and small packet switched networking. This makes it a good choice for a network that must handle both traditional high-throughput data traffic (such as file transfers) and real-time, low-latency content, such as voice and video.
ATM uses a connection-oriented model in which a virtual circuit must be established between two endpoints before the actual data exchange begins. ATM is a switching protocol used over Synchronous optical networking (SONET) and synchronous digital hierarchy (SDH). SONET and SDH are standardized multiplexing protocols that transfer multiple digital bit streams over optical fibres using lasers or light- emitting diodes (LEDs). Such networks and protocols permit transmission of very large amounts of data. However, use of SONET/SDH backbones of public switched telephone networks (PSTNs) and Integrated Services Digital Network (ISDNs) has been declining in favour of internet protocols (IP). Recently the success of so-called cloud computing has impacted across small numbers of user groups, or numbers of user groups, often referred to as an estate. Practically it is the end result that is usually modelled, rather than the impact of the sum of individual computer devices.
Cloud based systems include databases, web servers business applications which can be consolidated and can run side by side - so improving efficiency. However, they also have separate operating systems - for example Windows (Trade Mark) or Linux (Trade Mark).
When virtualizing these systems it is not unknown to achieve consolation ratios in excess of 10:1 - for example 500 servers can be reduced to 50 or 60. Therefore in addition to saving resources, users can provision the server environment - from a time perspective - to install hardware, as often many of the historically achieved functions can be achieved in software.
Problems that are encountered when virtualizing systems are optimizing the available resources and ensuring that there are sufficient resources available for users in an unexpected event or non standard conditions.
Prior Art
An examples of a facility for using images created by backup software to recreate an entire machine, as it was at the point in time in the past when the backup was taken, is described in Canadian Patent Application Number CA 2 613 419 (Syncsort Inc). The facility can be extended so as to bring up a set of machines which together serve some logical business function as in a cluster or network of associated servers. The method may also be used to further extend an entire data centre from backup images. Although providing a useful tool for assisting the virtualization process the aforementioned system does not assist in the modelling of a virtualisation proposal or the real time modelling of users in a virtualized environment.
The present invention arose in an attempt to overcome the aforementioned problem and to provide an improved method of modelling virtualized infrastructure specifically for use in the design, implementation and management of cloud based systems.
Summary of the Invention
According to a first aspect of the invention there is provided a method of modelling the to-be virtualized infrastructure so as to predict the computing resources required by a group of users, comprises the steps of: performing an assessment of computing resources consumed by the group of users by monitoring their use of their non-virtualised infrastucture; deriving at least one of: a User Template Session, a User Template Component Session and a User Template Image, in order to determine latency of computer resources available to the group of Users; determining an acceptable value of latency to the group of users; deriving a value of User concurrency for users within the user group; and deriving a base value, for attributes based upon: application input/output usage and/or CPU usage and/or network bandwidth, said base value being indicative of concurrency values for a specified resource consumed in a virtualized computing environment.
Thus the invention is able to improve the design, implementation and management of cloud based systems by determining typical usage parameters based upon obtaining a resource demand for an application as exhibited by their use in the existing non-virtualised environment. Ideally the 'Base' is calculated first to generate a subset from which subsequent calculations are made. The amount of computing resources demanded by any given computer application is largely dependent on the content being loaded by the user. Thus, an instance of a user application which is loaded without any user content (a "Vanilla Application") will often be significantly lower than a user application loaded with user content.
The Base value is obtained for attributes such as: application input/output usage, CPU usage, IOPS and network bandwidth by examining all measures of "Vanilla Application" and determining the estate mean. It is a set of "tuples" (i.e. records), where each record contains instances of the attributes used to perform subsequent calculations where users deviance from the mean of Vanilla Applications can be measured.
Optionally a filtering process is used to generate the Base value. Ideally a sequence list is derived.
Preferably calculations are performed detailing concurrency values for a predefined group of variables. The variables are selected from the group comprising: User Template Sessions, User Template Component Sessions and User Template Images.
More particularly the optimising a computer processor, so as to allocate available resources comprising the steps of: performing an assessment of resources available to a user group; determining typical usage parameters based upon one or more criteria and this is ideally achieved by determining a running sum of user concurrencies. Ideally a cache device is disposed on a connection path between a user computer executing a software application and a network. The application exchanges data with a further computer via the network.
The cache device advantageously includes a cache memory and a processor. The cache device is configured to measure, by the processor, a first latency between the user computer and another computer.
Further the cache device is configured to determine an acceptable latency range based on the latency and a requirement of the software application. The cache device is further configured to measure a second latency between the user computer and the further computer.
The cache device is further configured to store, in the cache memory, a set of data transmitted from the user computer to the further computer, if the second latency is not within the acceptable latency range.
Ideally the assessment of resources includes an initial preparatory or assessment step (namely calculating a subset of data used as the basis of the remainder of the modelling process) and extracting definitions of User Templates, which define the specification or the 'shape' of the virtualized infrastructure.
Preferably a set of calculations is performed detailing concurrency values for a predefined group of variables. A concurrency value is an indication of the extent to which multiple users are accessing the same hardware or software at the same time. Such predefined variables may also include: User Template Sessions, User Template Component Sessions and User Template Images.
The method for modelling this involves two preparatory steps namely: calculation of a value of a subset of data, used as the basis for the remainder of the modelling process, and extracting definitions of User Templates. These subsets of data help define the shape of the virtualized infrastructure.
Building on this initial value, a set of calculations are performed detailing concurrency values for such variables as: User Template Sessions, User Template Component Sessions and User Template Images.
A first step in this process is generating the Base which is described below. The Base consists of a subset of usage information that is used for all subsequent User Template calculations. Usage information concerns what Executables are placed in a process on a device, and generally comprise the following data: i) An Executable Name
ii) A Command Line used to run the Executable
iii) Start and Stop Times
iv) A User Account that is responsible for placing the Executable in process
v) Other details such as I/O, CPU, Memory usage
The cache device is inferred every time a usage report is sent from a device to a central collection server. From the contents of usage information collected from devices, the Base is generated by filtering attributes, directly or indirectly associated with collected Usage information. These attributes relate to User Accounts. User Accounts are responsible for placing applications in process on a device. For example they determine which Business Group a specific User Account belongs to, and which Applications are deemed suitable for use in a virtualised computing environment. It is understood that this list may in itself be further filtered to assess alternative scenarios. A period of time is determined, within which the processes listed within the usage information must have either started or stopped.
An example of virtualisation modelling will now be described, with reference to the Figures, in which:
Brief Description of the Figures
Figure 1 illustrates two rows containing pivoted start and stop times, retaining in each new row the reference to the User Template and CPU utilisation;
Figure 2 illustrates next step in the process, namely: amalgamation of the pivoted start/stop time data into a single list;
Figure 3 illustrates generation of concurrency of consumption of physical resources (sum as percentage of usage of CPU) by calculating the running sum over these columns;
Figure 4 illustrates determining a maximum user template session of concurrency;
Figure 5 illustrates the next stage which is extraction for each User Template/Application Component, pair start/stop times for each Application process associated with them in the Base;
Figure 6 illustrates how pivoted start/stop time data is amalgamated into a single list;
Figure 7 shows how a Running Sum is calculated which is typically a distinct combination of User Templates; Figure 8 shows how the Maximum User Template Session Concurrency is determined; and
Figure 9 is a block diagram of an apparatus for performing the method.
Detailed Description of Preferred Embodiment of the Invention
The successful implementation of a virtualized environment depends on accurate sizing and allocation of resources. Modelling existing resource demand helps to ensure that any new virtualized environment is sized according to recorded user demand, both in terms of the applications used, the content loaded into those applications and the times when those Applications and content are used.
By computing the concurrent usage of computing resources such as memory, IOPS, disk reads and writes it is possible to accurately size a to-be virtualized environment.
The outputs of a virtualization modelling exercise (either on-screen or printed) give sufficient information for the effective sizing of a virtualized environment. The key to creating these outputs is in the computation of concurrency.
An example of modelling of virtualization is described with reference to the Figures, by way of a specific example of determining concurrency, for example for a number of distributed users who are utilising a shared resource - such as 'the cloud' - to perform operations on separate documents, in a piece of application software, such as WORD (Trade Mark) using Microsoft Office (Trade Mark). Specific reference is made to Figure 9, which is a block diagram of an example of a typical distributed system that is found in many companies, organisations and offices. Steps that are performed in the method of modelling virtualized infrastructure are then described with reference to Figures 1 to 8.
Figure 9 is an overall diagrammatical view of a plurality of servers or user work stations 10, 20, 30, 40 interconnected for example by way of network cables, routers and other hardware 50. The central processor (CPU) and memory operate under control of a host operating system 70 such as LINUX and resources from the host operating system are pooled by way of virtualization software 60.
Figure 1 represents a pivot each row containing the start and stop times, in two rows, each new row the reference to the User Template.The first row in Figure 1 indicates pairs that include the start time; an indicator marking the row as a start time, and attributes recording the consumption of physical resources by a particular process. For example a particular application may demand more random access memory or processing capability than another process and these resource demands are recorded.
The second row in Figure 1 indicates the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process. It is appreciated that these attributes are now duplicated across two rows. Both sets of attributes are accorded negative values. This data is obtained and extracted for a specific end use described below.
Figure 2 illustrates the next step in the process, namely: amalgamation of the pivoted start/stop time data into a single list; and deriving a sequence list by using an event time column (which now contains both start and stop times). In the event that stop times occur before start times a special condition arises. This special condition may be flagged. The next stage of the process involves calculating a running sum. This is achieved by passing down all rows in the sequenced list, for each user template and calculating the running sum of each start/stop indicators (to calculate user concurrency). This provides an indication referred to as User Template Session Concurrency and in practical terms this indicates the number of executables concurrently in use by User Accounts associated with the User Template at the point in time of each start/stop time i.e. the User Template Session Concurrency at that instant.
It is also possible to generate concurrency of consumption of physical resources (sum as percentage of usage of CPU) by calculating the running sum over these columns in the sequenced list shown in Figure 3.
The next stage is to determine a maximum user template session of concurrency and this is shown in Figure 4. This stage is achieved for each user template and the maximum User Session Concurrency value is calculated. For each User Template, calculating maximum concurrency values for the consumption of physical resources is required so as to be able to model actual real time usage and forecast any usage trends.
User Template Component Session Concurrency is derived by generating the Usage Event List. For each User Template a list pairing is generated that references each User Template with each Application. This can be considered at whatever level of granularity and forms part of that User Template. Optionally the same Application can form part of a different User Template.
Referring now to Figure 5, the next stage is to extract, for each User Template/Application Component, pairall pairs of start/stop times for each Application process associated with them in the Base. If it is intended to calculate concurrency of consumption of physical resources, values from the Base Usage data, along with the start/stop times, are required as seen from the example in Figure 5.
The contents of the Base therefore include all attributes mentioned above that are used in a filtering process that is used to generate the Base, but include additional attributes such as application input/output usage, CPU usage and network bandwidth. Hence, the modelling process enables modelling of concurrency values for any resource consumed in a virtualized computing environment.
A number of attributes helping to describe the base have an implicit hierarchical relationship. Most obviously in an information technology or informatics (IT) environment, user accounts are often grouped into Domains. Similarly, attributes describing applications contain hierarchies.
The specific example depicted in Figure 5 shows how by pivoting each row containing the start and stop times, into two rows and by retaining in each new row the reference to the User Template and associated Components:-
The first row of these pairs includes the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process. The second row of these pair includes the start time, an indicator marking this row as a start time, and attributes recording the consumption of physical resources by the process. Optionally these attributes are now duplicated across two rows and are given negative values.
Referring now to Figure 6, which illustrates how the pivoted start/stop time data is amalgamated into a single list. Sequencing this list by the Event Time column (which now contains both start and stop times) and considering stop times before start times where they are equal, it is apparent that a value indicative of instantaneous user concurrency is obtained. Figure 7 shows how a Running Sum is calculated. This is achieved effectively by passing down all rows in the sequenced list, for each distinct combination of User Template and Component value, calculating the Running Sum of start/stop indicators (to calculate user concurrency) and providing a User Template Component Session Concurrency. It is also possible to generate concurrency of consumption of physical resources (sum as CPU) by calculating the Running Sum over these columns in the sequenced list.
Referring now to Figure 8, the Maximum User Template Session Concurrency is determined. Therefore for each User Template firstly the maximum User Session Concurrency value is calculated. Then, for each User Template, a maximum concurrency value for the consumption of physical resources can be derived and calculated. It is apparent that the method may be applied locally or over a large area network and that specific application software is capable of undertaking all or parts of the method. It is also understood that software supported on a recordable data medium is also within the scope of the invention.
Alternatively dedicated hardware may be provided for performing the method.
The invention has been described by way of example only and with reference to the Figures and it will be appreciated that variation may been made to the embodiments described It is appreciated that there are other specific techniques that may be employed and dedicated apparatus that is configured for performing the various stages of the aforementioned method, an example of which is shown in Figure 9.

Claims

Claims
1 . A method of modelling a to-be virtualized infrastructure so as to predict the computing resources required by a group of users, comprises the steps of: performing an assessment of computing resources consumed by the group of users by monitoring their use of their non-virtualised infrastucture; deriving at least one of: a User Template Session, a User Template Component Session and a User Template Image, in order to determine latency of computer resources available to the group of Users; determining an acceptable value of latency to the group of users; deriving a value of User concurrency for users within the user group; and deriving a base value, for attributes based upon: application input/output usage and/or CPU usage and/or network bandwidth, said base value being indicative of concurrency values for a specified resource consumed in a virtualized computing environment.
2. A method according to claim wherein a running sum of user concurrencies is determined.
3. A method according to claim 1 wherein a filtering process is used to generate the Base value.
4. A method according to any preceding claim wherein at least one peak concurrency value is obtained.
5. A method according to any preceding claim wherein a sequence list is derived.
6. A method according to any preceding claim wherein calculations are performed detailing concurrency values for a predefined group of variables.
7. A method according to any preceding claim wherein user concurrency is derived by generating a Usage Event List.
8. A method according to claim 7 wherein the Usage Event List includes a list pairing that is generated for each User Template with each Application.
9. A method as claimed in any preceding claim, which involves two preparatory steps namely: calculation of a value of a subset of data, used as the basis for the remainder of the modelling process, and extracting definitions of User Templates.
10. Apparatus for carrying out the method of any of claims 1 to 9 includes a display and a man machine interface and operates under control of software which is performing steps as defined in the method of any of claims 1 to 9.
PCT/GB2012/051683 2011-07-13 2012-07-13 Modelling virtualized infrastructure WO2013008036A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1111975.7A GB201111975D0 (en) 2011-07-13 2011-07-13 Modelling virtualized infrastructure
GB1111975.7 2011-07-13

Publications (2)

Publication Number Publication Date
WO2013008036A2 true WO2013008036A2 (en) 2013-01-17
WO2013008036A3 WO2013008036A3 (en) 2013-03-07

Family

ID=44586475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2012/051683 WO2013008036A2 (en) 2011-07-13 2012-07-13 Modelling virtualized infrastructure

Country Status (2)

Country Link
GB (2) GB201111975D0 (en)
WO (1) WO2013008036A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106685494A (en) * 2016-12-27 2017-05-17 京信通信技术(广州)有限公司 Grouping scheduling method and device of MU-MIMO (multiple-user multiple-input multiple output) system
WO2021020746A1 (en) * 2019-07-31 2021-02-04 고려대학교 산학협력단 Apparatus and method for managing virtual machine
KR20210015590A (en) * 2019-07-31 2021-02-10 고려대학교 산학협력단 Appratus and method for managing virtual machines
CN116886571A (en) * 2023-09-07 2023-10-13 武汉博易讯信息科技有限公司 Analysis method, equipment and computer readable medium for home broadband user

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9020945B1 (en) * 2013-01-25 2015-04-28 Humana Inc. User categorization system and method
EP3177943A1 (en) 2014-08-07 2017-06-14 Seabed Geosolutions B.V. Autonomous seismic nodes for the seabed

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2613419A1 (en) 2005-06-24 2007-01-04 Syncsort Incorporated System and method for virtualizing backup images

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7197431B2 (en) * 2000-08-22 2007-03-27 International Business Machines Corporation Method and system for determining the use and non-use of software programs
US7325234B2 (en) * 2001-05-25 2008-01-29 Siemens Medical Solutions Health Services Corporation System and method for monitoring computer application and resource utilization
US7716335B2 (en) * 2005-06-27 2010-05-11 Oracle America, Inc. System and method for automated workload characterization of an application server
US20070055771A1 (en) * 2005-07-25 2007-03-08 International Business Machines Corporation Controlling workload of a computer system through only external monitoring
US8250525B2 (en) * 2007-03-02 2012-08-21 Pegasystems Inc. Proactive performance management for multi-user enterprise software systems
JP4906686B2 (en) * 2007-11-19 2012-03-28 三菱電機株式会社 Virtual machine server sizing apparatus, virtual machine server sizing method, and virtual machine server sizing program
US8180604B2 (en) * 2008-09-30 2012-05-15 Hewlett-Packard Development Company, L.P. Optimizing a prediction of resource usage of multiple applications in a virtual environment
US8490087B2 (en) * 2009-12-02 2013-07-16 International Business Machines Corporation System and method for transforming legacy desktop environments to a virtualized desktop model
US8397138B2 (en) * 2009-12-08 2013-03-12 At & T Intellectual Property I, Lp Method and system for network latency virtualization in a cloud transport environment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2613419A1 (en) 2005-06-24 2007-01-04 Syncsort Incorporated System and method for virtualizing backup images

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106685494A (en) * 2016-12-27 2017-05-17 京信通信技术(广州)有限公司 Grouping scheduling method and device of MU-MIMO (multiple-user multiple-input multiple output) system
WO2021020746A1 (en) * 2019-07-31 2021-02-04 고려대학교 산학협력단 Apparatus and method for managing virtual machine
KR20210015590A (en) * 2019-07-31 2021-02-10 고려대학교 산학협력단 Appratus and method for managing virtual machines
KR102299040B1 (en) 2019-07-31 2021-09-08 고려대학교 산학협력단 Appratus and method for managing virtual machines
CN116886571A (en) * 2023-09-07 2023-10-13 武汉博易讯信息科技有限公司 Analysis method, equipment and computer readable medium for home broadband user
CN116886571B (en) * 2023-09-07 2023-11-21 武汉博易讯信息科技有限公司 Analysis method, equipment and computer readable medium for home broadband user

Also Published As

Publication number Publication date
GB201111975D0 (en) 2011-08-31
WO2013008036A3 (en) 2013-03-07
GB201212555D0 (en) 2012-08-29
GB2492899A (en) 2013-01-16

Similar Documents

Publication Publication Date Title
CN105808634B (en) Distributed mapping reduction network
US10333791B2 (en) Modeling computer network topology based on dynamic usage relationships
WO2013008036A2 (en) Modelling virtualized infrastructure
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
US8943186B2 (en) Method and apparatus for performance and policy analysis in distributed computing systems
CN108090225B (en) Database instance running method, device and system and computer readable storage medium
US7035919B1 (en) Method for calculating user weights for thin client sizing tool
CN104102543B (en) The method and apparatus of adjustment of load in a kind of cloud computing environment
TWI725744B (en) Method for establishing system resource prediction and resource management model through multi-layer correlations
JP2013513150A (en) Optimizing archive management scheduling
KR101994454B1 (en) Method for task distribution and asssessment
CN109325200B (en) Method and device for acquiring data and computer readable storage medium
CN107645410A (en) A kind of virtual machine management system and method based on OpenStack cloud platforms
JP2006048702A (en) Automatic configuration of transaction-based performance model
CN108268344A (en) A kind of data processing method and device
CN110275760A (en) Process based on fictitious host computer processor hangs up method and its relevant device
CN104536852B (en) Data recovery method and device
CN104717175B (en) The processing method and system of virtual desktop
CN112787853B (en) Automatic generation method and device of network change scheme and related equipment
CN113946491A (en) Microservice data processing method, microservice data processing device, computer equipment and storage medium
CN108234622A (en) Charging method and charge system
Ding et al. Efficient fitness function computation of genetic algorithm in virtual machine placement for greener data centers
CN108322537A (en) Method, apparatus, equipment and the storage medium in Cloud Server node resource pond
US11526784B2 (en) Real-time server capacity optimization tool using maximum predicted value of resource utilization determined based on historica data and confidence interval
CN115917510A (en) Virtual machine deployment in a streamlined cloud computing environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12754050

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12754050

Country of ref document: EP

Kind code of ref document: A2