WO2015101089A1 - 大规模集群的管理方法、装置和系统 - Google Patents

大规模集群的管理方法、装置和系统 Download PDF

Info

Publication number
WO2015101089A1
WO2015101089A1 PCT/CN2014/089538 CN2014089538W WO2015101089A1 WO 2015101089 A1 WO2015101089 A1 WO 2015101089A1 CN 2014089538 W CN2014089538 W CN 2014089538W WO 2015101089 A1 WO2015101089 A1 WO 2015101089A1
Authority
WO
WIPO (PCT)
Prior art keywords
performance
management
management object
service level
determining
Prior art date
Application number
PCT/CN2014/089538
Other languages
English (en)
French (fr)
Inventor
王黎
吴晓明
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015101089A1 publication Critical patent/WO2015101089A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5006Creating or negotiating SLA contracts, guarantees or penalties
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Definitions

  • the present invention relates to the field of cloud computing and, more particularly, to a method, apparatus and system for managing large-scale clusters.
  • Narrow cloud computing refers to the delivery and usage model of Information Technology ("IT") infrastructure, which means that the required resources are obtained through the network in an on-demand and scalable manner; the network providing resources is called “cloud”. (Cloud)”.
  • the resources in the “cloud” can be expanded in the user's view and can be acquired at any time, expanded at any time, used on demand, and paid for by usage.
  • Cloud computing refers to the delivery and use of services, which means that the required services are obtained on-demand and easily expanded through the network.
  • This service can be related to IT, software, the Internet, or other services.
  • the network that provides the service is called “Cloud.”
  • “Cloud” is a virtual computing resource that can be self-maintained and managed. It is usually a large server cluster, including computing servers, storage servers, and broadband resources. Cloud computing uniformly manages and schedules a large number of computing resources connected by networks to form a computing resource pool to provide on-demand services to users.
  • Cloud computing is gaining more and more attention due to its features such as hyperscale, virtualization, high reliability, versatility, high scalability, and on-demand services.
  • cloud computing data centers integrate computing resources, storage resources, and network resources, and use virtualization and other technologies to provide them to users through the network.
  • the form of the application may include a virtual machine (Virtual Machine, simply referred to as "VM"), a storage volume, and the like.
  • Virtualization technology constitutes large-scale large-scale clustering by generating applications such as large-scale virtual machines and large-scale storage volumes. How to perform performance management and experience assurance for large-scale large-scale clusters becomes an issue that needs more and more attention.
  • the management of existing large-scale large-scale clusters usually takes a server (Server) and a resource pool (Pool). Even the cluster is a unit. Even the user-based performance management is only for a small number of resources corresponding to a small number of VIP users. Thus, the performance management of most users cannot be guaranteed, and the user experience is poor.
  • Server server
  • Pool resource pool
  • the embodiment of the invention provides a method, a device and a system for managing a large-scale cluster, which can perform performance management and resource scheduling on the user according to the service level, thereby improving the user experience.
  • a method for managing a large-scale cluster includes: determining at least one management object among management objects corresponding to a first service level of a plurality of service levels, wherein the management object is in the large-scale cluster Resource unit; determining target performance of the at least one management object; acquiring actual performance of the at least one management object; performing performance management on the management object corresponding to the first service level according to the target performance and the actual performance .
  • the method before determining the at least one management object among the management objects corresponding to the first service level of the multiple service levels, the method further includes: according to the service level agreement SLA The management objects in the large-scale cluster determine the plurality of service levels.
  • the method further includes: determining the multiple Target performance of the first service level of the service levels; determining the target performance of the at least one management object, comprising: determining a target performance of the first service level as a target performance of the at least one management object.
  • the determining target performance of the at least one management object includes at least one of: determining the according to a predetermined performance policy. The target performance corresponding to the at least one management object; or manually setting the target performance of the at least one management object.
  • the target performance type includes at least one of a response delay, an IOPS per second read/write number, a data transmission rate, and a CPU usage rate.
  • the acquiring the actual performance of the at least one management object includes: periodically or continuously monitoring the at least one management object The actual performance.
  • the performing performance management on the management object corresponding to the first service level according to the target performance and the actual performance including Determining whether the obtained actual performance meets the target performance; when the actual performance does not satisfy the target performance, the management object corresponding to the first service level and/or the plurality of service levels
  • the management object corresponding to the other service levels of the first service level performs the performance management such that the actual performance of the first service level satisfies the target performance.
  • the performance management includes at least one of the following: service migration, service restriction, traffic control, resource scheduling, and issuing an alarm.
  • an eighth implementation manner of the first aspect when the actual performance meets the target performance, repeatedly performing the first service level corresponding to the multiple service levels The step of determining at least one management object in the management object, or repeating the step of acquiring the actual performance of the at least one management object.
  • the determining, by the management object corresponding to the first service level of the multiple service levels, the at least one management object including: Determining at least one management object satisfying a predetermined condition, wherein the predetermined condition includes at least one of establishment time, location information, load condition, and history, or a At least one management object is determined in a management object corresponding to a service level, wherein the predetermined algorithm includes at least one of random selection, sequential selection, and time dynamic selection.
  • the management object includes a virtual machine VM, a storage volume, a virtual switch vSwitch, a virtual local area network vLAN, an input/output I/O port, At least one of a switch, network bandwidth, and a server.
  • a management apparatus for a large-scale cluster comprising: a determining unit, configured to determine at least one management object among management objects corresponding to a first service level of a plurality of service levels, wherein the management object is a resource unit in a large-scale cluster; the determining unit is further configured to determine target performance of the at least one management object; an acquiring unit, configured to acquire actual performance of the at least one management object; and a performance management unit, configured to The target performance and the actual performance perform performance management on a management object corresponding to the first service level.
  • the determining unit is further configured to:
  • the plurality of service levels are determined for management objects in the large-scale cluster according to a service level agreement SLA.
  • the determining unit is further configured to: determine a target performance of the first service level of the multiple service levels; The target performance of a service level is determined as the target performance of the at least one managed object.
  • the determining unit is specifically configured to: determine, according to a predetermined performance policy, the target performance corresponding to the at least one management object; or The target performance of the at least one managed object is manually set.
  • the type of the target performance determined by the determining unit includes a response delay, an IOPS per second read/write times, a data transmission rate, and a CPU usage. At least one of the rates.
  • the acquiring unit is specifically configured to: periodically or continuously monitor actual performance of the at least one management object.
  • the performance management unit is specifically configured to: determine, by the determining unit, whether the obtained actual performance meets the target performance And a management object corresponding to the first service level and/or a management object corresponding to another service level of the first service level among the plurality of service levels when the actual performance does not satisfy the target performance
  • the performance management is performed such that the actual performance of the first service level satisfies the target performance.
  • the performance management includes at least one of the following: service migration, service restriction, traffic control, resource scheduling, and issuing an alarm.
  • the determining unit repeatedly performs the first of the multiple service levels when the actual performance meets the target performance The step of determining at least one management object among the management objects corresponding to the service level, or the obtaining unit repeatedly performing the step of acquiring the actual performance of the at least one management object.
  • the determining unit is specifically configured to:
  • Determining at least one management object that satisfies a predetermined condition in the management object corresponding to the first service level wherein the predetermined condition includes establishment time, location information, load status, and history At least one of the recordings; or determining at least one management object among the management objects corresponding to the first service level according to a predetermined algorithm, wherein the predetermined algorithm comprises at least one of random selection, sequential selection, and temporal dynamic selection.
  • the management object includes a virtual machine VM, a storage volume, a virtual switch vSwitch, a virtual local area network vLAN, an input/output I/O port, At least one of a switch, network bandwidth, and a server.
  • At least one management object is determined in a management object corresponding to the first service level of the large-scale cluster, and all management objects corresponding to the first service level are performed according to target performance and actual performance of the at least one management object.
  • Performance management which ensures that the performance of most or even all users reaches the target performance, improving or guaranteeing the user experience.
  • FIG. 1 is a system block diagram of a large-scale cluster management system according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a management method according to an embodiment of the present invention.
  • FIG. 3 is a flow chart of a management method according to an embodiment of the present invention.
  • FIG. 4 is a schematic block diagram of a management apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic block diagram of a management device in accordance with another embodiment of the present invention.
  • the large-scale cluster management system 100 shown in FIG. 1 includes a management object determination module 101, a target performance determination module 102, an actual performance acquisition module 103, a performance management module 104, and a large-scale cluster 105.
  • the management object determining module 101, the actual performance obtaining module 103, and the performance management module 104 are both Connected to the large-scale cluster 105, the management object determination module 101 is connected to the target performance determination module 102, and the target performance determination module 102 and the actual performance acquisition module 103 are both connected to the performance management module 104.
  • the management object determining module 101 is configured to determine at least one management object among the management objects corresponding to the first service level of the plurality of service levels, wherein the management object is a resource unit in the large-scale cluster 105.
  • the resource unit may be divided into a computing resource unit, a storage resource unit, a network resource unit, a physical resource unit, and the like. More specifically, the computing resource unit may be a virtual machine (VM), the storage resource unit may be a storage volume and a logical unit number (LUN), and the network resource unit may be an input/output (Input/Output).
  • the I/O port, the network bandwidth, the virtual switch (vSwitch), the virtual local area network (vLAN), the switch, and the like, and the physical resource unit can be a server or the like.
  • the target performance determining module 102 is configured to determine target performance of the at least one management object, specifically, determining target performance corresponding to the at least one management object according to the predetermined performance policy; or manually setting target performance of the at least one management object; or The target performance of the first service level corresponding to the at least one management object is determined as the target performance of the at least one management object.
  • the actual performance obtaining module 103 is configured to obtain actual performance of the at least one management object. Specifically, the actual performance of the at least one management object may be periodically and continuously monitored and counted.
  • the performance management module 104 is configured to perform performance management on the management object corresponding to the first service level according to the target performance determined by the target performance determining module 102 and the actual performance acquired by the actual performance obtaining module 103.
  • performance management is performed on the management object corresponding to the first service level and/or the management object corresponding to the other service level of the first service level among the multiple service levels, so that the first The actual performance of the service level satisfies the target performance.
  • the performance management methods include but are not limited to the following: service migration; service restriction; traffic control; resource scheduling;
  • the target performance determining module 102 may re-determine at least one management object, or may continue to monitor the actual performance of the previously determined at least one management object by the actual performance obtaining module 103.
  • the management system 100 of the large-scale cluster of the embodiment of the present invention determines at least one management object by using the management object corresponding to the first service level, and corresponding to the first service level according to the target performance and actual performance of the at least one management object. Performance management of all managed objects to enable Ensure that the performance of most or even all users achieves the target performance, improving or guaranteeing the user experience.
  • FIG. 2 is a flow chart of a management method of an embodiment of the present invention.
  • At least one management object is determined in a management object corresponding to the first service level of the large-scale cluster, and all management objects corresponding to the first service level are performed according to target performance and actual performance of the at least one management object.
  • Performance management which ensures that the performance of most or even all users achieves the target performance and improves the user experience.
  • the resource units of a large-scale cluster may be divided into a computing resource unit, a storage resource unit, a network resource unit, a physical resource unit, and the like, for providing services such as calculation, storage, and transmission to the user.
  • the computing resource unit may be a virtual machine VM or the like
  • the storage resource unit may be a storage volume and a logical unit number LUN, etc.
  • the network resource unit may be an input/output I/O port, a virtual switch vSwitch, a virtual local area network vLAN, a switch, and the like.
  • the network resource bandwidth, etc., the physical resource unit can be a server or the like.
  • the method before determining at least one management object among the management objects corresponding to the first service level of the multiple service levels, the method further includes: configuring, according to a service level agreement (SLA), a large-scale cluster The management object determines multiple service levels.
  • SLA service level agreement
  • the SLA may be used for hierarchical division, or may be classified by the network maintenance personnel according to certain attributes, such as location information of the management object, service type, service target, and the like.
  • the object of the hierarchical division is a user
  • the object equivalent to the hierarchical division is at least one resource unit that provides a service to the user, that is, the management object.
  • the service level can be divided into simple level divisions, and the target performance of one or more service levels can be determined when the service level is divided.
  • the target performance can be understood as the quality of service to be achieved (Quality of Service). , QoS).
  • determining, according to the SLA, a management object in a large-scale cluster the method further includes: determining target performance of the first service level of the plurality of service levels; determining target performance of the at least one management object, including: determining target performance of the first service level as the target of the at least one management object performance.
  • the target performance of the service level may be determined as the target performance of the at least one management object selected as the sample among the service levels.
  • determining target performance of the at least one management object includes at least one of: determining target performance corresponding to the at least one management object according to the predetermined performance policy; or manually setting target performance of the at least one management object .
  • the target performance may be determined directly for the determined at least one management object, specifically, according to a predetermined performance policy, that is, the system may be preset.
  • the policy file may contain information such as the service type, geographic location, and target performance of the management object. Correspondence.
  • the target performance of the management object can be manually set by the network maintenance personnel through the management interface.
  • the type of the target performance may include, but is not limited to, at least one of a response delay, an IOPS per second read/write number, a data transmission rate, and a CPU usage. It is easy to understand that the target performance may be a single parameter or a combination of multiple parameters, which is not limited in the present invention.
  • acquiring actual performance of the at least one management object includes: periodically or continuously monitoring actual performance of the at least one management object. It should be understood that the actual performance may be the same as or different from the type of target performance.
  • performing performance management on the management object corresponding to the first service level according to the target performance and the actual performance including: determining whether the obtained actual performance meets the target performance; when the actual performance does not meet the target performance, Performance management is performed on the management object corresponding to the first service level and/or the management object corresponding to the other service levels of the first service level among the plurality of service levels, so that the actual performance of the first service level satisfies the target performance.
  • performance management may include, but is not limited to, at least one of the following: service migration; traffic restriction; traffic control; resource scheduling;
  • the current service level, or other service level that is currently detected may be subjected to service migration, service restriction, and flow.
  • the operations of quantity control, resource scheduling, etc. enable the first service level to meet the target performance.
  • the first service level may be used. Managed objects for business migration to reduce CPU usage to 90% or less. It should be understood that other control methods can be used to achieve target performance, such as allocating more resources to the management object of the first service level, etc. The invention is not limited thereto.
  • the target performance by controlling or scheduling other service levels.
  • the lower priority can be lowered.
  • the service level of the service level of the service level enables the first service level to meet the target performance.
  • the first service level can also be achieved by controlling or scheduling the first service level and other service levels at the same time.
  • the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels may be repeatedly performed, or the actual acquisition of the at least one management object may be repeatedly performed.
  • the steps of performance In other words, you can re-sample to re-detect or continue to monitor.
  • the threshold of the number of repetitions can be set to make the sampling and monitoring of the performance management system more accurate and closer to the actual situation. For example, if the actual performance monitored by repeat sampling twice is not satisfied, the performance management is determined.
  • the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels is repeatedly performed, or the obtaining at least one management object is repeatedly performed.
  • resampling may be performed, ie, at least one management object is reselected in the first service level. It is also possible to continue to monitor at least one managed object that was previously sampled in order to perform performance management when its performance does not meet the target performance.
  • determining, by the management object corresponding to the first service level of the multiple service levels, the at least one management object includes: determining, in the management object corresponding to the first service level, at least one management that meets the predetermined condition An object, wherein the predetermined condition includes at least one of establishment time, location information, load condition, and history; or at a first service level according to a predetermined algorithm At least one management object is determined in the corresponding management object, wherein the predetermined algorithm comprises at least one of random selection, sequential selection, and temporal dynamic selection.
  • the management object includes at least one of a virtual machine VM, a storage volume, an input/output I/O port, a network bandwidth, and a server.
  • At least one management object is determined in a management object corresponding to the first service level of the large-scale cluster, and all management objects corresponding to the first service level are performed according to target performance and actual performance of the at least one management object.
  • Performance management which ensures that the performance of most or even all users reaches the target performance, improving or guaranteeing the user experience.
  • FIG. 3 is a flow chart of a management method of an embodiment of the present invention.
  • the service level of the user or management object in the large-scale cluster can be divided before the management object is selected.
  • the SLA may be used for hierarchical division, or may be classified by the network maintenance personnel according to certain attributes, such as location information of the management object, service type, service target, and the like.
  • the object of the hierarchical division is a user
  • the object equivalent to the hierarchical division is at least one resource unit that provides a service to the user, that is, the management object.
  • the service level can be divided into simple level divisions, and the target performance of one or more service levels can be determined when the service level is divided.
  • the target performance can be understood as the quality of service to be achieved (Quality of Service). , QoS).
  • the large-scale cluster a small number of management objects are selected as management objects.
  • the management object is a resource unit that provides services for users in a large-scale cluster.
  • the resource units of the large-scale cluster may be divided into a computing resource unit, a storage resource unit, a network resource unit, a physical resource unit, and the like, and are used to provide services such as calculation, storage, and transmission to the user.
  • the computing resource unit may be a virtual machine VM or the like
  • the storage resource unit may be a storage volume and a logical unit number LUN, etc.
  • the network resource unit may be an input/output I/O port and a network bandwidth
  • the physical resource unit may be a server. Wait.
  • At least one management object that satisfies a predetermined condition may be determined in the management object corresponding to the first service level, wherein the predetermined condition includes at least one of establishment time, location information, load status, and history,
  • the predetermined condition is that the load condition reaches 90% of the maximum load, or there are N times of faults in the history record.
  • the selected at least one management object may be the same type of management object, or may be a different type of management object, for example, It can be either a VM or a storage volume, or a VM, a storage volume, etc., as long as they meet the above predetermined conditions.
  • the predetermined condition may also exist in a combined form, for example, a VM whose load condition reaches 90% of the maximum load, a server having more than N faults in the history, and the like, which is not limited by the present invention.
  • At least one management object may be determined in the management object corresponding to the first service level according to a predetermined algorithm, where the predetermined algorithm includes, but is not limited to, random selection, sequential selection, time dynamic selection, intelligent selection, and the like.
  • the predetermined algorithm includes, but is not limited to, random selection, sequential selection, time dynamic selection, intelligent selection, and the like.
  • the predetermined algorithm is randomly selected, when the management object is selected, a certain number of management objects are randomly selected in the first service level, and the quantity here may also be pre-specified in the predetermined algorithm, for example, time. Dynamic selection can dynamically select management objects in different time periods or as time changes, thus ensuring the activity of the sample.
  • the managed object to be sampled can also be directly specified.
  • one or more management objects can be selected by the network maintenance personnel for a certain service level in the network topology interface as a sample of performance management.
  • the first service level in the step 302 is one of the plurality of service levels divided in the above step 301, where the "first" service is used.
  • the level is only used to indicate a certain service level and can be any of the above multiple service levels.
  • the service level may still exist in the large-scale cluster, and the service level may be a service level determined by the history, or may be a service level agreed upon when the user subscribes to the network, which is not limited herein.
  • the service level can be understood as a group of management objects determined according to the same or similar performance requirements, performance indicators, service types, and the like.
  • the target performance of the management object can be determined.
  • the target performance corresponding to the at least one management object may be determined according to a predetermined performance policy, and the target performance of the at least one management object may also be manually set.
  • the performance policy file can be pre-configured in the system, and certain performance attributes of the management object can be combined with the performance policy file to determine the target performance that enables the management object to obtain performance guarantee.
  • the policy file can include the service of the management object. Correspondence between information such as type and geographic location and target performance.
  • the target performance of the management object can be manually set by the network maintenance personnel through the management interface.
  • the management object is a storage volume with multiple service levels, and the storage volume selected as a sample among one of the service levels can set its target performance to a delay of less than 3 ms. Manual setting can also be determined by the policy file.
  • the service level has previously corresponded to the target performance (QoS). For example, if the target performance of the service level has been determined when the service level is divided in the above step 301, the target performance of the service level may be determined. The target performance of at least one managed object selected as a sample for the service level.
  • target performance There are many types of target performance, including but not limited to response latency, IOPS per second read and write times, data transfer rate, CPU usage, and so on. It is easy to understand that the target performance may be a single parameter or a combination of multiple parameters, which is not limited in the present invention.
  • the actual performance of the at least one managed object determined in step 303 is periodically or continuously monitored.
  • the type of actual performance detected may be the same as or different from the target type. Specifically, when the target performance determined in the foregoing step 303 is less than 3 ms, the type of actual performance detected may also be a delay, for example, the actual delay of monitoring the managed object is 4 ms.
  • the actual performance of the detection may be different from the target type. For example, the target performance requirement is that the VM creation time is less than 2 min, and the actual performance indicator monitored is MBPS (bandwidth), the system considers that the MBPS does not reach 50 MB/S. The target created in the VM 2min is unachievable, so performance policy scheduling and so on.
  • the system can analyze the actual performance data combined with the target performance, that is, whether the actual performance meets the target performance. That is to say, the management object or the cluster resource of the entire service level can be estimated by the performance of the sampled management object determined in the above step 302, so as to facilitate the overall evaluation and management of the service level.
  • the target performance sets the IO delay, IOPS, and CPU usage. If the actual monitored CPU usage exceeds the actual limit, you can specify the migration policy, perform service migration, and reduce the service load of the management object of the service level. Meet the requirements of user experience indicators, and balance the load of the whole system; if the actual performance IO delay exceeds the standard, you can perform resource scheduling, increase the resource ratio of this service level, such as CPU, cache, etc. Level of service level of business traffic to meet the needs of this service level. In addition, an alarm can be issued No further control or scheduling is required, waiting for further instructions from the staff or other network management equipment. In addition, the requirements of other service levels can be met by performing performance management on the first service level.
  • the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels may be repeatedly performed, or the actual performance of acquiring the at least one management object may be repeatedly performed. step.
  • the threshold of the number of repetitions can be set to make the sampling and monitoring of the performance management system more accurate and closer to the actual situation. For example, if the actual performance monitored by repeat sampling twice is not satisfied, the performance management is determined.
  • step 302 When the actual performance meets the target performance, it may return to step 302 or may return to step 304. That is to say, when the performance is satisfied without requiring control or scheduling, resampling can be performed, that is, at least one management object is reselected in the first service level. It is also possible to continue to monitor at least one managed object that was previously sampled in order to perform performance management when its performance does not meet the target performance.
  • At least one management object is determined in a management object corresponding to the first service level of the large-scale cluster, and all management objects corresponding to the first service level are performed according to target performance and actual performance of the at least one management object.
  • Performance management which ensures that the performance of most or even all users achieves the target performance and improves the user experience.
  • FIG. 4 is a schematic block diagram of a management device in accordance with one embodiment of the present invention.
  • the management device 400 in FIG. 4 includes a determination unit 401, an acquisition unit 402, and a performance management unit 403.
  • the determining unit 401 determines at least one management object among the management objects corresponding to the first service level of the plurality of service levels, wherein the management object is a resource unit in the large-scale cluster; the determining unit 401 determines target performance of the at least one management object; 402 acquires actual performance of at least one managed object.
  • the performance management unit 403 performs performance management on the management object corresponding to the first service level according to the target performance and the actual performance.
  • the management apparatus 400 of the embodiment of the present invention determines at least one management object by using the management object corresponding to the first service level of the large-scale cluster, and corresponding to the first service level according to the target performance and actual performance of the at least one management object. Performance management of all managed objects ensures that the performance of most or even all users reaches the target performance and improves the user experience.
  • resource units of a large-scale cluster may be divided into computing resource units, storage resource units, A network resource unit, a physical resource unit, and the like are used to provide services such as calculation, storage, and transmission to a user.
  • the computing resource unit may be a virtual machine VM or the like
  • the storage resource unit may be a storage volume and a logical unit number LUN, etc.
  • the network resource unit may be an input/output I/O port and a network bandwidth
  • the physical resource unit may be a server. Wait.
  • the determining unit 401 in the embodiment of the present invention may correspond to the management object determining module 101 and the target performance determining module 102 in the large-scale cluster management system 100 shown in FIG. 1 above; the obtaining unit 402 may correspond to the above figure.
  • the actual performance acquisition module 103 in the large-scale cluster management system 100 shown in FIG. 1; the performance management unit 403 may correspond to the performance management module 104 in the large-scale cluster management system 100 shown in FIG. 1 described above.
  • the determining unit 401 determines a plurality of service levels for the management objects in the large-scale cluster according to a Service Level Agreement (SLA).
  • SLA Service Level Agreement
  • the determining unit 401 may first classify the service level of the user or the management object in the large-scale cluster before selecting the management object.
  • the SLA may be used for hierarchical division, or may be classified by the network maintenance personnel according to certain attributes, such as location information of the management object, service type, service target, and the like.
  • the object equivalent to the hierarchical division is at least one resource unit that provides a service to the user, that is, the management object.
  • the service level can be divided into simple level divisions, and the target performance of one or more service levels can be determined when the service level is divided.
  • the target performance can be understood as the quality of service to be achieved (Quality of Service). , QoS).
  • the determining unit 401 may further be configured to determine target performance of the first service level of the multiple service levels; determining at least one The target performance of the management object includes: determining a target performance of the first service level as a target performance of the at least one management object.
  • the target performance of the service level may be determined as the target performance of the at least one management object selected as the sample among the service levels.
  • the determining unit 401 is further configured to determine target performance corresponding to the at least one management object according to the predetermined performance policy; or manually set target performance of the at least one management object.
  • determining The unit 401 may also determine its target performance directly for the determined at least one management object, and may specifically determine according to a predetermined performance policy, that is, the performance policy file may be pre-configured in the system, and the performance policy file may be combined by using certain attributes of the management object. It is possible to determine the target performance that enables the management object to obtain performance guarantee.
  • the policy file may include the correspondence between the service type of the management object, the geographic location, and the target performance.
  • the target performance of the management object can be manually set by the network maintenance personnel through the management interface.
  • the type of the target performance may include, but is not limited to, at least one of a response delay, an IOPS per second read/write number, a data transmission rate, and a CPU usage. It is easy to understand that the target performance may be a single parameter or a combination of multiple parameters, which is not limited in the present invention.
  • the obtaining unit 402 is specifically configured to periodically or continuously monitor actual performance of the at least one management object. It should be understood that the actual performance may be the same as or different from the type of target performance.
  • the performance management unit 403 is specifically configured to determine whether the obtained actual performance meets the target performance; when the actual performance does not meet the target performance, the management object corresponding to the first service level and/or multiple The management objects corresponding to other service levels of the first service level in the service level perform performance management such that the actual performance of the first service level satisfies the target performance.
  • performance management may include, but is not limited to, at least one of the following: service migration; traffic restriction; traffic control; resource scheduling;
  • the first service level, or other service level that is currently detected may be subjected to operations such as service migration, service restriction, flow control, resource scheduling, etc.
  • the first service level can meet the target performance.
  • the target performance For example, when the actual performance of the selected at least one management object selected in the first service level is that the CPU usage is higher than 90% (the target performance is less than or equal to 90% of the CPU usage), the first service level may be used. Managed objects for business migration to reduce CPU usage to 90% or less. It should be understood that other control methods can be used to achieve target performance, such as allocating more resources to the management object of the first service level, etc. The invention is not limited thereto.
  • the target performance by controlling or scheduling other service levels. For example, when the actual performance I/O delay of the first service level does not meet the target performance, the lower priority can be lowered.
  • the service level of the service level of the service level enables the first service level to meet the target performance.
  • the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels may be repeatedly performed, or the actual performance of acquiring the at least one management object may be repeatedly performed. step.
  • the threshold of the number of repetitions can be set to make the sampling and monitoring of the performance management system more accurate and closer to the actual situation. For example, if the actual performance monitored by repeat sampling twice is not satisfied, the performance management is determined.
  • the determining unit 401 when the actual performance meets the target performance, the determining unit 401 repeatedly performs the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels, or the obtaining unit 402 repeats Perform the step of obtaining the actual performance of at least one managed object.
  • resampling may be performed, ie, at least one management object is reselected in the first service level. It is also possible to continue to monitor at least one managed object that was previously sampled in order to perform performance management when its performance does not meet the target performance.
  • the determining unit 401 is further configured to determine, in the management object corresponding to the first service level, at least one management object that meets a predetermined condition, where the predetermined condition includes a setup time, location information, a load status, and a history. And at least one of the management objects corresponding to the first service level is determined according to a predetermined algorithm, wherein the predetermined algorithm comprises at least one of random selection, sequential selection, and time dynamic selection.
  • the management object includes at least one of a virtual machine VM, a storage volume, an input/output I/O port, a virtual switch vSwitch, a virtual local area network vLAN, a switch, a network bandwidth, and a server.
  • the management apparatus 400 of the embodiment of the present invention determines at least one management object by using the management object corresponding to the first service level of the large-scale cluster, and corresponding to the first service level according to the target performance and actual performance of the at least one management object. Performance management of all managed objects ensures that the performance of most or even all users reaches the target performance, improving or guaranteeing the user experience.
  • FIG. 5 is a schematic block diagram of a management device in accordance with another embodiment of the present invention.
  • the management device 500 of FIG. 5 includes a processor 51 and a memory 52, and the processor 51 and the memory 52 are connected by a bus system 53.
  • the memory 52 is configured to store an instruction for causing the processor 51 to: determine at least one management object among the management objects corresponding to the first service level of the plurality of service levels, wherein the management object is a resource unit in the large-scale cluster; determining at least The target performance of a management object; obtaining the actual performance of at least one management object; performing performance management on the management object corresponding to the first service level according to the target performance and actual performance.
  • the management apparatus 500 of the embodiment of the present invention determines at least one management object by using the management object corresponding to the first service level of the large-scale cluster, and corresponding to the first service level according to the target performance and actual performance of the at least one management object. Performance management of all managed objects ensures that the performance of most or even all users reaches the target performance and improves the user experience.
  • the resource units of a large-scale cluster may be divided into a computing resource unit, a storage resource unit, a network resource unit, a physical resource unit, and the like, for providing services such as calculation, storage, and transmission to the user.
  • the computing resource unit may be a virtual machine VM or the like
  • the storage resource unit may be a storage volume and a logical unit number LUN, etc.
  • the network resource unit may be an input/output I/O port, a virtual switch vSwitch, a virtual local area network vLAN, a switch, and the like.
  • the network resource bandwidth, etc., the physical resource unit can be a server or the like.
  • the management device 50 may further include a transmitting circuit 54, a receiving circuit 55, and the like.
  • the processor 51 controls the operation of the management device 50, which may also be referred to as a CPU (Central Processing Unit).
  • Memory 52 can include read only memory and random access memory and provides instructions and data to processor 51.
  • a portion of memory 52 may also include non-volatile random access memory (NVRAM).
  • the various components of the management device 50 are coupled together by a bus system 53, which may include, in addition to the data bus, a power bus, a control bus, a status signal bus, and the like. However, for clarity of description, various buses are labeled as the bus system 53 in the figure.
  • Processor 51 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in the processor 51 or an instruction in a form of software.
  • the processor 51 described above may be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or discrete hardware. Component.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like. Combined with the present invention
  • the steps of the method disclosed in the embodiment may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 52, and the processor 51 reads the information in the memory 52 and performs the steps of the above method in combination with its hardware.
  • the method before determining at least one management object among the management objects corresponding to the first service level of the multiple service levels, the method further includes: determining, by using the service level agreement SLA, a plurality of management objects in the large-scale cluster. Service level.
  • the method further includes: determining a target performance of the first service level among the multiple service levels; determining a target of the at least one management object.
  • the performance includes: determining a target performance of the first service level as a target performance of the at least one management object.
  • determining target performance of the at least one management object includes at least one of: determining target performance corresponding to the at least one management object according to the predetermined performance policy; or manually setting target performance of the at least one management object .
  • the type of target performance includes at least one of a response delay, an IOPS per second read/write number, a data transmission rate, and a CPU usage.
  • acquiring actual performance of the at least one management object includes: periodically or continuously monitoring actual performance of the at least one management object.
  • performing performance management on the management object corresponding to the first service level according to the target performance and the actual performance including: determining whether the obtained actual performance meets the target performance; when the actual performance does not meet the target performance, Performance management is performed on the management object corresponding to the first service level and/or the management object corresponding to the other service levels of the first service level among the plurality of service levels, so that the actual performance of the first service level satisfies the target performance.
  • the performance management includes at least one of the following: service migration; service restriction; traffic control; resource scheduling; and issuing an alarm.
  • the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels is repeatedly performed, or the obtaining at least one management object is repeatedly performed.
  • the actual performance steps when the actual performance meets the target performance, the step of determining at least one management object among the management objects corresponding to the first service level of the multiple service levels is repeatedly performed, or the obtaining at least one management object is repeatedly performed.
  • determining, by the management object corresponding to the first service level of the multiple service levels, the at least one management object includes: determining, in the management object corresponding to the first service level, Determining at least one management object satisfying a predetermined condition, wherein the predetermined condition includes at least one of establishment time, location information, load condition, and history; or determining at least one management object among the management objects corresponding to the first service level according to a predetermined algorithm
  • the predetermined algorithm includes at least one of random selection, sequential selection, and temporal dynamic selection.
  • the management object includes at least one of a virtual machine VM, a storage volume, an input/output I/O port, a network bandwidth, a virtual switch vSwitch, a virtual local area network vLAN, a switch, and a server.
  • the management apparatus 500 of the embodiment of the present invention determines at least one management object by using the management object corresponding to the first service level of the large-scale cluster, and corresponding to the first service level according to the target performance and actual performance of the at least one management object. Performance management of all managed objects ensures that the performance of most or even all users reaches the target performance, improving or guaranteeing the user experience.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another The system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

一种大规模集群的管理方法、装置和系统,能够按照服务等级对用户进行性能管理和资源调度,提高用户体验。该方法包括:在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中管理对象为大规模集群中的资源单元;确定至少一个管理对象的目标性能;获取至少一个管理对象的实际性能;根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理。通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高了用户体验。

Description

大规模集群的管理方法、装置和系统 技术领域
本发明涉及云计算领域,并且更具体地,涉及大规模集群的管理方法、装置和系统。
背景技术
随着计算机网络的进一步发展和海量数据计算能力的要求,各种大型计算能力的计算机硬件不断出现。此外,全球信息系统万维网也非常流行。这些软硬件技术或设备的出现,为提出一种新型的称为“云计算(Cloud Computing)”的计算模型提供了可能。
狭义的云计算指信息技术(Information Technology,简称为“IT”)基础设施的交付和使用模式,指通过网络以按需、易扩展的方式获得所需资源;提供资源的网络被称为“云(Cloud)”。“云”中的资源在使用者看来是可以无限扩展的,并且可以随时获取,随时扩展,按需使用,并按使用付费。
广义的云计算指服务的交付和使用模式,指通过网络以按需、易扩展的方式获得所需服务。这种服务可以与IT、软件、互联网相关,也可以是其他服务,提供服务的网络被称为“云(Cloud)”。“云”是一些可以自我维护和管理的虚拟计算资源,通常为一些大型服务器集群,包括计算服务器、存储服务器、宽带资源等。云计算对大量用网络连接的计算资源进行统一管理和调度,构成一个计算资源池,以向用户提供按需服务。
由于云计算具有超大规模、虚拟化、高可靠性、通用性、高扩展性、按需服务等特性,云计算越来越受到广泛的关注。
在云计算应用中,云计算数据中心整合计算资源、存储资源和网络资源,利用虚拟化等技术并通过网络提供给用户使用。应用的形式可以包括虚拟机(Virtual Machine,简称为“VM”)、存储卷等。虚拟化技术通过产生大规模的虚拟机和大规模的存储卷等应用,构成大规模大规模集群。如何对大规模大规模集群进行性能管理以及体验保证成为越来越需要关注的问题。
现有的大规模大规模集群的管理通常以服务器(Server)、资源池(Pool) 甚至集群(Cluster)为单位,即使以用户为单位的性能管理也仅仅针对少数VIP用户所对应的少量资源,这样,绝大多数的用户的性能管理是无法被保证的,用户体验较差。
发明内容
本发明实施例提供一种大规模集群的管理方法、装置和系统,能够按照服务等级对用户进行性能管理和资源调度,提高用户体验。
第一方面,提供了一种大规模集群的管理方法,包括:在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中所述管理对象为所述大规模集群中的资源单元;确定所述至少一个管理对象的目标性能;获取所述至少一个管理对象的实际性能;根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理。
结合第一方面,在第一方面的第一种实现方式中,所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象之前,还包括:根据服务等级协议SLA为所述大规模集群中的管理对象确定所述多个服务等级。
结合第一方面及其上述实现方式,在第一方面的第二种实现方式中,所述根据SLA为所述大规模集群中的管理对象确定多个服务等级之后,还包括:确定所述多个服务等级中第一服务等级的目标性能;所述确定所述至少一个管理对象的目标性能,包括:将所述第一服务等级的目标性能确定为所述至少一个管理对象的目标性能。
结合第一方面及其上述实现方式,在第一方面的第三种实现方式中,所述确定所述至少一个管理对象的目标性能包括以下中的至少一种:根据预定的性能策略确定所述至少一个管理对象对应的所述目标性能;或者人工设置所述至少一个管理对象的所述目标性能。
结合第一方面及其上述实现方式,在第一方面的第四种实现方式中,所述目标性能的类型包括响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。
结合第一方面及其上述实现方式,在第一方面的第五种实现方式中,所述获取所述至少一个管理对象的实际性能,包括:周期性或持续性地监测所述至少一个管理对象的实际性能。
结合第一方面及其上述实现方式,在第一方面的第六种实现方式中,所述根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理,包括:确定获取到的所述实际性能是否满足所述目标性能;在所述实际性能不满足所述目标性能时,对所述第一服务等级对应的管理对象和/或所述多个服务等级中除所述第一服务等级的其他服务等级对应的管理对象进行所述性能管理,以使得所述第一服务等级的实际性能满足所述目标性能。
结合第一方面及其上述实现方式,在第一方面的第七种实现方式中,所述性能管理包括以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
结合第一方面及其上述实现方式,在第一方面的第八种实现方式中,在所述实际性能满足所述目标性能时,重复执行所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行所述获取所述至少一个管理对象的实际性能的步骤。
结合第一方面及其上述实现方式,在第一方面的第九种实现方式中,所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,包括:在所述第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中所述预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者根据预定算法在所述第一服务等级对应的管理对象中确定至少一个管理对象,其中所述预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
结合第一方面及其上述实现方式,在第一方面的第十种实现方式中,所述管理对象包括虚拟机VM、存储卷、虚拟交换机vSwitch、虚拟本地局域网vLAN、输入输出I/O端口、交换机、网络带宽和服务器中的至少一种。
第二方面,提供了一种大规模集群的管理装置,包括:确定单元,用于在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中所述管理对象为所述大规模集群中的资源单元;所述确定单元还用于确定所述至少一个管理对象的目标性能;获取单元,用于获取所述至少一个管理对象的实际性能;性能管理单元,用于根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理。
结合第二方面,在第二方面的第一种实现方式中,所述确定单元还用于: 根据服务等级协议SLA为所述大规模集群中的管理对象确定所述多个服务等级。
结合第二方面及其上述实现方式,在第二方面的第二种实现方式中,所述确定单元还用于:确定所述多个服务等级中第一服务等级的目标性能;将所述第一服务等级的目标性能确定为所述至少一个管理对象的目标性能。
结合第二方面及其上述实现方式,在第二方面的第三种实现方式中,所述确定单元具体用于:根据预定的性能策略确定所述至少一个管理对象对应的所述目标性能;或者人工设置所述至少一个管理对象的所述目标性能。
结合第二方面及其上述实现方式,在第二方面的第四种实现方式中,所述确定单元确定的目标性能的类型包括响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。
结合第二方面及其上述实现方式,在第二方面的第五种实现方式中,所述获取单元具体用于:周期性或持续性地监测所述至少一个管理对象的实际性能。
结合第二方面及其上述实现方式,在第二方面的第六种实现方式中,所述性能管理单元具体用于:通过所述确定单元确定获取到的所述实际性能是否满足所述目标性能;在所述实际性能不满足所述目标性能时,对所述第一服务等级对应的管理对象和/或所述多个服务等级中除所述第一服务等级的其他服务等级对应的管理对象进行所述性能管理,以使得所述第一服务等级的实际性能满足所述目标性能。
结合第二方面及其上述实现方式,在第二方面的第七种实现方式中,所述性能管理包括以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
结合第二方面及其上述实现方式,在第二方面的第八种实现方式中,在所述实际性能满足所述目标性能时,所述确定单元重复执行所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者所述获取单元重复执行所述获取所述至少一个管理对象的实际性能的步骤。
结合第二方面及其上述实现方式,在第二方面的第九种实现方式中,所述确定单元具体用于:
在所述第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中所述预定条件包括建立时间、位置信息、负载情况和历史记 录中的至少一种;或者根据预定算法在所述第一服务等级对应的管理对象中确定至少一个管理对象,其中所述预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
结合第二方面及其上述实现方式,在第二方面的第九种实现方式中,所述管理对象包括虚拟机VM、存储卷、虚拟交换机vSwitch、虚拟本地局域网vLAN、输入输出I/O端口、交换机、网络带宽和服务器中的至少一种。
本发明实施例通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高或者保障了用户体验。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面所描述的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一个实施例的大规模集群管理系统的系统框图;
图2是本发明一个实施例的管理方法的流程图;
图3是本发明一个实施例的管理方法的流程图;
图4是本发明一个实施例的管理装置的示意框图;
图5是本发明另一实施例的管理装置的示意框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
图1是本发明一个实施例的大规模集群的管理系统的系统框图。图1中示出的大规模集群的管理系统100包括:管理对象确定模块101、目标性能确定模块102、实际性能获取模块103、性能管理模块104和大规模集群105。其中管理对象确定模块101、实际性能获取模块103和性能管理模块104都 与大规模集群105相连接,管理对象确定模块101与目标性能确定模块102相连接,目标性能确定模块102和实际性能获取模块103都与性能管理模块104相连接。
管理对象确定模块101用于在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中管理对象为大规模集群105中的资源单元。资源单元可以分为计算资源单元、存储资源单元、网络资源单元、物理资源单元等。更具体一些,计算资源单元可以为虚拟机(Virtual Machine,VM)等,存储资源单元可以为存储卷和逻辑单元号(Logical Unit Number,LUN)等,网络资源单元可以为输入输出(Input/Output,I/O)端口、网络带宽、虚拟交换机(Virtual Switch,vSwitch)、虚拟局域网(Virtual Local Area Network,vLAN)、交换机等,物理资源单元可以为服务器等。
目标性能确定模块102用于确定上述至少一个管理对象的目标性能,具体地,可以根据预定的性能策略确定至少一个管理对象对应的目标性能;或者人工设置至少一个管理对象的目标性能;或者将上述至少一个管理对象对应的第一服务等级的目标性能确定为该至少一个管理对象的目标性能。
实际性能获取模块103用于获取上述至少一个管理对象的实际性能,具体地,可以周期性或持续性地监测并且统计至少一个管理对象的实际性能。
性能管理模块104用于根据目标性能确定模块102确定的目标性能和实际性能获取模块103获取到的实际性能对第一服务等级对应的管理对象进行性能管理。
具体地,在实际性能不满足目标性能时,对第一服务等级对应的管理对象和/或多个服务等级中除第一服务等级的其他服务等级对应的管理对象进行性能管理,以使得第一服务等级的实际性能满足目标性能,其中性能管理的方法包括但不限于以下几种:业务迁移;业务限制;流量控制;资源调度;发出告警等。
在实际性能满足目标性能时,可以由目标性能确定模块102重新确定至少一个管理对象,或者可以由实际性能获取模块103继续监测之前确定的至少一个管理对象的实际性能。
本发明实施例的大规模集群的管理系统100通过在第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够 保证绝大多数甚至是全部用户的性能达到目标性能,提高或者保障了用户体验。
图2是本发明一个实施例的管理方法的流程图。
201,在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中管理对象为大规模集群中的资源单元。
202,确定至少一个管理对象的目标性能。
203,获取至少一个管理对象的实际性能。
204,根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理。
本发明实施例通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高了用户体验。
应理解,大规模集群的资源单元可以分为计算资源单元、存储资源单元、网络资源单元、物理资源单元等,用于为用户提供计算、存储、传输等服务。更具体一些,计算资源单元可以为虚拟机VM等,存储资源单元可以为存储卷和逻辑单元号LUN等,网络资源单元可以为输入输出I/O端口、虚拟交换机vSwitch、虚拟局域网vLAN、交换机和网络带宽等,物理资源单元可以为服务器等。
可选地,作为一个实施例,在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象之前,还包括:根据服务等级协议(Service level Agreement,SLA)为大规模集群中的管理对象确定多个服务等级。
首先,作为一个前置过程,可以在选取管理对象之前首先对大规模集群中的用户或者管理对象进行服务等级的划分。具体地可以通过SLA来进行等级划分,也可以由网络维护人员根据一定的属性,例如管理对象的地点信息、服务类型、服务目标等进行等级划分。在等级划分的对象为用户时,等同于等级划分的对象为向用户提供服务的至少一个资源单元,即管理对象。
此外,服务等级的划分可以是单纯的等级划分,也可以在进行服务等级划分时就确定了某个/多个服务等级的目标性能,这里目标性能可以理解为所要达到的服务质量(Quality of Service,QoS)。
可选地,作为一个实施例,根据SLA为大规模集群中的管理对象确定 多个服务等级之后,还包括:确定多个服务等级中第一服务等级的目标性能;确定至少一个管理对象的目标性能,包括:将第一服务等级的目标性能确定为至少一个管理对象的目标性能。结合上述实施例,在划分服务等级时如果已经确定了服务等级的目标性能,则可以将该服务等级的目标性能确定为该服务等级中选取的作为样本的至少一个管理对象的目标性能。
可选地,作为一个实施例,确定至少一个管理对象的目标性能包括以下中的至少一种:根据预定的性能策略确定至少一个管理对象对应的目标性能;或者人工设置至少一个管理对象的目标性能。
除了上述将服务等级的目标性能确定为管理对象的服务性能之外,还可以直接针对确定的至少一个管理对象确定其目标性能,具体地可以根据预定的性能策略来确定,即系统中可以预设有性能策略文件,通过管理对象的某些属性结合性能策略文件能够确定使得管理对象能够得到性能保证的目标性能,举个例子,策略文件可以包含管理对象的服务类型、地理位置等信息与目标性能的对应关系。此外,还可以由网络维护人员通过管理界面手动设置管理对象的目标性能。
可选地,作为一个实施例,目标性能的类型可以包括但不限于响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。容易理解地,目标性能可以是单一参数,也可以是多种参数的组合,本发明对此并不限定。
可选地,作为一个实施例,获取至少一个管理对象的实际性能,包括:周期性或持续性地监测至少一个管理对象的实际性能。应理解,实际性能可以与目标性能的类型相同,也可以不同。
可选地,作为一个实施例,根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理,包括:确定获取到的实际性能是否满足目标性能;在实际性能不满足目标性能时,对第一服务等级对应的管理对象和/或多个服务等级中除第一服务等级的其他服务等级对应的管理对象进行性能管理,以使得第一服务等级的实际性能满足目标性能。
可选地,性能管理可以包括但不限于以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
也就是说,如果检测到的实际性能不满足预期(目标性能),则可以对当前检测的第一服务等级、或者其他服务等级进行业务迁移、业务限制、流 量控制、资源调度等操作来使得该第一服务等级能够满足目标性能。例如,当第一服务等级中选定的至少一个管理对象被监测到的实际性能为CPU占用率高于90%(目标性能为CPU占用率小于等于90%),则可以对该第一服务等级的管理对象进行业务迁移,以使得CPU占用率降至90%或以下,应理解,还可以使用其他调控方法来达到目标性能,例如为该第一服务等级的管理对象分配更多的资源等等,本发明对此并不限定。
此外,还可能通过对其他服务等级进行管控或调度来使得第一服务等级达到目标性能,例如,当第一服务等级的实际性能I/O时延不满足目标性能时,可以通过降低较低优先级的服务等级的业务流量来使得第一服务等级满足目标性能。当然,还可以通过同时对第一服务等级和其他服务等级进行管控或调度来使得第一服务等级达到目标性能。另外,还可以发出告警而暂不进行管控或调度,等待工作人员或其他网管设备的进一步指令。不失一般性地,还可以通过对第一服务等级进行性能管理,以使得其他服务等级达到期望性能。
可选地,在实际性能不满足目标性能时,也可以重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行获取至少一个管理对象的实际性能的步骤。也就是说,可以重新进行采样进行再次检测,或者继续持续进行监测。这样,可以通过设定重复次数的阈值来使得性能管理系统的采样和监测有更高的精度,更加接近实际的情况。例如,可以预先设定重复采样2次所监测到的实际性能都不满足目标性能,则确定进行上述性能管理。
可选地,作为一个实施例,在实际性能满足目标性能时,重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行获取至少一个管理对象的实际性能的步骤。当性能满足不需要管控或者调度时,可以进行重新采样,即在第一服务等级中重新选定至少一个管理对象。也可以继续针对先前采样的至少一个管理对象进行监测,以便于在其性能不满足目标性能时进行性能管理。
可选地,作为一个实施例,在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,包括:在第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者根据预定算法在第一服务等级 对应的管理对象中确定至少一个管理对象,其中预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
可选地,作为一个实施例,管理对象包括虚拟机VM、存储卷、输入输出I/O端口、网络带宽和服务器中的至少一种。
本发明实施例通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高或者保障了用户体验。
图3是本发明一个实施例的管理方法的流程图。
301,服务等级划分
首先,作为一个可选步骤,可以在选取管理对象之前对大规模集群中的用户或者管理对象进行服务等级的划分。具体地可以通过SLA来进行等级划分,也可以由网络维护人员根据一定的属性,例如管理对象的地点信息、服务类型、服务目标等进行等级划分。在等级划分的对象为用户时,等同于等级划分的对象为向用户提供服务的至少一个资源单元,即管理对象。
此外,服务等级的划分可以是单纯的等级划分,也可以在进行服务等级划分时就确定了某个/多个服务等级的目标性能,这里目标性能可以理解为所要达到的服务质量(Quality of Service,QoS)。
302,选取管理对象
在大规模集群中选取少量管理对象作为管理对象,这里需要保证一个服务等级中选取至少一个管理对象,其中管理对象为大规模集群中为用户提供服务的资源单元。具体地,大规模集群的资源单元可以分为计算资源单元、存储资源单元、网络资源单元、物理资源单元等,用于为用户提供计算、存储、传输等服务。更具体一些,计算资源单元可以为虚拟机VM等,存储资源单元可以为存储卷和逻辑单元号LUN等,网络资源单元可以为输入输出I/O端口和网络带宽等,物理资源单元可以为服务器等。
针对第一服务等级来说,可以在第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种,例如,预定条件为负载情况达到最大载荷的90%,或者历史记录中出现过N次故障以上等。应理解,选取的至少一个管理对象可以为同一类管理对象,也可以为不同类的管理对象,例如, 可以都为VM、也可以都为存储卷,还可以VM、存储卷等都包含,只要他们是符合上述预定条件的。此外,预定条件也可以为组合形式存在,例如负载情况达到最大载荷的90%的VM,历史记录中出现过N次故障以上的服务器,等等,本发明对此并不限定。
此外,还可以根据预定算法在第一服务等级对应的管理对象中确定至少一个管理对象,其中预定算法包括但不限于随机选取、顺序选取、时间动态选取、智能选取等。作为一个例子,如果预定算法为随机选取,则在管理对象选取时,在第一服务等级中随机选定一定数量的管理对象,这里的数量同样可以是预定算法中预先指定的,又例如,时间动态选取,可以在不同的时间段,或者随着时间的变化而动态地选取管理对象,这样能够保证样本的活性。
不失一般性地,还可以直接指定被采样的管理对象,例如可以由网络维护人员在网络拓扑界面中为某个服务等级选取一个或多个管理对象,作为性能管理的样本。
应理解,由于上述步骤301为可选步骤,因此在步骤301执行时,步骤302中的第一服务等级为上述步骤301中划分的多个服务等级中的一个,在这里,“第一”服务等级仅用于表示某个服务等级,可以为上述多个服务等级中的任意一个。在步骤301不执行时,大规模集群中仍然可以存在服务等级,该服务等级可以是历史确定的服务等级,也可以是用户签约入网时约定的服务等级,此处并不限定。服务等级可以理解为按照相同或相近的性能要求、性能指标、业务类型等确定的管理对象分组。
303,确定目标性能
在确定了作为性能管理样本的至少一个管理对象后,可以确定管理对象的目标性能。具体地,可以根据预定的性能策略确定至少一个管理对象对应的目标性能,还可以人工设置至少一个管理对象的目标性能。也就是说,系统中可以预设有性能策略文件,通过管理对象的某些属性结合性能策略文件能够确定使得管理对象能够得到性能保证的目标性能,举个例子,策略文件可以包含管理对象的服务类型、地理位置等信息与目标性能的对应关系。此外,还可以由网络维护人员通过管理界面手动设置管理对象的目标性能。例如,管理对象为存储卷,具有多个服务等级,针对其中一个服务等级中被选取为样本的存储卷可以将其目标性能设置为时延小于3ms,该设定可以通过 手动设定,也可以是通过策略文件确定的。
此外,还有可能服务等级已经预先对应了目标性能(服务质量QoS),例如,在上述步骤301中划分服务等级时如果已经确定了服务等级的目标性能,则可以将该服务等级的目标性能确定为该服务等级中选取的作为样本的至少一个管理对象的目标性能。
目标性能的类型有很多,可以包括但不限于响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率等等。容易理解地,目标性能可以是单一参数,也可以是多种参数的组合,本发明对此并不限定。
304,监测实际性能
周期性或持续性地监测步骤303中确定的至少一个管理对象的实际性能。检测的实际性能的类型可以与目标类型相同,也可以不同。具体地,在上述步骤303确定的目标性能为时延小于3ms时,检测的实际性能的类型也可是时延,例如监测到管理对象的实际时延为4ms。此外,检测的实际性能与目标类型不同的情况也可能存在,例如,目标性能要求是VM创建时间小于2min,而监控的实际性能指标为MBPS(带宽),则系统认为MBPS达不到50MB/S,VM 2min内创建完成的目标不可达成,故进行性能策略调度等等。
305,判断
系统在接收到检测的实际性能后,可以对检测到的实际性能的数据结合目标性能进行分析,即判断实际性能是否达到目标性能。也就是说,可以通过上述步骤302中确定的采样的管理对象的性能表现来预估决策整个同服务等级的管理对象或集群资源,以便于对该服务等级进行整体评估和管理。
306,不满足目标性能
如果经过判断确定上述实际性能不满足目标性能,则需要确定进行何种方式的性能管理。一般来说有几种性能管理方式:例如迁移、限制、调度、告警等等。例如,目标性能设定了IO延时、IOPS和CPU占用率,实际监测到的实际性能CPU占用率超标,则可以指定迁移策略,执行业务迁移,减少该服务等级的管理对象的业务负载,以满足用户体验指标要求,同时可以平衡全系统的负载;如果实际性能IO时延超标,则可以进行资源调度,增加此服务等级的资源配比,如CPU、缓存等,还可以通过限制较低优先级的服务等级的业务流量来满足此服务等级的需求。另外,还可以发出告警而 暂不进行管控或调度,等待工作人员或其他网管设备的进一步指令。此外,还可以通过对第一服务等级进行性能管理来使得其他服务等级的需求得到满足。
此外,在实际性能不满足目标性能时,也可以重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行获取至少一个管理对象的实际性能的步骤。也就是说,可以重新进行采样进行再次检测,或者继续持续进行监测。这样,可以通过设定重复次数的阈值来使得性能管理系统的采样和监测有更高的精度,更加接近实际的情况。例如,可以预先设定重复采样2次所监测到的实际性能都不满足目标性能,则确定进行上述性能管理。
307,满足目标性能
在实际性能满足目标性能时,可以返回步骤302或者可以返回步骤304。也就是说当性能满足而不需要管控或者调度时,可以进行重新采样,即在第一服务等级中重新选定至少一个管理对象。也可以继续针对先前采样的至少一个管理对象进行监测,以便于在其性能不满足目标性能时进行性能管理。
本发明实施例通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高了用户体验。
图4是本发明一个实施例的管理装置的示意框图。图4中的管理装置400包括确定单元401、获取单元402和性能管理单元403。
确定单元401在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中管理对象为大规模集群中的资源单元;确定单元401确定至少一个管理对象的目标性能;获取单元402获取至少一个管理对象的实际性能。性能管理单元403根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理。
本发明实施例的管理装置400通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高了用户体验。
应理解,大规模集群的资源单元可以分为计算资源单元、存储资源单元、 网络资源单元、物理资源单元等,用于为用户提供计算、存储、传输等服务。更具体一些,计算资源单元可以为虚拟机VM等,存储资源单元可以为存储卷和逻辑单元号LUN等,网络资源单元可以为输入输出I/O端口和网络带宽等,物理资源单元可以为服务器等。
还应理解,本发明实施例中的确定单元401可以对应于上述图1所示的大规模集群管理系统100中的管理对象确定模块101和目标性能确定模块102;获取单元402可以对应于上述图1所示的大规模集群管理系统100中的实际性能获取模块103;性能管理单元403可以对应于上述图1所示的大规模集群管理系统100中的性能管理模块104。
可选地,作为一个实施例,确定单元401根据服务等级协议(Service level Agreement,SLA)为大规模集群中的管理对象确定多个服务等级。
首先,作为一个前置过程,可以通过确定单元401在选取管理对象之前首先对大规模集群中的用户或者管理对象进行服务等级的划分。具体地可以通过SLA来进行等级划分,也可以由网络维护人员根据一定的属性,例如管理对象的地点信息、服务类型、服务目标等进行等级划分。在等级划分的对象为用户时,等同于等级划分的对象为向用户提供服务的至少一个资源单元,即管理对象。
此外,服务等级的划分可以是单纯的等级划分,也可以在进行服务等级划分时就确定了某个/多个服务等级的目标性能,这里目标性能可以理解为所要达到的服务质量(Quality of Service,QoS)。
可选地,作为一个实施例,根据SLA为大规模集群中的管理对象确定多个服务等级之后,确定单元401还可以用于确定多个服务等级中第一服务等级的目标性能;确定至少一个管理对象的目标性能,包括:将第一服务等级的目标性能确定为至少一个管理对象的目标性能。结合上述实施例,在划分服务等级时如果已经确定了服务等级的目标性能,则可以将该服务等级的目标性能确定为该服务等级中选取的作为样本的至少一个管理对象的目标性能。
可选地,作为一个实施例,确定单元401还可以用于根据预定的性能策略确定至少一个管理对象对应的目标性能;或者人工设置至少一个管理对象的目标性能。
除了上述将服务等级的目标性能确定为管理对象的服务性能之外,确定 单元401还可以直接针对确定的至少一个管理对象确定其目标性能,具体地可以根据预定的性能策略来确定,即系统中可以预设有性能策略文件,通过管理对象的某些属性结合性能策略文件能够确定使得管理对象能够得到性能保证的目标性能,举个例子,策略文件可以包含管理对象的服务类型、地理位置等信息与目标性能的对应关系。此外,还可以由网络维护人员通过管理界面手动设置管理对象的目标性能。
可选地,作为一个实施例,目标性能的类型可以包括但不限于响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。容易理解地,目标性能可以是单一参数,也可以是多种参数的组合,本发明对此并不限定。
可选地,作为一个实施例,获取单元402具体用于周期性或持续性地监测至少一个管理对象的实际性能。应理解,实际性能可以与目标性能的类型相同,也可以不同。
可选地,作为一个实施例,性能管理单元403具体用于确定获取到的实际性能是否满足目标性能;在实际性能不满足目标性能时,对第一服务等级对应的管理对象和/或多个服务等级中除第一服务等级的其他服务等级对应的管理对象进行性能管理,以使得第一服务等级的实际性能满足目标性能。
可选地,性能管理可以包括但不限于以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
也就是说,如果检测到的实际性能不满足预期(目标性能),则可以对当前检测的第一服务等级、或者其他服务等级进行业务迁移、业务限制、流量控制、资源调度等操作来使得该第一服务等级能够满足目标性能。例如,当第一服务等级中选定的至少一个管理对象被监测到的实际性能为CPU占用率高于90%(目标性能为CPU占用率小于等于90%),则可以对该第一服务等级的管理对象进行业务迁移,以使得CPU占用率降至90%或以下,应理解,还可以使用其他调控方法来达到目标性能,例如为该第一服务等级的管理对象分配更多的资源等等,本发明对此并不限定。
此外,还可能通过对其他服务等级进行管控或调度来使得第一服务等级达到目标性能,例如,当第一服务等级的实际性能I/O时延不满足目标性能时,可以通过降低较低优先级的服务等级的业务流量来使得第一服务等级满足目标性能。当然,还可以通过同时对第一服务等级和其他服务等级进行管 控或调度来使得第一服务等级达到目标性能。另外,还可以发出告警而暂不进行管控或调度,等待工作人员或其他网管设备的进一步指令。
此外,在实际性能不满足目标性能时,也可以重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行获取至少一个管理对象的实际性能的步骤。也就是说,可以重新进行采样进行再次检测,或者继续持续进行监测。这样,可以通过设定重复次数的阈值来使得性能管理系统的采样和监测有更高的精度,更加接近实际的情况。例如,可以预先设定重复采样2次所监测到的实际性能都不满足目标性能,则确定进行上述性能管理。
可选地,作为一个实施例,在实际性能满足目标性能时,确定单元401重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者获取单元402重复执行获取至少一个管理对象的实际性能的步骤。当性能满足不需要管控或者调度时,可以进行重新采样,即在第一服务等级中重新选定至少一个管理对象。也可以继续针对先前采样的至少一个管理对象进行监测,以便于在其性能不满足目标性能时进行性能管理。
可选地,作为一个实施例,确定单元401还用于在第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者根据预定算法在第一服务等级对应的管理对象中确定至少一个管理对象,其中预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
可选地,作为一个实施例,管理对象包括虚拟机VM、存储卷、输入输出I/O端口、虚拟交换机vSwitch、虚拟局域网vLAN、交换机、网络带宽和服务器中的至少一种。
本发明实施例的管理装置400通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高或者保障了用户体验。
图5是本发明另一实施例的管理装置的示意框图。图5的管理装置500包括处理器51和存储器52,处理器51和存储器52通过总线系统53相连。
存储器52用于存储使得处理器51执行以下操作的指令:在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中管理对象为大规模集群中的资源单元;确定至少一个管理对象的目标性能;获取至少一个管理对象的实际性能;根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理。
本发明实施例的管理装置500通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高了用户体验。
应理解,大规模集群的资源单元可以分为计算资源单元、存储资源单元、网络资源单元、物理资源单元等,用于为用户提供计算、存储、传输等服务。更具体一些,计算资源单元可以为虚拟机VM等,存储资源单元可以为存储卷和逻辑单元号LUN等,网络资源单元可以为输入输出I/O端口、虚拟交换机vSwitch、虚拟局域网vLAN、交换机和网络带宽等,物理资源单元可以为服务器等。
此外,管理装置50还可以包括发射电路54、接收电路55等。处理器51控制管理装置50的操作,处理器51还可以称为CPU(Central Processing Unit,中央处理单元)。存储器52可以包括只读存储器和随机存取存储器,并向处理器51提供指令和数据。存储器52的一部分还可以包括非易失性随机存取存储器(NVRAM)。管理装置50的各个组件通过总线系统53耦合在一起,其中总线系统53除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统53。
上述本发明实施例揭示的方法可以应用于处理器51中,或者由处理器51实现。处理器51可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器51中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器51可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实 施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器52,处理器51读取存储器52中的信息,结合其硬件完成上述方法的步骤。
可选地,作为一个实施例,在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象之前,还包括:根据服务等级协议SLA为大规模集群中的管理对象确定多个服务等级。
可选地,作为一个实施例,根据SLA为大规模集群中的管理对象确定多个服务等级之后,还包括:确定多个服务等级中第一服务等级的目标性能;确定至少一个管理对象的目标性能,包括:将第一服务等级的目标性能确定为至少一个管理对象的目标性能。
可选地,作为一个实施例,确定至少一个管理对象的目标性能包括以下中的至少一种:根据预定的性能策略确定至少一个管理对象对应的目标性能;或者人工设置至少一个管理对象的目标性能。
可选地,作为一个实施例,目标性能的类型包括响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。
可选地,作为一个实施例,获取至少一个管理对象的实际性能,包括:周期性或持续性地监测至少一个管理对象的实际性能。
可选地,作为一个实施例,根据目标性能和实际性能对第一服务等级对应的管理对象进行性能管理,包括:确定获取到的实际性能是否满足目标性能;在实际性能不满足目标性能时,对第一服务等级对应的管理对象和/或多个服务等级中除第一服务等级的其他服务等级对应的管理对象进行性能管理,以使得第一服务等级的实际性能满足目标性能。
可选地,作为一个实施例,性能管理包括以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
可选地,作为一个实施例,在实际性能满足目标性能时,重复执行在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行获取至少一个管理对象的实际性能的步骤。
可选地,作为一个实施例,在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,包括:在第一服务等级对应的管理对象中确 定满足预定条件的至少一个管理对象,其中预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者根据预定算法在第一服务等级对应的管理对象中确定至少一个管理对象,其中预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
可选地,作为一个实施例,管理对象包括虚拟机VM、存储卷、输入输出I/O端口、网络带宽、虚拟交换机vSwitch、虚拟局域网vLAN、交换机和服务器中的至少一种。
本发明实施例的管理装置500通过在大规模集群的第一服务等级对应的管理对象中确定至少一个管理对象,并根据该至少一个管理对象的目标性能和实际性能对该第一服务等级对应的所有管理对象进行性能管理,从而能够保证绝大多数甚至是全部用户的性能达到目标性能,提高或者保障了用户体验。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个 系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种大规模集群的管理方法,其特征在于,包括:
    在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中所述管理对象为所述大规模集群中的资源单元;
    确定所述至少一个管理对象的目标性能;
    获取所述至少一个管理对象的实际性能;
    根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理。
  2. 根据权利要求1所述的方法,其特征在于,所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象之前,还包括:根据服务等级协议SLA为所述大规模集群中的管理对象确定所述多个服务等级。
  3. 根据权利要求2所述的方法,其特征在于,所述根据SLA为所述大规模集群中的管理对象确定多个服务等级之后,还包括:确定所述多个服务等级中第一服务等级的目标性能;
    所述确定所述至少一个管理对象的目标性能,包括:将所述第一服务等级的目标性能确定为所述至少一个管理对象的目标性能。
  4. 根据权利要求2或3所述的方法,其特征在于,所述确定所述至少一个管理对象的目标性能包括以下中的至少一种:根据预定的性能策略确定所述至少一个管理对象对应的所述目标性能;或者人工设置所述至少一个管理对象的所述目标性能。
  5. 根据权利要求1-4中任意一项所述的方法,其特征在于,所述目标性能的类型包括响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。
  6. 根据权利要求5所述的方法,其特征在于,所述获取所述至少一个管理对象的实际性能,包括:周期性或持续性地监测所述至少一个管理对象的实际性能。
  7. 根据权利要求1所述的方法,其特征在于,所述根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理,包括:
    确定获取到的所述实际性能是否满足所述目标性能;
    在所述实际性能不满足所述目标性能时,对所述第一服务等级对应的管 理对象和/或所述多个服务等级中除所述第一服务等级的其他服务等级对应的管理对象进行所述性能管理,以使得所述第一服务等级的实际性能满足所述目标性能。
  8. 根据权利要求7所述的方法,其特征在于,所述性能管理包括以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
  9. 根据权利要求7所述的方法,其特征在于,在所述实际性能满足所述目标性能时,重复执行所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者重复执行所述获取所述至少一个管理对象的实际性能的步骤。
  10. 根据权利要求1-9中任意一项所述的方法,其特征在于,所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,包括:
    在所述第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中所述预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者
    根据预定算法在所述第一服务等级对应的管理对象中确定至少一个管理对象,其中所述预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
  11. 根据权利要求1-10中任意一项所述的方法,其特征在于,所述管理对象包括虚拟机VM、存储卷、虚拟交换机vSwitch、虚拟本地局域网vLAN、输入输出I/O端口、网络带宽、交换机和服务器中的至少一种。
  12. 一种大规模集群的管理装置,其特征在于,包括:
    确定单元,用于在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象,其中所述管理对象为所述大规模集群中的资源单元;
    所述确定单元还用于确定所述至少一个管理对象的目标性能;
    获取单元,用于获取所述至少一个管理对象的实际性能;
    性能管理单元,用于根据所述目标性能和所述实际性能对所述第一服务等级对应的管理对象进行性能管理。
  13. 根据权利要求12所述的装置,其特征在于,所述确定单元还用于:根据服务等级协议SLA为所述大规模集群中的管理对象确定所述多个服务等级。
  14. 根据权利要求13所述的装置,其特征在于,所述确定单元还用于:
    确定所述多个服务等级中第一服务等级的目标性能;
    将所述第一服务等级的目标性能确定为所述至少一个管理对象的目标性能。
  15. 根据权利要求13或14所述的装置,其特征在于,所述确定单元具体用于:根据预定的性能策略确定所述至少一个管理对象对应的所述目标性能;或者人工设置所述至少一个管理对象的所述目标性能。
  16. 根据权利要求12-15中任意一项所述的装置,其特征在于,所述确定单元确定的目标性能的类型包括响应时延、每秒读写次数IOPS、数据传输速率、CPU占用率中的至少一种。
  17. 根据权利要求16所述的装置,其特征在于,所述获取单元具体用于:周期性或持续性地监测所述至少一个管理对象的实际性能。
  18. 根据权利要求12所述的装置,其特征在于,所述性能管理单元具体用于:
    通过所述确定单元确定获取到的所述实际性能是否满足所述目标性能;
    在所述实际性能不满足所述目标性能时,对所述第一服务等级对应的管理对象和/或所述多个服务等级中除所述第一服务等级的其他服务等级对应的管理对象进行所述性能管理,以使得所述第一服务等级的实际性能满足所述目标性能。
  19. 根据权利要求18所述的装置,其特征在于,所述性能管理包括以下中的至少一种:业务迁移;业务限制;流量控制;资源调度;发出告警。
  20. 根据权利要求18所述的装置,其特征在于,在所述实际性能满足所述目标性能时,所述确定单元重复执行所述在多个服务等级的第一服务等级对应的管理对象中确定至少一个管理对象的步骤,或者所述获取单元重复执行所述获取所述至少一个管理对象的实际性能的步骤。
  21. 根据权利要求12-20中任意一项所述的装置,其特征在于,所述确定单元具体用于:
    在所述第一服务等级对应的管理对象中确定满足预定条件的至少一个管理对象,其中所述预定条件包括建立时间、位置信息、负载情况和历史记录中的至少一种;或者
    根据预定算法在所述第一服务等级对应的管理对象中确定至少一个管 理对象,其中所述预定算法包括随机选取、顺序选取、时间动态选取中的至少一种。
  22. 根据权利要求12-21中任意一项所述的装置,其特征在于,所述管理对象包括虚拟机VM、存储卷、虚拟交换机vSwitch、虚拟本地局域网vLAN、输入输出I/O端口、交换机、网络带宽和服务器中的至少一种。
PCT/CN2014/089538 2013-12-31 2014-10-27 大规模集群的管理方法、装置和系统 WO2015101089A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310752189.5A CN103763130B (zh) 2013-12-31 2013-12-31 大规模集群的管理方法、装置和系统
CN201310752189.5 2013-12-31

Publications (1)

Publication Number Publication Date
WO2015101089A1 true WO2015101089A1 (zh) 2015-07-09

Family

ID=50530293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/089538 WO2015101089A1 (zh) 2013-12-31 2014-10-27 大规模集群的管理方法、装置和系统

Country Status (2)

Country Link
CN (1) CN103763130B (zh)
WO (1) WO2015101089A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103763130B (zh) * 2013-12-31 2018-06-19 华为数字技术(苏州)有限公司 大规模集群的管理方法、装置和系统
CN104199741A (zh) * 2014-08-29 2014-12-10 曙光信息产业(北京)有限公司 一种用于云计算环境的虚拟数据管理方法
CN105515817A (zh) * 2015-01-21 2016-04-20 上海北塔软件股份有限公司 一种将管理对象进行等级化运维的方法及系统
CN107251007B (zh) * 2015-03-25 2021-10-01 英特尔公司 集群计算服务确保装置和方法
CN106878042A (zh) * 2015-12-18 2017-06-20 北京奇虎科技有限公司 基于sla的容器资源调度方法和系统
CN106921512B (zh) * 2015-12-28 2020-08-04 中移(苏州)软件技术有限公司 一种大数据集群租户带宽控制方法及装置
CN105975343B (zh) * 2016-05-10 2019-10-15 广东睿江云计算股份有限公司 一种云主机系统中服务质量的控制方法及装置
CN106020973A (zh) * 2016-05-10 2016-10-12 广东睿江云计算股份有限公司 云主机系统中的cpu调度方法及装置
CN107704213B (zh) * 2017-11-02 2021-08-31 郑州云海信息技术有限公司 一种存储阵列的自动化服务质量管理方法及装置
CN107800574B (zh) * 2017-11-03 2021-05-28 郑州云海信息技术有限公司 存储qos调节方法、系统、设备及计算机可读存储器
CN114697210B (zh) 2017-11-22 2023-11-03 华为技术有限公司 一种网络性能保障方法及装置
CN109992424B (zh) * 2017-12-29 2024-04-02 北京华胜天成科技股份有限公司 本地网络的业务关联关系的确定方法及装置
CN108494588A (zh) * 2018-03-12 2018-09-04 深圳市瑞驰信息技术有限公司 一种集群块设备动态QoS配置的系统及方法
CN108958648A (zh) * 2018-05-08 2018-12-07 广东睿江云计算股份有限公司 一种云磁盘存放优化的方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004671A (zh) * 2010-11-15 2011-04-06 北京航空航天大学 一种云计算环境下数据中心基于统计模型的资源管理方法
US20120331113A1 (en) * 2011-06-27 2012-12-27 Microsoft Corporation Resource management for cloud computing platforms
WO2013003031A2 (en) * 2011-06-27 2013-01-03 Microsoft Corporation Resource management for cloud computing platforms
CN103763130A (zh) * 2013-12-31 2014-04-30 华为数字技术(苏州)有限公司 大规模集群的管理方法、装置和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004671A (zh) * 2010-11-15 2011-04-06 北京航空航天大学 一种云计算环境下数据中心基于统计模型的资源管理方法
US20120331113A1 (en) * 2011-06-27 2012-12-27 Microsoft Corporation Resource management for cloud computing platforms
WO2013003031A2 (en) * 2011-06-27 2013-01-03 Microsoft Corporation Resource management for cloud computing platforms
CN103763130A (zh) * 2013-12-31 2014-04-30 华为数字技术(苏州)有限公司 大规模集群的管理方法、装置和系统

Also Published As

Publication number Publication date
CN103763130A (zh) 2014-04-30
CN103763130B (zh) 2018-06-19

Similar Documents

Publication Publication Date Title
WO2015101089A1 (zh) 大规模集群的管理方法、装置和系统
CN108370341B (zh) 资源配置方法、虚拟网络功能管理器和网元管理系统
CN112153700B (zh) 一种网络切片资源管理方法及设备
CN107210928B (zh) 分布式和自适应计算机网络分析
EP2972855B1 (en) Automatic configuration of external services based upon network activity
JP6563936B2 (ja) クラウドに基づく仮想オーケストレーターのための方法、システム、およびコンピュータ読取可能な媒体
US9547534B2 (en) Autoscaling applications in shared cloud resources
JP6559670B2 (ja) ネットワーク機能仮想化情報コンセントレータのための方法、システム、およびコンピュータ読取可能媒体
US10250684B2 (en) Methods and systems for determining performance capacity of a resource of a networked storage environment
JP6441950B2 (ja) 分散システムにおける集中ネットワーク構成
US10048896B2 (en) Methods and systems for determining performance capacity of a resource of a networked storage environment
CN103475544A (zh) 一种基于云资源监控平台的业务监控方法
EP3061209B1 (en) Methods, nodes and computer program for enabling of resource component allocation
US20210119854A1 (en) Scalable statistics and analytics mechanisms in cloud networking
US20210406053A1 (en) Rightsizing virtual machine deployments in a cloud computing environment
CN113032410A (zh) 数据处理方法、装置、电子设备及计算机存储介质
US9094321B2 (en) Energy management for communication network elements
US10623474B2 (en) Topology graph of a network infrastructure and selected services status on selected hubs and nodes
US20180255157A1 (en) Network service chains using hardware logic devices in an information handling system
CN111756643B (zh) 一种港口运行网络控制方法
CN116760785B (zh) 带宽分配方法、装置、电子设备及存储介质
US20230328157A1 (en) Network orchestration for device management operations
US11277310B2 (en) Systemic adaptive data management in an internet of things environment
US20140189157A1 (en) Energy management for communication network elements
CN114584515A (zh) 存储区域网络拥塞的端点通知

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14877425

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14877425

Country of ref document: EP

Kind code of ref document: A1