WO2014019119A1 - Procédé, dispositif et système de gestion de défaillances de ressources - Google Patents

Procédé, dispositif et système de gestion de défaillances de ressources Download PDF

Info

Publication number
WO2014019119A1
WO2014019119A1 PCT/CN2012/079333 CN2012079333W WO2014019119A1 WO 2014019119 A1 WO2014019119 A1 WO 2014019119A1 CN 2012079333 W CN2012079333 W CN 2012079333W WO 2014019119 A1 WO2014019119 A1 WO 2014019119A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
virtual
physical
correspondence
virtual resource
Prior art date
Application number
PCT/CN2012/079333
Other languages
English (en)
Chinese (zh)
Inventor
郑力
许利霞
张羽
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280003070.1A priority Critical patent/CN103403689B/zh
Priority to PCT/CN2012/079333 priority patent/WO2014019119A1/fr
Publication of WO2014019119A1 publication Critical patent/WO2014019119A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications

Definitions

  • the present invention relates to the field of server monitoring, and in particular, to a resource fault management method, apparatus and system. Background technique
  • Virtualization technology introduces a virtual layer between the operating system and physical resources, virtualizes physical resources into logical resources, and builds multiple virtual resources on top of logical resources, also called virtual machines (VMs). Scheduling of logical resources to achieve multiplexing of physical resources.
  • VMs virtual machines
  • Scheduling of logical resources to achieve multiplexing of physical resources By virtualizing physical resources, the practice of virtual machine farms consisting of multiple virtual machines is now more common in servers, increasing server utilization and reducing the cost of purchasing servers.
  • multi-core servers and a large number of virtual machine clusters have emerged. The management of the entire system is facing more and more challenges. Virtual machine fault warning and management is one of them.
  • Virtualization vendors provide the overall fault monitoring management technology for virtual machines. Management software vendors also have various management solutions for virtual servers.
  • the embodiment of the invention provides a resource fault management method, device and system, which can realize mutual warning between physical resources and virtual resources, thereby avoiding failure of physical resources or failure of virtual resources. Take care of the business due to power outages, migrations, etc.
  • an embodiment of the present invention provides a resource fault management method, where the method includes: when detecting a first physical resource occurrence monitoring and early warning, acquiring, according to a resource correspondence relationship between a physical resource and a virtual resource, a first virtual resource corresponding to a physical resource;
  • a monitoring alert for the first virtual resource is issued.
  • the embodiment of the present invention further provides a resource fault management method, where the method includes: when detecting a second virtual resource occurrence monitoring alert, acquiring and referring to the resource corresponding relationship between the virtual resource and the physical resource a second physical resource corresponding to the second virtual resource;
  • a monitoring alert for the second physical resource is issued.
  • the embodiment of the present invention further provides a resource fault management device, where the device includes: a resource correspondence relationship searching module, configured to: when the first physical resource is monitored and monitored, according to the physical resource and the resource of the virtual resource Relationship, acquiring a first virtual resource corresponding to the first physical resource;
  • the embodiment of the present invention further provides a resource fault management device, where the device includes: a resource correspondence relationship searching module, configured to: when a second virtual resource is monitored and monitored, according to a virtual resource and a resource of a physical resource Relationship, acquiring a second physical resource corresponding to the second virtual resource;
  • Corresponding physical resource monitoring and warning module configured to issue a monitoring and early warning for the second physical resource.
  • the embodiment of the present invention further provides a resource fault management system, including a server system and a resource fault management apparatus according to the third aspect or the fourth aspect, where the resource fault management system is used for the server The physical resources and virtual resources in the system are monitored and monitored.
  • FIG. 1 is a schematic structural diagram of a resource fault management system according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a first embodiment of a resource fault management apparatus according to the present invention
  • FIG. 3 is a schematic structural diagram of a second embodiment of a resource fault management apparatus according to the present invention
  • FIG. 5 is a flowchart of a method for a fourth embodiment of a resource fault management method according to the present invention
  • FIG. 6 is a resource correspondence relationship before and after fault processing in a fourth embodiment of the present invention
  • FIG. 7 is a flowchart of a method for a fifth embodiment of a resource fault management method according to the present invention
  • FIG. 8 is a schematic diagram of a resource correspondence relationship before and after fault processing in a fifth embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a resource fault management system according to an embodiment of the present invention.
  • the resource failure management system provided by the present invention includes a server system 10 and a resource failure management device 20. among them:
  • the server system 10 may include a server cluster composed of one or more servers, wherein physical resources on all servers may be virtualized into logical resources, virtual resources including virtual clusters are constructed on the logical resources, and based on the above construction results Establish a resource correspondence relationship between the physical resource and the virtual resource, and/or a resource corresponding relationship between the virtual resource and the physical resource.
  • the physical resource in the embodiment of the present invention refers to the actual physical resource in the physical resource pool constituting the server system 10.
  • the server system 10 can implement the computing node, the storage node, and the 10 (input/output, input/output) node.
  • the pooled management of physical resources in the pool forms a physical resource pool.
  • the pool can be subdivided into, for example, a CPU (Central Processing Unit) pool, a memory pool, a 10 resource pool, and a host bus adapter (HB) (Host Bus Adapter).
  • HB Hos Bus Adapter
  • the physical resource may be For example, a CPU, a memory card, a hard disk, etc.; and a logical resource refers to the resources used in the formation of a logical partition, A logical partition consists of 4 CPUs 32 cores, 64Gb of memory, 1T of storage, a vNic (virtual network interface card) and a vHBA (virtual host bus Adapter). One of the cores or vNic is a logical resource.
  • the logical resource partitioning software can be used to construct a logical resource pool of the above physical device to manage and use the logical resources therein.
  • Hardware resources which are obtained by converting logical resources through virtualization software, that is, each virtual hardware (such as vCPU, virtual central processing unit, virtual processor) constituting a virtual machine, and the virtual machine itself belong to the present invention.
  • the resource correspondence relationship between the physical resource and the virtual resource may be established in a plurality of manners to establish a resource correspondence relationship between the physical CPU resource and the virtual CPU resource, and the BMC in the server (Baseboard)
  • the management controller (substrate management controller) obtains the slot information of the physical CPU resource, and the BIOS (Basic Input Output System) acquires attribute information such as the core of the CPU resource, and establishes a connection between the slot information and the attribute information.
  • the server management system receives the information reported by the BMC and the BIOS, and forms a logical resource pool. After the virtualized OS (Operating System) is started, the logical resource is selected from the logical resource pool in the server management system to form a virtual resource pool.
  • the resource resource may be used to establish a resource correspondence relationship between the physical resource and the virtual resource, and/or the resource corresponding relationship between the virtual resource and the physical resource may be established.
  • the above-mentioned resource correspondence relationship may be recorded in multiple ways, such as a physical resource and a virtual resource relationship list in a data mode, or a physical resource object in an object mode and a virtual resource object.
  • the following table uses CPU resources as an example.
  • Logical Resource Pool Core (n+4) "vVMl for Core VM4 (n+7)
  • the resource correspondence between the virtual resource and the physical resource and the resource corresponding relationship between the physical resource and the virtual resource may be reverse symmetric.
  • the CPU1 in the host 1 in the physical resource corresponds to the logical resource.
  • Corel ⁇ Core8 consists of a logical resource pool, which in turn is associated with vCPU1 in VM0.
  • CPU1 in host 1 has a failure warning, it will have a direct impact on vCPU1 in VM0, and vice versa, when vCPU in VM0
  • CPU1 in host 1 may also be affected, for example, causing insufficient resources, excessive temperature, and the like.
  • the CPU1 in the host 1 of the physical resource corresponds to the vCPU1 in the VM0 in the virtual resource, and at the same time, the virtual resource and the resource resource corresponding to the physical resource are virtualized.
  • the vCPU1 in VM0 in the resource also corresponds to CPU1 in the host 1 in the physical resource. However, in some cases, it may be asymmetric. For example, when the vCPU1 in the virtual machine VM0 is faulty, the CPU1 in the host 1 may be affected, and the physical resources in the host 1 are related to each other. Sexuality may also indirectly affect CPU2, CPU3, etc.
  • the corresponding physical resource in the resource corresponding relationship of the physical resource may include CPU1, CPU2, and CPU3 of the host 1, and even all the physical resources in the host 1, and the resource correspondence between the virtual resource and the physical resource, and the physical resource and the virtual resource.
  • the resource correspondence is asymmetric.
  • the resource fault management device 20 can perform monitoring and early warning on the physical resources and the virtual resources of the server system 10; when the monitoring and early warning of the physical resources occurs, the resource fault management device 20 can search for and monitor the occurrence according to the resource correspondence relationship between the physical resource and the virtual resource.
  • the virtual resource corresponding to the physical resource sends a monitoring alert of the corresponding virtual resource; when the monitoring and early warning of the virtual resource occurs, the resource fault management device 20 may search for the virtuality of the monitoring and early warning according to the resource correspondence relationship between the virtual resource and the physical resource.
  • the physical resource corresponding to the resource sends a monitoring alert of the corresponding physical resource; and the resource fault management device 20 can further monitor the virtual resource and/or occurrence of the monitoring and early warning based on the resource correspondence between the physical resource and the virtual resource.
  • the physical resource of the early warning is fault-processed, and the update of the resource correspondence relationship of the server system is triggered in real time according to the result of the fault processing.
  • FIG. 2 is a schematic structural diagram of a first embodiment of a resource fault management apparatus according to the present invention. As shown The resource fault management apparatus in this embodiment may include:
  • the resource correspondence relationship searching module 220 is configured to acquire, according to the resource correspondence relationship between the physical resource and the virtual resource, the first virtual resource corresponding to the first physical resource when the first physical resource is monitored and monitored.
  • the resource correspondence between the physical resource and the virtual resource may be generated by the server system after the server system creates the virtual machine group, and is saved in the server system and updated in real time.
  • the resource correspondence search module 220 obtains the information from the server system when needed, or may After the virtual machine group is created in the server system, the resource fault management device generates a resource correspondence relationship between the physical resource and the virtual resource according to the virtual machine group created by the server system, and tracks the change relationship between the physical resource and the virtual resource in the server system in real time.
  • the real-time update is performed, and the resource correspondence search module 220 acquires the resource-relationship relationship between the locally stored physical resource and the virtual resource when needed.
  • the resource correspondence search module 220 searches for a resource corresponding relationship between the first physical resource and the first virtual resource from the resource corresponding relationship between the physical resource and the virtual resource, and acquires the first virtual resource corresponding to the first physical resource.
  • the corresponding virtual resource monitoring and warning module 230 is configured to issue a monitoring alert for the first virtual resource.
  • the corresponding virtual resource monitoring and warning module 230 can obtain the first virtual resource corresponding to the first physical resource that generates the monitoring early warning from the resource corresponding relationship searching module 220, so as to accurately warn the corresponding virtual resource failure.
  • the virtual resource is accurately predicted when the physical resource monitoring and early warning occurs.
  • the resource fault management apparatus in this embodiment may further include: a physical resource monitoring and warning module 210, configured to perform real-time monitoring on the physical resources.
  • the physical resource monitoring and warning module 210 can perform real-time monitoring and early warning on each physical resource in the server: for example, the temperature, voltage, and register fault value of each physical resource, and generate corresponding monitoring information according to the foregoing state, when monitoring When the physical resource is in an abnormal state, it can issue a monitoring alert for the physical resource.
  • the physical resource monitoring and warning module 210 may notify the resource correspondence relationship searching module 220 that the first physical resource occurrence monitoring and early warning information is obtained, so that the resource correspondence relationship searching module 220 obtains the first The first virtual resource corresponding to the physical resource.
  • the resource fault management apparatus in this embodiment may further include a fault processing module 240, configured to: according to a resource correspondence relationship between the physical resource and the virtual resource, the first virtual resource that generates the monitoring early warning and/or the first occurrence of the monitoring early warning Physical resources are faulty.
  • the fault processing module 240 can establish a rule base for monitoring and early warning fault processing, and when the monitoring and early warning occurs, the fault processing is performed.
  • the module 240 matches the monitoring early warning in the rule base for monitoring the early warning fault processing, determines the fault handling strategy, and invokes other related modules or systems to perform the countermeasure, and does not act on the physical resources unrelated to the virtual resources in which the monitoring and early warning occurs.
  • the fault processing module 240 may match the monitoring and early warning of the first physical resource and/or the monitoring and early warning of the first virtual resource in a preset rule base of the monitoring and early warning fault processing, and determine a fault processing strategy, based on the physical resource and the virtual resource.
  • the resource corresponding relationship, the fault processing of the first physical resource and/or the first virtual resource including determining a fault handling countermeasure for the first virtual resource in which the monitoring early warning occurs, such as replacing resources, increasing or decreasing resources, virtual machine backup, downtime, or migration Etc., where the replacement, the addition and subtraction of resources, and the virtual machine migration may find available physical resources according to the resource correspondence between the physical resource and the virtual resource; and formulate isolation, downlink, reset, and heat dissipation for the first physical resource where the monitoring and early warning occurs.
  • a fault handling countermeasure for the first virtual resource in which the monitoring early warning occurs such as replacing resources, increasing or decreasing resources, virtual machine backup, downtime, or migration Etc.
  • the processing and other countermeasures, and the load balancing adjustment according to the resource correspondence between the physical resource and the virtual resource for example, the virtual resource corresponding to the first physical resource is migrated due to the overload, or other virtual resources may be moved due to the lower load. Resource migration to the first physical resource .
  • effective and accurate fault processing is performed on the physical resource in which the monitoring and early warning occurs and/or the virtual resource in which the monitoring and early warning occurs.
  • the resource fault management apparatus in this embodiment may further include a resource correspondence update module, configured to update the result of the fault processing by the fault processing module 240 on the first virtual resource and/or the first physical resource.
  • the resource corresponding relationship between the physical resource and the virtual resource may be triggered in the process of performing fault processing on the virtual resource and/or the physical resource where the monitoring and early warning occurs in the fault processing module 240, and triggering the corresponding relationship of the changed resource.
  • the update enables timely and effective updating of the resource correspondence between the physical resource and the virtual resource.
  • the resource fault management apparatus in this embodiment may further include: a virtual resource management module, configured to construct a logical resource on the physical resource, and construct the virtual resource on the logical resource; and the resource correspondence module is configured to establish the physical resource Correspondence with resources of virtual resources.
  • the resource corresponding relationship between the physical resource and the virtual resource shown in the figure is established, and the physical resource monitoring and warning module 210 first issues a monitoring alert for the CPU 4 in the host 1, and the resource correspondence relationship is searched.
  • the module 220 searches for the virtual resource corresponding to the CPU 4 that generates the monitoring alarm according to the resource association relationship between the pre-established physical resource and the virtual resource, and then the corresponding virtual resource monitoring and warning module 230 issues a monitoring alarm for the vCPU in the VM2.
  • the final fault processing module 240 formulates the following fault processing countermeasures according to the resource correspondence relationship between the physical resource and the virtual resource: Isolating the CPU 4 of the host 1 that has generated the monitoring warning, and the VM4 in the VM2
  • the vCPU migrates, and discovers that there is no free CPU in the host 1 according to the resource corresponding relationship between the physical resource and the virtual resource, so the VM2 that has generated the monitoring alert is migrated to the host 2 to use its idle CPU1.
  • FIG. 3 is a schematic structural diagram of a second embodiment of a resource fault management apparatus according to the present invention.
  • the resource fault management apparatus in this embodiment as shown in the figure may include:
  • the resource correspondence search module 320 is configured to acquire a second physical resource corresponding to the second virtual resource according to the resource correspondence between the virtual resource and the physical resource when the second virtual resource is monitored and monitored.
  • the resource corresponding relationship between the virtual resource and the physical resource may be generated by the server system after the server system creates the virtual machine group, and is saved in the server system and updated in real time.
  • the resource correspondence search module 320 obtains from the server system when needed, and may also obtain After the virtual machine group is created by the server system, the resource fault management device generates a resource correspondence relationship between the virtual resource and the physical resource according to the virtual machine group created by the server system, and tracks the resource correspondence relationship between the virtual resource and the physical resource in the server system in real time.
  • the change is performed in real time, and the resource correspondence search module 320 acquires the resource corresponding relationship between the virtual resource and the physical resource stored locally.
  • the resource correspondence search module 320 searches for a resource correspondence relationship between the second virtual resource and the second physical resource from the resource correspondence relationship between the virtual resource and the physical resource, and acquires the second physical resource corresponding to the second virtual resource.
  • the corresponding physical resource monitoring and warning module 330 is configured to issue a monitoring and warning for the second physical resource.
  • the corresponding physical resource monitoring and warning module 330 can obtain the second physical resource corresponding to the second virtual resource that is searched by the resource corresponding relationship searching module 320, so as to accurately warn the corresponding physical resource fault.
  • the physical resource is accurately predicted when the virtual resource monitoring and early warning occurs.
  • the resource fault management apparatus in this embodiment may further include: a virtual resource monitoring and warning module 310, configured to perform real-time monitoring and early warning on the virtual resources.
  • the virtual resource monitoring and alerting module 310 can monitor each virtual resource, for example, including monitoring performance parameters of each virtual machine, resource state information of the virtual machine, load status of the virtual machine group, etc., when monitoring the virtual resource status abnormality , issued a monitoring alert for virtual resources.
  • the virtual resource monitoring and warning module 310 may notify the resource correspondence relationship searching module 320 of the second virtual resource occurrence monitoring and early warning information, so that the resource correspondence relationship searching module 320 acquires the second The second physical resource corresponding to the virtual resource.
  • the resource fault management apparatus in this embodiment may further include a fault processing module 340, configured to: according to the resource correspondence relationship between the virtual resource and the physical resource, The second virtual resource and/or the second physical resource that generates the monitoring alert performs fault processing.
  • the fault processing module 340 can establish a rule base for monitoring and early warning fault processing. When the monitoring early warning occurs, the fault processing module 340 matches the monitoring early warning in the rule base of the monitoring and early warning fault processing, determines the fault handling strategy, and invokes Other related modules or systems perform this countermeasure, and do not act on physical resources that are not related to the virtual resource in which the monitoring alert occurs.
  • the fault processing module 340 may match the monitoring alert of the second physical resource and/or the monitoring and alerting of the second virtual resource in a rule base of the preset monitoring and early warning fault processing, and determine a fault handling policy,
  • the faulty processing of the second physical resource and/or the second virtual resource including the fault processing countermeasures for the second virtual resource that generates the monitoring early warning, such as replacing resources and increasing
  • the resource, the virtual machine backup, the shutdown or the migration, and the like, wherein the replacement, the increase or decrease of the resource, and the virtual machine migration may search for available physical resources according to the resource corresponding relationship between the virtual resource and the physical resource;
  • the resource is configured to perform the load balancing adjustment according to the resource correspondence relationship between the virtual resource and the physical resource, for example, the load corresponding to the second physical resource is too high due to the load being too high.
  • Virtual resources migrated out or other loads due to lower load Quasi resource migration to the second physical resource According to the resource correspondence relationship between the virtual resource and the physical resource, effective and accurate fault processing is performed on the physical resource in which the monitoring and early warning occurs and/or the virtual resource in which the monitoring and early warning occurs.
  • the resource fault management apparatus in this embodiment may further include a resource correspondence update module, configured to update the result of the fault processing by the fault processing module 340 on the second virtual resource and/or the second physical resource.
  • the resource correspondence relationship between the virtual resource and the physical resource may be triggered in the process of performing fault processing on the virtual resource and/or the physical resource where the monitoring and early warning occurs in the fault processing module 340, and triggering the corresponding relationship of the changed resource.
  • the update enables timely and effective updating of the resource correspondence between the virtual resource and the physical resource.
  • the resource fault management apparatus in this embodiment may further include: a virtual resource management module, configured to construct a logical resource on the physical resource, and construct the virtual resource on the logical resource; and the resource correspondence module is configured to establish the virtual resource Correspondence with resources of physical resources.
  • the resource corresponding relationship between the virtual resource and the physical resource shown in the figure is established, and the virtual resource monitoring and warning module 310 first issues a monitoring alarm for the VM0 computing resource shortage, and the resource correspondence relationship searching module 320 ⁇
  • the physical resource corresponding to the VM0 is the CPU1 and the CPU2 in the host 1, and the physical resource monitoring is performed.
  • the warning module 330 can issue a monitoring alarm for the CPU 1 and the CPU 2 in the host 1.
  • the fault processing module 340 finds that there is no idle CPU in the host 1 according to the resource correspondence between the virtual resource and the physical resource.
  • the CPU3 in host 1 is assigned to VM0, and VM1 is migrated to host 2. This resolves VM0 and CPU1 and CPU2 associations in host 1.
  • VM0 may cause host 1 when performing high-precision mass operations. All the physical resources, such as CPU1 and other CPUs or memory other than CPU2, are affected, such as resource overload, etc.
  • the physical resources corresponding to VM0 in the resource correspondence between virtual resources and physical resources include other ones in host 1. Physical resources, at this time, it is necessary to issue monitoring and warning to other physical resources in the host 1.
  • FIG. 4 is a schematic structural diagram of a third embodiment of a resource fault management apparatus according to the present invention.
  • the resource fault management apparatus in this embodiment as shown in the figure may include:
  • the resource correspondence relationship searching module 430 is configured to: when the first physical resource is monitored and monitored, obtain the first virtual resource corresponding to the first physical resource according to the resource corresponding relationship between the physical resource and the virtual resource, and when the second virtual resource is used When the monitoring alarm occurs, the resource correspondence search module 430 can obtain the second physical resource corresponding to the second virtual resource according to the resource correspondence between the virtual resource and the physical resource.
  • the first physical resource and the first virtual resource, the second physical resource, and the second virtual resource are not specifically mentioned.
  • the first physical resource in which the monitoring early warning occurs in this embodiment may be a physical resource pool. Any one of the actual physical devices may also include a plurality of actual physical devices.
  • the resource correspondence search module 430 searches for the first virtual resource corresponding to the first physical resource, and each physical resource corresponding to the resource corresponding relationship between the physical resource and the virtual resource corresponds to
  • the virtual resources may be one or multiple, and multiple physical resources may also correspond to the same virtual resource.
  • the corresponding relationship between the virtual resource and the physical resource is similar, and will not be described again.
  • the resource correspondence relationship searching module 430 may search for a resource corresponding relationship between the first physical resource and the first virtual resource from the resource corresponding relationship between the physical resource and the virtual resource, and acquire the first virtual resource corresponding to the first physical resource; And searching for a resource corresponding relationship between the second virtual resource and the second physical resource in the resource correspondence relationship of the physical resource, and acquiring the second physical resource corresponding to the second virtual resource.
  • the corresponding virtual resource monitoring and warning module 440 is configured to issue a monitoring alert for the first virtual resource.
  • the corresponding virtual resource monitoring and warning module 440 can obtain the first virtual resource corresponding to the first physical resource that generates the monitoring early warning from the resource corresponding relationship searching module 430, so as to accurately warn the corresponding virtual resource failure.
  • the corresponding physical resource monitoring and warning module 450 is configured to issue a monitoring and warning for the second physical resource.
  • the corresponding physical resource monitoring and warning module 450 can obtain the second physical resource corresponding to the second virtual resource that generates the monitoring early warning from the resource corresponding relationship searching module 430, so as to accurately warn the corresponding physical resource failure.
  • the resource fault management apparatus may further include: a physical resource monitoring and warning module 410, configured to perform real-time monitoring on the physical resources.
  • the physical resource monitoring and warning module 410 performs real-time monitoring and early warning on each physical resource including the physical machine of the server: for example, the temperature, voltage, and register failure value of each physical resource in the server, and correspondingly generate according to the state Monitoring information, when monitoring the physical resource occurrence status abnormality, it can issue monitoring and warning for physical resources.
  • the physical resource monitoring and warning module 410 may notify the resource correspondence relationship searching module 430 of the first physical resource occurrence monitoring and early warning information, so that the resource correspondence relationship searching module 430 acquires the first physical resource.
  • the physical resource monitoring early warning module 410 can be implemented in the same module as the corresponding physical resource monitoring early warning module 450.
  • the resource fault management apparatus may further include: a virtual resource monitoring and warning module 420, configured to perform real-time monitoring and early warning on the virtual resources.
  • the virtual resource monitoring and warning module 420 can monitor each virtual resource, for example, including monitoring performance parameters of each virtual machine, resource status information of the virtual machine, load status of the virtual machine group, and the like, when monitoring the virtual resource status abnormality. , issued a monitoring alert for virtual resources.
  • the physical resource monitoring and warning module 410 may notify the resource correspondence relationship searching module 430 of the second virtual resource occurrence monitoring and early warning information, so that the resource correspondence relationship searching module 430 acquires the second virtual resource.
  • the virtual resource monitoring alert module 420 can be implemented in the same module as the corresponding virtual resource monitoring alert module 440.
  • the resource fault management apparatus may further include: a fault processing module 460, configured to: according to the resource correspondence relationship between the physical resource and the virtual resource, the virtual resource and/or the monitoring alarm for the occurrence of the monitoring and early warning Physical resources for troubleshooting.
  • the fault processing module 460 can establish a rule base for monitoring and early warning fault processing. When the monitoring and alerting is generated, the fault processing module 460 matches the monitoring early warning in the rule base of the monitoring and early warning fault processing, and determines the fault handling strategy. It also calls other related modules or systems to perform this countermeasure, and does not act on physical resources that are not related to the virtual resources in which the monitoring early warning occurs.
  • the fault processing module 460 matches the monitoring and early warning of the physical resource and/or the monitoring and early warning of the virtual resource in the rule base of the preset monitoring and early warning fault processing, determines the fault processing strategy, and based on the resource correspondence between the physical resource and the virtual resource. And/or the virtual resource and the physical resource corresponding to the physical resource and/or the virtual resource, including specifying replacement resources, increasing or decreasing resources, virtual machine backup, downtime, or migration for the virtual resource in which the monitoring and early warning occurs. And other countermeasures, wherein the replacement, the addition and subtraction of resources, and the virtual machine migration may search for available physical resources according to the resource correspondence between the physical resource and the virtual resource; and formulate, for example, isolation and offline for the physical resource in which the monitoring and early warning occurs.
  • the processing countermeasures such as resetting, heat-dissipating, and repairing, further include performing load balancing adjustment according to a resource correspondence relationship between the physical resource and the virtual resource, for example, another virtual virtual resource corresponding to the monitoring early warning due to a high load Resources migrated or due to low load Other virtual resources migrated to the physical resources.
  • load balancing adjustment according to a resource correspondence relationship between the physical resource and the virtual resource, for example, another virtual virtual resource corresponding to the monitoring early warning due to a high load Resources migrated or due to low load Other virtual resources migrated to the physical resources.
  • effective and accurate fault processing is performed on the physical resource in which the monitoring and early warning occurs and/or the virtual resource in which the monitoring and early warning occurs.
  • the resource fault management apparatus may further include: a resource correspondence relationship update module 470, configured to update a resource correspondence relationship between the physical resource and the virtual resource in real time, including a resource correspondence relationship between the physical resource and the virtual resource, and/or Or the corresponding relationship between the virtual resource and the physical resource.
  • the resource correspondence update module 470 can update the resource correspondence between the virtual resource and the physical resource according to the fault processing module 460, and the fault processing module 460 can update the resource correspondence between the virtual resource and the physical resource.
  • the module 460 triggers the update of the resource corresponding relationship that is changed during the process of the fault processing of the virtual resource and/or the physical resource that is used for monitoring and early warning, thereby realizing the timely correspondence between the virtual resource and the physical resource. Effective update.
  • the resource correspondence update module 470 can track the resource correspondence relationship of the server system in real time due to other reasons, and update the resource correspondence relationship between the physical resource and the virtual resource and/or the resource correspondence relationship between the virtual resource and the physical resource in real time. . If the resource correspondence relationship is recorded in the resource fault management device, the resource correspondence relationship update module 470 may update the resource correspondence relationship update module 470. If the resource correspondence relationship is recorded in the server system, the resource correspondence relationship update module 470 may trigger the server system. Update it.
  • resource fault management apparatus may further include:
  • a virtual resource management module 480 configured to build a logical resource on a physical resource, and construct a virtual resource on the logical resource;
  • the resource correspondence relationship module 490 is configured to establish a resource correspondence relationship between the physical resource and the virtual resource and/or a resource correspondence relationship between the virtual resource and the physical resource.
  • the virtual resource management module 480 and the resource correspondence relationship module 490 may be implemented in a resource fault management device.
  • the server system may be used only to provide physical resources.
  • the virtual resource management module may be in the server system. Implemented in .
  • FIG. 5 is a flowchart of a method of a fourth embodiment of a resource fault management method according to the present invention.
  • the method flow of this embodiment as shown in the figure includes:
  • Step S501 Perform real-time monitoring on physical resources.
  • the resource fault management device may perform real-time monitoring and early warning on each physical resource in the server system: for example, a temperature, a voltage, a register fault value, and the like of each physical resource, and generate corresponding monitoring information according to the foregoing state, when monitoring the physical When the resource is in an abnormal state, it can issue a monitoring alert for the physical resource.
  • the resource fault management device can monitor the virtual resources of the server system while monitoring the physical resources of the server system, for example, including monitoring performance parameters of each virtual machine, and resource status information of the virtual machine. When the virtual resource status is abnormal, the monitoring alarm is issued for the virtual resource.
  • Step S502 When detecting the first physical resource occurrence monitoring alarm, obtain the first virtual resource corresponding to the first physical resource according to the resource correspondence between the physical resource and the virtual resource.
  • the method may be: searching for a resource corresponding relationship between the first physical resource and the first virtual resource, and acquiring a first virtual resource corresponding to the first physical resource.
  • the resource corresponding relationship between the physical resource and the virtual resource in this embodiment is as shown in FIG. 6.
  • the resource corresponding relationship between the physical resource and the virtual resource is established: the physical resources CPU1 and CPU2 in the host 1 correspond to the VMO in the virtual resource, the CPU3 corresponds to the VM1, the CPU4 corresponds to the VM2, and the CPU1, CPU2, and CPU4 in the host 2 are idle.
  • CPU 3 corresponds to VM3 and VM4.
  • the monitoring of the CPU 4 in the host 1 is detected, and the virtual resource corresponding to the CPU 4 in the host 1 is VM2.
  • a monitoring early warning for the first virtual resource is issued, that is, a monitoring pre-step S504 for the VM2 is issued, and the first physical resource and/or the first virtual resource is fault-processed based on the resource correspondence between the physical resource and the virtual resource.
  • the resource fault management device can establish a monitoring alert The rule base of fault handling, when the monitoring early warning occurs, the resource fault management device matches the monitoring early warning in the rule base of the monitoring and early warning fault processing, determines the fault handling strategy, and calls other related modules or systems to perform the countermeasure, and the occurrence and monitoring The physical resources that are not related to the virtual resources of the alert do not act.
  • the resource fault management device may match the monitoring and early warning of the first physical resource and/or the monitoring and early warning of the first virtual resource in a rule base of the preset monitoring and early warning fault processing, determine a fault processing strategy, and based on the physical resource and The resource corresponding relationship of the virtual resource, the fault processing of the first physical resource and/or the first virtual resource, including determining a fault handling countermeasure for the first virtual resource in which the monitoring early warning occurs, such as replacing resources, increasing or decreasing resources, virtual machine backup, Downtime or migration, etc., where the replacement, the addition and subtraction of resources, and the virtual machine migration may find available physical resources according to the resource correspondence between the physical resource and the virtual resource; and formulate, for example, isolation, offline, and reset for the first physical resource where the monitoring and early warning occurs.
  • the countermeasures for the heat dissipation adjustment or repair, and the load balancing adjustment according to the resource correspondence between the physical resource and the virtual resource for example, the virtual resource corresponding to the first physical resource is migrated due to the overload, or the load may be lower due to the lower load. Other virtual resources migrated to the first Physical resources.
  • the resource fault management device analyzes the resource correspondence relationship between the physical resource and the virtual resource shown in FIG. 6, and finds that there is no other available CPU resource in the host 1, and the adjacent host 2 has the VM2 requirement.
  • the CPU1 is spared, and the load of the host 2 is at the level that can accept the VM2, the CPU 4 of the host 1 that will monitor the alarm is isolated, the VM2 is migrated to the host 2 to use its CPU1, and then the other relevant in the server system is called.
  • the module or system performs this decision; no additional processing is required for VM0 and VM1 built on host 1 without being affected by CPU4 warnings.
  • the fault management device can track the resource correspondence relationship of the server system in real time due to fault processing or other reasons, and update the resource correspondence relationship between the physical resource and the virtual resource. If the resource correspondence is recorded in the resource fault management device, the self-updating is performed. If the resource correspondence is recorded in the server system, the fault management device may trigger the server system to update it.
  • the virtual resource corresponding to the physical resource may be used for early warning when the physical resource is faulty, and the virtual resource is early and accurately detected, but only for possible impact.
  • the virtual resources are used for early warning and processing, which improves the accuracy of virtual machine fault warning and processing, and reduces the complexity of processing, and minimizes the impact on the entire business system.
  • FIG. 7 is a flowchart of a method of a fifth embodiment of a resource fault management method according to the present invention.
  • the method flow of this embodiment as shown in the figure includes: Step S701: Perform real-time monitoring on the virtual resource.
  • the resource fault management device can monitor each virtual resource in the server system, for example, including monitoring performance parameters of each virtual machine, resource state information of the virtual machine, load status of the virtual machine group, and the like, when monitoring the virtual resource status abnormally , issued a monitoring alert for virtual resources.
  • the resource fault management device can perform real-time monitoring and monitoring on the physical resources of the server system including the server physical whole machine while real-time monitoring of the virtual resources of the server system: for example, physical physical machine The heat dissipation status, the fan, the power status, the temperature, voltage, and register fault value of each physical resource, and generate corresponding monitoring information according to the above status.
  • the monitoring and early warning can be issued for the physical resource. .
  • Step S702 When detecting the second virtual resource occurrence monitoring alarm, obtain the second physical resource corresponding to the second virtual resource according to the resource correspondence between the virtual resource and the physical resource. Specifically, the resource corresponding relationship between the second virtual resource and the second physical resource is obtained from the resource corresponding relationship between the virtual resource and the physical resource, and the second physical resource corresponding to the second virtual resource is obtained.
  • the resource corresponding relationship between the virtual resource and the physical resource in this embodiment is as shown in FIG. 8.
  • VM0 corresponds to CPU1 and CPU2 in host 1
  • VM1 corresponds to CPU3 and CPU4 in host 1
  • VM2 and VM3 correspond to CPU4 in host 2.
  • VM0 needs to perform high-precision massive calculation, and the calculation resource will have a bottleneck, so the monitoring and early warning of VM0 occurs.
  • VM0 corresponds to CPU1 and CPU2 in host 1, and since CPU1 and CPU2 belong to host 1, the actual host 1 is likely to be affected when VM0 performs high-precision mass calculation.
  • the physical resource corresponding to the VM0 may be the entire host 1 according to the resource correspondence between the virtual resource and the physical resource in this embodiment.
  • Step S703 Send a monitoring alert for the second physical resource.
  • the high-precision massive operation of VM0 may cause the host 1 to be overloaded, and the physical resource corresponding to VM0 is the entire host 1, thus issuing a monitoring alert for the physical resource host 1.
  • Step S704 Perform fault processing on the second physical resource and/or the second virtual resource based on the resource correspondence between the virtual resource and the physical resource.
  • the resource fault management device can formulate a rule base for monitoring and early warning fault handling. When the monitoring and early warning occurs, the resource fault management device matches the monitoring early warning in the rule base of the monitoring and early warning fault processing, determines the fault handling strategy, and calls other related modules. Or This countermeasure is implemented in a unified manner, and no physical resources related to the virtual resources in which the monitoring and early warning occurs are operated.
  • the resource fault management device may match the monitoring and early warning of the second physical resource and/or the monitoring and early warning of the second virtual resource in a preset rule base of the monitoring and early warning fault processing, and determine a fault handling strategy, based on the virtual resource and The resource corresponding relationship of the physical resource, the fault processing of the second physical resource and/or the second virtual resource, including determining a fault handling countermeasure for the second virtual resource in which the monitoring early warning occurs, such as replacing resources, increasing or decreasing resources, virtual machine backup, Downtime or migration, etc., where the replacement, addition, and reduction of resources, and virtual machine migration, etc., may find available physical resources according to the resource correspondence relationship between the virtual resource and the physical resource; and formulate, for example, isolation, offline, and reset for the second physical resource in which the monitoring and early warning occurs.
  • the countermeasures for the heat dissipation adjustment or the repair include the load balancing adjustment according to the resource correspondence between the virtual resource and the physical resource. For example, if the load is too high, the other virtual resources corresponding to the second physical resource are migrated out or the load may be lower. Other virtual resources migrate to the second object Resources. In this embodiment, there are two problems in the embodiment that need to be processed simultaneously. One is that the VM0 computing resource is insufficient, and the other is that the host 1 may be overloaded.
  • the resource fault management device analyzes the resource correspondence relationship between the virtual resource and the logistics resource, and finds that the CPU 3 adjacent to the CPU 1 and the CPU 2 has been used by the VM 1 , and assigns the CPU 3 to the VM 0 according to the virtual machine level and the task level, and the solution is solved.
  • the host 1 load may be too large.
  • the decision is to temporarily take the CPU 4 offline and reserve it for VM0 as a load adjustment.
  • the decision is to migrate VM1 to host 2 with idle and connected CPU resources, and allocate CPU1 and CPU2 of host 2 to Give VM1.
  • the fault management device can track the resource correspondence relationship of the server system in real time due to fault processing or other reasons, and update the resource correspondence relationship between the virtual resource and the physical resource. If the resource correspondence is recorded in the resource fault management device, the self-updating is performed. If the resource correspondence is recorded in the server system, the resource fault management device may trigger the server system to update it.
  • the present embodiment is based on the resource correspondence between the virtual resource and the physical resource, and can perform early warning on the physical resource corresponding to the virtual resource of the early warning according to the early warning of the virtual resource, thereby realizing mutual warning between the virtual resource and the physical resource, so that the timely
  • the two associated resources are processed for early warning, and further processed for the virtual resources and/or physical resources where the monitoring and early warning occurs, which reduces the complexity of the processing and minimizes the impact on the entire business system. It can update the resource correspondence between virtual resources and physical resources, and provide necessary information for optimizing the performance of virtual resources and adjusting the load of physical resources.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

L'invention concerne un procédé, un dispositif et un système de gestion de défaillances de ressources. Ledit procédé de gestion de défaillances de ressources se déroule de la manière suivante : lorsqu'un préavertissement de surveillance est détecté pour une première ressource physique, une première ressource virtuelle correspondant à cette première ressource physique selon une relation de correspondance de ressources entre une ressource physique et une ressource virtuelle est acquise; et un préavertissement de surveillance est produit pour la première ressource virtuelle; lorsqu'un préavertissement de surveillance est détecté pour une seconde ressource virtuelle, une seconde ressource physique correspondant à la seconde ressource virtuelle selon la relation de correspondance de ressources entre la ressource virtuelle et la ressource physique est acquise; et un préavertissement de surveillance est produit pour la seconde ressource physique. Grâce à la présente invention, un préavertissement mutuel entre la ressource physique et la ressource virtuelle peut être mis en œuvre selon la relation de correspondance de ressources entre la ressource physique et la ressource virtuelle.
PCT/CN2012/079333 2012-07-30 2012-07-30 Procédé, dispositif et système de gestion de défaillances de ressources WO2014019119A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280003070.1A CN103403689B (zh) 2012-07-30 2012-07-30 一种资源故障管理方法、装置及系统
PCT/CN2012/079333 WO2014019119A1 (fr) 2012-07-30 2012-07-30 Procédé, dispositif et système de gestion de défaillances de ressources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/079333 WO2014019119A1 (fr) 2012-07-30 2012-07-30 Procédé, dispositif et système de gestion de défaillances de ressources

Publications (1)

Publication Number Publication Date
WO2014019119A1 true WO2014019119A1 (fr) 2014-02-06

Family

ID=49565846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/079333 WO2014019119A1 (fr) 2012-07-30 2012-07-30 Procédé, dispositif et système de gestion de défaillances de ressources

Country Status (2)

Country Link
CN (1) CN103403689B (fr)
WO (1) WO2014019119A1 (fr)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812699A (zh) * 2014-02-17 2014-05-21 无锡华云数据技术服务有限公司 基于云计算的监控管理系统
CN105812170B (zh) * 2014-12-31 2019-01-18 华为技术有限公司 基于数据中心的故障分析方法和装置
CA3028993A1 (fr) * 2016-06-24 2017-12-28 Schneider Electric Systems Usa, Inc. Procedes, systemes et appareil concus pour faciliter de maniere dynamique la gestion d'un systeme sans limite et a forte disponibilite
CN107979479A (zh) * 2016-10-25 2018-05-01 中兴通讯股份有限公司 一种虚拟化网元故障管理方法和系统
CN106330576B (zh) * 2016-11-18 2019-10-25 北京红马传媒文化发展有限公司 容器化微服务自动伸缩及迁移调度的方法、系统和设备
CN107066334A (zh) * 2017-03-17 2017-08-18 联想(北京)有限公司 信息处理方法及处理系统
CN107273188B (zh) * 2017-07-19 2020-08-18 苏州浪潮智能科技有限公司 一种虚拟机中央处理单元cpu绑定方法及装置
CN107729219B (zh) * 2017-11-17 2021-07-16 北京联想超融合科技有限公司 基于超融合存储系统的资源监控方法、装置及终端
CN113254324B (zh) * 2021-07-14 2021-11-30 睿至科技集团有限公司 Lpar性能采集方法及系统
CN114531287A (zh) * 2022-02-17 2022-05-24 恒安嘉新(北京)科技股份公司 虚拟资源获取行为的检测方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938416A (zh) * 2010-09-01 2011-01-05 华南理工大学 一种基于动态重配置虚拟资源的云计算资源调度方法
CN102035662A (zh) * 2009-09-27 2011-04-27 中国移动通信集团公司 一种虚拟服务器管理系统、方法及装置
CN102053873A (zh) * 2011-01-13 2011-05-11 浙江大学 一种缓存感知的多核处理器虚拟机故障隔离保证方法
CN102096461A (zh) * 2011-01-13 2011-06-15 浙江大学 基于虚拟机迁移和负载感知整合的云数据中心节能方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009110111A1 (fr) * 2008-03-04 2009-09-11 三菱電機株式会社 Dispositif de serveur, procédé de détection de défaillance de dispositif de serveur, et programme de détection de défaillance de dispositif de serveur
CN102184145B (zh) * 2011-05-13 2013-04-17 杭州华三通信技术有限公司 重启数据不丢失方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035662A (zh) * 2009-09-27 2011-04-27 中国移动通信集团公司 一种虚拟服务器管理系统、方法及装置
CN101938416A (zh) * 2010-09-01 2011-01-05 华南理工大学 一种基于动态重配置虚拟资源的云计算资源调度方法
CN102053873A (zh) * 2011-01-13 2011-05-11 浙江大学 一种缓存感知的多核处理器虚拟机故障隔离保证方法
CN102096461A (zh) * 2011-01-13 2011-06-15 浙江大学 基于虚拟机迁移和负载感知整合的云数据中心节能方法

Also Published As

Publication number Publication date
CN103403689A (zh) 2013-11-20
CN103403689B (zh) 2016-09-28

Similar Documents

Publication Publication Date Title
WO2014019119A1 (fr) Procédé, dispositif et système de gestion de défaillances de ressources
US10346215B2 (en) Replication of a virtualized computing environment to a computing system with offline hosts
US11036531B2 (en) Techniques to migrate a virtual machine using disaggregated computing resources
TWI603266B (zh) 虛擬機器之資源調整方法及系統
US20170242758A1 (en) Hardware recovery systems
US20090125901A1 (en) Providing virtualization of a server management controller
US10725883B2 (en) Externally triggered maintenance of state information of virtual machines for high availablity operations
US20100115509A1 (en) Power optimization via virtualization opportunity
EP4083777A1 (fr) N ud de commande de ressources et procédé
KR20090081404A (ko) 분할가능 컴퓨팅 장치 내의 여분의 파티션 유닛들을 글로벌 관리 엔티티에 의해 관리하는 방법 및 컴퓨터 판독가능 매체
Patel et al. Live virtual machine migration techniques in cloud computing: A survey
JP2011128967A (ja) 仮想計算機の移動方法、仮想計算機システム及びプログラム
US9948509B1 (en) Method and apparatus for optimizing resource utilization within a cluster and facilitating high availability for an application
US11880287B2 (en) Management of microservices failover
US10809779B2 (en) Managing power in a high performance computing system for resiliency and cooling
WO2017075989A1 (fr) Procédé, dispositif et système permettant une migration de machines virtuelles
TW201409357A (zh) 虛擬機資源負載平衡系統及方法
US9141441B2 (en) Managing computing resources through aggregated core management
US10725804B2 (en) Self triggered maintenance of state information of virtual machines for high availability operations
US10326826B1 (en) Migrating an on premises workload to a web services platform
WO2016182851A2 (fr) Gestion d'énergie électrique dans un système informatique à hautes performances pour résilience et refroidissement
US11900159B2 (en) Method for repointing resources between hosts
US20160299792A1 (en) System wide manageability
US10365934B1 (en) Determining and reporting impaired conditions in a multi-tenant web services environment
US20210182116A1 (en) Method for running a quorum-based system by dynamically managing the quorum

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12882336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12882336

Country of ref document: EP

Kind code of ref document: A1