CN116501460A - Cloud host dynamic migration monitoring and early warning method - Google Patents

Cloud host dynamic migration monitoring and early warning method Download PDF

Info

Publication number
CN116501460A
CN116501460A CN202310320979.XA CN202310320979A CN116501460A CN 116501460 A CN116501460 A CN 116501460A CN 202310320979 A CN202310320979 A CN 202310320979A CN 116501460 A CN116501460 A CN 116501460A
Authority
CN
China
Prior art keywords
host
cloud host
cloud
monitoring
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310320979.XA
Other languages
Chinese (zh)
Inventor
林德生
郑生华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Youke Communication Technology Co ltd
Original Assignee
China Youke Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Youke Communication Technology Co ltd filed Critical China Youke Communication Technology Co ltd
Priority to CN202310320979.XA priority Critical patent/CN116501460A/en
Publication of CN116501460A publication Critical patent/CN116501460A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5019Workload prediction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a cloud host dynamic migration monitoring and early warning method, which comprises the following steps: s1: modeling a relationship between a cloud host and a host machine; s2: collecting cloud resource pool relation data, deploying integrated acquisition service, and establishing a cross monitoring matrix; s3: monitoring and judging the running state of the host machine; when the host machine is abnormal, the step S4 is entered; s4: performing online migration of the cloud host, and starting a cloud host migration state monitoring function; s5: monitoring current host information of the cloud host, monitoring the running state of the cloud host, monitoring the key service state of the cloud host, and performing cloud host state fault judgment according to the monitoring result; s6: performing fault treatment; s7: the integrated acquisition service monitors the new host instead, and repeats the steps S3-S6. The method can monitor the dynamic migration process of the cloud host, so that faults generated during dynamic migration of the cloud host can be timely and accurately found and early-warned.

Description

Cloud host dynamic migration monitoring and early warning method
Technical Field
The invention relates to the technical field of cloud computing, in particular to a cloud host dynamic migration monitoring and early warning method.
Background
With the rapid development of cloud computing, many telecom operators have been cloud-loaded with many services. The cloud computing environment gathers a large amount of physical resources and virtual resources, and provides cloud host dynamic migration technology for ensuring that the loaded service can still stably run when the physical resources and the virtual resources are in failure, VMWARE, openStack and the like, and allows automatic migration before the host fails or performs poorly. The dynamic Migration (Life Migration), also called Online Migration (Online Migration), is a process of moving one cloud host system from one physical host (host) to another physical host while ensuring normal operation of services on the cloud host, so that offline maintenance or upgrade can be performed on the physical server without affecting normal use of tenants. However, in some special cases, migration is unsuccessful, and failures such as that the cloud host does not migrate or after migration, the cloud host does not automatically run or the business service process stops can occur. On the other hand, the cloud host for collecting the monitoring nodes also can cause failure of an external network link or misjudgment of faults caused by self resource exhaustion due to self dynamic property, so that false alarm interference is caused, and the reliability of an alarm is affected. Therefore, in order to ensure service operation and improve customer satisfaction, when a host machine fault occurs, whether the virtual machine is successfully migrated needs to be accurately monitored, if the virtual machine is not successfully migrated, relevant operation and maintenance personnel are timely notified to perform manual migration and processing, so that the migration fault is quickly responded and processed, and service operation is ensured.
Disclosure of Invention
The invention aims to provide a cloud host dynamic migration monitoring and early warning method, which can monitor the cloud host dynamic migration process, so as to timely and accurately discover faults occurring during the cloud host dynamic migration and perform early warning.
In order to achieve the above purpose, the invention adopts the following technical scheme: a cloud host dynamic migration monitoring and early warning method comprises the following steps:
step S1: modeling a relationship between a cloud host and a host machine;
step S2: collecting cloud resource pool relation data, deploying integrated acquisition service, and establishing a cross monitoring matrix;
step S3: monitoring and judging the running state of the host machine; when the host machine is abnormal, the step S4 is entered;
step S4: performing online migration of the cloud host, and starting a cloud host migration state monitoring function;
step S5: monitoring current host information of the cloud host, monitoring the running state of the cloud host, monitoring the key service state of the cloud host, and performing cloud host state fault judgment according to the monitoring result;
step S6: performing fault treatment;
step S7: the integrated acquisition service monitors the new host instead, and repeats the steps S3-S6.
Further, in the step S1, in the modeling process, a cloud host, a host entity table and an attribute table are defined, and system metadata is registered.
Further, the step S2 specifically includes the following steps:
step S201: the method comprises the steps of docking a cloud resource pool API interface, and collecting computing resource relation data of a bearing target cloud host, wherein the computing resource relation data comprise the current host, the resources of the current host and relation data of all hosts under a resource pool;
step S202: and deploying integrated acquisition services on cloud hosts of different network domains or network segments, and establishing a cross monitoring matrix.
Further, the step S3 specifically includes the following steps:
step 301: monitoring the network quality of a host machine; the integrated acquisition service dials and measures the target host through the ICMP, monitors the network quality of the target host in real time, and judges whether the target host has abnormal super-threshold value or not; the network monitoring index comprises whether a network is connected, network delay and packet loss rate, the monitored data comprises an index value, a time stamp and an abnormal state value, the abnormal value is set to be 1, and the normal value is set to be 0;
step 302: monitoring the hardware state of the host machine; the integrated acquisition service acquires the hardware health state of the target host through the IPMI and monitors whether the target host has hardware faults or not; the hardware monitoring index comprises: judging whether the power supply state, the current state, the voltage state, the fan state, the processor state, the memory state, the temperature state and the event log state are abnormal, setting an abnormal value as 1 and setting a normal value as 0;
step S303: performing host state fault judgment; performing phase operation on each index data acquired by the cross monitoring matrix through the step S301 and the step S301 according to the time correlation, so as to judge the final value of the corresponding index, further judge whether the host is abnormal or not, and then send the obstacle judging result to the control center; and the control center automatically starts the migration of the cloud host according to the host state obstacle judgment result, and when the host state obstacle judgment result is abnormal, the step S4 is started.
Further, in step S4, the cloud host online migration is performed by using the cloud host automatic migration technology, and the cloud host migration status monitoring function is started, that is, step S5 is entered.
Further, the step S5 specifically includes the following steps:
step S501: if the cloud host is successfully migrated, the current host of the cloud host is a new host; judging whether the cloud host successfully completes migration or not through the current host; the integrated acquisition service queries the current host of the cloud host through the cloud resource pool API interface, if the current host or the fault host is judged, the cloud host is not migrated successfully, abnormal data with a time stamp, which is not migrated by the cloud host, is generated, the state value is set to be 1, and otherwise, the state value is set to be 0;
step S502: if the cloud host is successfully migrated, the cloud host still keeps a starting running state; when step S501 queries that the cloud host has migrated to a new host, the integrated acquisition service performs ICMP dial testing on the new target cloud host, if the dial testing is found to be not successful, the cloud host may not be started, then an abnormal data with a timestamp is generated, the state value is set to 1, otherwise, the state value is set to 0;
step S503: if the cloud host is successfully migrated, the business service is not affected; performing dial testing on a port of the key service, if the dial testing is abnormal, generating abnormal data with a timestamp, wherein the service is not started, and the state value is set to be 1, otherwise, the state value is set to be 0;
step S504: performing phase-phase operation according to the monitoring results of the migration states of the cloud hosts in the step S501, the step S502 and the step S503 and sending the operation results to a control center; and (6) the control center locates the failed concrete link according to the fault judging result of the cloud host state, and performs fault processing when the fault judging result of the cloud host state is abnormal, namely, the step (S6) is entered.
Further, the step S6 specifically includes the following steps:
step S601: performing fault treatment of non-migration of the cloud host; when judging that the cloud host is not migrated or not started through the step S504, identifying that the fault is responsible for the cloud resource pool manager, and informing the manager to perform subsequent fault processing operations, including manual migration and manual starting; simultaneously notifying tenants of the cloud host, notifying fault reasons, and discovering faults before clients;
step S602: after the cloud host migration is carried out, the key process does not start fault processing; and when judging that the cloud host is migrated but the service process is not started in the step S504, notifying the tenant to perform manual starting processing or call an automatic operation service script.
Compared with the prior art, the invention has the following beneficial effects: the invention can monitor the host state and the cloud host state in the cloud host dynamic migration process, thereby analyzing and diagnosing faults generated during the cloud host dynamic migration timely and accurately, enhancing the efficiency of service fault positioning and processing in the cloud computing environment, improving the service quality of the service and improving the customer satisfaction.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Fig. 2 is a schematic diagram of a cross-monitoring matrix according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1, this embodiment provides a cloud host dynamic migration monitoring and early warning method, which includes the following steps:
step S1: modeling the relationship between the cloud host and the host, defining a cloud host, a host entity table and an attribute table, and registering system metadata.
Step S2: and acquiring cloud resource pool relation data, deploying an integrated acquisition service, and establishing a cross monitoring matrix, as shown in fig. 2.
In this embodiment, the step S2 specifically includes the following steps:
step S201: and collecting cloud resource pool relation data. The method comprises the steps of docking a cloud resource pool API interface, and collecting computing resource relation data of a bearing target cloud host, wherein the computing resource relation data comprise the current host, resources of the current host, relation data of all hosts under a resource pool and the like;
step S202: and establishing a cross monitoring matrix. The integrated acquisition service is operated on the cloud host, and the host can also cause misjudgment faults due to failure of external network connection due to the self dynamic property, so that interference is caused, and the reliability of an alarm is affected. Therefore, the cross monitoring matrix is established by deploying the integrated acquisition service on cloud hosts of different network domains or network segments.
Step S3: monitoring and judging the running state of the host machine; when the host is abnormal, the process proceeds to step S4.
In this embodiment, the step S3 specifically includes the following steps:
step 301: and monitoring the network quality of the host machine.
The integrated acquisition service dials and measures the target host through the ICMP, monitors the network quality of the target host in real time, and judges whether the target host has abnormal super-threshold value or not; the network monitoring index comprises whether a network is connected, network delay and packet loss rate, the monitored data comprises an index value, a time stamp and an abnormal state value, the abnormal value is set to be 1, and the normal value is set to be 0.
Step 302: and monitoring the hardware state of the host machine.
The integrated acquisition service acquires the hardware health state of the target host through the IPMI and monitors whether the target host has hardware faults or not; the hardware monitoring index comprises: and judging whether the power supply, the current, the voltage, the fan, the processor, the memory, the temperature, the event log and other eight hardware states are abnormal, wherein the abnormal value is set to be 1, and the normal value is set to be 0.
Step S303: and performing host state fault judgment.
Performing phase operation on each index data acquired by the cross monitoring matrix through the step S301 and the step S301 according to the time correlation, so as to judge the final value of the corresponding index, further judge whether the host is abnormal or not, and then send the obstacle judging result to the control center; and the control center automatically starts the migration of the cloud host according to the host state obstacle judgment result, and when the host state obstacle judgment result is abnormal, the step S4 is started.
Step S4: and performing online migration of the cloud host.
And (5) carrying out online migration of the cloud host by utilizing the cloud host automatic migration technology, and starting a cloud host migration state monitoring function, namely entering step S5.
Step S5: monitoring current host information of the cloud host, monitoring the running state of the cloud host, monitoring the key service state of the cloud host, and performing cloud host state fault judgment according to the monitoring result.
In this embodiment, the step S5 specifically includes the following steps:
step S501: current host information of the cloud host is monitored.
If the cloud host is successfully migrated, the current host of the cloud host is a new host; judging whether the cloud host successfully completes migration or not through the current host; the integrated acquisition service queries the current host of the cloud host through the cloud resource pool API interface, if the current host is judged to be the fault host, the cloud host is not migrated successfully, the abnormal data with the time stamp, which is not migrated by the cloud host, is generated, the state value is set to be 1, and otherwise, the state value is set to be 0.
Step S502: and monitoring the running state of the cloud host.
If the cloud host is successfully migrated, the cloud host still keeps a starting running state; when step S501 queries that the cloud host has migrated to a new host, the integrated acquisition service performs ICMP dial testing on the new target cloud host, if the dial testing is found to be not successful, the cloud host may not be started, then an abnormal data with a timestamp is generated, the state value is set to 1, otherwise, the state value is set to 0.
Step S503: monitoring the key service state of the cloud host.
If the cloud host is successfully migrated, the business service is not affected; and (3) performing dial testing on the ports of the key service, if the dial testing is abnormal, generating abnormal data with a timestamp, wherein the service is not started, and setting the state value to be 1, otherwise setting the state value to be 0.
Step S504: and (5) judging obstacle by a cloud host cross matrix.
Performing phase-phase operation according to the monitoring results of the migration states of the cloud hosts in the step S501, the step S502 and the step S503 and sending the operation results to a control center; and (6) the control center locates the failed concrete link according to the fault judging result of the cloud host state, and performs fault processing when the fault judging result of the cloud host state is abnormal, namely, the step (S6) is entered.
Step S6: and performing fault processing.
In this embodiment, the step S6 specifically includes the following steps:
step S601: and performing migration failure processing on the cloud host.
When judging that the cloud host is not migrated or not started through the step S504, identifying that the fault is responsible for the cloud resource pool manager, and informing the manager to perform subsequent fault processing operations, including manual migration and manual starting; simultaneously notifying tenants of the cloud host, notifying fault reasons, and discovering faults before clients;
step S602: and after the cloud host migration is carried out, the key process does not start fault processing.
And when judging that the cloud host is migrated but the service process is not started in the step S504, notifying the tenant to perform manual starting processing or call an automatic operation service script.
Step S7: the integrated acquisition service monitors the new host instead, and repeats the steps S3-S6.
The process according to the invention is further illustrated by the following example.
In this embodiment, the cloud host dynamic migration monitoring method includes:
step S1: modeling a cloud host and a host organization, defining a cloud host, a host entity table and an attribute table, and registering system metadata.
Entity attribute data:
table 1 metadata entity example data
Entity ID Entity Java class name Entity name
cm_01_37_03_01 HostConfEntity Host machine entity
cm_01_37_05_01 VmConfEntity Cloud host entity
Table 2 sink entity attribute example data
Sequence number Entity ID of the genus Attribute Java membership Data type Attribute names Attribute table field
1 cm_01_37_03_01 cloudCenterId Integer Cloud center numbering cm_01_37_01_01_01
2 cm_01_37_03_01 cloudPoolId Integer Cloud resource pool numbering cm_01_37_02_01_01
3 cm_01_37_03_01 hostId Integer Host numbering cm_01_37_03_01_01
4 cm_01_37_03_01 resourceType String Resource category cm_01_37_03_01_02
5 cm_01_37_03_01 hostName String Host name cm_01_37_03_01_03
6 cm_01_37_03_01 resourceRemark String Resource description cm_01_37_03_01_04
7 cm_01_37_03_01 uuId String UUID cm_01_37_03_01_05
8 cm_01_37_03_01 hostIp Integer Managing IP cm_01_37_03_01_06
9 cm_01_37_03_01 softwareVersion String Virtualized software version cm_01_37_03_01_07
10 cm_01_37_03_01 totalMemory Integer Total internal memory cm_01_37_03_01_08
11 cm_01_37_03_01 cpuModel String CPU model cm_01_37_03_01_09
12 cm_01_37_03_01 cpuNums Integer Number of CPUs cm_01_37_03_01_10
Table 3 cloud host entity attribute example data
Step S2: deploying an integrated acquisition service
Step S201: and collecting cloud resource pool relation data, butting a Vmware cloud resource pool Vmware sphere API interface (vSphere SDK for Java), and collecting and storing the monitored cloud host and other hosts under the cluster. The implementation method comprises the following steps:
1) Connecting vcenter, instantiating a venter service;
ServiceInstance si=VCenterConnectService.getServiceInstance(vcenterId);
2) Acquiring data center
Datacenter datacenter=(Datacenter)new
InventoryNavigator(si.getRootFolder()).searchManagedEntity("Datacenter",
datacenterName);
3) Acquiring project of cloud host
Folder project=(Folder)new
InventoryNavigator((datacenter.getVmFolder())).searchManagedEntity("Folder",
projectName);
4) Acquiring cloud host object according to cloud host name
VirtualMachine vm=(VirtualMachine)new
InventoryNavigator(project).searchManagedEntity("VirtualMachine",vmName);
5) And acquiring the real-time running state of the cloud host, and acquiring the current host. The code process is as follows:
ManagedObjectReference hostInfo=vm.getRuntime().getHost();
HostSystem host=new HostSystem(si.getServerConnection(),hostInfo);
String hostname=host.getName();
step S3: host running state monitoring
Step S301: and monitoring the network quality of the host machine. ICMP dial testing is carried out on the current host machine, and whether the target host machine has network faults or not is monitored. The monitoring index comprises whether a network is connected, network delay and packet loss rate, the data comprises an index value, a time stamp and an abnormal state value, the abnormal value is set to be 1, and the normal value is set to be 0.
The results were returned by ICMP dial testing, and the data examples are as follows:
10packets transmitted,10received,0%packet loss,time 9007ms
rtt min/avg/max/mdev=0.545/0.639/0.762/0.062ms
when the packet loss rate is 100% or the average delay exceeds 0.3
Step S302: and monitoring the state of the host hardware. The method comprises the steps of polling an IPMI command of a current host machine to collect hardware health states of a target host machine, wherein the collected hardware states comprise: and judging whether hardware faults occur or not according to the temperature, the fan, the voltage, the current, the processor, the memory and the power supply conditions, wherein an abnormal value is set to be 1, and a normal value is set to be 0.
The temperature acquisition result is exemplified as follows, and whether or not a fault has occurred (the value other than ns, lnc, unc or ok is abnormal) is judged based on the third column state value.
CPU 1Temp |01h|ns|3.1|Disabled
CPU 2Temp |02h|ns|3.2|Disabled
IOH 2Temp |0Dh|ns|7.1|Disabled
Ambient Temp|0Eh|ok|7.1|17degrees C
Step S303: and judging the obstacle by the host.
And calculating host state data acquired by the matrix. And performing phase operation on the result data of each index acquired in the matrix step S301 and the matrix step S302 according to the time correlation, so as to judge the final value of the index, confirm whether the host is abnormal or not and send the result to the control center. The control center automatically starts the cloud host migration module when the host fault judgment result is abnormal
Step S4: cloud host automatic migration
And (3) carrying out online migration of the cloud host by using an automatic cloud host migration technology, and starting a cloud host state migration monitoring function. And starting the automatic migration of the cloud host through a vmmotion interface of the Vmware sphere API interface of the docking Vmware cloud resource pool, and starting a state migration monitoring task of the cloud host.
Step S5: monitoring cloud host migration status
Step S501: current host information of the cloud host is monitored. If vmware vmotion live migration is successful, the current host of the cloud host should be the new host. Therefore, whether the cloud host successfully completes migration can be judged through the current host. The integrated acquisition service queries the current host of the cloud host through the cloud resource pool API interface, if the current host is judged to be the fault host, the cloud host is not automatically migrated, the abnormal data with the timestamp, which is not migrated by the cloud host, is generated, the state value is set to be 1, and otherwise, the state value is set to be 0.
Step S502: and monitoring the running state of the cloud host. When the migration is successful, the cloud host still keeps the starting-up running state. When step S501 queries that the cloud host has migrated to a new host, the integrated acquisition service performs ICMP dial testing on the new target cloud host, when it is found that the dial testing is not enabled, the cloud host may not be powered on, and then a piece of abnormal data with a timestamp is produced, the state value of which is set to 1, otherwise the state value of which is set to 0.
Step S503: monitoring the key service state of the cloud host. If the cloud host successfully migrates seamlessly, the business service is not affected. Therefore, by performing dial testing on the ports of the key service, if dial testing is abnormal, generating a piece of abnormal data with a timestamp that the service is not started, setting the state value to 1, otherwise setting the state value to 0.
The specific implementation is as follows: the connection state of the target address and the service port (HTTP service port or UDP service port) is collected by the ncat command.
The exception results are exemplified as follows:
curl (7) Failed connect to IP: port; the connection times out.
If an anomaly is found, generating a piece of service unopened time-stamped anomaly data, setting the state value to 1, otherwise setting the state value to 0.
Step S504: and (5) judging obstacle by a cloud host cross matrix. And performing phase operation according to the time correlation according to the acquisition results of the cloud host migration states in the steps S501, S502 and S503, and sending the results to a control center. And the control center calls the fault processing module according to the specific link of failure positioning when the fault judging result of the cloud host state is abnormal.
Step S6: alert notification and handling
Step S601: the cloud host does not migrate failure handling. When step S504 determines that the cloud host is not migrated or not powered on, the fault is determined to be responsible for the cloud resource pool administrator, and the administrator is notified to perform subsequent fault handling operations, such as manual migration or manual power on. And simultaneously notifying the tenant of the cloud host, notifying the fault reason, and discovering the fault before the client.
Step S602: and after the cloud host is migrated, the key process is not started. And when the cloud host is judged to be migrated but the service process is not started in the step S504, notifying the tenant to perform manual starting processing or call an automatic operation service script.
Step S7: the integrated acquisition service instead monitors the new host machine and repeats the steps S3-S6.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (7)

1. The cloud host dynamic migration monitoring and early warning method is characterized by comprising the following steps of:
step S1: modeling a relationship between a cloud host and a host machine;
step S2: collecting cloud resource pool relation data, deploying integrated acquisition service, and establishing a cross monitoring matrix;
step S3: monitoring and judging the running state of the host machine; when the host machine is abnormal, the step S4 is entered;
step S4: performing online migration of the cloud host, and starting a cloud host migration state monitoring function;
step S5: monitoring current host information of the cloud host, monitoring the running state of the cloud host, monitoring the key service state of the cloud host, and performing cloud host state fault judgment according to the monitoring result;
step S6: performing fault treatment;
step S7: the integrated acquisition service monitors the new host instead, and repeats the steps S3-S6.
2. The method for monitoring and early warning of cloud host dynamic migration according to claim 1, wherein in the step S1, in the modeling process, a cloud host, a host entity table and an attribute table are defined, and system metadata is registered.
3. The cloud host dynamic migration monitoring and early warning method according to claim 1, wherein the step S2 specifically includes the following steps:
step S201: the method comprises the steps of docking a cloud resource pool API interface, and collecting computing resource relation data of a bearing target cloud host, wherein the computing resource relation data comprise the current host, the resources of the current host and relation data of all hosts under a resource pool;
step S202: and deploying integrated acquisition services on cloud hosts of different network domains or network segments, and establishing a cross monitoring matrix.
4. The cloud host dynamic migration monitoring and early warning method according to claim 1, wherein the step S3 specifically includes the following steps:
step 301: monitoring the network quality of a host machine; the integrated acquisition service dials and measures the target host through the ICMP, monitors the network quality of the target host in real time, and judges whether the target host has abnormal super-threshold value or not; the network monitoring index comprises whether a network is connected, network delay and packet loss rate, the monitored data comprises an index value, a time stamp and an abnormal state value, the abnormal value is set to be 1, and the normal value is set to be 0;
step 302: monitoring the hardware state of the host machine; the integrated acquisition service acquires the hardware health state of the target host through the IPMI and monitors whether the target host has hardware faults or not; the hardware monitoring index comprises: judging whether the power supply state, the current state, the voltage state, the fan state, the processor state, the memory state, the temperature state and the event log state are abnormal, setting an abnormal value as 1 and setting a normal value as 0;
step S303: performing host state fault judgment; performing phase operation on each index data acquired by the cross monitoring matrix through the step S301 and the step S301 according to the time correlation, so as to judge the final value of the corresponding index, further judge whether the host is abnormal or not, and then send the obstacle judging result to the control center; and the control center automatically starts the migration of the cloud host according to the host state obstacle judgment result, and when the host state obstacle judgment result is abnormal, the step S4 is started.
5. The method for monitoring and early warning of cloud host dynamic migration according to claim 1, wherein in step S4, cloud host online migration is performed by using a cloud host automatic migration technology, and a cloud host migration status monitoring function is started, namely step S5 is entered.
6. The cloud host dynamic migration monitoring and early warning method according to claim 1, wherein the step S5 specifically includes the following steps:
step S501: if the cloud host is successfully migrated, the current host of the cloud host is a new host; judging whether the cloud host successfully completes migration or not through the current host; the integrated acquisition service queries the current host of the cloud host through the cloud resource pool API interface, if the current host or the fault host is judged, the cloud host is not migrated successfully, abnormal data with a time stamp, which is not migrated by the cloud host, is generated, the state value is set to be 1, and otherwise, the state value is set to be 0;
step S502: if the cloud host is successfully migrated, the cloud host still keeps a starting running state; when step S501 queries that the cloud host has migrated to a new host, the integrated acquisition service performs ICMP dial testing on the new target cloud host, if the dial testing is found to be not successful, the cloud host may not be started, then an abnormal data with a timestamp is generated, the state value is set to 1, otherwise, the state value is set to 0;
step S503: if the cloud host is successfully migrated, the business service is not affected; performing dial testing on a port of the key service, if the dial testing is abnormal, generating abnormal data with a timestamp, wherein the service is not started, and the state value is set to be 1, otherwise, the state value is set to be 0;
step S504: performing phase-phase operation according to the monitoring results of the migration states of the cloud hosts in the step S501, the step S502 and the step S503 and sending the operation results to a control center; and (6) the control center locates the failed concrete link according to the fault judging result of the cloud host state, and performs fault processing when the fault judging result of the cloud host state is abnormal, namely, the step (S6) is entered.
7. The cloud host dynamic migration monitoring and early warning method according to claim 6, wherein the step S6 specifically includes the following steps:
step S601: performing fault treatment of non-migration of the cloud host; when judging that the cloud host is not migrated or not started through the step S504, identifying that the fault is responsible for the cloud resource pool manager, and informing the manager to perform subsequent fault processing operations, including manual migration and manual starting; simultaneously notifying tenants of the cloud host, notifying fault reasons, and discovering faults before clients;
step S602: after the cloud host migration is carried out, the key process does not start fault processing; and when judging that the cloud host is migrated but the service process is not started in the step S504, notifying the tenant to perform manual starting processing or call an automatic operation service script.
CN202310320979.XA 2023-03-29 2023-03-29 Cloud host dynamic migration monitoring and early warning method Pending CN116501460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310320979.XA CN116501460A (en) 2023-03-29 2023-03-29 Cloud host dynamic migration monitoring and early warning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310320979.XA CN116501460A (en) 2023-03-29 2023-03-29 Cloud host dynamic migration monitoring and early warning method

Publications (1)

Publication Number Publication Date
CN116501460A true CN116501460A (en) 2023-07-28

Family

ID=87315772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310320979.XA Pending CN116501460A (en) 2023-03-29 2023-03-29 Cloud host dynamic migration monitoring and early warning method

Country Status (1)

Country Link
CN (1) CN116501460A (en)

Similar Documents

Publication Publication Date Title
US10037237B2 (en) Method and arrangement for fault management in infrastructure as a service clouds
TWI746512B (en) Physical machine fault classification processing method and device, and virtual machine recovery method and system
CN107612787B (en) Cloud host fault detection method based on Openstack open source cloud platform
US20180176088A1 (en) Virtualized network function monitoring
US20120174112A1 (en) Application resource switchover systems and methods
WO2018095414A1 (en) Method and apparatus for detecting and recovering fault of virtual machine
CN110716842B (en) Cluster fault detection method and device
US10489232B1 (en) Data center diagnostic information
WO2017107656A1 (en) Virtualized network element failure self-healing method and device
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
US20190379576A1 (en) Providing dynamic serviceability for software-defined data centers
WO2017075989A1 (en) Method, device and system for virtual machines migration
CN106130763A (en) Server cluster and be applicable to the database resource group method for handover control of this cluster
CN108199901B (en) Hardware repair reporting method, system, device, hardware management server and storage medium
JP2018029344A (en) Fault management method, virtualization network function manager (vnfm), and program
JP2017532682A (en) Virtual machine failure detection and recovery management system
JP5425720B2 (en) Virtualization environment monitoring apparatus and monitoring method and program thereof
WO2013111317A1 (en) Information processing method, device and program
CN106411643B (en) BMC detection method and device
CN116501460A (en) Cloud host dynamic migration monitoring and early warning method
WO2013097176A1 (en) User experience index monitoring method and monitoring virtual machine
CN110618884A (en) Fault monitoring method, virtualized network function module manager and storage medium
JP2013134658A (en) Computer network system, configuration management method, configuration management program and storage medium
US9405605B1 (en) Correction of dependency issues in network-based service remedial workflows
JP6984119B2 (en) Monitoring equipment, monitoring programs, and monitoring methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination