CN113032106A - Automatic detection method and device for IO suspension abnormality of computing node - Google Patents

Automatic detection method and device for IO suspension abnormality of computing node Download PDF

Info

Publication number
CN113032106A
CN113032106A CN202110477121.5A CN202110477121A CN113032106A CN 113032106 A CN113032106 A CN 113032106A CN 202110477121 A CN202110477121 A CN 202110477121A CN 113032106 A CN113032106 A CN 113032106A
Authority
CN
China
Prior art keywords
computing node
computing
node
state
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110477121.5A
Other languages
Chinese (zh)
Other versions
CN113032106B (en
Inventor
张志雄
魏亮
杨晓峰
许振峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110477121.5A priority Critical patent/CN113032106B/en
Publication of CN113032106A publication Critical patent/CN113032106A/en
Application granted granted Critical
Publication of CN113032106B publication Critical patent/CN113032106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a method and a device for automatically detecting IO suspension abnormity of a computing node, and relates to the technical field of cloud computing, wherein the method comprises the following steps: the method comprises the steps of collecting IO states of all virtual machines on a computing node in real time, wherein the IO states comprise a return state and a suspension state; counting the number of IO in the suspension state and the total number of IO at fixed intervals, and determining the ratio of the number of IO in the suspension state to the total number of IO; and determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold. The method and the device can find the IO abnormality of the computing node in time, further take effective treatment measures on the abnormal computing node in time, improve the response speed and accelerate the fault recovery speed.

Description

Automatic detection method and device for IO suspension abnormality of computing node
Technical Field
The invention relates to the technical field of cloud computing, in particular to a method and a device for automatically detecting IO suspension abnormity of a computing node.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In recent years, with the rapid development of cloud computing technology, the application of the cloud computing technology is more and more extensive, a general cloud platform uses distributed storage as storage resources to supply virtual machines, since the distributed storage can only detect self faults in a cluster, under the condition that communication between computing resources and the storage is interrupted, the distributed storage cluster cannot judge abnormal computing nodes through the distributed storage cluster, the virtual machines IO on the abnormal computing nodes are in a suspension state for a long time, and the virtual machines in the IO suspension state have normal heartbeat but cannot normally provide services to the outside.
At present, for scenes such as IO suspension of computing nodes and the like, a common cloud manufacturer does not have a good processing mechanism, basically finds out by alarming, manually operates and maintains down the computing nodes, and evacuates the virtual machines on the computing nodes to recover the environment, and the processing mode has the problems of low response speed, difficulty in fault recovery, low efficiency and the like.
Disclosure of Invention
The embodiment of the invention provides an automatic detection method for IO suspension abnormity of a computing node, which is used for timely finding out IO abnormity of the computing node, further timely taking effective treatment measures on the abnormal computing node, improving the response speed and accelerating the fault recovery speed, and comprises the following steps:
the method comprises the steps of collecting IO states of all virtual machines on a computing node in real time, wherein the IO states comprise a return state and a suspension state;
counting the number of IO in the suspension state and the total number of IO at fixed intervals, and determining the ratio of the number of IO in the suspension state to the total number of IO;
and determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold.
The embodiment of the present invention further provides an automatic detection device for computing node IO suspension abnormality, which is used to find computing node IO abnormality in time, and further take effective processing measures to the abnormal computing node in time, so as to improve response speed and accelerate failure recovery speed, and the device includes:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring IO states of all virtual machines on a computing node in real time, and the IO states comprise a return state and a suspension state;
the counting module is used for counting the IO number and the total IO number in the suspension state at fixed time intervals and determining the ratio of the IO number in the suspension state to the total IO number;
and the determining module is used for determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the automatic detection method for IO suspension abnormality of the computing node.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the above method for automatically detecting the IO suspension abnormality of the computing node.
In the embodiment of the invention, whether each IO is suspended can be known in time by acquiring the IO state of the virtual machine on the computing node in real time, the number of the IOs in the suspended state and the total number of the IOs of the computing node are counted after a fixed time interval, whether the computing node is in an abnormal state with a large number of the IOs suspended is determined according to the size relation between the ratio of the number of the IOs in the suspended state to the total number of the IOs and a preset threshold value, and the automatic detection and the timely discovery of the cloud platform for the IO suspended scene of the computing node are realized. Therefore, the computing nodes in abnormal states can be effectively processed in time, the response speed is improved, and the fault recovery efficiency can be greatly improved compared with a manual operation and maintenance mode.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a flowchart of an automatic detection method for computing node IO suspension abnormality in the embodiment of the present invention;
FIG. 2 is a flowchart of another method for automatically detecting IO suspension abnormality of a compute node according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another method for automatically detecting IO suspension abnormality of a compute node according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an automatic detection device for computing node IO suspension abnormality in the embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In order to solve the problems that the IO anomaly discovery speed is low and the corresponding processing response is not timely, the embodiment of the invention provides an automatic detection method for IO suspension anomalies of a computing node, which is applied to a cloud platform, and as shown in fig. 1, the method comprises the following steps 101 to 103:
step 101, acquiring IO states of all virtual machines of the computing node in real time, wherein the IO states comprise a return state and a suspension state.
The suspension state is that an IO request sent by a compute node is processed in the next layer for a long time (for example, a storage side) and does not receive a return, and the IO has a return under a normal condition, which indicates success (for example, 0 is returned) or non-success (for example, a value other than 0 is returned, and different values correspond to different abnormal states). The IO normally responds to the received return code to the outside and is in a return state; the IO is always in an interactive state with the next layer, and is in a suspended state without a return code.
In another implementation manner of the embodiment of the present invention, IO return time may also be recorded, so as to assist in determining whether IO is abnormal according to the length of the return time.
And 102, counting the number of the IO in the suspension state and the total number of the IO at fixed time intervals, and determining the ratio of the number of the IO in the suspension state to the total number of the IO.
It should be noted that, if a large number of IO are in the suspended state for a certain period of time and no return exists, the virtual machine heartbeat is normal but the service cannot be provided normally.
In order to find out the abnormal condition that the virtual machine cannot provide the external service in time, in the embodiment of the invention, the number of the IO in the suspension state and the total number of the IO are counted at fixed intervals, so as to confirm whether a large number of IOs are in the suspension state at the current moment and whether the virtual machine provides the external service normally.
In the embodiment of the invention, the fixed time is set by operation and maintenance personnel. Considering that the abnormal state can be found and processed earlier when the fixed time is set to be shorter, the processing speed is high, but the conditions that the pressure is suddenly increased and the misjudgment is easy to occur under the condition of network oscillation can occur; when the fixed time is set to be longer, the processing of the IO suspension abnormal scene is slower but more reliable, and the fixed time may be set by balancing the processing speed and the processing reliability, for example, the fixed time may be set to be 30 s.
And 103, determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold.
Specifically, as shown in fig. 2, step 103 determines whether the computing node IO is in an abnormal state according to a magnitude relationship between the ratio and a preset threshold, and may be executed as the following step 1031 or step 1032:
and step 1031, if the ratio is greater than or equal to a preset threshold, determining that the computing node IO is in an abnormal state.
Step 1032, if the ratio is smaller than a preset threshold, determining that the computing node IO is in a normal state.
The preset threshold may be determined according to experience that the virtual machine cannot provide external services when the IO is suspended by a certain amount, and in general, the preset threshold may be set to 80%.
In one implementation, when it is determined that the computing node IO is in an abnormal state, the cloud platform managing the computing nodes performs processing uniformly.
Specifically, if the computing node IO is determined to be in an abnormal state, checking whether redundant computing resources exist in other computing nodes of the current cloud platform; determining a processing mode of the computing node with IO abnormality according to whether other computing nodes have redundant computing resources; and processing the IO abnormal computing nodes according to a processing mode.
The redundant computing resources are resources repeatedly configured on the computing nodes, and play a role in bearing a component which fails when the component in the computing nodes fails.
Therefore, if redundant computing resources exist in other computing nodes of the cloud platform, the computing nodes with abnormal IO can be powered off, and the virtual machines are evacuated to the computing nodes with the redundant computing resources; and when the virtual machine is evacuated to the computing nodes with redundant computing resources, the service of the virtual machine is recovered. Therefore, the emergency function of the abnormal computing node is realized by using the redundant computing resources, the virtual machine service can be recovered within second-level time, and the service recovery efficiency is greatly improved compared with a manual operation and maintenance mode.
In another implementation, if the other compute nodes do not have redundant compute resources, the compute nodes are powered down, stopping virtual machine service.
In the two processing modes, the processing mode of powering off the computing node is adopted, so that the high availability mechanism of the computing node can recover the service of the computing node, and the purpose of emergency processing is achieved.
The recovery of the service does not necessarily mean that the cause of the failure of the computing node is solved, and the processing method adopted may not be applicable in some cases. After receiving the notification, the operation and maintenance personnel investigate the reason for generating the IO abnormity and determine whether the processing mode is proper so as to reduce the probability of the fault happening again.
As shown in fig. 3, which is a flowchart of another method for automatically detecting an IO suspension abnormality of a compute node in the embodiment of the present invention, it can be seen from fig. 3 that, when a ratio of the number of suspended IOs to the total number of IOs of a virtual machine is different from a preset threshold and when redundant computing resources exist, different processing manners are adopted for the compute node and the virtual machine, so as to achieve the purpose of automatically detecting and recovering an IO suspension scene.
In the embodiment of the invention, whether each IO is suspended can be known in time by acquiring the IO state of the virtual machine on the computing node in real time, the number of the IOs in the suspended state and the total number of the IOs of the computing node are counted after a fixed time interval, whether the computing node is in an abnormal state with a large number of the IOs suspended is determined according to the size relation between the ratio of the number of the IOs in the suspended state to the total number of the IOs and a preset threshold value, and the automatic detection and the timely discovery of the cloud platform for the IO suspended scene of the computing node are realized. Therefore, the computing nodes in abnormal states can be effectively processed in time, the response speed is improved, and the fault recovery efficiency can be greatly improved compared with a manual operation and maintenance mode.
The embodiment of the invention also provides an automatic detection device for IO suspension abnormity of the computing node, which is described in the following embodiment. Because the principle of the device for solving the problems is similar to the automatic detection method for the IO suspension abnormity of the computing node, the implementation of the device can refer to the implementation of the automatic detection method for the IO suspension abnormity of the computing node, and repeated parts are not described again.
As shown in fig. 4, the apparatus 400 includes an acquisition module 401, a statistics module 402, and a determination module 403.
The acquisition module 401 is configured to acquire IO states of all virtual machines on a compute node in real time, where the IO states include a return state and a suspension state;
a counting module 402, configured to count the number of IO in a suspended state and the total number of IO at fixed intervals, and determine a ratio of the number of IO in the suspended state to the total number of IO;
the determining module 403 is configured to determine whether the computing node IO is in an abnormal state according to a size relationship between the ratio and a preset threshold.
In an implementation manner of the embodiment of the present invention, the determining module 403 is configured to:
when the ratio is greater than or equal to a preset threshold value, determining that the IO of the computing node is in an abnormal state;
and when the ratio is smaller than a preset threshold value, determining that the calculation node IO is in a normal state.
In one implementation manner of the embodiment of the present invention, the apparatus 400 further includes:
the checking module 404 is configured to, when it is determined that the computing node IO is in an abnormal state, check whether redundant computing resources exist in other computing nodes of the current cloud platform;
the determining module 403 is further configured to determine a processing mode of the computing node with IO exception according to whether there is redundant computing resource in other computing nodes;
and the processing module 405 is configured to process the computing node with the IO exception according to a processing mode.
In an implementation manner of the embodiment of the present invention, when there are redundant computing resources in other computing nodes, the processing module 405 is configured to:
powering off the computing node with the abnormal IO, and evacuating the virtual machine to the computing node with the redundant computing resource;
and when the virtual machine is evacuated to the computing nodes with redundant computing resources, the service of the virtual machine is recovered.
In an implementation manner of the embodiment of the present invention, when there is no redundant computing resource in other computing nodes, the processing module 405 is configured to:
powering off the computing node and stopping the virtual machine service.
In one implementation manner of the embodiment of the present invention, the apparatus 400 further includes:
the communication module 406 is configured to send a node processing notification to the operation and maintenance staff, where the node processing notification includes a processing mode of a computing node with an IO exception.
In the embodiment of the invention, whether each IO is suspended can be known in time by acquiring the IO state of the virtual machine on the computing node in real time, the number of the IOs in the suspended state and the total number of the IOs of the computing node are counted after a fixed time interval, whether the computing node is in an abnormal state with a large number of the IOs suspended is determined according to the size relation between the ratio of the number of the IOs in the suspended state to the total number of the IOs and a preset threshold value, and the automatic detection and the timely discovery of the cloud platform for the IO suspended scene of the computing node are realized. Therefore, the computing nodes in abnormal states can be effectively processed in time, the response speed is improved, and the fault recovery efficiency can be greatly improved compared with a manual operation and maintenance mode.
An embodiment of the present invention further provides a computer device, and fig. 5 is a schematic diagram of the computer device in the embodiment of the present invention, where the computer device is capable of implementing all steps in the automatic detection method for computing node IO suspension abnormality in the embodiment, and the computer device specifically includes the following contents:
a processor (processor)501, a memory (memory)502, a communication Interface (Communications Interface)503, and a communication bus 504;
the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between related devices;
the processor 501 is configured to call a computer program in the memory 502, and when the processor executes the computer program, the automatic detection method for the IO suspension abnormality of the compute node in the foregoing embodiment is implemented.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program for executing the above method for automatically detecting the IO suspension abnormality of the computing node.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for automatically detecting IO suspension abnormality of a computing node is characterized by comprising the following steps:
the method comprises the steps of collecting IO states of all virtual machines on a computing node in real time, wherein the IO states comprise a return state and a suspension state;
counting the number of IO in the suspension state and the total number of IO at fixed intervals, and determining the ratio of the number of IO in the suspension state to the total number of IO;
and determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold.
2. The method according to claim 1, wherein determining whether a compute node IO is in an abnormal state according to a magnitude relation between the ratio and a preset threshold comprises:
if the ratio is larger than or equal to a preset threshold value, determining that the IO of the computing node is in an abnormal state;
and if the ratio is smaller than a preset threshold value, determining that the calculation node IO is in a normal state.
3. The method according to claim 2, wherein after determining whether the computing node IO is in an abnormal state according to a magnitude relation between the ratio and a preset threshold, the method further comprises:
if the computing node IO is determined to be in an abnormal state, checking whether redundant computing resources exist in other computing nodes of the current cloud platform;
determining a processing mode of the computing node with IO abnormality according to whether other computing nodes have redundant computing resources;
and processing the IO abnormal computing node according to the processing mode.
4. The method of claim 3, wherein when the other compute nodes have redundant compute resources, processing the compute node with the IO exception according to the processing method includes:
powering off the computing node with the abnormal IO, and evacuating the virtual machine to the computing node with the redundant computing resource;
and when the virtual machine is evacuated to the computing nodes with redundant computing resources, the service of the virtual machine is recovered.
5. The method according to claim 3, wherein when there are no redundant computing resources in other computing nodes, processing the computing node with the IO exception according to the processing method includes:
powering off the computing node and stopping the virtual machine service.
6. The method according to any one of claims 3 to 5, further comprising:
and sending a node processing notification to operation and maintenance personnel, wherein the node processing notification comprises a processing mode of the computing node with IO exception.
7. An automatic detection device for IO suspension abnormity of a computing node, which is characterized by comprising:
the system comprises an acquisition module, a data processing module and a data processing module, wherein the acquisition module is used for acquiring IO states of all virtual machines on a computing node in real time, and the IO states comprise a return state and a suspension state;
the counting module is used for counting the IO number and the total IO number in the suspension state at fixed time intervals and determining the ratio of the IO number in the suspension state to the total IO number;
and the determining module is used for determining whether the IO of the computing node is in an abnormal state or not according to the size relation between the ratio and a preset threshold.
8. The apparatus of claim 7, further comprising:
the checking module is used for checking whether redundant computing resources exist in other computing nodes of the current cloud platform when the computing node IO is determined to be in an abnormal state;
the determining module is further used for determining a processing mode of the computing node with the IO abnormality according to whether the other computing nodes have redundant computing resources;
and the processing module is used for processing the calculation node with IO exception according to the processing mode.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 6.
CN202110477121.5A 2021-04-29 2021-04-29 Automatic detection method and device for IO suspension abnormality of computing node Active CN113032106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110477121.5A CN113032106B (en) 2021-04-29 2021-04-29 Automatic detection method and device for IO suspension abnormality of computing node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110477121.5A CN113032106B (en) 2021-04-29 2021-04-29 Automatic detection method and device for IO suspension abnormality of computing node

Publications (2)

Publication Number Publication Date
CN113032106A true CN113032106A (en) 2021-06-25
CN113032106B CN113032106B (en) 2024-07-09

Family

ID=76455606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110477121.5A Active CN113032106B (en) 2021-04-29 2021-04-29 Automatic detection method and device for IO suspension abnormality of computing node

Country Status (1)

Country Link
CN (1) CN113032106B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254166A (en) * 2021-07-13 2021-08-13 云宏信息科技股份有限公司 Method for processing IO request, storage medium and virtualization simulator

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407083A (en) * 2016-10-26 2017-02-15 华为技术有限公司 Fault detection method and device
CN106650434A (en) * 2016-12-27 2017-05-10 四川大学 IO sequence-based virtual machine abnormal behavior detection method and system
CN107679398A (en) * 2017-09-30 2018-02-09 北京奇虎科技有限公司 Virtual machine I/O data stream detection method and device, computing device, storage medium
CN110018793A (en) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 A kind of host I O process control method, device, terminal and readable storage medium storing program for executing
CN110535692A (en) * 2019-08-12 2019-12-03 华为技术有限公司 Fault handling method, device, computer equipment, storage medium and storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407083A (en) * 2016-10-26 2017-02-15 华为技术有限公司 Fault detection method and device
CN106650434A (en) * 2016-12-27 2017-05-10 四川大学 IO sequence-based virtual machine abnormal behavior detection method and system
CN107679398A (en) * 2017-09-30 2018-02-09 北京奇虎科技有限公司 Virtual machine I/O data stream detection method and device, computing device, storage medium
CN110018793A (en) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 A kind of host I O process control method, device, terminal and readable storage medium storing program for executing
CN110535692A (en) * 2019-08-12 2019-12-03 华为技术有限公司 Fault handling method, device, computer equipment, storage medium and storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武林平等: "大规模计算系统的主动故障管理方法", 华中科技大学学报(自然科学版), vol. 38, no. 1, 15 June 2010 (2010-06-15), pages 25 - 59 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254166A (en) * 2021-07-13 2021-08-13 云宏信息科技股份有限公司 Method for processing IO request, storage medium and virtualization simulator

Also Published As

Publication number Publication date
CN113032106B (en) 2024-07-09

Similar Documents

Publication Publication Date Title
TWI746512B (en) Physical machine fault classification processing method and device, and virtual machine recovery method and system
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN110581852A (en) Efficient mimicry defense system and method
CN111953566B (en) Distributed fault monitoring-based method and virtual machine high-availability system
CN109286529B (en) Method and system for recovering RabbitMQ network partition
EP3142011A1 (en) Anomaly recovery method for virtual machine in distributed environment
CN104506392B (en) A kind of delay machine detection method and equipment
CN104065526B (en) A kind of method and apparatus of server failure alarm
CN103812675A (en) Method and system for realizing allopatric disaster recovery switching of service delivery platform
CN211321337U (en) Monitoring system for communication system
CN109391691A (en) The restoration methods and relevant apparatus that NAS is serviced under a kind of single node failure
WO2024082471A1 (en) Inter-node link status monitoring method and apparatus
CN110912755A (en) System and method for network card fault monitoring and automatic recovery in cloud environment
CN114826962A (en) Link fault detection method, device, equipment and machine readable storage medium
CN113032106B (en) Automatic detection method and device for IO suspension abnormality of computing node
CN114064217A (en) Node virtual machine migration method and device based on OpenStack
CN111880947B (en) Data transmission method and device
CN109684130A (en) The method and device of data backup between a kind of computer room
CN117880254A (en) Reconnection method for real-time communication
CN110224872B (en) Communication method, device and storage medium
CN115220937A (en) Method, electronic device and program product for storage management
CN107656845A (en) A kind of virtual machine high availability method
CN112953792B (en) Network traffic monitoring method and device
JP6984119B2 (en) Monitoring equipment, monitoring programs, and monitoring methods
CN104639890A (en) Facility monitoring control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant