CN104268061B

CN104268061B - A kind of storage state monitoring method suitable for virtual machine

Info

Publication number: CN104268061B
Application number: CN201410464913.9A
Authority: CN
Inventors: 熊梦; 杨松; 莫展鹏; 季统凯
Original assignee: G Cloud Technology Co Ltd
Current assignee: G Cloud Technology Co Ltd
Priority date: 2014-09-12
Filing date: 2014-09-12
Publication date: 2017-03-15
Anticipated expiration: 2034-09-12
Also published as: CN104268061A

Abstract

The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.Method of the present invention is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end；Wherein, the state for reading the carry network storage equipment is thereon responsible for by host；The state for detecting the network storage equipment on host is responsible at monitoring service end, and sends alarm when failure occurs to monitoring management end；Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host.The present invention is solved；Can be used in the storage state of virtual machine.

Description

A kind of storage state monitoring method suitable for virtual machine

Technical field

The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.

Background technology

In large-scale data center deployment cloud platform, centralised storage is mounted to host's physics by network often Machine, host's physical machine reuses the memory space of carry to be used for creating virtual machine.In this case, when centralised storage occurs After failure or power interruption recovering, virtual machine can be in the state of a kind of " read-only ", i.e. virtual machine by the management such as hypervisor Device can also view state and be up；But any operation for writing data can not be done in fact to virtual machine, and virtual machine is to be in The state of failure.This failure has to find in time to detect and notify keeper, certainly will otherwise affect virtual machine user industry The normal operation of business.In general, two kinds of means will be monitored to this failure：

1st, the operation operation system of virtual machine is alerted, and the user of virtual machine reinforms the keeper of cloud platform after receiving alarm；

2nd, state-detection is carried out using the instrument that network storage manufacturer provides, alerted when the state of detecting is read-only；

But the drawbacks of these methods there is also following：

1st, the virtual machine in read-only status, the alarm function of operation system normally can not may run, this notice Mode is also not in time.

2nd, notified using operation system, virtual machine can only be navigated to, it is impossible to navigated to the network storage equipment exactly； The instrument provided using network storage manufacturer can only navigate to equipment, it is impossible to navigate to virtual machine exactly；Can neither carry For complete information, even automatically processed with allowing failure timely to be processed.

Content of the invention

Present invention solves the technical problem that being to provide a kind of storage state monitoring method suitable for virtual machine, can monitor The state of the network storage, monitors failure in time, while providing the corresponding informance of complete virtual machine and the network storage equipment, is net The timely recovery of network storage even recovers to provide support automatically.

The present invention solves the technical scheme of above-mentioned technical problem：Described method is by host, the network storage equipment, prison Realize a few parts of control service end and monitoring management end；Wherein, host is responsible for reading the carry network storage equipment thereon State；The state for detecting the network storage equipment on host is responsible at monitoring service end, and when failure occurs to monitoring management End sends alarm；Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host；

Realize that step is included：

Step 1：Cloud platform keeper passes through monitoring management end configuration monitoring parameter, including assay intervals, number of retries；

Step 2：Monitoring service end read monitoring parameter, start timed thread monitoring network storage device status and with Network storage equipment state of the logic roll form carry in host；

Step 3：If monitoring service end monitors network storage equipment failure, logical volume carry abnormal state, to monitoring Management end sends alarm event；

Step 4：Monitoring management termination by event and is processed, if network storage equipment failure, is then sent out to keeper Send alarm；If logical volume carry abnormal state, then the instruction that carry storage volume and virtual machine are restarted again is sent to host To recover from failure；

Described assay intervals refer to that the interval duration for once checking is made to the state of the network storage in monitoring service end；

Described number of retries then refer to produce network storage event of failure before should continuously repeat checked time Number, to guarantee that failure is credible；

The described network storage equipment is locally stored relative to host, refers to and can provide storage resource by network To the centralized storage that host is used；

Described host refers to the server node for being provided with virtual machine management program, can create multiple stage virtual machine thereon；

Described monitoring management end includes user interactive module, event of failure message module, host interactive module；

Described user interactive module is responsible for externally providing application interface, receives the monitoring ginseng of cloud platform Administrator Number, and parameter information is preserved to database；

Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm Module；Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and connects The alarm event that monitored service end active reporting comes；Event of failure alarm submodule is then responsible for the process of alarm event, with hand Event of failure is informed cloud platform keeper or virtual machine owning user by the mode of machine note or network mail；

Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed； Described troubleshooting instruction refers to the carry network storage again and restarts the virtual machine that sets up in the network storage；

Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host Virtual machine above and the real time status information with the network storage of host interconnection, and information is returned to monitoring management end.

The present invention program has the beneficial effect that：

1st, method of the present invention real-time is higher, as the event of failure needs not move through the inspection of service system running state Survey, therefore can just be detected early stage failure occurs；

2nd, the method for the present invention combines host, can inquire about the corresponding relation of the network storage and virtual machine at any time, therefore The event of failure of generation has stronger readability, can be accurately positioned the source of trouble；

3rd, the automatic recovered part failure of method of the present invention energy, due to combining host, therefore can pass through host extensive Extremely this common failure of multiple carry.

Description of the drawings

The present invention is further described below in conjunction with the accompanying drawings：

Fig. 1 is the module map of the present invention.

Specific embodiment

Cloud platform keeper first logs into the network storage equipment configuration prison that cloud platform page interface is used by virtual machine Control parameter, alarm recipient etc. when being frequency, fault detect number of retries, send alarm including assay intervals；Then pass through cloud Platform message communication mechanism passes to monitoring management end, during the latter preserves these parameter informations to the corresponding table of database, portion Divide and realize that code is as follows：

When monitoring management end starts, reading database respective table obtains the parameter information such as monitoring frequency and number of retries, and Realize that end sends request to all registered monitoring, notify them to start network storage timing scan thread and report storage letter Cease to storage service end and event of failure messenger service end, partly realize that code is as follows：

When monitoring realizes that end starts, detect that monitoring management end whether there is by cloud platform internal communication mechanism first, if Do not exist, do not start the timed thread of the monitoring network storage equipment；If existing, scanning obtains the information of the network storage equipment, And the storage information of acquisition is reported to monitoring management end, while to monitoring management end acquisition request monitoring frequency and retry secondary Number, for starting timed thread, partial code is as follows：

Monitoring realizes that the periodic detection network storage, by timed thread, is responsible for by the order such as iscsi, iscsiadm in end State, if continuous several times find storage in abnormality, produces network storage abnormality alarming event and sends to monitoring clothes The event of failure message module at business end.Partial code is as follows：

Event of failure message module receives alarm and produces alarm to cloud platform management end or virtual machine user in time, while producing A raw fault recovery is recorded and is preserved to database.

The host interactive module reading database at management service end obtains fault recovery record, sends failure to host Recover order, the carry network storage and restart virtual machine of the foundation in the network storage again.

In the present invention, logical volume is the related notion of LVM logical partitions management, and LVM is to disk partition under Linux environment The unification of multiple physics block devices can be become a big logical device, be carried with this by a kind of mechanism being managed The flexibility of high disk partition management.Failure and abnormal referring to are led because of situations such as network storage equipment generation power-off or suspension The host of cause cannot be successfully connected the network storage or be mounted to the local network storage in read-only status.

Claims

1. a kind of storage state monitoring method suitable for virtual machine, it is characterised in that：

Described method is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end；Wherein, place The state for reading the carry network storage equipment is thereon responsible for by main frame；It is responsible for detecting that the network on host is deposited in monitoring service end The state of storage equipment, and alarm is sent when failure occurs to monitoring management end；It is responsible for receiving fault warning simultaneously in monitoring management end Processed, most after to host send troubleshooting instruction；

Realize that step is included：

Step 2：Monitoring parameter is read at monitoring service end, starts timed thread monitoring network storage device status and with logic Network storage equipment state of the roll form carry in host；

Step 4：Monitoring management termination by event and is processed, if network storage equipment failure, is then sent to keeper and is accused Alert；If logical volume carry abnormal state, then to host send carry storage volume and virtual machine are restarted again instruction with from Recover in failure；

Described number of retries then refers to and should continuously repeat the number of times that checked before network storage event of failure is produced, with Guarantee that failure is credible；

The described network storage equipment is locally stored relative to host, refers to and storage resource can be supplied to place by network The centralized storage that main frame is used；

Described host refers to the server node for being provided with virtual machine management program, can create multiple stage thereon virtual Machine；

Described user interactive module is responsible for externally providing application interface, receives the monitoring parameter of cloud platform Administrator, and Parameter information is preserved to database；

Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm submodule； Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and receives monitoring The alarm event that service end active reporting comes；Event of failure alarm submodule is then responsible for the process of alarm event, with SMS Or event of failure is informed cloud platform keeper or virtual machine owning user by the mode of network mail；

Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed；Described Troubleshooting instruction refer to the carry network storage again and restart virtual machine of the foundation in the network storage；

Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host Virtual machine above machine and the real time status information with the network storage of host interconnection, and information is returned to monitoring management End.