CN104268061B - A kind of storage state monitoring method suitable for virtual machine - Google Patents

A kind of storage state monitoring method suitable for virtual machine Download PDF

Info

Publication number
CN104268061B
CN104268061B CN201410464913.9A CN201410464913A CN104268061B CN 104268061 B CN104268061 B CN 104268061B CN 201410464913 A CN201410464913 A CN 201410464913A CN 104268061 B CN104268061 B CN 104268061B
Authority
CN
China
Prior art keywords
monitoring
host
network storage
failure
virtual machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410464913.9A
Other languages
Chinese (zh)
Other versions
CN104268061A (en
Inventor
熊梦
杨松
莫展鹏
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN201410464913.9A priority Critical patent/CN104268061B/en
Publication of CN104268061A publication Critical patent/CN104268061A/en
Application granted granted Critical
Publication of CN104268061B publication Critical patent/CN104268061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.Method of the present invention is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end;Wherein, the state for reading the carry network storage equipment is thereon responsible for by host;The state for detecting the network storage equipment on host is responsible at monitoring service end, and sends alarm when failure occurs to monitoring management end;Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host.The present invention is solved;Can be used in the storage state of virtual machine.

Description

A kind of storage state monitoring method suitable for virtual machine
Technical field
The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.
Background technology
In large-scale data center deployment cloud platform, centralised storage is mounted to host's physics by network often Machine, host's physical machine reuses the memory space of carry to be used for creating virtual machine.In this case, when centralised storage occurs After failure or power interruption recovering, virtual machine can be in the state of a kind of " read-only ", i.e. virtual machine by the management such as hypervisor Device can also view state and be up;But any operation for writing data can not be done in fact to virtual machine, and virtual machine is to be in The state of failure.This failure has to find in time to detect and notify keeper, certainly will otherwise affect virtual machine user industry The normal operation of business.In general, two kinds of means will be monitored to this failure:
1st, the operation operation system of virtual machine is alerted, and the user of virtual machine reinforms the keeper of cloud platform after receiving alarm;
2nd, state-detection is carried out using the instrument that network storage manufacturer provides, alerted when the state of detecting is read-only;
But the drawbacks of these methods there is also following:
1st, the virtual machine in read-only status, the alarm function of operation system normally can not may run, this notice Mode is also not in time.
2nd, notified using operation system, virtual machine can only be navigated to, it is impossible to navigated to the network storage equipment exactly; The instrument provided using network storage manufacturer can only navigate to equipment, it is impossible to navigate to virtual machine exactly;Can neither carry For complete information, even automatically processed with allowing failure timely to be processed.
Content of the invention
Present invention solves the technical problem that being to provide a kind of storage state monitoring method suitable for virtual machine, can monitor The state of the network storage, monitors failure in time, while providing the corresponding informance of complete virtual machine and the network storage equipment, is net The timely recovery of network storage even recovers to provide support automatically.
The present invention solves the technical scheme of above-mentioned technical problem:Described method is by host, the network storage equipment, prison Realize a few parts of control service end and monitoring management end;Wherein, host is responsible for reading the carry network storage equipment thereon State;The state for detecting the network storage equipment on host is responsible at monitoring service end, and when failure occurs to monitoring management End sends alarm;Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host;
Realize that step is included:
Step 1:Cloud platform keeper passes through monitoring management end configuration monitoring parameter, including assay intervals, number of retries;
Step 2:Monitoring service end read monitoring parameter, start timed thread monitoring network storage device status and with Network storage equipment state of the logic roll form carry in host;
Step 3:If monitoring service end monitors network storage equipment failure, logical volume carry abnormal state, to monitoring Management end sends alarm event;
Step 4:Monitoring management termination by event and is processed, if network storage equipment failure, is then sent out to keeper Send alarm;If logical volume carry abnormal state, then the instruction that carry storage volume and virtual machine are restarted again is sent to host To recover from failure;
Described assay intervals refer to that the interval duration for once checking is made to the state of the network storage in monitoring service end;
Described number of retries then refer to produce network storage event of failure before should continuously repeat checked time Number, to guarantee that failure is credible;
The described network storage equipment is locally stored relative to host, refers to and can provide storage resource by network To the centralized storage that host is used;
Described host refers to the server node for being provided with virtual machine management program, can create multiple stage virtual machine thereon;
Described monitoring management end includes user interactive module, event of failure message module, host interactive module;
Described user interactive module is responsible for externally providing application interface, receives the monitoring ginseng of cloud platform Administrator Number, and parameter information is preserved to database;
Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm Module;Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and connects The alarm event that monitored service end active reporting comes;Event of failure alarm submodule is then responsible for the process of alarm event, with hand Event of failure is informed cloud platform keeper or virtual machine owning user by the mode of machine note or network mail;
Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed; Described troubleshooting instruction refers to the carry network storage again and restarts the virtual machine that sets up in the network storage;
Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host Virtual machine above and the real time status information with the network storage of host interconnection, and information is returned to monitoring management end.
The present invention program has the beneficial effect that:
1st, method of the present invention real-time is higher, as the event of failure needs not move through the inspection of service system running state Survey, therefore can just be detected early stage failure occurs;
2nd, the method for the present invention combines host, can inquire about the corresponding relation of the network storage and virtual machine at any time, therefore The event of failure of generation has stronger readability, can be accurately positioned the source of trouble;
3rd, the automatic recovered part failure of method of the present invention energy, due to combining host, therefore can pass through host extensive Extremely this common failure of multiple carry.
Description of the drawings
The present invention is further described below in conjunction with the accompanying drawings:
Fig. 1 is the module map of the present invention.
Specific embodiment
Cloud platform keeper first logs into the network storage equipment configuration prison that cloud platform page interface is used by virtual machine Control parameter, alarm recipient etc. when being frequency, fault detect number of retries, send alarm including assay intervals;Then pass through cloud Platform message communication mechanism passes to monitoring management end, during the latter preserves these parameter informations to the corresponding table of database, portion Divide and realize that code is as follows:
When monitoring management end starts, reading database respective table obtains the parameter information such as monitoring frequency and number of retries, and Realize that end sends request to all registered monitoring, notify them to start network storage timing scan thread and report storage letter Cease to storage service end and event of failure messenger service end, partly realize that code is as follows:
When monitoring realizes that end starts, detect that monitoring management end whether there is by cloud platform internal communication mechanism first, if Do not exist, do not start the timed thread of the monitoring network storage equipment;If existing, scanning obtains the information of the network storage equipment, And the storage information of acquisition is reported to monitoring management end, while to monitoring management end acquisition request monitoring frequency and retry secondary Number, for starting timed thread, partial code is as follows:
Monitoring realizes that the periodic detection network storage, by timed thread, is responsible for by the order such as iscsi, iscsiadm in end State, if continuous several times find storage in abnormality, produces network storage abnormality alarming event and sends to monitoring clothes The event of failure message module at business end.Partial code is as follows:
Event of failure message module receives alarm and produces alarm to cloud platform management end or virtual machine user in time, while producing A raw fault recovery is recorded and is preserved to database.
The host interactive module reading database at management service end obtains fault recovery record, sends failure to host Recover order, the carry network storage and restart virtual machine of the foundation in the network storage again.
In the present invention, logical volume is the related notion of LVM logical partitions management, and LVM is to disk partition under Linux environment The unification of multiple physics block devices can be become a big logical device, be carried with this by a kind of mechanism being managed The flexibility of high disk partition management.Failure and abnormal referring to are led because of situations such as network storage equipment generation power-off or suspension The host of cause cannot be successfully connected the network storage or be mounted to the local network storage in read-only status.

Claims (1)

1. a kind of storage state monitoring method suitable for virtual machine, it is characterised in that:
Described method is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end;Wherein, place The state for reading the carry network storage equipment is thereon responsible for by main frame;It is responsible for detecting that the network on host is deposited in monitoring service end The state of storage equipment, and alarm is sent when failure occurs to monitoring management end;It is responsible for receiving fault warning simultaneously in monitoring management end Processed, most after to host send troubleshooting instruction;
Realize that step is included:
Step 1:Cloud platform keeper passes through monitoring management end configuration monitoring parameter, including assay intervals, number of retries;
Step 2:Monitoring parameter is read at monitoring service end, starts timed thread monitoring network storage device status and with logic Network storage equipment state of the roll form carry in host;
Step 3:If monitoring service end monitors network storage equipment failure, logical volume carry abnormal state, to monitoring management End sends alarm event;
Step 4:Monitoring management termination by event and is processed, if network storage equipment failure, is then sent to keeper and is accused Alert;If logical volume carry abnormal state, then to host send carry storage volume and virtual machine are restarted again instruction with from Recover in failure;
Described assay intervals refer to that the interval duration for once checking is made to the state of the network storage in monitoring service end;
Described number of retries then refers to and should continuously repeat the number of times that checked before network storage event of failure is produced, with Guarantee that failure is credible;
The described network storage equipment is locally stored relative to host, refers to and storage resource can be supplied to place by network The centralized storage that main frame is used;
Described host refers to the server node for being provided with virtual machine management program, can create multiple stage thereon virtual Machine;
Described monitoring management end includes user interactive module, event of failure message module, host interactive module;
Described user interactive module is responsible for externally providing application interface, receives the monitoring parameter of cloud platform Administrator, and Parameter information is preserved to database;
Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm submodule; Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and receives monitoring The alarm event that service end active reporting comes;Event of failure alarm submodule is then responsible for the process of alarm event, with SMS Or event of failure is informed cloud platform keeper or virtual machine owning user by the mode of network mail;
Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed;Described Troubleshooting instruction refer to the carry network storage again and restart virtual machine of the foundation in the network storage;
Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host Virtual machine above machine and the real time status information with the network storage of host interconnection, and information is returned to monitoring management End.
CN201410464913.9A 2014-09-12 2014-09-12 A kind of storage state monitoring method suitable for virtual machine Active CN104268061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410464913.9A CN104268061B (en) 2014-09-12 2014-09-12 A kind of storage state monitoring method suitable for virtual machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410464913.9A CN104268061B (en) 2014-09-12 2014-09-12 A kind of storage state monitoring method suitable for virtual machine

Publications (2)

Publication Number Publication Date
CN104268061A CN104268061A (en) 2015-01-07
CN104268061B true CN104268061B (en) 2017-03-15

Family

ID=52159584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410464913.9A Active CN104268061B (en) 2014-09-12 2014-09-12 A kind of storage state monitoring method suitable for virtual machine

Country Status (1)

Country Link
CN (1) CN104268061B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550012A (en) * 2015-12-07 2016-05-04 国云科技股份有限公司 Method for custom recovery of malfunctioning virtual machine
WO2017190339A1 (en) * 2016-05-06 2017-11-09 华为技术有限公司 Fault processing method and device
CN106339297B (en) * 2016-09-14 2020-10-02 郑州云海信息技术有限公司 Method and system for real-time alarming of storage system fault
CN108108255A (en) * 2016-11-25 2018-06-01 中兴通讯股份有限公司 The detection of virtual-machine fail and restoration methods and device
CN107656845A (en) * 2017-09-18 2018-02-02 国云科技股份有限公司 A kind of virtual machine high availability method
CN107888689B (en) * 2017-11-16 2019-04-30 无锡地铁集团有限公司 Locking resource allocation method based on shared storage
CN107957930A (en) * 2017-11-22 2018-04-24 国云科技股份有限公司 A kind of monitoring method of host node storage space
CN109144412A (en) * 2018-07-26 2019-01-04 郑州云海信息技术有限公司 A kind of iSCSI adapter batch scanning method and system
CN109560963A (en) * 2018-11-23 2019-04-02 北京车和家信息技术有限公司 Monitoring alarm method, system and computer readable storage medium
CN110442497B (en) * 2019-07-03 2023-01-06 苏州浪潮智能科技有限公司 Method, device and readable medium for alarming storage state of virtualization system
CN111124275B (en) * 2019-11-15 2022-10-18 苏州浪潮智能科技有限公司 Monitoring service optimization method and device of distributed block storage system
CN111488321A (en) * 2020-03-05 2020-08-04 北京联创信安科技股份有限公司 Management system for storage volume
CN113220409A (en) * 2021-02-01 2021-08-06 浪潮云信息技术股份公司 Virtual machine monitoring system and method
CN115442269A (en) * 2022-09-01 2022-12-06 中国银行股份有限公司 Block chain-based network connectivity monitoring method and device
CN115766382A (en) * 2022-10-21 2023-03-07 济南浪潮数据技术有限公司 Cloud computing platform-based inspection method, system, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833334A (en) * 2012-08-23 2012-12-19 广东电子工业研究院有限公司 Logical volume management method
JP2012252703A (en) * 2011-06-01 2012-12-20 Hon Hai Precision Industry Co Ltd Virtual machine monitoring system and monitoring method thereof
CN103699474A (en) * 2012-09-27 2014-04-02 鸿富锦精密工业(深圳)有限公司 Storage equipment monitoring system and method
CN103729280A (en) * 2013-12-23 2014-04-16 国云科技股份有限公司 High availability mechanism for virtual machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012252703A (en) * 2011-06-01 2012-12-20 Hon Hai Precision Industry Co Ltd Virtual machine monitoring system and monitoring method thereof
CN102833334A (en) * 2012-08-23 2012-12-19 广东电子工业研究院有限公司 Logical volume management method
CN103699474A (en) * 2012-09-27 2014-04-02 鸿富锦精密工业(深圳)有限公司 Storage equipment monitoring system and method
CN103729280A (en) * 2013-12-23 2014-04-16 国云科技股份有限公司 High availability mechanism for virtual machine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"云计算环境下的资源监控应用研究";张仲妹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130815(第08(2013)期);第I140-344页,正文第22页 *

Also Published As

Publication number Publication date
CN104268061A (en) 2015-01-07

Similar Documents

Publication Publication Date Title
CN104268061B (en) A kind of storage state monitoring method suitable for virtual machine
TWI746512B (en) Physical machine fault classification processing method and device, and virtual machine recovery method and system
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN102937930B (en) Application program monitoring system and method
CN102111310B (en) Method and system for monitoring content delivery network (CDN) equipment status
CN102662821B (en) Method, device and system for auxiliary diagnosis of virtual machine failure
CN105095001B (en) Virtual machine abnormal restoring method under distributed environment
CN103607297A (en) Fault processing method of computer cluster system
CN104731580A (en) Automation operation and maintenance system based on Karaf and ActiveMQ and implement method thereof
CN103812699A (en) Monitoring management system based on cloud computing
CN103605722A (en) Method, device and equipment for database monitoring
CN103067209B (en) A kind of heartbeat module self-sensing method
CN106301823A (en) The fault alarming method of a kind of key component, device and big data management system
CN106021070A (en) Method and device for server cluster monitoring
CN112601216B (en) Zigbee-based trusted platform alarm method and system
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
CN102609350A (en) Server memory failure alarm method
CN112799909A (en) Automatic management system and method for server
CN105574590A (en) Adaptive general control disaster recovery switching device and system, and signal generation method
CN111143167A (en) Alarm merging method, device, equipment and storage medium for multiple platforms
CN104794041A (en) Method for monitoring active state of array card for Linux server and device of method
CN103605592A (en) Mechanism of detecting malfunctions of distributed computer system
CN105607973A (en) Method, device and system for processing equipment failures in virtual machine system
CN108241565A (en) A kind of system and method for being used to implement application system automation O&M
CN109460311A (en) The management method and device of firmware abnormality

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province

Patentee after: G-Cloud Technology Co., Ltd.

Address before: 523808 No. 14 Building, Songke Garden, Songshan Lake Science and Technology Industrial Park, Dongguan City, Guangdong Province

Patentee before: G-Cloud Technology Co., Ltd.

CP02 Change in the address of a patent holder