CN104268061B - A kind of storage state monitoring method suitable for virtual machine - Google Patents
A kind of storage state monitoring method suitable for virtual machine Download PDFInfo
- Publication number
- CN104268061B CN104268061B CN201410464913.9A CN201410464913A CN104268061B CN 104268061 B CN104268061 B CN 104268061B CN 201410464913 A CN201410464913 A CN 201410464913A CN 104268061 B CN104268061 B CN 104268061B
- Authority
- CN
- China
- Prior art keywords
- monitoring
- host
- network storage
- failure
- virtual machine
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.Method of the present invention is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end;Wherein, the state for reading the carry network storage equipment is thereon responsible for by host;The state for detecting the network storage equipment on host is responsible at monitoring service end, and sends alarm when failure occurs to monitoring management end;Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host.The present invention is solved;Can be used in the storage state of virtual machine.
Description
Technical field
The present invention relates to field of cloud calculation, particularly a kind of storage state monitoring method suitable for virtual machine.
Background technology
In large-scale data center deployment cloud platform, centralised storage is mounted to host's physics by network often
Machine, host's physical machine reuses the memory space of carry to be used for creating virtual machine.In this case, when centralised storage occurs
After failure or power interruption recovering, virtual machine can be in the state of a kind of " read-only ", i.e. virtual machine by the management such as hypervisor
Device can also view state and be up;But any operation for writing data can not be done in fact to virtual machine, and virtual machine is to be in
The state of failure.This failure has to find in time to detect and notify keeper, certainly will otherwise affect virtual machine user industry
The normal operation of business.In general, two kinds of means will be monitored to this failure:
1st, the operation operation system of virtual machine is alerted, and the user of virtual machine reinforms the keeper of cloud platform after receiving alarm;
2nd, state-detection is carried out using the instrument that network storage manufacturer provides, alerted when the state of detecting is read-only;
But the drawbacks of these methods there is also following:
1st, the virtual machine in read-only status, the alarm function of operation system normally can not may run, this notice
Mode is also not in time.
2nd, notified using operation system, virtual machine can only be navigated to, it is impossible to navigated to the network storage equipment exactly;
The instrument provided using network storage manufacturer can only navigate to equipment, it is impossible to navigate to virtual machine exactly;Can neither carry
For complete information, even automatically processed with allowing failure timely to be processed.
Content of the invention
Present invention solves the technical problem that being to provide a kind of storage state monitoring method suitable for virtual machine, can monitor
The state of the network storage, monitors failure in time, while providing the corresponding informance of complete virtual machine and the network storage equipment, is net
The timely recovery of network storage even recovers to provide support automatically.
The present invention solves the technical scheme of above-mentioned technical problem:Described method is by host, the network storage equipment, prison
Realize a few parts of control service end and monitoring management end;Wherein, host is responsible for reading the carry network storage equipment thereon
State;The state for detecting the network storage equipment on host is responsible at monitoring service end, and when failure occurs to monitoring management
End sends alarm;Monitoring management end is responsible for receiving and fault warning is processed, most after send troubleshooting instruction to host;
Realize that step is included:
Step 1:Cloud platform keeper passes through monitoring management end configuration monitoring parameter, including assay intervals, number of retries;
Step 2:Monitoring service end read monitoring parameter, start timed thread monitoring network storage device status and with
Network storage equipment state of the logic roll form carry in host;
Step 3:If monitoring service end monitors network storage equipment failure, logical volume carry abnormal state, to monitoring
Management end sends alarm event;
Step 4:Monitoring management termination by event and is processed, if network storage equipment failure, is then sent out to keeper
Send alarm;If logical volume carry abnormal state, then the instruction that carry storage volume and virtual machine are restarted again is sent to host
To recover from failure;
Described assay intervals refer to that the interval duration for once checking is made to the state of the network storage in monitoring service end;
Described number of retries then refer to produce network storage event of failure before should continuously repeat checked time
Number, to guarantee that failure is credible;
The described network storage equipment is locally stored relative to host, refers to and can provide storage resource by network
To the centralized storage that host is used;
Described host refers to the server node for being provided with virtual machine management program, can create multiple stage virtual machine thereon;
Described monitoring management end includes user interactive module, event of failure message module, host interactive module;
Described user interactive module is responsible for externally providing application interface, receives the monitoring ginseng of cloud platform Administrator
Number, and parameter information is preserved to database;
Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm
Module;Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and connects
The alarm event that monitored service end active reporting comes;Event of failure alarm submodule is then responsible for the process of alarm event, with hand
Event of failure is informed cloud platform keeper or virtual machine owning user by the mode of machine note or network mail;
Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed;
Described troubleshooting instruction refers to the carry network storage again and restarts the virtual machine that sets up in the network storage;
Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host
Virtual machine above and the real time status information with the network storage of host interconnection, and information is returned to monitoring management end.
The present invention program has the beneficial effect that:
1st, method of the present invention real-time is higher, as the event of failure needs not move through the inspection of service system running state
Survey, therefore can just be detected early stage failure occurs;
2nd, the method for the present invention combines host, can inquire about the corresponding relation of the network storage and virtual machine at any time, therefore
The event of failure of generation has stronger readability, can be accurately positioned the source of trouble;
3rd, the automatic recovered part failure of method of the present invention energy, due to combining host, therefore can pass through host extensive
Extremely this common failure of multiple carry.
Description of the drawings
The present invention is further described below in conjunction with the accompanying drawings:
Fig. 1 is the module map of the present invention.
Specific embodiment
Cloud platform keeper first logs into the network storage equipment configuration prison that cloud platform page interface is used by virtual machine
Control parameter, alarm recipient etc. when being frequency, fault detect number of retries, send alarm including assay intervals;Then pass through cloud
Platform message communication mechanism passes to monitoring management end, during the latter preserves these parameter informations to the corresponding table of database, portion
Divide and realize that code is as follows:
When monitoring management end starts, reading database respective table obtains the parameter information such as monitoring frequency and number of retries, and
Realize that end sends request to all registered monitoring, notify them to start network storage timing scan thread and report storage letter
Cease to storage service end and event of failure messenger service end, partly realize that code is as follows:
When monitoring realizes that end starts, detect that monitoring management end whether there is by cloud platform internal communication mechanism first, if
Do not exist, do not start the timed thread of the monitoring network storage equipment;If existing, scanning obtains the information of the network storage equipment,
And the storage information of acquisition is reported to monitoring management end, while to monitoring management end acquisition request monitoring frequency and retry secondary
Number, for starting timed thread, partial code is as follows:
Monitoring realizes that the periodic detection network storage, by timed thread, is responsible for by the order such as iscsi, iscsiadm in end
State, if continuous several times find storage in abnormality, produces network storage abnormality alarming event and sends to monitoring clothes
The event of failure message module at business end.Partial code is as follows:
Event of failure message module receives alarm and produces alarm to cloud platform management end or virtual machine user in time, while producing
A raw fault recovery is recorded and is preserved to database.
The host interactive module reading database at management service end obtains fault recovery record, sends failure to host
Recover order, the carry network storage and restart virtual machine of the foundation in the network storage again.
In the present invention, logical volume is the related notion of LVM logical partitions management, and LVM is to disk partition under Linux environment
The unification of multiple physics block devices can be become a big logical device, be carried with this by a kind of mechanism being managed
The flexibility of high disk partition management.Failure and abnormal referring to are led because of situations such as network storage equipment generation power-off or suspension
The host of cause cannot be successfully connected the network storage or be mounted to the local network storage in read-only status.
Claims (1)
1. a kind of storage state monitoring method suitable for virtual machine, it is characterised in that:
Described method is realized by a few parts in host, the network storage equipment, monitoring service end and monitoring management end;Wherein, place
The state for reading the carry network storage equipment is thereon responsible for by main frame;It is responsible for detecting that the network on host is deposited in monitoring service end
The state of storage equipment, and alarm is sent when failure occurs to monitoring management end;It is responsible for receiving fault warning simultaneously in monitoring management end
Processed, most after to host send troubleshooting instruction;
Realize that step is included:
Step 1:Cloud platform keeper passes through monitoring management end configuration monitoring parameter, including assay intervals, number of retries;
Step 2:Monitoring parameter is read at monitoring service end, starts timed thread monitoring network storage device status and with logic
Network storage equipment state of the roll form carry in host;
Step 3:If monitoring service end monitors network storage equipment failure, logical volume carry abnormal state, to monitoring management
End sends alarm event;
Step 4:Monitoring management termination by event and is processed, if network storage equipment failure, is then sent to keeper and is accused
Alert;If logical volume carry abnormal state, then to host send carry storage volume and virtual machine are restarted again instruction with from
Recover in failure;
Described assay intervals refer to that the interval duration for once checking is made to the state of the network storage in monitoring service end;
Described number of retries then refers to and should continuously repeat the number of times that checked before network storage event of failure is produced, with
Guarantee that failure is credible;
The described network storage equipment is locally stored relative to host, refers to and storage resource can be supplied to place by network
The centralized storage that main frame is used;
Described host refers to the server node for being provided with virtual machine management program, can create multiple stage thereon virtual
Machine;
Described monitoring management end includes user interactive module, event of failure message module, host interactive module;
Described user interactive module is responsible for externally providing application interface, receives the monitoring parameter of cloud platform Administrator, and
Parameter information is preserved to database;
Described event of failure message module further includes monitoring service end interaction submodule and event of failure alarm submodule;
Monitoring service end interaction submodule is used for for the monitoring parameter information of Administrator being sent to monitoring service end, and receives monitoring
The alarm event that service end active reporting comes;Event of failure alarm submodule is then responsible for the process of alarm event, with SMS
Or event of failure is informed cloud platform keeper or virtual machine owning user by the mode of network mail;
Described host interactive module is responsible for sending troubleshooting instruction to host and other interrelated logics are processed;Described
Troubleshooting instruction refer to the carry network storage again and restart virtual machine of the foundation in the network storage;
Described monitoring service end refers to and is deployed in service processes above host node, is responsible for collecting host, host
Virtual machine above machine and the real time status information with the network storage of host interconnection, and information is returned to monitoring management
End.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410464913.9A CN104268061B (en) | 2014-09-12 | 2014-09-12 | A kind of storage state monitoring method suitable for virtual machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410464913.9A CN104268061B (en) | 2014-09-12 | 2014-09-12 | A kind of storage state monitoring method suitable for virtual machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104268061A CN104268061A (en) | 2015-01-07 |
CN104268061B true CN104268061B (en) | 2017-03-15 |
Family
ID=52159584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410464913.9A Active CN104268061B (en) | 2014-09-12 | 2014-09-12 | A kind of storage state monitoring method suitable for virtual machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104268061B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550012A (en) * | 2015-12-07 | 2016-05-04 | 国云科技股份有限公司 | Method for custom recovery of malfunctioning virtual machine |
WO2017190339A1 (en) * | 2016-05-06 | 2017-11-09 | 华为技术有限公司 | Fault processing method and device |
CN106339297B (en) * | 2016-09-14 | 2020-10-02 | 郑州云海信息技术有限公司 | Method and system for real-time alarming of storage system fault |
CN108108255A (en) * | 2016-11-25 | 2018-06-01 | 中兴通讯股份有限公司 | The detection of virtual-machine fail and restoration methods and device |
CN107656845A (en) * | 2017-09-18 | 2018-02-02 | 国云科技股份有限公司 | A kind of virtual machine high availability method |
CN107888689B (en) * | 2017-11-16 | 2019-04-30 | 无锡地铁集团有限公司 | Locking resource allocation method based on shared storage |
CN107957930A (en) * | 2017-11-22 | 2018-04-24 | 国云科技股份有限公司 | A kind of monitoring method of host node storage space |
CN109144412A (en) * | 2018-07-26 | 2019-01-04 | 郑州云海信息技术有限公司 | A kind of iSCSI adapter batch scanning method and system |
CN109560963A (en) * | 2018-11-23 | 2019-04-02 | 北京车和家信息技术有限公司 | Monitoring alarm method, system and computer readable storage medium |
CN110442497B (en) * | 2019-07-03 | 2023-01-06 | 苏州浪潮智能科技有限公司 | Method, device and readable medium for alarming storage state of virtualization system |
CN111124275B (en) * | 2019-11-15 | 2022-10-18 | 苏州浪潮智能科技有限公司 | Monitoring service optimization method and device of distributed block storage system |
CN111488321A (en) * | 2020-03-05 | 2020-08-04 | 北京联创信安科技股份有限公司 | Management system for storage volume |
CN113220409A (en) * | 2021-02-01 | 2021-08-06 | 浪潮云信息技术股份公司 | Virtual machine monitoring system and method |
CN115442269A (en) * | 2022-09-01 | 2022-12-06 | 中国银行股份有限公司 | Block chain-based network connectivity monitoring method and device |
CN115766382A (en) * | 2022-10-21 | 2023-03-07 | 济南浪潮数据技术有限公司 | Cloud computing platform-based inspection method, system, equipment and medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833334A (en) * | 2012-08-23 | 2012-12-19 | 广东电子工业研究院有限公司 | Logical volume management method |
JP2012252703A (en) * | 2011-06-01 | 2012-12-20 | Hon Hai Precision Industry Co Ltd | Virtual machine monitoring system and monitoring method thereof |
CN103699474A (en) * | 2012-09-27 | 2014-04-02 | 鸿富锦精密工业(深圳)有限公司 | Storage equipment monitoring system and method |
CN103729280A (en) * | 2013-12-23 | 2014-04-16 | 国云科技股份有限公司 | High availability mechanism for virtual machine |
-
2014
- 2014-09-12 CN CN201410464913.9A patent/CN104268061B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012252703A (en) * | 2011-06-01 | 2012-12-20 | Hon Hai Precision Industry Co Ltd | Virtual machine monitoring system and monitoring method thereof |
CN102833334A (en) * | 2012-08-23 | 2012-12-19 | 广东电子工业研究院有限公司 | Logical volume management method |
CN103699474A (en) * | 2012-09-27 | 2014-04-02 | 鸿富锦精密工业(深圳)有限公司 | Storage equipment monitoring system and method |
CN103729280A (en) * | 2013-12-23 | 2014-04-16 | 国云科技股份有限公司 | High availability mechanism for virtual machine |
Non-Patent Citations (1)
Title |
---|
"云计算环境下的资源监控应用研究";张仲妹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130815(第08(2013)期);第I140-344页,正文第22页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104268061A (en) | 2015-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104268061B (en) | A kind of storage state monitoring method suitable for virtual machine | |
TWI746512B (en) | Physical machine fault classification processing method and device, and virtual machine recovery method and system | |
CN106789306B (en) | Method and system for detecting, collecting and recovering software fault of communication equipment | |
CN102937930B (en) | Application program monitoring system and method | |
CN102111310B (en) | Method and system for monitoring content delivery network (CDN) equipment status | |
CN102662821B (en) | Method, device and system for auxiliary diagnosis of virtual machine failure | |
CN105095001B (en) | Virtual machine abnormal restoring method under distributed environment | |
CN103607297A (en) | Fault processing method of computer cluster system | |
CN104731580A (en) | Automation operation and maintenance system based on Karaf and ActiveMQ and implement method thereof | |
CN103812699A (en) | Monitoring management system based on cloud computing | |
CN103605722A (en) | Method, device and equipment for database monitoring | |
CN103067209B (en) | A kind of heartbeat module self-sensing method | |
CN106301823A (en) | The fault alarming method of a kind of key component, device and big data management system | |
CN106021070A (en) | Method and device for server cluster monitoring | |
CN112601216B (en) | Zigbee-based trusted platform alarm method and system | |
CN114356499A (en) | Kubernetes cluster alarm root cause analysis method and device | |
CN102609350A (en) | Server memory failure alarm method | |
CN112799909A (en) | Automatic management system and method for server | |
CN105574590A (en) | Adaptive general control disaster recovery switching device and system, and signal generation method | |
CN111143167A (en) | Alarm merging method, device, equipment and storage medium for multiple platforms | |
CN104794041A (en) | Method for monitoring active state of array card for Linux server and device of method | |
CN103605592A (en) | Mechanism of detecting malfunctions of distributed computer system | |
CN105607973A (en) | Method, device and system for processing equipment failures in virtual machine system | |
CN108241565A (en) | A kind of system and method for being used to implement application system automation O&M | |
CN109460311A (en) | The management method and device of firmware abnormality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder |
Address after: 523808 19th Floor, Cloud Computing Center, Chinese Academy of Sciences, No. 1 Kehui Road, Songshan Lake Hi-tech Industrial Development Zone, Dongguan City, Guangdong Province Patentee after: G-Cloud Technology Co., Ltd. Address before: 523808 No. 14 Building, Songke Garden, Songshan Lake Science and Technology Industrial Park, Dongguan City, Guangdong Province Patentee before: G-Cloud Technology Co., Ltd. |
|
CP02 | Change in the address of a patent holder |