CN104394194A - Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform - Google Patents

Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform Download PDF

Info

Publication number
CN104394194A
CN104394194A CN201410602814.2A CN201410602814A CN104394194A CN 104394194 A CN104394194 A CN 104394194A CN 201410602814 A CN201410602814 A CN 201410602814A CN 104394194 A CN104394194 A CN 104394194A
Authority
CN
China
Prior art keywords
platform
paas platform
fault
cloud system
paas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410602814.2A
Other languages
Chinese (zh)
Inventor
杨辰业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201410602814.2A priority Critical patent/CN104394194A/en
Publication of CN104394194A publication Critical patent/CN104394194A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks

Abstract

The invention relates to a cloud system operation and maintenance monitoring method and a system based on a platform-as-a-service (PaaS) platform. The method comprises a first step of establishing a PaaS platform based on a cloud system structure; a second step of predefining multiple emergency channels for processing server faults on the established PaaS platform; a third step of collecting operating data of the cloud system through the PaaS platform, acquiring fault information of a server from the collected operating data, and evaluating the fault type of the fault information of the server; and a fourth step of selecting the predefined emergency channel to process the server fault by the PaaS platform according to different types of server faults. Through adoption of the method, the disadvantages of conventional cloud system monitoring system such as complex structure, high cost and incapability of processing faults efficiently are overcome.

Description

A kind of cloud system O&M method for supervising based on PaaS platform and system
Technical field
The present invention relates to Computer Science and Technology field, particularly relate to a kind of cloud system O&M method for supervising based on PaaS platform and system.
Background technology
The monitor procedure of traditional IT system O&M method for supervising is as follows:
1) supervisory control system of system background constantly monitors the cpu of detected system, internal memory, the indexs such as storage, report to the police in this supervisory control system when reaching threshold values, process when supervisory control system sees above-mentioned warning, such as make a phone call to engineer, or this monitor staff a part of work that can complete by oneself.
2), when pinpointing the problems, traditional maintenance and monitor supervision platform can only be accomplished to pinpoint the problems, and cannot deal with problems.
Under this traditional O&M monitoring mode, need to build supervisory control system in addition, comprise the software systems such as hardware facility and network management monitoring such as main frame, network, storage, also need special monitor staff supervision on duty in addition.So traditional approach brings several problem that cannot avoid:
1) Cost Problems of supervisory control system is built separately;
2) Cost Problems of operator on duty;
3) after going wrong, to by the recovery problem of operational system.
Summary of the invention
Technical problem to be solved by this invention is to provide a kind of cloud system O&M method for supervising based on PaaS platform and system, and for solving, conventional cloud system monitoring system constructing is complicated, cost is high, can not the effective problem such as handling failure.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of cloud system O&M method for supervising based on Paas platform, comprising:
Step 1, based on cloud system architecture construction PaaS platform;
Step 2, in the PaaS platform of building, predefine is for the treatment of the various emergency passage of server failure;
Step 3, gathers the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Step 4, PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
On the basis of technique scheme, the present invention can also do following improvement.
Further, in PaaS platform, predefined various emergency passage comprises: the automatic treatment channel of alarming short message passage, platform and propelling movement processing terminal passage.
Further, in described step 4, if the type of described server failure is cannot handling failure, then PaaS platform selects alarming short message passage, and fault message is sent to attendant.
Further, in described step 4, if the type of described server failure is the fault that platform can process automatically, then PaaS platform selects the automatic treatment channel of platform, and the corresponding API calling PaaS platform carries out troubleshooting.
Further, the fault that platform can process automatically comprises the fault that load reaches threshold value, load is reached to the fault of threshold value, and PaaS platform selects the automatic treatment channel of platform, calls dynamic conditioning or migration that corresponding API carries out load.
Further, described propelling movement processing terminal passage, it sets up the connection of processing terminal for the treatment of particular type of service device fault and described PaaS platform in advance, when there is respective server fault, PaaS platform is selected to push processing terminal passage, server failure is pushed to respective handling terminal.
Further, described cloud system framework adopts the gae platform of Sina sae cloud platform or google.
Technical scheme of the present invention also comprises a kind of cloud system O&M supervisory control system based on Paas platform, comprising:
PaaS platform builds module, for based on cloud system architecture construction PaaS platform;
Escape truck builds module, for predefine in the PaaS platform of building for the treatment of the various emergency passage of server failure;
Failure modes module, for being gathered the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Fault processing module, for transferring PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
Further, in PaaS platform, predefined various emergency passage comprises: the automatic treatment channel of alarming short message passage, platform and propelling movement processing terminal passage;
Described alarming short message passage, in the type of server failure be cannot handling failure time, fault message is sent to attendant;
The automatic treatment channel of described platform is the fault that platform can process automatically for the type at server failure, and the corresponding API calling PaaS platform carries out troubleshooting;
Described propelling movement processing terminal passage, processing the processing terminal of particular type of service device fault and the connection of described PaaS platform for setting up in advance, when there is respective server fault, server failure being pushed to respective handling terminal.
Further, described processing terminal comprises smart mobile phone, notebook computer and/or desktop computer.
The invention has the beneficial effects as follows: present invention reduces cloud system monitoring maintenance cost, the monitoring of general cloud system needs to set up special messenger to safeguard at the scene, but people does not still reach effect at the scene at the scene by using this method to accomplish.Invention further reduces fault handling time, by failure modes, process in time, also utilize the configuration of the feature automatic dynamic adjustment software and hardware of cloud computing, such as elasticity increases or reduces hardware resource, can reach power saving, energy-conservation, save the effect such as refrigeration.Generally speaking, be conducive to availability and the stability of increase business cloud system, ensure the availability of system, to increase the competitiveness of company, reduce working strength and the threshold of monitoring and attendant.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of a kind of cloud system O&M method for supervising based on Paas platform of the present invention;
Fig. 2 is the structural representation of a kind of cloud system O&M supervisory control system based on Paas platform of the present invention;
Fig. 3 is the logical schematic of PaaS platform deal with data in embodiment;
Fig. 4 is physical equipment deployment diagram when utilizing processing terminal in embodiment.
Embodiment
Be described principle of the present invention and feature below in conjunction with accompanying drawing, example, only for explaining the present invention, is not intended to limit scope of the present invention.
As shown in Figure 1, this gives a kind of cloud system O&M method for supervising based on Paas platform, comprising:
Step 1, based on cloud system architecture construction PaaS platform;
Step 2, in the PaaS platform of building, predefine is for the treatment of the various emergency passage of server failure;
Step 3, gathers the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Step 4, PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
In the present embodiment, described cloud system framework adopts the gae platform of Sina sae cloud platform or google, and it is the system based on cloud computing, can the operation of round-the-clock 7*24 hour, and the PaaS platform of building is referred to as AppEngine.
The present embodiment proposes the new O&M pattern of a kind of cloud system monitoring, can processing server fault intelligently, and handling failure process need evaluating server fault type, emergency processing is carried out in classification.In PaaS platform, predefined various emergency passage mainly comprises three kinds: the automatic treatment channel of alarming short message passage, platform and propelling movement processing terminal passage.
Described alarming short message passage, in the type of server failure be cannot handling failure time, fault message is sent to attendant.
The automatic treatment channel of described platform is the fault that platform can process automatically for the type at server failure, and the corresponding API calling PaaS platform carries out troubleshooting.
Described propelling movement processing terminal passage, processing the processing terminal of particular type of service device fault and the connection of described PaaS platform for setting up in advance, when there is respective server fault, server failure being pushed to respective handling terminal.
Particularly, if the type of described server failure is cannot handling failure, then PaaS platform selects alarming short message passage, and fault message is sent to attendant.
If the type of described server failure is the fault that platform can process automatically, then PaaS platform selects the automatic treatment channel of platform, and the corresponding API calling PaaS platform carries out troubleshooting.
If the fault that the type of described server failure is platform can be processed automatically is the treatable fault of processing terminal of setting, then described propelling movement processing terminal passage sets up the connection of processing terminal for the treatment of particular type of service device fault and described PaaS platform in advance, when there is respective server fault, PaaS platform is selected to push processing terminal passage, server failure is pushed to respective handling terminal.
In the present embodiment, described processing terminal comprises the smart machines such as smart mobile phone, notebook computer and/or desktop computer, and being convenient to attendant can timely processing server fault under circumstances.
Accordingly, as shown in Figure 2, the present embodiment gives a kind of cloud system O&M supervisory control system based on Paas platform, comprising:
PaaS platform builds module, for based on cloud system architecture construction PaaS platform;
Escape truck builds module, for predefine in the PaaS platform of building for the treatment of the various emergency passage of server failure;
Failure modes module, for being gathered the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Fault processing module, for transferring PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
Much more no longer this cloud system O&M supervisory control system is consistent with the specific implementation process of above-mentioned cloud system O&M method for supervising, to state here.
As shown in Figure 3, the present embodiment carries out O&M monitoring main image data, Data classification, coupling passage three parts respectively, that is:
1) by the api of PaaS platform, cloud platform service condition is gathered, as cpu, disk, network, volume size etc.
2) analyze the data gathered, data be divided three classes, corresponding selection escape truck:
The first kind, data fault cannot process, and enters alert program, and at this moment system is sent to the corresponding contact method of Maintenance Development personnel of specifying by special alarming short message passage.
Equations of The Second Kind, what can automatically process by analysis is similar to the fault that load reaches threshold values, at this moment can the api of Automatically invoked PaaS, and dynamic conditioning or migration are carried out in the load etc. of respective application.
3rd class, system direct supplying system troubleshooting desktop is to the corresponding director of processing terminal, and at this moment corresponding director processes corresponding fault by mobile device, desktop computer, traditional notebook.
As shown in Figure 4, for the 3rd class situation, its physical equipment is disposed and three platform series connection is carried out monitoring and safeguarding.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1., based on a cloud system O&M method for supervising for Paas platform, it is characterized in that, comprising:
Step 1, based on cloud system architecture construction PaaS platform;
Step 2, in the PaaS platform of building, predefine is for the treatment of the various emergency passage of server failure;
Step 3, gathers the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Step 4, PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
2. a kind of cloud system O&M method for supervising based on Paas platform according to claim 1, it is characterized in that, in PaaS platform, predefined various emergency passage comprises: the automatic treatment channel of alarming short message passage, platform and propelling movement processing terminal passage.
3. a kind of cloud system O&M method for supervising based on Paas platform according to claim 2, it is characterized in that, in described step 4, if the type of described server failure is cannot handling failure, then PaaS platform selects alarming short message passage, and fault message is sent to attendant.
4. a kind of cloud system O&M method for supervising based on Paas platform according to claim 2, it is characterized in that, in described step 4, if the type of described server failure is the fault that platform can process automatically, then PaaS platform selects the automatic treatment channel of platform, and the corresponding API calling PaaS platform carries out troubleshooting.
5. a kind of cloud system O&M method for supervising based on Paas platform according to claim 4, it is characterized in that, the fault that platform can process automatically comprises the fault that load reaches threshold value, load is reached to the fault of threshold value, PaaS platform selects the automatic treatment channel of platform, calls dynamic conditioning or migration that corresponding API carries out load.
6. a kind of cloud system O&M method for supervising based on Paas platform according to claim 2, it is characterized in that, described propelling movement processing terminal passage, it sets up the connection of processing terminal for the treatment of particular type of service device fault and described PaaS platform in advance, when there is respective server fault, PaaS platform is selected to push processing terminal passage, server failure is pushed to respective handling terminal.
7. a kind of cloud system O&M method for supervising based on Paas platform according to claim 1, is characterized in that, described cloud system framework adopts the gae platform of Sina sae cloud platform or google.
8., based on a cloud system O&M supervisory control system for Paas platform, it is characterized in that, comprising:
PaaS platform builds module, for based on cloud system architecture construction PaaS platform;
Escape truck builds module, for predefine in the PaaS platform of building for the treatment of the various emergency passage of server failure;
Failure modes module, for being gathered the service data of cloud system by PaaS platform, and obtains server failure information from the service data gathered, then carries out the fault type assessment of server failure information;
Fault processing module, for transferring PaaS platform, for dissimilar server failure, selects predefined application passage to carry out processing server fault.
9. a kind of cloud system O&M supervisory control system based on Paas platform according to claim 8, it is characterized in that, in PaaS platform, predefined various emergency passage comprises: the automatic treatment channel of alarming short message passage, platform and propelling movement processing terminal passage;
Described alarming short message passage, in the type of server failure be cannot handling failure time, fault message is sent to attendant;
The automatic treatment channel of described platform is the fault that platform can process automatically for the type at server failure, and the corresponding API calling PaaS platform carries out troubleshooting;
Described propelling movement processing terminal passage, processing the processing terminal of particular type of service device fault and the connection of described PaaS platform for setting up in advance, when there is respective server fault, server failure being pushed to respective handling terminal.
10. a kind of cloud system O&M supervisory control system based on Paas platform according to claim 9, it is characterized in that, described processing terminal comprises smart mobile phone, notebook computer and/or desktop computer.
CN201410602814.2A 2014-10-31 2014-10-31 Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform Pending CN104394194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410602814.2A CN104394194A (en) 2014-10-31 2014-10-31 Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410602814.2A CN104394194A (en) 2014-10-31 2014-10-31 Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform

Publications (1)

Publication Number Publication Date
CN104394194A true CN104394194A (en) 2015-03-04

Family

ID=52612029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410602814.2A Pending CN104394194A (en) 2014-10-31 2014-10-31 Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform

Country Status (1)

Country Link
CN (1) CN104394194A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106209920A (en) * 2016-09-19 2016-12-07 贵州白山云科技有限公司 The safety protecting method of a kind of dns server and device
CN106502857A (en) * 2015-09-07 2017-03-15 上海隆通网络系统有限公司 A kind of intellectual analysis interference method and system in IT operation management system
CN106897193A (en) * 2017-02-28 2017-06-27 郑州云海信息技术有限公司 A kind of monitoring and operation managing system of the cloud data center based on ITIL
CN107295311A (en) * 2017-08-03 2017-10-24 国网新疆电力公司疆南供电公司 The operation management system of video monitoring platform
CN107454613A (en) * 2017-09-06 2017-12-08 上海斐讯数据通信技术有限公司 The optimization method and system of a kind of wireless network
CN107659422A (en) * 2016-07-25 2018-02-02 中兴通讯股份有限公司 A kind of fault message querying method and device
CN107968816A (en) * 2017-11-13 2018-04-27 国云科技股份有限公司 A kind of method that cloud platform is built using mobile terminal
CN108632057A (en) * 2017-03-17 2018-10-09 华为技术有限公司 A kind of fault recovery method of cloud computing server, device and management system
CN112783625A (en) * 2021-01-18 2021-05-11 北京思特奇信息技术股份有限公司 Emergency multi-task short message processing group sending system and method
CN114866546A (en) * 2022-04-20 2022-08-05 北京红山信息科技研究院有限公司 PaaS-based one-stop management system for monitoring platform
CN115061839A (en) * 2022-04-12 2022-09-16 南京信易达计算技术有限公司 High-performance platform monitoring operation and maintenance system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167004A (en) * 2011-12-15 2013-06-19 中国移动通信集团上海有限公司 Cloud platform host system fault correcting method and cloud platform front control server
CN103778031A (en) * 2014-01-15 2014-05-07 华中科技大学 Distributed system multilevel fault tolerance method under cloud environment
CN103986623A (en) * 2014-05-28 2014-08-13 山东超越数控电子有限公司 Automatic hardware equipment monitoring system based on domestic operating system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167004A (en) * 2011-12-15 2013-06-19 中国移动通信集团上海有限公司 Cloud platform host system fault correcting method and cloud platform front control server
CN103778031A (en) * 2014-01-15 2014-05-07 华中科技大学 Distributed system multilevel fault tolerance method under cloud environment
CN103986623A (en) * 2014-05-28 2014-08-13 山东超越数控电子有限公司 Automatic hardware equipment monitoring system based on domestic operating system

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502857A (en) * 2015-09-07 2017-03-15 上海隆通网络系统有限公司 A kind of intellectual analysis interference method and system in IT operation management system
CN107659422A (en) * 2016-07-25 2018-02-02 中兴通讯股份有限公司 A kind of fault message querying method and device
CN107659422B (en) * 2016-07-25 2021-06-15 中兴通讯股份有限公司 Fault information query method and device
CN106209920B (en) * 2016-09-19 2019-11-22 贵州白山云科技股份有限公司 A kind of safety protecting method and device of dns server
CN106209920A (en) * 2016-09-19 2016-12-07 贵州白山云科技有限公司 The safety protecting method of a kind of dns server and device
CN106897193A (en) * 2017-02-28 2017-06-27 郑州云海信息技术有限公司 A kind of monitoring and operation managing system of the cloud data center based on ITIL
CN108632057A (en) * 2017-03-17 2018-10-09 华为技术有限公司 A kind of fault recovery method of cloud computing server, device and management system
CN107295311A (en) * 2017-08-03 2017-10-24 国网新疆电力公司疆南供电公司 The operation management system of video monitoring platform
CN107454613A (en) * 2017-09-06 2017-12-08 上海斐讯数据通信技术有限公司 The optimization method and system of a kind of wireless network
CN107968816B (en) * 2017-11-13 2020-10-27 国云科技股份有限公司 Method for building cloud platform by using mobile terminal
CN107968816A (en) * 2017-11-13 2018-04-27 国云科技股份有限公司 A kind of method that cloud platform is built using mobile terminal
CN112783625A (en) * 2021-01-18 2021-05-11 北京思特奇信息技术股份有限公司 Emergency multi-task short message processing group sending system and method
CN112783625B (en) * 2021-01-18 2023-10-24 北京思特奇信息技术股份有限公司 Emergency multitasking short message group sending system and method
CN115061839A (en) * 2022-04-12 2022-09-16 南京信易达计算技术有限公司 High-performance platform monitoring operation and maintenance system and method
CN114866546A (en) * 2022-04-20 2022-08-05 北京红山信息科技研究院有限公司 PaaS-based one-stop management system for monitoring platform
CN114866546B (en) * 2022-04-20 2024-03-22 北京红山信息科技研究院有限公司 PaaS-based one-stop management system for monitoring platform

Similar Documents

Publication Publication Date Title
CN104394194A (en) Cloud system operation and maintenance monitoring method and system based on platform-as-a-service (PaaS) platform
CN104469305B (en) The fault detection method and device of power network video monitoring device
CN104038373B (en) information early warning and self-repairing system and method
CN104410163B (en) A kind of safety in production based on electric energy management system and power-economizing method
CN103986604A (en) Method and device for locating network fault
CN103812675A (en) Method and system for realizing allopatric disaster recovery switching of service delivery platform
CN110891283A (en) Small base station monitoring device and method based on edge calculation model
CN104021438A (en) Method for monitoring physical equipment in business system based on business model and device thereof
CN103490919A (en) Fault management system and fault management method
CN105871581A (en) Method and device for processing of alarm information in cloud calculation
CN106357469A (en) Dynamic adjustment method and device of resource monitoring mode
CN112711493A (en) Scenario root cause analysis application
CN103366245B (en) Electric network fault information issuing method based on OSB bus and system
CN103763143A (en) Method and system for equipment abnormality alarming based on storage server
CN103824017A (en) Method and platform for monitoring rogue programs
CN105553766A (en) Monitoring method of abnormal node dynamic tracking cluster node state
CN105025179A (en) Method and system for monitoring service agents of call center
KR101433045B1 (en) System and method for detecting error beforehand
CN110209497B (en) Method and system for dynamically expanding and shrinking host resource
CN107820051A (en) Monitoring system and its monitoring method and device
CN115102838B (en) Emergency processing method and device for server downtime risk and electronic equipment
CN105656700B (en) A kind of distributing computer room comprehensively monitoring and automatic emergency decision-making treatment method and device
CN108021463B (en) GPU fault management method based on finite-state machine
CN103457792A (en) Fault detection method and fault detection device
CN104658053A (en) Automatic inspection auxiliary system for running state of dispatching control system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150304

RJ01 Rejection of invention patent application after publication