CN104391777B - Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS - Google Patents
Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS Download PDFInfo
- Publication number
- CN104391777B CN104391777B CN201410635137.4A CN201410635137A CN104391777B CN 104391777 B CN104391777 B CN 104391777B CN 201410635137 A CN201410635137 A CN 201410635137A CN 104391777 B CN104391777 B CN 104391777B
- Authority
- CN
- China
- Prior art keywords
- service
- monitoring objective
- upstart
- cloud platform
- objective service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention provides a kind of cloud platform based on (SuSE) Linux OS and its operation and monitoring method and running monitor device.The operation and monitoring method of the wherein cloud platform based on (SuSE) Linux OS includes:Obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;Start upstart processes corresponding to monitoring objective service;After the crash event that upstart processes get monitoring objective service, the work of the recovery monitoring objective service of upstart processes is performed, to repair the monitoring objective service of collapse.Using scheme provided by the invention, when abnormal collapse occurs in service, it the service of restarting can be repaired automatically, without using existing Hot Spare mode, improve stability and the flexibility of cloud platform.
Description
Technical field
The present invention relates to computer realm, more particularly to a kind of cloud platform and its operation prison based on (SuSE) Linux OS
Prosecutor method and running monitor device.
Background technology
Cloud computing (Cloud Computing), it is a kind of calculation based on internet, in this way, shares
Software and hardware resources and information can be supplied to computer and other equipment on demand.Various cloud platforms (Cloud Platforms)
It is the architecture that application program is run in field of cloud calculation, as its name suggests, this platform allows developers or will finished writing
Program be placed on that " cloud " is inner to be run, or use " cloud " inner service provided.
At present, numerous Information technology enterprises enters field of cloud calculation one after another, is proposed respective cloud platform one after another, so
And and then the problem of these cloud platforms, is also constantly exposed.Such as between 2007 to 2008 years, Amazon cloud platform is a wide range of
Failure;2009, Microsoft's cloud platform collapse, cause serious consequence.Therefore, the stability of cloud platform how is protected to have become
Important topic in current cloud platform.
In traditional cloud environment, cloud platform service operation is on server, once there are service crashes, cloud platform is by nothing
Method normal operation.Although can be protected by high availability mechanism to cloud platform, the installation configuration of these protection mechanisms is cumbersome
And maintenance cost is higher, and occur once breaking down, positioning failure cause is difficult.Fig. 1 is that cloud platform takes in the prior art
The schematic diagram of the protection mechanism of business, as shown in figure 1, during cloud platform normal operation, once there is process collapse, high availability mechanism meeting
Automatically standby host is enabled, then goes to repair failure again, after troubleshooting, recovered cloud platform and run, this protection mechanism, though
The stable of cloud platform can be so protected, but needs to establish standby host, cost is higher, operation maintenance time length, if standby host
Performance is bad, can also cause normal use.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on
State the cloud platform based on (SuSE) Linux OS of problem and the operation and monitoring method of the cloud platform based on (SuSE) Linux OS
And device.
A purpose of the invention is to provide for a kind of side for improving the cloud platform operational reliability based on (SuSE) Linux OS
Method.
The purpose of entering of the present invention is to reduce the cost using standby host, the automatic service for repairing collapse.
A kind of according to an aspect of the invention, there is provided operation monitoring side of the cloud platform based on (SuSE) Linux OS
Method.The operation and monitoring method of the cloud platform based on (SuSE) Linux OS includes:The cloud based on (SuSE) Linux OS is obtained to put down
The monitoring objective service run in platform;Start upstart processes corresponding to monitoring objective service;Got in upstart processes
After the crash event of monitoring objective service, the work of the recovery monitoring objective service of upstart processes is performed, to repair collapse
Monitoring objective service.
Alternatively, upstart processes corresponding to startup monitoring objective service include:Perform according to monitoring objective service operation
There are entry condition and the closing of upstart processes defined in the upstart scripts that feature is write in advance, wherein upstart scripts
Condition, and the work of the crash event of response monitoring destination service.
Alternatively, performing the work of upstart process resumption monitoring objective services includes:Reset monitor destination service and/
Or recover the configuration of monitoring objective service.
Alternatively, also include after the work of upstart process resumption monitoring objective services is performed:Generate monitoring objective
The crash log of service.
Alternatively, generating the crash log of monitoring objective service includes:The crash info of monitoring objective service is obtained, and is write
Enter the running log of system, wherein crash info includes following any one or more:Collapse time, the prison of monitoring objective service
Control destination service collapse when processor running status, monitoring objective service crashes when internal memory running status.
Alternatively, operation has multiple monitoring objective services in cloud platform, and each monitoring objective service is corresponding with one and preset
Upstart processes, wherein monitoring objective service includes:Cloud storage service, cloud platform back-end services, cloud platform web service.
According to another aspect of the present invention, there is provided a kind of operation monitoring of cloud platform based on (SuSE) Linux OS
Device.The running monitor device of the cloud platform based on (SuSE) Linux OS includes:Target Acquisition module, it is configured to obtain cloud
The monitoring objective service run in platform;Process initiation module, it is configured to start monitoring objective service correspondingly based on Linux operations
The upstart processes of the cloud platform of system;Process resumption module, it is configured to get monitoring objective service in upstart processes
Crash event after, perform upstart processes recovery monitoring objective service work, with repair collapse monitoring objective take
Business.
Alternatively, process initiation module is additionally configured to:Perform what is write in advance according to monitoring objective service operation feature
There are the entry condition and closedown condition of upstart processes, and response prison defined in upstart scripts, wherein upstart scripts
The work of the crash event of destination service is controlled, work includes:Reset monitor destination service and/or recover monitoring objective service and match somebody with somebody
Put.
Alternatively, the running monitor device of the above-mentioned cloud platform based on (SuSE) Linux OS also includes:Daily record generates mould
Block, it is configured to:Obtain monitoring objective service crash info, and the running log of writing system, wherein crash info include with
Under it is any one or more:Processor running status, prison when the collapse time of monitoring objective service, monitoring objective service crashes
Control internal memory running status during destination service collapse.
According to another aspect of the present invention, a kind of cloud platform based on (SuSE) Linux OS is additionally provided.This is based on
The cloud platform of (SuSE) Linux OS includes the operation monitoring of any cloud platform based on (SuSE) Linux OS of above-mentioned introduction
Device.
The operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present invention, is being utilized as each monitoring objective
The upstart processes of service setup, obtain the collapse time of monitoring objective service, and the corresponding work using upstart processes is returned
Multiple monitoring objective service, so as to when abnormal collapse occurs in service, the service of restarting be repaired automatically, without using existing
Hot Spare mode, improve stability and the flexibility of cloud platform.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS, can be recorded automatically
Service crashes daily record, so as to help the crash reason of positioning service and trouble point, reduce maintenance cost.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS need not change cloud
The code of platform in itself, directly carries out collapse protection to cloud platform service, realizes the protection in units of process, coverage rate
Extensively, the processes such as cloud platform related storage, calculating and management are comprehensively protected.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS is pre- by performing
The upstart scripts first write realize that protection is flexible, and its monitoring objective service protected dynamically is added, deletes, opened
And closing.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights
Specifically noted structure is realized and obtained in claim and accompanying drawing.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
According to the accompanying drawings will be brighter to the detailed description of the specific embodiment of the invention, those skilled in the art
Above-mentioned and other purposes, the advantages and features of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is the schematic diagram of the protection mechanism of cloud platform service in the prior art;
Fig. 2 is the running monitor device of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
Schematic diagram;
Fig. 3 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
Schematic diagram;
Fig. 4 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
One kind specific implementation flow chart;And
Fig. 5 is the operation and monitoring method pair of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
A kind of optional flow chart that Nkscloud services are monitored.
Embodiment
Embodiments of the present invention are described in detail below with reference to drawings and Examples, and how the present invention is applied whereby
Technological means solves technical problem, and the implementation process for reaching technique effect can fully understand and implement according to this.Need to illustrate
As long as not forming conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other,
The technical scheme formed is within protection scope of the present invention.
In addition, can be in the department of computer science of such as one group computer executable instructions the flow of accompanying drawing illustrates the step of
Performed in system, although also, show logical order in flow charts, in some cases, can be with different from herein
Order perform shown or described step.
Fig. 2 is the running monitor device of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
Schematic diagram, as illustrated, being somebody's turn to do the running monitor device 100 of the cloud platform based on (SuSE) Linux OS can include in general manner:
Target Acquisition module 110, process initiation module 120, process resumption module 130.In some alternative embodiments, it can also increase
Daily record generation module 140 is set.
Linux is a set of free class Unix operating systems using with Free propagation, is one and is based on POSIX
(Portable Operating System Interface of Unix, portable operating system interface) and UNIX's is multi-purpose
Family, multitask, the operating system for supporting multithreading and multi -CPU.Linux is famous with its high efficiency and flexibility, Linux modules
The design structure of change so that it can run on expensive work station, have multitask, the ability of multi-user, therefore general
All over being used in cloud platform.
In the part of the running monitor device 100 of the cloud platform based on (SuSE) Linux OS of the present embodiment, target obtains
Modulus block 110 is configured to obtain the monitoring objective service run in cloud platform, and the cloud platform run on (SuSE) Linux OS is born
The service number of duty is typically more, and therefore, the monitoring objective service priority that Target Acquisition module 110 obtains is to directly affect cloud platform
The service of operational reliability, such as cloud storage service, cloud platform back-end services, cloud platform web service etc..
Process initiation module 120 is configured to start upstart processes corresponding to monitoring objective service.Upstart is one
Initialization finger daemon based on event, for substituting the init in traditional linux system.Upstart is one and is based on event
Process, therefore it starts and stopped to be all based on the communication of event.Default one can be corresponded to for each monitoring objective service
Individual upstart processes.
Upstart is started using event and is closed system service.Upstart is parallel, as long as event occurs,
Upstart services can concurrently start.In an embodiment of the present invention, can be pre- according to the operation characteristic of monitoring objective service
Upstart scripts corresponding to first writing, the script can be using the processes of each service as object, when process exception is collapsed, certainly
It is dynamic to restart the process and repaired.Now, process initiation module 120 can perform advance according to monitoring objective service operation feature
There are the entry condition and closedown condition of upstart processes defined in the upstart scripts write, wherein upstart scripts, and
The work of the crash event of response monitoring destination service.
In upstart, work (Job) and event (Event) is two important concepts, Job is used for completing a work
Make, for example start a background service, or one configuration order of operation.Each work waits one or more events, and one
Denier event occurs, and upstart just triggers the job and completes corresponding work.Event has in upstart in the form of notification message
Body is present.There occurs once upstart just sends a message to some event to whole system.That is, event is once sent out
Raw, all working and other events can be all notified in whole system.
In the present embodiment, the running monitor device 100 of the cloud platform based on (SuSE) Linux OS is directed to different monitoring
Destination service, service can be included by defining in different upstart Job, each Job starts, and service configuration, service is repaiied
The functions such as multiple and daily record generation.Upstart starts, when abnormal collapse occurs for its corresponding monitoring objective service, upstart meetings
Message is sent to system, system can receive the message, and trigger Job corresponding to execution, complete server resets, configure, repair
All or part in the work such as multiple, daily record generation.
Process resumption module 130 is configured to after the crash event that upstart processes get monitoring objective service, is performed
The work of the recovery monitoring objective service of upstart processes, to repair the monitoring objective service of collapse.Wherein, these upstart
Work can include:Reset monitor destination service and/or the configuration for recovering monitoring objective service.
Preferably, daily record generation module 140 can obtain monitoring objective clothes during process resumption module 130 works
The crash info of business, and the running log of writing system, wherein crash info are including following any one or more:Monitoring objective
Internal memory fortune when processor running status when the collapse time of service, monitoring objective service crashes, monitoring objective service crashes
Row state.Using the running log of daily record generation module 140, linux system can be recorded after monitoring objective service crashes
Key message, it is easy to Analysis Service crash reason, orientation problem, reduces the repair time with platform.
The embodiment of the present invention additionally provides a kind of cloud platform based on (SuSE) Linux OS.(SuSE) Linux OS should be based on
The above-mentioned introduction of cloud platform any embodiment the cloud platform based on (SuSE) Linux OS running monitor device.Utilize
The cloud platform based on (SuSE) Linux OS of the present embodiment, hardware deployment is simple, can be significantly reduced the general of platform collapse
Rate, the stability of product is lifted, meanwhile, maintenance cost can be reduced by servicing automatic repair function, record the daily record of crash info
Function can improve O&M efficiency, and greatly protection can be provided for cloud platform.
The present embodiment additionally provides a kind of operation and monitoring method of the cloud platform based on (SuSE) Linux OS, and this is based on
The operation and monitoring method of the cloud platform of (SuSE) Linux OS can be any based on Linux behaviour by what is introduced in above example
Make the running monitor device 100 of the cloud platform of system to perform, be monitored destination service using upstart, and collapse in service
Repaired automatically after bursting, to improve the O&M efficiency of cloud platform.Fig. 3 is according to an embodiment of the invention to be based on Linux
The schematic diagram of the operation and monitoring method of the cloud platform of operating system, as illustrated, being somebody's turn to do the cloud platform based on (SuSE) Linux OS
Operation and monitoring method includes:
Step S302, obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;
Step S304, start upstart processes corresponding to monitoring objective service;
Step S306, after the crash event that upstart processes get monitoring objective service, perform upstart processes
Recovery monitoring objective service work, with repair collapse monitoring objective service.
Operation has multiple monitoring objective services in cloud platform, and these monitoring objective services can be selected to the normal fortune of cloud platform
The service that row has a great influence, each monitoring objective service are corresponding with a default upstart process, wherein monitoring objective service
It can include:Cloud storage service, cloud platform back-end services, cloud platform web service.What step S302 can be run from cloud platform
The target for needing to be monitored is obtained in service.
A kind of step S304 optional flow is:Perform what is write in advance according to monitoring objective service operation feature
There are the entry condition and closedown condition of upstart processes, and response prison defined in upstart scripts, wherein upstart scripts
Control the work of the crash event of destination service.
The work of step S306 execution can include:Reset monitor destination service and/or recover monitoring objective service
Configuration.
For the ease of the trouble point of positioning collapse, can also include after step S306:Generate monitoring objective service
Crash log.Specifically, the crash info of monitoring objective service, and the running log of writing system can be obtained, wherein collapsing
Information includes following any one or more:The processor when collapse time of monitoring objective service, monitoring objective service crashes
Internal memory running status when running status, monitoring objective service crashes.The execution of the running log of writing system it is a kind of optional
Mode is:Shell daily record scripts are write, the function of the shell scripts is record crash info.
Shell provides user and interacts a kind of interface of operation with kernel, and it receives the order of user's input and handle
It is sent into kernel and goes to perform, and shell programmings refer to by writing Run Script, complete the execution that mass user inputs order.
In the operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present embodiment, it can be recorded using shell daily records script
The crash log of monitoring objective service, is easy to subsequent analysis and fault location.
The operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present embodiment is with Linux upstart technologies
Based on, day when monitoring the service processes of cloud platform, recover automatically in service crashes, and being able to record service crashes
Will, improve O&M efficiency.Other the present embodiment can flexibly be configured for different cloud platforms, realize different monitoring target
The operation monitoring of service.
Below by taking kylin secure cloud operating system of getting the bid as an example, introduce the embodiment of the present invention based on (SuSE) Linux OS
Cloud platform operation and monitoring method a kind of implementation.
Acceptance of the bid kylin secure cloud operating system is related to the protection of multiple processes, below with three main services:Cloud storage takes
Business (Gluster Manager Console, abbreviation GMC), cloud platform back-end services (NeoKylin Security Cloud, letter
Claim Nkscloud), exemplified by cloud platform web service (NeoKylin Security Cloud-Web, abbreviation Nkscloud_web)
Illustrate the realization of collapse protection mechanism.
Fig. 4 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
One kind specific implementation flow chart, as shown in figure 4, the operation provided in the above-described embodiments has the cloud platform of (SuSE) Linux OS
It is upper to run GMC services, Nkscloud services and Nkscloud_web services simultaneously, perform above example based on
After the operation and monitoring method of the cloud platform of (SuSE) Linux OS, three above service is used as monitoring objective service, can exist respectively
Under the monitoring of upstart processes, restart automatically after collapse and repaired, and record the daily record after collapse.Its monitoring method
Flow can be:
The fortune of the cloud platform based on (SuSE) Linux OS of advance installation settings the present embodiment on (SuSE) Linux OS
Row supervising device, to perform the operation and monitoring method of the cloud platform based on (SuSE) Linux OS above, GMC services, Nkscloud
Service and Nkscloud_web services are restarted automatically with upstart mechanism, each service processes after restarting, all more than receiving
The monitoring of the operation and monitoring method of cloud platform based on (SuSE) Linux OS.
When there is process collapse in any one in GMC services, Nkscloud services and Nkscloud_web services,
Upstart work according to corresponding to performing crash event, restarts the service, and record the crash log of the service automatically.For
The failure that can not be repaired automatically, operation maintenance personnel can be according to the rapid orientation problems of crash log of generation, to handle as early as possible.
Fig. 5 is the operation and monitoring method pair of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS
A kind of optional flow chart that Nkscloud services are monitored, as shown in figure 5, the flow comprises the following steps:
Step S502, obtain Nkscloud and service abnormal crash event;
Step S504, upstart send message to linux system;
Step S506, perform upstart work corresponding to Nkscloud services;
Step S508, recover Nkscloud service normal operations.
Before the monitoring for Nkscloud services is performed, the upstart prisons of Nkscloud services can be defined first
The startup of control process and closing dependence condition, such as can define:
Start on stopped rc RUNLEVEL=[2345]
stop on runlevel[!2345]
In above-mentioned code, start on represent that operation monitoring starts dependence condition, that is, for nkscloud services
When monitoring starts, rc services must be in runlevel 2,3,4,5 times closings.
Stop on represent that the closing of operation monitoring relies on condition, and the monitoring for nkscloud services can not be 2,3,4,5
Runlevel under close.
After step S502 gets the abnormal crash event of Nkscloud services, the response mechanism of definition includes:
Respawn, when its implication is nkscloud service exception collapses, the service can restart automatically at once.
Upstart work corresponding to the Nkscloud services performed in step S506 includes:Nkcloud service configuration, clothes
Business starts, and function, its function code such as service reparation and daily record can be:
It can be held successively after Nkcloud service crashes using the upstart operation codes serviced above in relation to Nkscloud
The step of row service configuration, service start, service is repaired, log, completes the restarting of Nkcloud services as soon as possible,
So as to ensure that the operational reliability of whole cloud platform.
Only it is introduced so that Nkscloud is serviced as an example, can be formulated for the service run in other cloud platforms above
Corresponding upstart work, with carry out running state monitoring using upstart mechanism and completed in collapse service reparation and its
His corresponding function.
Using the operation and monitoring method of the cloud platform based on (SuSE) Linux OS of above example, be utilized as it is each
The upstart processes of monitoring objective service setup, the collapse time of monitoring objective service is obtained, accordingly uses upstart processes
Work reply monitoring objective service, so as to when abnormal collapse occurs in service, the service of restarting be repaired automatically, without
Using existing Hot Spare mode, stability and the flexibility of cloud platform are improved.Further, clothes can also be recorded automatically
Business crash log, so as to help the crash reason of positioning service and trouble point, reduce maintenance cost.And can flexibly it lead to
The upstart scripts realization for performing and writing in advance is crossed, its monitoring objective service protected dynamically is added, deletes, opened
And closing.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any
Mode it can use in any combination.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention
Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention
It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized
It is set to and covers other all these variations or modifications.
Claims (9)
- A kind of 1. operation and monitoring method of the cloud platform based on (SuSE) Linux OS, it is characterised in that including:Obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;Start upstart processes corresponding to the monitoring objective service, including:Perform special according to the monitoring objective service operation The upstart scripts that point is write in advance, wherein there is the entry condition of the upstart processes defined in the upstart scripts And closedown condition, and the work of the crash event of the monitoring objective service is responded, being run in the cloud platform has multiple institutes Monitoring objective service is stated, each monitoring objective service is corresponding with a default upstart process;After the crash event that the upstart processes get the monitoring objective service, the upstart processes are performed Recover the work of the monitoring objective service, to repair the monitoring objective service of collapse.
- 2. according to the method for claim 1, it is characterised in that perform monitoring objective described in the upstart process resumptions The work of service includes:Restart the monitoring objective service and/or recover the configuration of the monitoring objective service.
- 3. according to the method for claim 1, it is characterised in that monitor mesh described in the upstart process resumptions performing Mark the work of service also includes afterwards:Generate the crash log of the monitoring objective service.
- 4. according to the method for claim 3, it is characterised in that generating the crash log of the monitoring objective service includes:The crash info of the monitoring objective service, and the running log of writing system are obtained, whereinThe crash info includes following any one or more:The collapse time of the monitoring objective service, the monitoring mesh Mark service crashes when processor running status, the monitoring objective service crashes when internal memory running status.
- 5. method according to any one of claim 1 to 4, it is characterised in thatThe monitoring objective service includes:Cloud storage service, cloud platform back-end services, cloud platform web service.
- A kind of 6. running monitor device of the cloud platform based on (SuSE) Linux OS, it is characterised in that including:Target Acquisition module, it is configured to obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;Process initiation module, it is configured to start upstart processes corresponding to the monitoring objective service, including:Perform according to institute The upstart scripts that monitoring objective service operation feature is write in advance are stated, wherein having defined in the upstart scripts described The entry condition and closedown condition of upstart processes, and the work of the crash event of the response monitoring objective service, it is described Operation has multiple monitoring objective services in cloud platform, each monitoring objective service be corresponding with one it is default Upstart processes;Process resumption module, it is configured to after the crash event that the upstart processes get the monitoring objective service, holds The work of the recovery monitoring objective service of the row upstart processes, to repair the monitoring objective service of collapse.
- 7. running monitor device according to claim 6, it is characterised in thatThe work of the crash event of the monitoring objective service is responded, the work includes:Restart the monitoring objective service and/ Or recover the configuration of the monitoring objective service.
- 8. running monitor device according to claim 6, it is characterised in that also include:Daily record generation module, is configured to:The crash info of the monitoring objective service, and the running log of writing system are obtained, WhereinThe crash info includes following any one or more:The collapse time of the monitoring objective service, the monitoring mesh Mark service crashes when processor running status, the monitoring objective service crashes when internal memory running status.
- 9. a kind of cloud platform based on (SuSE) Linux OS, it is characterised in that including any one of claim 6 to 8 The running monitor device of cloud platform based on (SuSE) Linux OS.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410635137.4A CN104391777B (en) | 2014-11-12 | 2014-11-12 | Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410635137.4A CN104391777B (en) | 2014-11-12 | 2014-11-12 | Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391777A CN104391777A (en) | 2015-03-04 |
CN104391777B true CN104391777B (en) | 2018-01-23 |
Family
ID=52609685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410635137.4A Active CN104391777B (en) | 2014-11-12 | 2014-11-12 | Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391777B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844137B (en) * | 2016-12-08 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Server monitoring method and device |
CN107423620B (en) * | 2017-03-12 | 2020-11-24 | 苏州浪潮智能科技有限公司 | Management method and device for storage server service process |
CN108427627A (en) * | 2018-02-05 | 2018-08-21 | 阿里巴巴集团控股有限公司 | The method and device and electronic equipment of statistical system stability |
CN110032487A (en) * | 2018-11-09 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Keep Alive supervision method, apparatus and electronic equipment |
CN110572292B (en) * | 2019-10-30 | 2022-04-15 | 北京永亚普信科技有限责任公司 | High availability system and method based on unidirectional transmission link |
CN111104226B (en) * | 2019-12-25 | 2024-01-26 | 东北大学 | Intelligent management system and method for multi-tenant service resources |
CN111400138A (en) * | 2020-03-17 | 2020-07-10 | 中国建设银行股份有限公司 | Client monitoring method, device and system based on double-layer daemon mechanism |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8595556B2 (en) * | 2010-10-14 | 2013-11-26 | International Business Machines Corporation | Soft failure detection |
CN103167004A (en) * | 2011-12-15 | 2013-06-19 | 中国移动通信集团上海有限公司 | Cloud platform host system fault correcting method and cloud platform front control server |
EP2672668B1 (en) * | 2012-06-06 | 2018-09-26 | Juniper Networks, Inc. | Facilitating the operation of a virtual network by predicting a failure |
CN103297264B (en) * | 2013-04-19 | 2017-04-12 | 无锡成电科大科技发展有限公司 | Cloud platform failure recovery method and system |
CN103716182B (en) * | 2013-12-12 | 2016-08-31 | 中国科学院信息工程研究所 | A kind of fault detect towards real-time cloud platform and fault-tolerance approach and system |
-
2014
- 2014-11-12 CN CN201410635137.4A patent/CN104391777B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN104391777A (en) | 2015-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391777B (en) | Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS | |
US8910172B2 (en) | Application resource switchover systems and methods | |
KR102268355B1 (en) | Cloud deployment infrastructure validation engine | |
US10489232B1 (en) | Data center diagnostic information | |
CN105354113B (en) | A kind of system and method for server, management server | |
US9665452B2 (en) | Systems and methods for smart diagnoses and triage of failures with identity continuity | |
EP4083786A1 (en) | Cloud operating system management method and apparatus, server, management system, and medium | |
CN110807064B (en) | Data recovery device in RAC distributed database cluster system | |
US10313441B2 (en) | Data processing system with machine learning engine to provide enterprise monitoring functions | |
WO2018095414A1 (en) | Method and apparatus for detecting and recovering fault of virtual machine | |
US10102073B2 (en) | Systems and methods for providing automatic system stop and boot-to-service OS for forensics analysis | |
WO2017107827A1 (en) | Method and apparatus for isolating environment | |
CN112153024B (en) | Mimicry defense system based on SaaS platform | |
WO2016045439A1 (en) | Vnfm disaster-tolerant protection method and device, nfvo and storage medium | |
US20160364250A1 (en) | Systems and methods for providing technical support and exporting diagnostic data | |
CN111400139A (en) | Multi-data center batch job management and control and scheduling system, method and storage medium | |
CN103178977A (en) | Computer system and starting-up management method of same | |
US9734191B2 (en) | Asynchronous image repository functionality | |
WO2018001262A1 (en) | Method, apparatus and system for disaster recovery of virtual machine | |
CN108600156A (en) | A kind of server and safety certifying method | |
CN115766405A (en) | Fault processing method, device, equipment and storage medium | |
EP3473035B1 (en) | Application resilience system and method thereof for applications deployed on a cloud platform | |
CN111949475A (en) | Method and system for achieving distributed task scheduling based on zookeeper shell | |
US10296425B2 (en) | Optimizing data processing across server clusters and data centers using checkpoint-based data replication | |
CN110519393B (en) | Self-service equipment supervision method, device, equipment, server and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |