CN104391777B - Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS - Google Patents

Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS Download PDF

Info

Publication number
CN104391777B
CN104391777B CN201410635137.4A CN201410635137A CN104391777B CN 104391777 B CN104391777 B CN 104391777B CN 201410635137 A CN201410635137 A CN 201410635137A CN 104391777 B CN104391777 B CN 104391777B
Authority
CN
China
Prior art keywords
service
monitoring objective
upstart
cloud platform
objective service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410635137.4A
Other languages
Chinese (zh)
Other versions
CN104391777A (en
Inventor
侯健
刘彬
罗飞
宋潇豫
张永军
赵峰
乔咏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Software Co Ltd
Original Assignee
China Standard Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Software Co Ltd filed Critical China Standard Software Co Ltd
Priority to CN201410635137.4A priority Critical patent/CN104391777B/en
Publication of CN104391777A publication Critical patent/CN104391777A/en
Application granted granted Critical
Publication of CN104391777B publication Critical patent/CN104391777B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of cloud platform based on (SuSE) Linux OS and its operation and monitoring method and running monitor device.The operation and monitoring method of the wherein cloud platform based on (SuSE) Linux OS includes:Obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;Start upstart processes corresponding to monitoring objective service;After the crash event that upstart processes get monitoring objective service, the work of the recovery monitoring objective service of upstart processes is performed, to repair the monitoring objective service of collapse.Using scheme provided by the invention, when abnormal collapse occurs in service, it the service of restarting can be repaired automatically, without using existing Hot Spare mode, improve stability and the flexibility of cloud platform.

Description

Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS
Technical field
The present invention relates to computer realm, more particularly to a kind of cloud platform and its operation prison based on (SuSE) Linux OS Prosecutor method and running monitor device.
Background technology
Cloud computing (Cloud Computing), it is a kind of calculation based on internet, in this way, shares Software and hardware resources and information can be supplied to computer and other equipment on demand.Various cloud platforms (Cloud Platforms) It is the architecture that application program is run in field of cloud calculation, as its name suggests, this platform allows developers or will finished writing Program be placed on that " cloud " is inner to be run, or use " cloud " inner service provided.
At present, numerous Information technology enterprises enters field of cloud calculation one after another, is proposed respective cloud platform one after another, so And and then the problem of these cloud platforms, is also constantly exposed.Such as between 2007 to 2008 years, Amazon cloud platform is a wide range of Failure;2009, Microsoft's cloud platform collapse, cause serious consequence.Therefore, the stability of cloud platform how is protected to have become Important topic in current cloud platform.
In traditional cloud environment, cloud platform service operation is on server, once there are service crashes, cloud platform is by nothing Method normal operation.Although can be protected by high availability mechanism to cloud platform, the installation configuration of these protection mechanisms is cumbersome And maintenance cost is higher, and occur once breaking down, positioning failure cause is difficult.Fig. 1 is that cloud platform takes in the prior art The schematic diagram of the protection mechanism of business, as shown in figure 1, during cloud platform normal operation, once there is process collapse, high availability mechanism meeting Automatically standby host is enabled, then goes to repair failure again, after troubleshooting, recovered cloud platform and run, this protection mechanism, though The stable of cloud platform can be so protected, but needs to establish standby host, cost is higher, operation maintenance time length, if standby host Performance is bad, can also cause normal use.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State the cloud platform based on (SuSE) Linux OS of problem and the operation and monitoring method of the cloud platform based on (SuSE) Linux OS And device.
A purpose of the invention is to provide for a kind of side for improving the cloud platform operational reliability based on (SuSE) Linux OS Method.
The purpose of entering of the present invention is to reduce the cost using standby host, the automatic service for repairing collapse.
A kind of according to an aspect of the invention, there is provided operation monitoring side of the cloud platform based on (SuSE) Linux OS Method.The operation and monitoring method of the cloud platform based on (SuSE) Linux OS includes:The cloud based on (SuSE) Linux OS is obtained to put down The monitoring objective service run in platform;Start upstart processes corresponding to monitoring objective service;Got in upstart processes After the crash event of monitoring objective service, the work of the recovery monitoring objective service of upstart processes is performed, to repair collapse Monitoring objective service.
Alternatively, upstart processes corresponding to startup monitoring objective service include:Perform according to monitoring objective service operation There are entry condition and the closing of upstart processes defined in the upstart scripts that feature is write in advance, wherein upstart scripts Condition, and the work of the crash event of response monitoring destination service.
Alternatively, performing the work of upstart process resumption monitoring objective services includes:Reset monitor destination service and/ Or recover the configuration of monitoring objective service.
Alternatively, also include after the work of upstart process resumption monitoring objective services is performed:Generate monitoring objective The crash log of service.
Alternatively, generating the crash log of monitoring objective service includes:The crash info of monitoring objective service is obtained, and is write Enter the running log of system, wherein crash info includes following any one or more:Collapse time, the prison of monitoring objective service Control destination service collapse when processor running status, monitoring objective service crashes when internal memory running status.
Alternatively, operation has multiple monitoring objective services in cloud platform, and each monitoring objective service is corresponding with one and preset Upstart processes, wherein monitoring objective service includes:Cloud storage service, cloud platform back-end services, cloud platform web service.
According to another aspect of the present invention, there is provided a kind of operation monitoring of cloud platform based on (SuSE) Linux OS Device.The running monitor device of the cloud platform based on (SuSE) Linux OS includes:Target Acquisition module, it is configured to obtain cloud The monitoring objective service run in platform;Process initiation module, it is configured to start monitoring objective service correspondingly based on Linux operations The upstart processes of the cloud platform of system;Process resumption module, it is configured to get monitoring objective service in upstart processes Crash event after, perform upstart processes recovery monitoring objective service work, with repair collapse monitoring objective take Business.
Alternatively, process initiation module is additionally configured to:Perform what is write in advance according to monitoring objective service operation feature There are the entry condition and closedown condition of upstart processes, and response prison defined in upstart scripts, wherein upstart scripts The work of the crash event of destination service is controlled, work includes:Reset monitor destination service and/or recover monitoring objective service and match somebody with somebody Put.
Alternatively, the running monitor device of the above-mentioned cloud platform based on (SuSE) Linux OS also includes:Daily record generates mould Block, it is configured to:Obtain monitoring objective service crash info, and the running log of writing system, wherein crash info include with Under it is any one or more:Processor running status, prison when the collapse time of monitoring objective service, monitoring objective service crashes Control internal memory running status during destination service collapse.
According to another aspect of the present invention, a kind of cloud platform based on (SuSE) Linux OS is additionally provided.This is based on The cloud platform of (SuSE) Linux OS includes the operation monitoring of any cloud platform based on (SuSE) Linux OS of above-mentioned introduction Device.
The operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present invention, is being utilized as each monitoring objective The upstart processes of service setup, obtain the collapse time of monitoring objective service, and the corresponding work using upstart processes is returned Multiple monitoring objective service, so as to when abnormal collapse occurs in service, the service of restarting be repaired automatically, without using existing Hot Spare mode, improve stability and the flexibility of cloud platform.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS, can be recorded automatically Service crashes daily record, so as to help the crash reason of positioning service and trouble point, reduce maintenance cost.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS need not change cloud The code of platform in itself, directly carries out collapse protection to cloud platform service, realizes the protection in units of process, coverage rate Extensively, the processes such as cloud platform related storage, calculating and management are comprehensively protected.
Further, the operation and monitoring method of the cloud platform of the invention based on (SuSE) Linux OS is pre- by performing The upstart scripts first write realize that protection is flexible, and its monitoring objective service protected dynamically is added, deletes, opened And closing.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and obtained in claim and accompanying drawing.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
According to the accompanying drawings will be brighter to the detailed description of the specific embodiment of the invention, those skilled in the art Above-mentioned and other purposes, the advantages and features of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is the schematic diagram of the protection mechanism of cloud platform service in the prior art;
Fig. 2 is the running monitor device of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS Schematic diagram;
Fig. 3 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS Schematic diagram;
Fig. 4 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS One kind specific implementation flow chart;And
Fig. 5 is the operation and monitoring method pair of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS A kind of optional flow chart that Nkscloud services are monitored.
Embodiment
Embodiments of the present invention are described in detail below with reference to drawings and Examples, and how the present invention is applied whereby Technological means solves technical problem, and the implementation process for reaching technique effect can fully understand and implement according to this.Need to illustrate As long as not forming conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, The technical scheme formed is within protection scope of the present invention.
In addition, can be in the department of computer science of such as one group computer executable instructions the flow of accompanying drawing illustrates the step of Performed in system, although also, show logical order in flow charts, in some cases, can be with different from herein Order perform shown or described step.
Fig. 2 is the running monitor device of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS Schematic diagram, as illustrated, being somebody's turn to do the running monitor device 100 of the cloud platform based on (SuSE) Linux OS can include in general manner: Target Acquisition module 110, process initiation module 120, process resumption module 130.In some alternative embodiments, it can also increase Daily record generation module 140 is set.
Linux is a set of free class Unix operating systems using with Free propagation, is one and is based on POSIX (Portable Operating System Interface of Unix, portable operating system interface) and UNIX's is multi-purpose Family, multitask, the operating system for supporting multithreading and multi -CPU.Linux is famous with its high efficiency and flexibility, Linux modules The design structure of change so that it can run on expensive work station, have multitask, the ability of multi-user, therefore general All over being used in cloud platform.
In the part of the running monitor device 100 of the cloud platform based on (SuSE) Linux OS of the present embodiment, target obtains Modulus block 110 is configured to obtain the monitoring objective service run in cloud platform, and the cloud platform run on (SuSE) Linux OS is born The service number of duty is typically more, and therefore, the monitoring objective service priority that Target Acquisition module 110 obtains is to directly affect cloud platform The service of operational reliability, such as cloud storage service, cloud platform back-end services, cloud platform web service etc..
Process initiation module 120 is configured to start upstart processes corresponding to monitoring objective service.Upstart is one Initialization finger daemon based on event, for substituting the init in traditional linux system.Upstart is one and is based on event Process, therefore it starts and stopped to be all based on the communication of event.Default one can be corresponded to for each monitoring objective service Individual upstart processes.
Upstart is started using event and is closed system service.Upstart is parallel, as long as event occurs, Upstart services can concurrently start.In an embodiment of the present invention, can be pre- according to the operation characteristic of monitoring objective service Upstart scripts corresponding to first writing, the script can be using the processes of each service as object, when process exception is collapsed, certainly It is dynamic to restart the process and repaired.Now, process initiation module 120 can perform advance according to monitoring objective service operation feature There are the entry condition and closedown condition of upstart processes defined in the upstart scripts write, wherein upstart scripts, and The work of the crash event of response monitoring destination service.
In upstart, work (Job) and event (Event) is two important concepts, Job is used for completing a work Make, for example start a background service, or one configuration order of operation.Each work waits one or more events, and one Denier event occurs, and upstart just triggers the job and completes corresponding work.Event has in upstart in the form of notification message Body is present.There occurs once upstart just sends a message to some event to whole system.That is, event is once sent out Raw, all working and other events can be all notified in whole system.
In the present embodiment, the running monitor device 100 of the cloud platform based on (SuSE) Linux OS is directed to different monitoring Destination service, service can be included by defining in different upstart Job, each Job starts, and service configuration, service is repaiied The functions such as multiple and daily record generation.Upstart starts, when abnormal collapse occurs for its corresponding monitoring objective service, upstart meetings Message is sent to system, system can receive the message, and trigger Job corresponding to execution, complete server resets, configure, repair All or part in the work such as multiple, daily record generation.
Process resumption module 130 is configured to after the crash event that upstart processes get monitoring objective service, is performed The work of the recovery monitoring objective service of upstart processes, to repair the monitoring objective service of collapse.Wherein, these upstart Work can include:Reset monitor destination service and/or the configuration for recovering monitoring objective service.
Preferably, daily record generation module 140 can obtain monitoring objective clothes during process resumption module 130 works The crash info of business, and the running log of writing system, wherein crash info are including following any one or more:Monitoring objective Internal memory fortune when processor running status when the collapse time of service, monitoring objective service crashes, monitoring objective service crashes Row state.Using the running log of daily record generation module 140, linux system can be recorded after monitoring objective service crashes Key message, it is easy to Analysis Service crash reason, orientation problem, reduces the repair time with platform.
The embodiment of the present invention additionally provides a kind of cloud platform based on (SuSE) Linux OS.(SuSE) Linux OS should be based on The above-mentioned introduction of cloud platform any embodiment the cloud platform based on (SuSE) Linux OS running monitor device.Utilize The cloud platform based on (SuSE) Linux OS of the present embodiment, hardware deployment is simple, can be significantly reduced the general of platform collapse Rate, the stability of product is lifted, meanwhile, maintenance cost can be reduced by servicing automatic repair function, record the daily record of crash info Function can improve O&M efficiency, and greatly protection can be provided for cloud platform.
The present embodiment additionally provides a kind of operation and monitoring method of the cloud platform based on (SuSE) Linux OS, and this is based on The operation and monitoring method of the cloud platform of (SuSE) Linux OS can be any based on Linux behaviour by what is introduced in above example Make the running monitor device 100 of the cloud platform of system to perform, be monitored destination service using upstart, and collapse in service Repaired automatically after bursting, to improve the O&M efficiency of cloud platform.Fig. 3 is according to an embodiment of the invention to be based on Linux The schematic diagram of the operation and monitoring method of the cloud platform of operating system, as illustrated, being somebody's turn to do the cloud platform based on (SuSE) Linux OS Operation and monitoring method includes:
Step S302, obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;
Step S304, start upstart processes corresponding to monitoring objective service;
Step S306, after the crash event that upstart processes get monitoring objective service, perform upstart processes Recovery monitoring objective service work, with repair collapse monitoring objective service.
Operation has multiple monitoring objective services in cloud platform, and these monitoring objective services can be selected to the normal fortune of cloud platform The service that row has a great influence, each monitoring objective service are corresponding with a default upstart process, wherein monitoring objective service It can include:Cloud storage service, cloud platform back-end services, cloud platform web service.What step S302 can be run from cloud platform The target for needing to be monitored is obtained in service.
A kind of step S304 optional flow is:Perform what is write in advance according to monitoring objective service operation feature There are the entry condition and closedown condition of upstart processes, and response prison defined in upstart scripts, wherein upstart scripts Control the work of the crash event of destination service.
The work of step S306 execution can include:Reset monitor destination service and/or recover monitoring objective service Configuration.
For the ease of the trouble point of positioning collapse, can also include after step S306:Generate monitoring objective service Crash log.Specifically, the crash info of monitoring objective service, and the running log of writing system can be obtained, wherein collapsing Information includes following any one or more:The processor when collapse time of monitoring objective service, monitoring objective service crashes Internal memory running status when running status, monitoring objective service crashes.The execution of the running log of writing system it is a kind of optional Mode is:Shell daily record scripts are write, the function of the shell scripts is record crash info.
Shell provides user and interacts a kind of interface of operation with kernel, and it receives the order of user's input and handle It is sent into kernel and goes to perform, and shell programmings refer to by writing Run Script, complete the execution that mass user inputs order. In the operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present embodiment, it can be recorded using shell daily records script The crash log of monitoring objective service, is easy to subsequent analysis and fault location.
The operation and monitoring method of the cloud platform based on (SuSE) Linux OS of the present embodiment is with Linux upstart technologies Based on, day when monitoring the service processes of cloud platform, recover automatically in service crashes, and being able to record service crashes Will, improve O&M efficiency.Other the present embodiment can flexibly be configured for different cloud platforms, realize different monitoring target The operation monitoring of service.
Below by taking kylin secure cloud operating system of getting the bid as an example, introduce the embodiment of the present invention based on (SuSE) Linux OS Cloud platform operation and monitoring method a kind of implementation.
Acceptance of the bid kylin secure cloud operating system is related to the protection of multiple processes, below with three main services:Cloud storage takes Business (Gluster Manager Console, abbreviation GMC), cloud platform back-end services (NeoKylin Security Cloud, letter Claim Nkscloud), exemplified by cloud platform web service (NeoKylin Security Cloud-Web, abbreviation Nkscloud_web) Illustrate the realization of collapse protection mechanism.
Fig. 4 is the operation and monitoring method of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS One kind specific implementation flow chart, as shown in figure 4, the operation provided in the above-described embodiments has the cloud platform of (SuSE) Linux OS It is upper to run GMC services, Nkscloud services and Nkscloud_web services simultaneously, perform above example based on After the operation and monitoring method of the cloud platform of (SuSE) Linux OS, three above service is used as monitoring objective service, can exist respectively Under the monitoring of upstart processes, restart automatically after collapse and repaired, and record the daily record after collapse.Its monitoring method Flow can be:
The fortune of the cloud platform based on (SuSE) Linux OS of advance installation settings the present embodiment on (SuSE) Linux OS Row supervising device, to perform the operation and monitoring method of the cloud platform based on (SuSE) Linux OS above, GMC services, Nkscloud Service and Nkscloud_web services are restarted automatically with upstart mechanism, each service processes after restarting, all more than receiving The monitoring of the operation and monitoring method of cloud platform based on (SuSE) Linux OS.
When there is process collapse in any one in GMC services, Nkscloud services and Nkscloud_web services, Upstart work according to corresponding to performing crash event, restarts the service, and record the crash log of the service automatically.For The failure that can not be repaired automatically, operation maintenance personnel can be according to the rapid orientation problems of crash log of generation, to handle as early as possible.
Fig. 5 is the operation and monitoring method pair of the cloud platform according to an embodiment of the invention based on (SuSE) Linux OS A kind of optional flow chart that Nkscloud services are monitored, as shown in figure 5, the flow comprises the following steps:
Step S502, obtain Nkscloud and service abnormal crash event;
Step S504, upstart send message to linux system;
Step S506, perform upstart work corresponding to Nkscloud services;
Step S508, recover Nkscloud service normal operations.
Before the monitoring for Nkscloud services is performed, the upstart prisons of Nkscloud services can be defined first The startup of control process and closing dependence condition, such as can define:
Start on stopped rc RUNLEVEL=[2345]
stop on runlevel[!2345]
In above-mentioned code, start on represent that operation monitoring starts dependence condition, that is, for nkscloud services When monitoring starts, rc services must be in runlevel 2,3,4,5 times closings.
Stop on represent that the closing of operation monitoring relies on condition, and the monitoring for nkscloud services can not be 2,3,4,5 Runlevel under close.
After step S502 gets the abnormal crash event of Nkscloud services, the response mechanism of definition includes: Respawn, when its implication is nkscloud service exception collapses, the service can restart automatically at once.
Upstart work corresponding to the Nkscloud services performed in step S506 includes:Nkcloud service configuration, clothes Business starts, and function, its function code such as service reparation and daily record can be:
It can be held successively after Nkcloud service crashes using the upstart operation codes serviced above in relation to Nkscloud The step of row service configuration, service start, service is repaired, log, completes the restarting of Nkcloud services as soon as possible, So as to ensure that the operational reliability of whole cloud platform.
Only it is introduced so that Nkscloud is serviced as an example, can be formulated for the service run in other cloud platforms above Corresponding upstart work, with carry out running state monitoring using upstart mechanism and completed in collapse service reparation and its His corresponding function.
Using the operation and monitoring method of the cloud platform based on (SuSE) Linux OS of above example, be utilized as it is each The upstart processes of monitoring objective service setup, the collapse time of monitoring objective service is obtained, accordingly uses upstart processes Work reply monitoring objective service, so as to when abnormal collapse occurs in service, the service of restarting be repaired automatically, without Using existing Hot Spare mode, stability and the flexibility of cloud platform are improved.Further, clothes can also be recorded automatically Business crash log, so as to help the crash reason of positioning service and trouble point, reduce maintenance cost.And can flexibly it lead to The upstart scripts realization for performing and writing in advance is crossed, its monitoring objective service protected dynamically is added, deletes, opened And closing.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Mode it can use in any combination.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized It is set to and covers other all these variations or modifications.

Claims (9)

  1. A kind of 1. operation and monitoring method of the cloud platform based on (SuSE) Linux OS, it is characterised in that including:
    Obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;
    Start upstart processes corresponding to the monitoring objective service, including:Perform special according to the monitoring objective service operation The upstart scripts that point is write in advance, wherein there is the entry condition of the upstart processes defined in the upstart scripts And closedown condition, and the work of the crash event of the monitoring objective service is responded, being run in the cloud platform has multiple institutes Monitoring objective service is stated, each monitoring objective service is corresponding with a default upstart process;
    After the crash event that the upstart processes get the monitoring objective service, the upstart processes are performed Recover the work of the monitoring objective service, to repair the monitoring objective service of collapse.
  2. 2. according to the method for claim 1, it is characterised in that perform monitoring objective described in the upstart process resumptions The work of service includes:
    Restart the monitoring objective service and/or recover the configuration of the monitoring objective service.
  3. 3. according to the method for claim 1, it is characterised in that monitor mesh described in the upstart process resumptions performing Mark the work of service also includes afterwards:
    Generate the crash log of the monitoring objective service.
  4. 4. according to the method for claim 3, it is characterised in that generating the crash log of the monitoring objective service includes:
    The crash info of the monitoring objective service, and the running log of writing system are obtained, wherein
    The crash info includes following any one or more:The collapse time of the monitoring objective service, the monitoring mesh Mark service crashes when processor running status, the monitoring objective service crashes when internal memory running status.
  5. 5. method according to any one of claim 1 to 4, it is characterised in that
    The monitoring objective service includes:Cloud storage service, cloud platform back-end services, cloud platform web service.
  6. A kind of 6. running monitor device of the cloud platform based on (SuSE) Linux OS, it is characterised in that including:
    Target Acquisition module, it is configured to obtain the monitoring objective service run in the cloud platform based on (SuSE) Linux OS;
    Process initiation module, it is configured to start upstart processes corresponding to the monitoring objective service, including:Perform according to institute The upstart scripts that monitoring objective service operation feature is write in advance are stated, wherein having defined in the upstart scripts described The entry condition and closedown condition of upstart processes, and the work of the crash event of the response monitoring objective service, it is described Operation has multiple monitoring objective services in cloud platform, each monitoring objective service be corresponding with one it is default Upstart processes;
    Process resumption module, it is configured to after the crash event that the upstart processes get the monitoring objective service, holds The work of the recovery monitoring objective service of the row upstart processes, to repair the monitoring objective service of collapse.
  7. 7. running monitor device according to claim 6, it is characterised in that
    The work of the crash event of the monitoring objective service is responded, the work includes:Restart the monitoring objective service and/ Or recover the configuration of the monitoring objective service.
  8. 8. running monitor device according to claim 6, it is characterised in that also include:
    Daily record generation module, is configured to:The crash info of the monitoring objective service, and the running log of writing system are obtained, Wherein
    The crash info includes following any one or more:The collapse time of the monitoring objective service, the monitoring mesh Mark service crashes when processor running status, the monitoring objective service crashes when internal memory running status.
  9. 9. a kind of cloud platform based on (SuSE) Linux OS, it is characterised in that including any one of claim 6 to 8 The running monitor device of cloud platform based on (SuSE) Linux OS.
CN201410635137.4A 2014-11-12 2014-11-12 Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS Active CN104391777B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410635137.4A CN104391777B (en) 2014-11-12 2014-11-12 Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410635137.4A CN104391777B (en) 2014-11-12 2014-11-12 Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS

Publications (2)

Publication Number Publication Date
CN104391777A CN104391777A (en) 2015-03-04
CN104391777B true CN104391777B (en) 2018-01-23

Family

ID=52609685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410635137.4A Active CN104391777B (en) 2014-11-12 2014-11-12 Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS

Country Status (1)

Country Link
CN (1) CN104391777B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844137B (en) * 2016-12-08 2020-05-19 腾讯科技(深圳)有限公司 Server monitoring method and device
CN107423620B (en) * 2017-03-12 2020-11-24 苏州浪潮智能科技有限公司 Management method and device for storage server service process
CN108427627A (en) * 2018-02-05 2018-08-21 阿里巴巴集团控股有限公司 The method and device and electronic equipment of statistical system stability
CN110032487A (en) * 2018-11-09 2019-07-19 阿里巴巴集团控股有限公司 Keep Alive supervision method, apparatus and electronic equipment
CN110572292B (en) * 2019-10-30 2022-04-15 北京永亚普信科技有限责任公司 High availability system and method based on unidirectional transmission link
CN111104226B (en) * 2019-12-25 2024-01-26 东北大学 Intelligent management system and method for multi-tenant service resources
CN111400138A (en) * 2020-03-17 2020-07-10 中国建设银行股份有限公司 Client monitoring method, device and system based on double-layer daemon mechanism

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595556B2 (en) * 2010-10-14 2013-11-26 International Business Machines Corporation Soft failure detection
CN103167004A (en) * 2011-12-15 2013-06-19 中国移动通信集团上海有限公司 Cloud platform host system fault correcting method and cloud platform front control server
EP2672668B1 (en) * 2012-06-06 2018-09-26 Juniper Networks, Inc. Facilitating the operation of a virtual network by predicting a failure
CN103297264B (en) * 2013-04-19 2017-04-12 无锡成电科大科技发展有限公司 Cloud platform failure recovery method and system
CN103716182B (en) * 2013-12-12 2016-08-31 中国科学院信息工程研究所 A kind of fault detect towards real-time cloud platform and fault-tolerance approach and system

Also Published As

Publication number Publication date
CN104391777A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN104391777B (en) Cloud platform and its operation and monitoring method and device based on (SuSE) Linux OS
US8910172B2 (en) Application resource switchover systems and methods
KR102268355B1 (en) Cloud deployment infrastructure validation engine
US10489232B1 (en) Data center diagnostic information
CN105354113B (en) A kind of system and method for server, management server
US9665452B2 (en) Systems and methods for smart diagnoses and triage of failures with identity continuity
EP4083786A1 (en) Cloud operating system management method and apparatus, server, management system, and medium
CN110807064B (en) Data recovery device in RAC distributed database cluster system
US10313441B2 (en) Data processing system with machine learning engine to provide enterprise monitoring functions
WO2018095414A1 (en) Method and apparatus for detecting and recovering fault of virtual machine
US10102073B2 (en) Systems and methods for providing automatic system stop and boot-to-service OS for forensics analysis
WO2017107827A1 (en) Method and apparatus for isolating environment
CN112153024B (en) Mimicry defense system based on SaaS platform
WO2016045439A1 (en) Vnfm disaster-tolerant protection method and device, nfvo and storage medium
US20160364250A1 (en) Systems and methods for providing technical support and exporting diagnostic data
CN111400139A (en) Multi-data center batch job management and control and scheduling system, method and storage medium
CN103178977A (en) Computer system and starting-up management method of same
US9734191B2 (en) Asynchronous image repository functionality
WO2018001262A1 (en) Method, apparatus and system for disaster recovery of virtual machine
CN108600156A (en) A kind of server and safety certifying method
CN115766405A (en) Fault processing method, device, equipment and storage medium
EP3473035B1 (en) Application resilience system and method thereof for applications deployed on a cloud platform
CN111949475A (en) Method and system for achieving distributed task scheduling based on zookeeper shell
US10296425B2 (en) Optimizing data processing across server clusters and data centers using checkpoint-based data replication
CN110519393B (en) Self-service equipment supervision method, device, equipment, server and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant