CN110262917A - Host self-healing method, device, computer equipment and storage medium - Google Patents

Host self-healing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110262917A
CN110262917A CN201910406740.8A CN201910406740A CN110262917A CN 110262917 A CN110262917 A CN 110262917A CN 201910406740 A CN201910406740 A CN 201910406740A CN 110262917 A CN110262917 A CN 110262917A
Authority
CN
China
Prior art keywords
host
healing
self
component
exception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910406740.8A
Other languages
Chinese (zh)
Inventor
黄桂钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910406740.8A priority Critical patent/CN110262917A/en
Publication of CN110262917A publication Critical patent/CN110262917A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the present application provides a kind of host self-healing method, device, computer equipment and storage medium, is related to field of cloud computer technology, can be applied in PaaS platform.The described method includes: if detecting, there is exception and meets the condition of self-healing in host, self-healing is carried out to host according to the Exception Type occurred, wherein, self-healing mode corresponding to different Exception Types is different, and the condition of self-healing corresponding to different Exception Types is different;In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.The embodiment of the present application first carries out self-healing when host occurs abnormal, reduces host and abnormal probability occurs;Since self-healing is to detect exception occur and meet the condition of self-healing can starting self-healing, real-time response is abnormal, therefore will not influence the operation of host upper container because of abnormal appearance, improves the stability of host operation;Self-healing is carried out to host to reduce the abnormal human cost of processing.

Description

Host self-healing method, device, computer equipment and storage medium
Technical field
This application involves field of cloud computer technology more particularly to a kind of host self-healing method, device, computer equipment and Storage medium.
Background technique
In cloud platform, such as in PaaS (Platform-as-a-Service, platform service) platform, have a large amount of Host.Since host may use always, it is easy to appear some exceptions for host, such as data on host The memory capacity of volume runs low;If some components in host are hung, as docker component is hung;Such as memory image Mirror image warehouse expired fastly etc..If there is exception, reminded, then handled by related personnel, then largely Host needs a large amount of related personnel, so needs a large amount of human cost and time cost;Host occurs abnormal simultaneously Afterwards, the processing of related personnel one by one, it is still untreated to inevitably result in some abnormal long periods, will affect the fortune of cloud platform Row.
Summary of the invention
The embodiment of the present application provides a kind of host self-healing method, device, computer equipment and storage medium, can ring in real time Should be abnormal, it reduces host and abnormal probability occurs, improve the stability of host operation.
In a first aspect, the embodiment of the present application provides a kind of host self-healing method, comprising:
If detecting, there is exception and meets the condition of self-healing in host, according to the Exception Type occurred to host into Row self-healing, wherein self-healing mode corresponding to different Exception Types is different, the item of self-healing corresponding to different Exception Types Part is different;In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
Second aspect, the embodiment of the invention provides a kind of host self-healing device, which includes using The corresponding unit of method described in the above-mentioned first aspect of execution.
The third aspect, the embodiment of the invention provides a kind of computer equipment, the computer equipment includes memory, with And the processor being connected with the memory;
The memory is for storing computer program, and the processor is for running the calculating stored in the memory Machine program, to execute method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, when the computer program is executed by processor, realizes method described in above-mentioned first aspect.
The embodiment of the present application is under conditions of detecting that host exception occurs and meets self-healing, according to the exception occurred Type carries out self-healing to host, in this way, first carrying out self-healing when host occurs abnormal, reduction host occurs abnormal Probability;Since self-healing is to detect exception occur and meet the condition of self-healing can starting self-healing, real-time response is abnormal, therefore not The operation of host upper container can be influenced because of abnormal appearance, improve the stability of host operation;To host into Row self-healing is to reduce the abnormal human cost of processing.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of host self-healing method provided by the embodiments of the present application;
Fig. 2 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 3 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 4 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 5 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 6 is the schematic block diagram of host self-healing device provided by the embodiments of the present application;
Fig. 7 is the schematic block diagram of self-healing unit provided by the embodiments of the present application;
Fig. 8 is the schematic block diagram of volume self-healing unit provided by the embodiments of the present application;
Fig. 9 is the schematic block diagram of another self-healing unit provided by the embodiments of the present application;
Figure 10 is the schematic block diagram of component self-healing unit provided by the embodiments of the present application;
Figure 11 is the schematic block diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.
Fig. 1 is the flow diagram of host self-healing method provided by the embodiments of the present application.As shown in Figure 1, this method packet Include S101-S102.
S101, if detecting, there is exception and meets the condition of self-healing in host, according to the Exception Type occurred to place Host carries out self-healing, wherein self-healing mode corresponding to different Exception Types is different, corresponding to different Exception Types certainly More condition is different.
Wherein, Exception Type includes a variety of, as the book of host is abnormal, on host locally applied mirror image it is abnormal, Component exception, the resource exception of host on host etc..Self-healing mode corresponding to different Exception Types is different, no The condition of self-healing corresponding to same Exception Type is different.
It include book on host, in one embodiment, the Exception Type includes that the book of host is abnormal.Root According to the case where book on host in platform and book is divided into three classes by the purposes of book, these three types are respectively: 1, the log volume of container application, alternatively referred to as container Log Directory, that is, the book of storing containers application log such as exist The NAS (Network Attached Storage, network attached storage) of container application log is rolled up;2, host is locally rolled up, packet Include the book (datavolume, alternatively referred to as data volumes) of docker, the metadata volume of docker (metadata volume, Alternatively referred to as metadata volumes), the root of host volume etc.;3, the application volume of container application.It should be noted that book Classification can also be according to other mode classifications.
As shown in Fig. 2, the Exception Type includes the book exception of host, in this way, carrying out self-healing, packet to host It includes: self-healing is carried out to the book on host.Step S101 includes the following steps S201-S203.
S201, if detecting, the memory capacity of the book on host reaches the first preset capacity, and judgement detects There is exception and meets the condition of self-healing in book on host.
The size of book on host can be detected according to prefixed time interval, such as can by docker info come into Row detection.Wherein it is possible to detect the total size (Space Total) of book on host, book has used size (Space Used), book available size (Space Available).
Space Used:xxx MB
Space Total:xxxxx MB
Space Available:xxxxx MB
When space Used accounting be more than book the first preset capacity, such as the first preset capacity can for 90% or its His numerical value then determines to detect that the book of host exception occurs and meets the condition of self-healing.
S202 detects the type of the book of the appearance exception on host.
It such as detects and causes abnormal book, namely abnormal issued from which book, and according to the book and institute Relationship between the data of preservation determines the type of abnormal book occur.Pass between book and the data saved System can be reserved in the database, also can be reserved in other positions, is such as stored in some file of host.
S203 carries out self-healing to the book on the host according to the type for abnormal book occur.
The type of book includes the log volume of container application, host the application volume that locally volume, container are applied.Different numbers Self-healing mode corresponding to type according to volume is different.
In one embodiment, as shown in figure 3, step S203 includes the following steps S301-S303.
S301 traverses the book if occurring the log volume that abnormal book is container application on the host In big file.
Wherein, what is saved in the log volume of container application is all kinds of logs of container application, such as standard output log, data Log, access log etc..The memory capacity that big file in book can be understood as file is greater than the capacity of a certain setting, or File of the memory capacity of file according to default ranking before being accounted for after sequence arrangement from big to small, such as memory capacity in person's book According to the file for accounting for preceding 10 after sequence arrangement from big to small.
S302 judges whether the big file is standard output journal file.
Standard output journal file refers to the outputting log file in container application start-up course, there is specific log text Part format, such as .out file format.
If the big file is standard output journal file, step S303 is executed;If the big file is non-standard output Journal file executes step S304.
S303 is executed and is deleted script, to delete the standard output journal file in the big file.
It is to be appreciated that the utility value of the standard output log of application is minimum, that is, only need to retain the day on the same day Will, this log then can be deleted directly, not interfere with the subsequent analysis of applied business.
The big file is carried out compression filing by S304.
It is to be appreciated that, if the big file is data logging, accessing day if big file is not non-standard outputting log file Will etc., this just very valuable, this log then needs to carry out compression filing, i.e., advanced compression is being filed.Wherein, There is individual archiving server in the cloud platform, can be filed according to the content of the data saved in big file.
Above step S301-S304 carries out self-healing to the log volume that container is applied, i.e., by the standard output day in book The big file of will is deleted, and the big file of other Log Types carries out compression filing.
S305, if occurring abnormal book on the host is that host is locally rolled up, detecting the book is place The root volume of the book of docker, the metadata volume of docker or host in the volume of host local.
Wherein, what is stored in the book of docker is container application using package informatin;In the metadata volume of docker Storage is the information such as tag, name, status.
S306 executes first and removes script, to remove on the host if the book is the book of docker The information of remaining container application packet.
S307 executes second and removes script, to remove the host if the book is the metadata volume of docker The data of the upper container exited extremely.
Since what is stored in meatadata volumes is the information such as tag, name, status, these information can be with the appearance of execution The increase of device number, meatadata volumes is fully supported, while the container exited extremely, can not be by corresponding tag with information such as status It is removed, can then have junk data at this time in meatadata volumes.As second remove script can be with are as follows: dockerps- Aqfstatus=exited | xargs dockerrm.It executes this and second understands script to understand the appearance exited extremely on host The junk data of device.
S308, if the root that the book is host is rolled up, the application environment that detection host is presently in, and according to place The application environment that host is presently in carries out respective handling to the default file in the root volume of the host.
The application environment that host is presently in refers to the host which application environment the host belongs under.Its In, application environment includes production environment, test environment, exploitation environment etc..Wherein, production environment means the money on the host External environment is docked in source (being such as mounted on the container on host, the CPU resource on host), or is carried out for external user Access;Test environment means that the resource on the host is used for test;Exploitation environment means the resource on the host It is used for exploitation.The application environment being presently according to host carries out phase to the default file in the root volume of the host It should handle, comprising: if the application environment that host is presently in is production environment, to the default text in the root volume of the host Part carries out compression processing;If the application environment that host is presently in is not production environment, in the root volume of the host Default file is deleted.Wherein, default file is preassigned file, such as journal journal file.Wherein it is possible to Understand ground, the resource under production environment on host accesses for external user, in this way, the file stored in host root volume It is more more valuable than the file under other environment.
Above step S305-S308 is realized locally to roll up host and be carried out self-healing treatment, and is locally rolled up according to host Different type carries out different self-healing processing.
S309 traverses the book if occurring the application volume that abnormal book is container application on the host In take up space the container application of maximum preceding predetermined number (30 such as preceding), to obtain container application catalogue, according to container application Catalogue, the information of container application send mail to pre-set user, so that pre-set user carries out further according to the information of mail Operation.Wherein further operation carries out assessment using volume and is dilatation or uploads to user including what is applied to container File is standardized.Pre-set user includes the project team of container application and the application management person etc. of container application.
Fig. 2-embodiment realization shown in Fig. 3 detects the book on host, in the memory capacity of book When reaching the first preset capacity, data are rolled up and carry out self-healing.In this way, realize to abnormal detection in advance and precognition, and to mentioning Before detect abnormal carry out self-healing.Wherein, the self-healing of book executes automatically, improves the efficiency of book self-healing, It reduces book and abnormal probability occurs, while self-healing is when detecting that book exception occurs and meets the condition of self-healing, both It can start self-healing, in this way, the exception of response data volume in real time, will not influence the operation of host, improve host fortune Capable stability;Self-healing is carried out to host to reduce the abnormal human cost of processing.
It include applying mirror image, including two kinds of situations in host, one, the locally applied mirror image of host, it can be understood as, Before container starting, therefore, to assure that host locally ensures to have the mirror image, could be based on the mirror image and be started;Two, host category Host in mirror image warehouse.So self-healing method of host, mirrored volume self-healing method and mirror image storehouse including host The availability in library detects.
In one embodiment, the Exception Type includes that the locally applied mirror image on host is abnormal, and step S101 includes: If detecting, the memory capacity of the mirrored volume for saving locally applied mirror image on the host reaches the second preset capacity, Then determine to detect occur exception for the mirrored volume that saves locally applied mirror image on the host and meet the item of self-healing Part;It executes third and removes script, it is remaining using Mirror Info on the host to remove.
It is to be appreciated that the container of container application is before activation, needing host locally ensures to have mirror image, and could be based on should Mirror image is started, and if local mirror image not will do it deletion after container is offline, as it is possible that other containers are using. In this case, container application mirror image (alternatively referred to as applying mirror image) occupied storage of host can constantly increase, and one Denier is more than the size of volume, then will lead to new container can not pull corresponding mirror image, so that container can not start.Cause This, if detecting, the memory capacity of the mirrored volume for saving locally applied mirror image on host reaches the second preset capacity, It executes third and removes script, to understand the residual container Mirror Info on host.
It is such as detected by docker info each a period of time: Data Space Used:xxx GB;Data Space Total:xxxxx GB;Data Space Available:xxxxx GB.When Data Space Used accounting is more than second pre- If capacity, when such as 90%, automatic trigger third removes script, and removing script such as third can be with are as follows: docker images | awk'{print$3}'|xargs dockerrmi。
The availability in mirror image warehouse is detected abnormal.Specifically, if step S101 includes: that detection reaches detection mirror picture The time in warehouse detects the mirror image warehouse address of docker component liaison, by perform script with detecting the mirror image warehouse The availability of location;If mirror image warehouse address is unavailable, judge that the mirror image warehouse on host exception occurs and meets self-healing Condition;Mail is sent to pre-set user.To detect the not available reason in mirror image warehouse address by pre-set user, as warehouse takes Business is abnormal, port is not opened, network difference etc..Wherein, as sent ping request to host corresponding to mirror image warehouse address, If receiving the data of the return of host corresponding to mirror image warehouse address, mean that mirror image warehouse address is available, if one The data for not receiving the return of host corresponding to mirror image warehouse address in fixing time, then mean that mirror image warehouse address can not With.
There are many resources, such as CPU, memory (can be also simply referred to as mem) resource in host.Exception Type includes host In resource exception, in this way, the self-healing method of host, the self-healing method including host resource.
Specifically, step S101 includes: whether the CPU usage detected in host reaches default CPU usage (such as 80% or other numerical value), or detection host in mem utilization rate whether reach default mem utilization rate (such as 80% or other Numerical value);If the CPU usage in host reaches the mem utilization rate in default CPU usage or host and reaches default Mem determines that host exception occurs and meets the condition of self-healing;Obtain the preceding preset quantity for occupying cpu resource or mem resource Process, obtain container application corresponding to the process, and obtain container application resource distribution;Will occupy cpu resource or The resource distribution of the process of the preceding preset quantity of mem resource and the application of corresponding container, is sent to related personnel.As led to Cross mail be sent to related personnel so that related personnel according to the preceding preset quantity that occupies cpu resource or mem resource into The resource distribution of journey and the application of corresponding container carries out abnormal investigation.If it find that the resource of host is inadequate, it will To corresponding host cluster, increase new host, in this way, the container on the high host of resource consumption floats to new place On host, to share the pressure on host.
It include component in host, in this way, Exception Type includes the component exception on host, the self-healing side of host Method, the self-healing method including component on host.It should be noted that framework of the component according to cloud platform of cloud platform, demand Deng determination.The component of such as cloud platform includes docker component, mesos component, marathon component, zookeeper component.
In one embodiment, as shown in figure 4, step S101 includes the following steps S401-S402.
S401 determines the type for the component to be monitored if detecting the time for reaching monitor component.
It is to be appreciated that being periodically executed the monitoring script of monitor component, wherein the monitoring script can matching by cloud platform It sets the page and carries out configuration distributing.
S402, according to the type for the component to be monitored, to detect whether the component on host abnormal and satisfaction occurs The condition of self-healing, if detecting, there is exception and meets the condition of self-healing in the component on host, to the component on host into Row self-healing.If there is exception and meets the condition of self-healing in the component being not detected on host, without any processing.
Self-healing mode corresponding to the different component types of host is different.
In one embodiment, as shown in figure 5, step S402 includes the following steps S501-S504.
S501, if the type for the component to be monitored is docker component, executive module information inspection script, to check State docker module information;If not receiving the data that the docker component returns in the given time, judgement detects place There is exception and meets the condition of self-healing in the docker component on host;It executes docker and restarts script, it is described to restart The docker component on host.
Script is checked by execution dockerinfo, to check docker module information, including mirror image and container number etc., If illustrating that docker component hang is lived, i.e. docker component is stuck again without returned data in 5 seconds.In this kind In the case of, it executes docker and restarts script, to restart docker component.Wherein, docker restart script can be with are as follows: systemctl restartdocker.Wherein, preset time may be arranged as other times.
S502, if the type for the component to be monitored is docker component, executive module state checks script, to check State docker component current operating conditions information;If the docker component current operating conditions are halted state, determine to examine There is exception and meets the condition of self-healing in the docker component measured on host;It executes docker and starts script, to open Move the docker component on the host.
It, can also be with it is to be appreciated that module information checks that script and component states check that script can execute same time It is performed separately, if certain interval of time executes, such as can execute one of script when reaching the time of monitor component, then under Another script is executed when once reaching the time of monitor component.
Wherein, can be checked by systemctl status docker docker component current operating conditions whether be Stop (stop) state, i.e. whether docker component is hung, if docker component current operating conditions are halted state, executes Docker starts script, to start docker component.Wherein, docker start script can be with are as follows: systemctl start docker。
S503, if the type for the component to be monitored is mesos component or is marathon component, executive process is checked Script whether there is with process corresponding to detection components;Process corresponding to component if it does not exist then determines to detect host There is exception and meets the condition of self-healing in component on machine;Corresponding starting script is executed, to start the phase on the host Answer component.
Wherein, it if the type for the component to be monitored is mesos component, executes mesos process and checks script, with detection Mesos process whether there is;Mesos process if it does not exist then determines to detect that the component on host occurs abnormal and meets The condition of self-healing;It executes mesos component and starts script, to start mesos component.Ps-ef can such as be passed through | grep mesos- Slave checks that mesos process whether there is, if it does not exist, then executing the starting script of mesos component automatically.
Wherein, it if the type for the component to be monitored is marathon component, executes marathon process and checks script, with Detection marathon process whether there is;Marathon process if it does not exist then determines to detect that the component on host occurs Exception and the condition for meeting self-healing;It executes marathon component and starts script, to start marathon component.Ps-can such as be passed through Ef | grep marathon checks that marathon process whether there is, if it does not exist, then executing the starting of marathon component Script.
S504 executes port and checks script, to detect if the type for the component to be monitored is zookeeper component Whether the listening port for stating zookeeper component is opened;If the listening port of zookeeper component is not opened, determine to detect There is exception and meets the condition of self-healing in the zookeeper component on to host;It executes port and starts script, with starting The listening port of the zookeeper component.
Netstat-anp can such as be passed through | grep zk_port detects whether the listening port of zookeeper component is opened It opens, wherein zk_port refers to the listening port of zookeeper, if it does not exist, then automatic execute starting zookeeper group The port of part starts script, to start the listening port of zookeeper component.
It should be noted that the component on host further includes other components.
Fig. 4-embodiment shown in fig. 5, which is realized, carries out self-healing to the component on host.It is detected on host by timing Component whether there is exception, if the abnormal component on host occur carries out self-healing.Wherein, the self-healing of book is automatic It executes, improves the efficiency of book self-healing, reduce book and abnormal probability occur, while self-healing is to detect book When exception occur and meeting the condition of self-healing, it can both start self-healing, in this way, the exception of response data volume in real time, Bu Huiying The operation for ringing host improves the stability of host operation;Self-healing is carried out to host to reduce the abnormal manpower of processing Cost.
S102, in the preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
It is to be appreciated that after carrying out self-healing, if within a preset time, still there is identical exception, then it is assumed that self-healing It not can solve the exception still later, then carry out warning note.If should be noted that in the preset time after self-healing, not Detect that identical exception occurs in host, and more than the identical exception of host appearance after preset time, is detected again, then it still presses It is handled according to the mode in step S101.
Book if Exception Type includes host is abnormal, after the book to host carries out self-healing, if the The memory capacity for still occurring book in one preset time reaches the first preset capacity, then carries out warning note, while to default User sends alarm mail, so that it is dilatation or to user that pre-set user, which carries out assessment determination come the book to host, The file of upload standardizes etc..Wherein, the first preset time can be 10 minutes etc., and the first preset time can also be other Time.If Exception Type includes the locally applied mirror image exception on host, to locally applied for saving on host After the mirrored volume of mirror image carries out self-healing, if still occurring being used to save locally applied mirror image on host in the second preset time The memory capacity of mirrored volume reaches the second preset capacity, then carries out warning note, while sending alarm mail to pre-set user.The Two preset times can be half a day etc., and the second preset time can also be other times.
Availability detection as Exception Type includes mirror image warehouse is abnormal, is carrying out availability detection to mirror image warehouse address Afterwards, unavailable if still occurring mirror image warehouse address in third preset time, warning note is carried out, while sending out to pre-set user Send alarm mail.Third preset time can be 1 hour etc., can also be other times.
If Exception Type includes the resource exception in host, if after carrying out self-healing to the resource in host, the 4th Still occur the resource exception in host in preset time, then carries out warning note, while sending alarm mail to pre-set user. 4th preset time can be 5 minutes etc., can also be other times.
If Exception Type includes the component exception on host, if after carrying out self-healing to the component on host, the 5th The component still occurred in host in preset time is abnormal, then carries out warning note, while sending alarm mail to pre-set user. 5th preset time can be half an hour etc., can also be other times.
Fig. 6 is the schematic block diagram of host self-healing device provided by the embodiments of the present application.The device includes for executing Unit corresponding to above-mentioned host self-healing method.As shown in fig. 6, the host self-healing device 100 include self-healing unit 101, Prompt unit 102.
Self-healing unit 101, if for detecting that host exception occurs and meets the condition of self-healing, it is different according to what is occurred Normal type carries out self-healing to host, wherein self-healing mode corresponding to different Exception Types is different, different Exception Types The condition of corresponding self-healing is different.
Prompt unit 102, in the preset time after self-healing, if detecting, there is identical exception in host, into Row warning note.
In one embodiment, the Exception Type includes that the book of host is abnormal.As shown in fig. 7, self-healing unit 101 Including volume self-healing judging unit 201, volume kind detecting unit 202, volume self-healing unit 203.
Self-healing judging unit 201 is rolled up, if for detecting that it is default that the memory capacity of the book on host reaches first Capacity then determines to detect that the book on host exception occurs and meets the condition of self-healing.
Kind detecting unit 202 is rolled up, for detecting the type of the abnormal book of the appearance on host.
Self-healing unit 203 is rolled up, for involving according to the type for abnormal book occur to the data on the host Row self-healing.
In one embodiment, as shown in figure 8, volume self-healing unit 203 include Traversal Unit 301, file judging unit 302, File deletion unit 303, compression profiling unit 304, local volume detection unit 305, the first clearing cell 306, second remove list Member 307, root roll up processing unit 308, using volume processing unit 309.
Traversal Unit 301, if occurring the log volume that abnormal book is container application, traversal on the host Big file in the book.
File judging unit 302, for judging whether the big file is standard output journal file.
File deletes unit 303, if being standard output journal file for the big file, executes deletion script, with Delete the standard output journal file in the big file.
Compress profiling unit 304, if for the big file be non-standard outputting log file, by the big file into Row compression filing.
Local volume detection unit 305, if occurring abnormal book on the host is that host is locally rolled up, inspection Survey the root that the book is the book of docker of the host locally in volume, the metadata volume of docker or host Volume.
First clearing cell 306 executes first and removes script if the book for being docker for the book, with Remove the information of remaining container application packet on the host.
Second clearing cell 307 executes second and removes script if the metadata volume for being docker for the book, To remove the data of the container exited extremely on the host.
Root rolls up processing unit 308, if the root for the book to be host is rolled up, what detection host was presently in is answered With environment, and the application environment being presently according to host, the default file in the root volume of the host is carried out corresponding Processing.
Using volume processing unit 309, if occurring the application that abnormal book is container application on the host Volume, traverses the maximum preceding predetermined number that takes up space in the book (a container application in such as preceding 30), to obtain container application Catalogue sends mail to pre-set user according to the information that container application catalogue, container are applied, so that pre-set user is according to mail Information further operated.
In one embodiment, the Exception Type includes that the locally applied mirror image on host is abnormal.Self-healing unit 101 wraps Include mirrored volume self-healing judging unit, mirrored volume clearing cell.Wherein, mirrored volume self-healing judging unit, if described for detecting The memory capacity of the mirrored volume for saving locally applied mirror image on host reaches the second preset capacity, then determines to detect Occur exception for the mirrored volume that saves locally applied mirror image on the host and meet the condition of self-healing.Mirrored volume is removed Unit removes script for executing third, remaining using Mirror Info on the host to remove.
In one embodiment, the Exception Type includes that the availability detection in mirror image warehouse is abnormal.Self-healing unit 101 includes Availability probe unit, availability self-healing judging unit, transmission unit.Wherein, availability probe unit, if being reached for detecting To the time in detection mirror image warehouse, the mirror image warehouse address of docker component liaison is detected by perform script, it should with detection The availability of mirror image warehouse address.Availability self-healing judging unit determines host if unavailable for mirror image warehouse address On mirror image warehouse there is exception and meet the condition of self-healing.Transmission unit, for sending mail to pre-set user.
In one embodiment, Exception Type includes the resource exception in host.Self-healing unit 101 includes resources measurement list Member, obtains transmission unit at resource self-healing judging unit.Resources measurement unit, for whether detecting the CPU usage in host Whether the mem utilization rate reached in default CPU usage, or detection host reaches default mem utilization rate.Resource self-healing is sentenced Order member, reaches if reaching the mem utilization rate in default CPU usage or host for the CPU usage in host Default mem determines that host exception occurs and meets the condition of self-healing.Obtain transmission unit, obtain occupy cpu resource or The process of the preceding preset quantity of mem resource obtains container application corresponding to the process, and the resource for obtaining container application is matched It sets;The process of the preceding preset quantity of cpu resource or mem resource and the resource distribution of corresponding container application will be occupied, It is sent to related personnel.
In one embodiment, Exception Type includes that the component in host is abnormal.As shown in figure 9, self-healing unit 101 includes Component type determination unit 401, component self-healing unit 402.
Component type determination unit 401, if determining the component to be monitored for detecting the time for reaching monitor component Type.
Component self-healing unit 402, for the type according to the component to be monitored, whether to detect the component on host There is exception and meet the condition of self-healing, the component on host exception occurs and meets the condition of self-healing if detecting, to place Component on host carries out self-healing.
In one embodiment, as shown in Figure 10, component self-healing unit 402 include docker component self-healing unit 501, Mesos component self-healing unit 502, marathon component self-healing unit 503, zookeeper component self-healing unit 504.
Docker component self-healing unit 501, if the type for the component to be monitored is docker component, executive module Information inspection script, to check the docker module information;If not receiving the docker component in the given time to return Data, then determine detect that the docker component on host exception occurs and meets the condition of self-healing;It executes Docker restarts script, to restart the docker component on the host.
Docker component self-healing unit 501, if the type for the component for being also used to be monitored is docker component, execution group Part state checks script, to check the docker component current operating conditions information;If the docker component is currently run State is halted state, then determines to detect that the docker component on host exception occurs and meets the condition of self-healing; It executes docker and starts script, to start the docker component on the host.
Mesos component self-healing unit 502 executes mesos if the type for the component to be monitored is mesos component Process checks script, whether there is with detecting mesos process;Mesos process if it does not exist then determines to detect on host There is exception and meets the condition of self-healing in component;It executes mesos component and starts script, to start mesos component.
Marathon component self-healing unit 503 executes if the type for the component to be monitored is marathon component Marathon process checks script, whether there is with detecting marathon process;Marathon process if it does not exist then determines to examine The component on host is measured exception occur and meet the condition of self-healing;It executes marathon component and starts script, with starting Marathon component.
Zookeeper component self-healing unit 504 is held if the type for the component to be monitored is zookeeper component Script is checked in row port, to detect whether the listening port of the zookeeper component is opened;If the prison of zookeeper component It listens port not open, then determines to detect that the zookeeper component on host exception occurs and meets the condition of self-healing; It executes port and starts script, to start the listening port of the zookeeper component.
It should be noted that it is apparent to those skilled in the art that, the tool of above-mentioned apparatus and each unit Body realizes process, can be no longer superfluous herein with reference to the corresponding description in preceding method embodiment, for convenience of description and succinctly It states.
Above-mentioned apparatus can be implemented as a kind of form of computer program, and computer program can be in meter as shown in figure 11 It calculates and is run on machine equipment.
Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The equipment is that terminal etc. is set It is standby, such as the host in PaaS platform.The equipment 100 includes processor 102, the memory connected by system bus 101 With network interface 103, wherein memory may include non-volatile memory medium 104 and built-in storage 105.
The non-volatile memory medium 104 can storage program area 1041 and computer program 1042.This is non-volatile to deposit , it can be achieved that host described in above-mentioned terminal when the computer program 1042 stored in storage media is executed by processor 102 Self-healing method.The processor 102 supports the operation of whole equipment 100 for providing calculating and control ability.The built-in storage 105 provide environment for the operation of the computer program in non-volatile memory medium, which is executed by processor 102 When, it may make processor 102 to execute host self-healing method described in above-mentioned terminal.The network interface 103 is for carrying out net Network communication.It will be understood by those skilled in the art that structure shown in figure, only part-structure relevant to application scheme Block diagram, do not constitute the restriction for the equipment being applied thereon to application scheme, specific equipment may include than in figure Shown more or fewer components perhaps combine certain components or with different component layouts.
Wherein, the processor 102 is for running computer program stored in memory, to realize above-mentioned host Any embodiment of self-healing method.
It should be appreciated that in the embodiment of the present application, alleged processor 102 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (application program lication Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other can Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor Or the processor is also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process, It is that relevant hardware can be instructed to complete by computer program.The computer program can be stored in a storage medium, The storage medium can be computer readable storage medium.The computer program is by the processing of at least one of the computer system Device executes, to realize the process step of the embodiment of the above method.
Therefore, present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium, should Computer readable storage medium includes non-volatile computer readable storage medium storing program for executing.The storage medium is stored with computer program, The computer program realizes any embodiment of above-mentioned host self-healing method when being executed by a processor.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk Or the various computer readable storage mediums that can store program code such as CD.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation.Those skilled in the art can be with It is well understood, for convenience of description and succinctly, the specific work process of the device of foregoing description, equipment and unit can With with reference to the corresponding process in preceding method embodiment, details are not described herein.The above, the only specific embodiment party of the application Formula, but the protection scope of the application is not limited thereto, and anyone skilled in the art discloses in the application In technical scope, various equivalent modifications or substitutions can be readily occurred in, these modifications or substitutions should all cover the guarantor in the application Within the scope of shield.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims (10)

1. a kind of host self-healing method, which is characterized in that the described method includes:
If detecting, there is exception and meets the condition of self-healing in host, is carried out certainly according to the Exception Type occurred to host More, wherein self-healing mode corresponding to different Exception Types is different, and the condition of self-healing corresponding to different Exception Types is not Together;
In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
2. the method according to claim 1, wherein the Exception Type include host book it is abnormal, If described detect that host exception occurs and meets the condition of self-healing, host is carried out certainly according to the Exception Type occurred More, comprising:
If detecting, the memory capacity of the book on host reaches the first preset capacity, and judgement detects on host There is exception and meets the condition of self-healing in book;
Detect the type of the book of the appearance exception on host;
Self-healing is carried out to the book on the host according to the type for abnormal book occur.
3. according to the method described in claim 2, it is characterized in that, the type of the book includes the log of container application Volume, it is described that self-healing is carried out to the book on the host according to the type for abnormal book occur, comprising:
If occurring the log volume that abnormal book is container application on the host, the big text in the book is traversed Part;
Judge whether the big file is standard output journal file;
If the big file is standard output journal file, deletion script is executed, it is defeated to delete the standard in the big file Journal file out;
If the big file is non-standard outputting log file, the big file is subjected to compression filing.
4. according to the method described in claim 2, it is characterized in that, the type of the book includes host locally volume, institute It states and self-healing is carried out to the book on the host according to the type for abnormal book occur, comprising:
If occurring abnormal book on the host is that host is locally rolled up, detecting the book is that host is locally rolled up In the book of docker, the metadata volume of docker or host root volume;
If the book is the book of docker, executes first and remove script, to remove remaining appearance on the host The information of device application packet;
If the book is the metadata volume of docker, executes second and remove script, moved back extremely on the host with removing The data of container out;
If the root that the book is host is rolled up, the application environment that detection host is presently in, and current according to host Locating application environment carries out respective handling to the default file in the root volume of the host.
5. the method according to claim 1, wherein the Exception Type includes the locally applied mirror on host As abnormal, if described detect that host exception occurs and meets the condition of self-healing, according to the Exception Type occurred to host Machine carries out self-healing, comprising:
If it is default to detect that the memory capacity of the mirrored volume for saving locally applied mirror image on the host reaches second Capacity then determines to detect occur exception for the mirrored volume that saves locally applied mirror image on the host and meet self-healing Condition;
It executes third and removes script, it is remaining using Mirror Info on the host to remove.
6. the method according to claim 1, wherein the Exception Type includes that component on host is abnormal, If described detect that host exception occurs and meets the condition of self-healing, host is carried out certainly according to the Exception Type occurred More, comprising:
If detecting the time for reaching monitor component, the type for the component to be monitored is determined;
According to the type for the component to be monitored, whether there is exception to detect the component on host and meet the item of self-healing Part, if detecting, there is exception and meets the condition of self-healing in the component on host, carries out self-healing to the component on host.
7. according to the method described in claim 6, it is characterized in that, the type according to the component to be monitored, to detect Whether the component on host there is exception and meets the condition of self-healing, if detecting, the component appearance on host is abnormal and full The condition of sufficient self-healing carries out self-healing to the component on host, comprising:
If the type for the component to be monitored is docker component, executive module information inspection script, to check the docker Module information;If not receiving the data that the docker component returns in the given time, judgement is detected on host There is exception and meets the condition of self-healing in the docker component;It executes docker and restarts script, to restart on the host The docker component;
If the type for the component to be monitored is docker component, executive module state checks script, to check the docker Component current operating conditions information;If the docker component current operating conditions are halted state, judgement detects host There is exception and meets the condition of self-healing in the docker component on machine;It executes docker and starts script, to start the place The docker component on host;
If the type for the component to be monitored is mesos component or is marathon component, executive process checks script, with inspection Surveying process corresponding to component whether there is;Process corresponding to component if it does not exist then determines to detect the group on host There is exception and meets the condition of self-healing in part;Corresponding starting script is executed, to start the corresponding assembly on the host;
If the type for the component to be monitored is zookeeper component, executes port and check script, described in detection Whether the listening port of zookeeper component is opened;If the listening port of zookeeper component is not opened, judgement is detected There is exception and meets the condition of self-healing in the zookeeper component on host;It executes port and starts script, to start State the listening port of zookeeper component.
8. a kind of host self-healing device, which is characterized in that the host self-healing device includes:
Self-healing unit, if for detecting that host exception occurs and meets the condition of self-healing, according to the Exception Type occurred Self-healing is carried out to host, wherein self-healing mode corresponding to different Exception Types is different, corresponding to different Exception Types Self-healing condition it is different;
Prompt unit, for if detecting the identical exception of host appearance, carrying out alarm and mentioning in the preset time after self-healing Show.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory, and is connected with the memory Processor;
The memory is for storing computer program;The processor is for running the computer journey stored in the memory Sequence, to execute the method according to claim 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence when the computer program is executed by processor, realizes the method according to claim 1 to 7.
CN201910406740.8A 2019-05-15 2019-05-15 Host self-healing method, device, computer equipment and storage medium Pending CN110262917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910406740.8A CN110262917A (en) 2019-05-15 2019-05-15 Host self-healing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910406740.8A CN110262917A (en) 2019-05-15 2019-05-15 Host self-healing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110262917A true CN110262917A (en) 2019-09-20

Family

ID=67913256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910406740.8A Pending CN110262917A (en) 2019-05-15 2019-05-15 Host self-healing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110262917A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796959A (en) * 2020-06-30 2020-10-20 中国工商银行股份有限公司 Host machine container self-healing method, device and system
CN112346874A (en) * 2020-11-27 2021-02-09 中国工商银行股份有限公司 Abnormal volume processing method and device based on cloud platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080282104A1 (en) * 2007-05-11 2008-11-13 Microsoft Corporation Self Healing Software
CN105808394A (en) * 2014-12-31 2016-07-27 中兴通讯股份有限公司 Server self-healing method and device
CN107179957A (en) * 2016-03-10 2017-09-19 阿里巴巴集团控股有限公司 Physical machine failure modes processing method, device and virtual machine restoration methods, system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080282104A1 (en) * 2007-05-11 2008-11-13 Microsoft Corporation Self Healing Software
CN105808394A (en) * 2014-12-31 2016-07-27 中兴通讯股份有限公司 Server self-healing method and device
CN107179957A (en) * 2016-03-10 2017-09-19 阿里巴巴集团控股有限公司 Physical machine failure modes processing method, device and virtual machine restoration methods, system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796959A (en) * 2020-06-30 2020-10-20 中国工商银行股份有限公司 Host machine container self-healing method, device and system
CN111796959B (en) * 2020-06-30 2023-08-08 中国工商银行股份有限公司 Self-healing method, device and system for host container
CN112346874A (en) * 2020-11-27 2021-02-09 中国工商银行股份有限公司 Abnormal volume processing method and device based on cloud platform
CN112346874B (en) * 2020-11-27 2023-08-25 中国工商银行股份有限公司 Abnormal volume processing method and device based on cloud platform

Similar Documents

Publication Publication Date Title
US11429506B2 (en) Systems and methods for collecting, tracking, and storing system performance and event data for computing devices
CN106548402B (en) Resource transfer monitoring method and device
CN108829560A (en) Data monitoring method, device, computer equipment and storage medium
CN109660426B (en) Monitoring method and system, computer readable medium and electronic device
CN105610648B (en) A kind of acquisition method and server of O&M monitoring data
CN105302697B (en) A kind of running state monitoring method and system of density data model database
CN111221591A (en) Method, system and medium for detecting availability of micro-service deployed based on Kubernetes
CN107181821A (en) A kind of information push method and device based on SSE specifications
CN112230847B (en) Method, system, terminal and storage medium for monitoring K8s storage volume
CN110266544B (en) Device and method for positioning reason of cloud platform micro-service failure
CN110262917A (en) Host self-healing method, device, computer equipment and storage medium
CN107004169A (en) The automation tenant upgrading serviced for multi-tenant
CN103490978A (en) Terminal, server and message monitoring method
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
CN114513400B (en) Log aggregation system and method for improving usability of log aggregation system
CN110018932B (en) Method and device for monitoring container magnetic disk
CN109165147A (en) Log print control program, device, system, back-end server and headend equipment
CN109558299A (en) Business monitoring and the method, apparatus of early warning, equipment and storage medium
CN106446158B (en) Application data sharing method, sharing device and terminal
CN106650281B (en) A kind of data processing method, system, server and client side
CN115525392A (en) Container monitoring method and device, electronic equipment and storage medium
CN106933718B (en) Method for monitoring performance and device
CN110990237B (en) Information collection system, method and storage medium
CN109508356B (en) Data abnormality early warning method, device, computer equipment and storage medium
CN108023741A (en) One kind monitoring resource using method and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination