CN110262917A - Host self-healing method, device, computer equipment and storage medium - Google Patents
Host self-healing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110262917A CN110262917A CN201910406740.8A CN201910406740A CN110262917A CN 110262917 A CN110262917 A CN 110262917A CN 201910406740 A CN201910406740 A CN 201910406740A CN 110262917 A CN110262917 A CN 110262917A
- Authority
- CN
- China
- Prior art keywords
- host
- healing
- self
- component
- exception
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the present application provides a kind of host self-healing method, device, computer equipment and storage medium, is related to field of cloud computer technology, can be applied in PaaS platform.The described method includes: if detecting, there is exception and meets the condition of self-healing in host, self-healing is carried out to host according to the Exception Type occurred, wherein, self-healing mode corresponding to different Exception Types is different, and the condition of self-healing corresponding to different Exception Types is different;In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.The embodiment of the present application first carries out self-healing when host occurs abnormal, reduces host and abnormal probability occurs;Since self-healing is to detect exception occur and meet the condition of self-healing can starting self-healing, real-time response is abnormal, therefore will not influence the operation of host upper container because of abnormal appearance, improves the stability of host operation;Self-healing is carried out to host to reduce the abnormal human cost of processing.
Description
Technical field
This application involves field of cloud computer technology more particularly to a kind of host self-healing method, device, computer equipment and
Storage medium.
Background technique
In cloud platform, such as in PaaS (Platform-as-a-Service, platform service) platform, have a large amount of
Host.Since host may use always, it is easy to appear some exceptions for host, such as data on host
The memory capacity of volume runs low;If some components in host are hung, as docker component is hung;Such as memory image
Mirror image warehouse expired fastly etc..If there is exception, reminded, then handled by related personnel, then largely
Host needs a large amount of related personnel, so needs a large amount of human cost and time cost;Host occurs abnormal simultaneously
Afterwards, the processing of related personnel one by one, it is still untreated to inevitably result in some abnormal long periods, will affect the fortune of cloud platform
Row.
Summary of the invention
The embodiment of the present application provides a kind of host self-healing method, device, computer equipment and storage medium, can ring in real time
Should be abnormal, it reduces host and abnormal probability occurs, improve the stability of host operation.
In a first aspect, the embodiment of the present application provides a kind of host self-healing method, comprising:
If detecting, there is exception and meets the condition of self-healing in host, according to the Exception Type occurred to host into
Row self-healing, wherein self-healing mode corresponding to different Exception Types is different, the item of self-healing corresponding to different Exception Types
Part is different;In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
Second aspect, the embodiment of the invention provides a kind of host self-healing device, which includes using
The corresponding unit of method described in the above-mentioned first aspect of execution.
The third aspect, the embodiment of the invention provides a kind of computer equipment, the computer equipment includes memory, with
And the processor being connected with the memory;
The memory is for storing computer program, and the processor is for running the calculating stored in the memory
Machine program, to execute method described in above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, when the computer program is executed by processor, realizes method described in above-mentioned first aspect.
The embodiment of the present application is under conditions of detecting that host exception occurs and meets self-healing, according to the exception occurred
Type carries out self-healing to host, in this way, first carrying out self-healing when host occurs abnormal, reduction host occurs abnormal
Probability;Since self-healing is to detect exception occur and meet the condition of self-healing can starting self-healing, real-time response is abnormal, therefore not
The operation of host upper container can be influenced because of abnormal appearance, improve the stability of host operation;To host into
Row self-healing is to reduce the abnormal human cost of processing.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow diagram of host self-healing method provided by the embodiments of the present application;
Fig. 2 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 3 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 4 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 5 is the sub-process schematic diagram of host self-healing method provided by the embodiments of the present application;
Fig. 6 is the schematic block diagram of host self-healing device provided by the embodiments of the present application;
Fig. 7 is the schematic block diagram of self-healing unit provided by the embodiments of the present application;
Fig. 8 is the schematic block diagram of volume self-healing unit provided by the embodiments of the present application;
Fig. 9 is the schematic block diagram of another self-healing unit provided by the embodiments of the present application;
Figure 10 is the schematic block diagram of component self-healing unit provided by the embodiments of the present application;
Figure 11 is the schematic block diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall in the protection scope of this application.
Fig. 1 is the flow diagram of host self-healing method provided by the embodiments of the present application.As shown in Figure 1, this method packet
Include S101-S102.
S101, if detecting, there is exception and meets the condition of self-healing in host, according to the Exception Type occurred to place
Host carries out self-healing, wherein self-healing mode corresponding to different Exception Types is different, corresponding to different Exception Types certainly
More condition is different.
Wherein, Exception Type includes a variety of, as the book of host is abnormal, on host locally applied mirror image it is abnormal,
Component exception, the resource exception of host on host etc..Self-healing mode corresponding to different Exception Types is different, no
The condition of self-healing corresponding to same Exception Type is different.
It include book on host, in one embodiment, the Exception Type includes that the book of host is abnormal.Root
According to the case where book on host in platform and book is divided into three classes by the purposes of book, these three types are respectively:
1, the log volume of container application, alternatively referred to as container Log Directory, that is, the book of storing containers application log such as exist
The NAS (Network Attached Storage, network attached storage) of container application log is rolled up;2, host is locally rolled up, packet
Include the book (datavolume, alternatively referred to as data volumes) of docker, the metadata volume of docker (metadata volume,
Alternatively referred to as metadata volumes), the root of host volume etc.;3, the application volume of container application.It should be noted that book
Classification can also be according to other mode classifications.
As shown in Fig. 2, the Exception Type includes the book exception of host, in this way, carrying out self-healing, packet to host
It includes: self-healing is carried out to the book on host.Step S101 includes the following steps S201-S203.
S201, if detecting, the memory capacity of the book on host reaches the first preset capacity, and judgement detects
There is exception and meets the condition of self-healing in book on host.
The size of book on host can be detected according to prefixed time interval, such as can by docker info come into
Row detection.Wherein it is possible to detect the total size (Space Total) of book on host, book has used size
(Space Used), book available size (Space Available).
Space Used:xxx MB
Space Total:xxxxx MB
Space Available:xxxxx MB
When space Used accounting be more than book the first preset capacity, such as the first preset capacity can for 90% or its
His numerical value then determines to detect that the book of host exception occurs and meets the condition of self-healing.
S202 detects the type of the book of the appearance exception on host.
It such as detects and causes abnormal book, namely abnormal issued from which book, and according to the book and institute
Relationship between the data of preservation determines the type of abnormal book occur.Pass between book and the data saved
System can be reserved in the database, also can be reserved in other positions, is such as stored in some file of host.
S203 carries out self-healing to the book on the host according to the type for abnormal book occur.
The type of book includes the log volume of container application, host the application volume that locally volume, container are applied.Different numbers
Self-healing mode corresponding to type according to volume is different.
In one embodiment, as shown in figure 3, step S203 includes the following steps S301-S303.
S301 traverses the book if occurring the log volume that abnormal book is container application on the host
In big file.
Wherein, what is saved in the log volume of container application is all kinds of logs of container application, such as standard output log, data
Log, access log etc..The memory capacity that big file in book can be understood as file is greater than the capacity of a certain setting, or
File of the memory capacity of file according to default ranking before being accounted for after sequence arrangement from big to small, such as memory capacity in person's book
According to the file for accounting for preceding 10 after sequence arrangement from big to small.
S302 judges whether the big file is standard output journal file.
Standard output journal file refers to the outputting log file in container application start-up course, there is specific log text
Part format, such as .out file format.
If the big file is standard output journal file, step S303 is executed;If the big file is non-standard output
Journal file executes step S304.
S303 is executed and is deleted script, to delete the standard output journal file in the big file.
It is to be appreciated that the utility value of the standard output log of application is minimum, that is, only need to retain the day on the same day
Will, this log then can be deleted directly, not interfere with the subsequent analysis of applied business.
The big file is carried out compression filing by S304.
It is to be appreciated that, if the big file is data logging, accessing day if big file is not non-standard outputting log file
Will etc., this just very valuable, this log then needs to carry out compression filing, i.e., advanced compression is being filed.Wherein,
There is individual archiving server in the cloud platform, can be filed according to the content of the data saved in big file.
Above step S301-S304 carries out self-healing to the log volume that container is applied, i.e., by the standard output day in book
The big file of will is deleted, and the big file of other Log Types carries out compression filing.
S305, if occurring abnormal book on the host is that host is locally rolled up, detecting the book is place
The root volume of the book of docker, the metadata volume of docker or host in the volume of host local.
Wherein, what is stored in the book of docker is container application using package informatin;In the metadata volume of docker
Storage is the information such as tag, name, status.
S306 executes first and removes script, to remove on the host if the book is the book of docker
The information of remaining container application packet.
S307 executes second and removes script, to remove the host if the book is the metadata volume of docker
The data of the upper container exited extremely.
Since what is stored in meatadata volumes is the information such as tag, name, status, these information can be with the appearance of execution
The increase of device number, meatadata volumes is fully supported, while the container exited extremely, can not be by corresponding tag with information such as status
It is removed, can then have junk data at this time in meatadata volumes.As second remove script can be with are as follows: dockerps-
Aqfstatus=exited | xargs dockerrm.It executes this and second understands script to understand the appearance exited extremely on host
The junk data of device.
S308, if the root that the book is host is rolled up, the application environment that detection host is presently in, and according to place
The application environment that host is presently in carries out respective handling to the default file in the root volume of the host.
The application environment that host is presently in refers to the host which application environment the host belongs under.Its
In, application environment includes production environment, test environment, exploitation environment etc..Wherein, production environment means the money on the host
External environment is docked in source (being such as mounted on the container on host, the CPU resource on host), or is carried out for external user
Access;Test environment means that the resource on the host is used for test;Exploitation environment means the resource on the host
It is used for exploitation.The application environment being presently according to host carries out phase to the default file in the root volume of the host
It should handle, comprising: if the application environment that host is presently in is production environment, to the default text in the root volume of the host
Part carries out compression processing;If the application environment that host is presently in is not production environment, in the root volume of the host
Default file is deleted.Wherein, default file is preassigned file, such as journal journal file.Wherein it is possible to
Understand ground, the resource under production environment on host accesses for external user, in this way, the file stored in host root volume
It is more more valuable than the file under other environment.
Above step S305-S308 is realized locally to roll up host and be carried out self-healing treatment, and is locally rolled up according to host
Different type carries out different self-healing processing.
S309 traverses the book if occurring the application volume that abnormal book is container application on the host
In take up space the container application of maximum preceding predetermined number (30 such as preceding), to obtain container application catalogue, according to container application
Catalogue, the information of container application send mail to pre-set user, so that pre-set user carries out further according to the information of mail
Operation.Wherein further operation carries out assessment using volume and is dilatation or uploads to user including what is applied to container
File is standardized.Pre-set user includes the project team of container application and the application management person etc. of container application.
Fig. 2-embodiment realization shown in Fig. 3 detects the book on host, in the memory capacity of book
When reaching the first preset capacity, data are rolled up and carry out self-healing.In this way, realize to abnormal detection in advance and precognition, and to mentioning
Before detect abnormal carry out self-healing.Wherein, the self-healing of book executes automatically, improves the efficiency of book self-healing,
It reduces book and abnormal probability occurs, while self-healing is when detecting that book exception occurs and meets the condition of self-healing, both
It can start self-healing, in this way, the exception of response data volume in real time, will not influence the operation of host, improve host fortune
Capable stability;Self-healing is carried out to host to reduce the abnormal human cost of processing.
It include applying mirror image, including two kinds of situations in host, one, the locally applied mirror image of host, it can be understood as,
Before container starting, therefore, to assure that host locally ensures to have the mirror image, could be based on the mirror image and be started;Two, host category
Host in mirror image warehouse.So self-healing method of host, mirrored volume self-healing method and mirror image storehouse including host
The availability in library detects.
In one embodiment, the Exception Type includes that the locally applied mirror image on host is abnormal, and step S101 includes:
If detecting, the memory capacity of the mirrored volume for saving locally applied mirror image on the host reaches the second preset capacity,
Then determine to detect occur exception for the mirrored volume that saves locally applied mirror image on the host and meet the item of self-healing
Part;It executes third and removes script, it is remaining using Mirror Info on the host to remove.
It is to be appreciated that the container of container application is before activation, needing host locally ensures to have mirror image, and could be based on should
Mirror image is started, and if local mirror image not will do it deletion after container is offline, as it is possible that other containers are using.
In this case, container application mirror image (alternatively referred to as applying mirror image) occupied storage of host can constantly increase, and one
Denier is more than the size of volume, then will lead to new container can not pull corresponding mirror image, so that container can not start.Cause
This, if detecting, the memory capacity of the mirrored volume for saving locally applied mirror image on host reaches the second preset capacity,
It executes third and removes script, to understand the residual container Mirror Info on host.
It is such as detected by docker info each a period of time: Data Space Used:xxx GB;Data Space
Total:xxxxx GB;Data Space Available:xxxxx GB.When Data Space Used accounting is more than second pre-
If capacity, when such as 90%, automatic trigger third removes script, and removing script such as third can be with are as follows: docker images |
awk'{print$3}'|xargs dockerrmi。
The availability in mirror image warehouse is detected abnormal.Specifically, if step S101 includes: that detection reaches detection mirror picture
The time in warehouse detects the mirror image warehouse address of docker component liaison, by perform script with detecting the mirror image warehouse
The availability of location;If mirror image warehouse address is unavailable, judge that the mirror image warehouse on host exception occurs and meets self-healing
Condition;Mail is sent to pre-set user.To detect the not available reason in mirror image warehouse address by pre-set user, as warehouse takes
Business is abnormal, port is not opened, network difference etc..Wherein, as sent ping request to host corresponding to mirror image warehouse address,
If receiving the data of the return of host corresponding to mirror image warehouse address, mean that mirror image warehouse address is available, if one
The data for not receiving the return of host corresponding to mirror image warehouse address in fixing time, then mean that mirror image warehouse address can not
With.
There are many resources, such as CPU, memory (can be also simply referred to as mem) resource in host.Exception Type includes host
In resource exception, in this way, the self-healing method of host, the self-healing method including host resource.
Specifically, step S101 includes: whether the CPU usage detected in host reaches default CPU usage (such as
80% or other numerical value), or detection host in mem utilization rate whether reach default mem utilization rate (such as 80% or other
Numerical value);If the CPU usage in host reaches the mem utilization rate in default CPU usage or host and reaches default
Mem determines that host exception occurs and meets the condition of self-healing;Obtain the preceding preset quantity for occupying cpu resource or mem resource
Process, obtain container application corresponding to the process, and obtain container application resource distribution;Will occupy cpu resource or
The resource distribution of the process of the preceding preset quantity of mem resource and the application of corresponding container, is sent to related personnel.As led to
Cross mail be sent to related personnel so that related personnel according to the preceding preset quantity that occupies cpu resource or mem resource into
The resource distribution of journey and the application of corresponding container carries out abnormal investigation.If it find that the resource of host is inadequate, it will
To corresponding host cluster, increase new host, in this way, the container on the high host of resource consumption floats to new place
On host, to share the pressure on host.
It include component in host, in this way, Exception Type includes the component exception on host, the self-healing side of host
Method, the self-healing method including component on host.It should be noted that framework of the component according to cloud platform of cloud platform, demand
Deng determination.The component of such as cloud platform includes docker component, mesos component, marathon component, zookeeper component.
In one embodiment, as shown in figure 4, step S101 includes the following steps S401-S402.
S401 determines the type for the component to be monitored if detecting the time for reaching monitor component.
It is to be appreciated that being periodically executed the monitoring script of monitor component, wherein the monitoring script can matching by cloud platform
It sets the page and carries out configuration distributing.
S402, according to the type for the component to be monitored, to detect whether the component on host abnormal and satisfaction occurs
The condition of self-healing, if detecting, there is exception and meets the condition of self-healing in the component on host, to the component on host into
Row self-healing.If there is exception and meets the condition of self-healing in the component being not detected on host, without any processing.
Self-healing mode corresponding to the different component types of host is different.
In one embodiment, as shown in figure 5, step S402 includes the following steps S501-S504.
S501, if the type for the component to be monitored is docker component, executive module information inspection script, to check
State docker module information;If not receiving the data that the docker component returns in the given time, judgement detects place
There is exception and meets the condition of self-healing in the docker component on host;It executes docker and restarts script, it is described to restart
The docker component on host.
Script is checked by execution dockerinfo, to check docker module information, including mirror image and container number etc.,
If illustrating that docker component hang is lived, i.e. docker component is stuck again without returned data in 5 seconds.In this kind
In the case of, it executes docker and restarts script, to restart docker component.Wherein, docker restart script can be with are as follows:
systemctl restartdocker.Wherein, preset time may be arranged as other times.
S502, if the type for the component to be monitored is docker component, executive module state checks script, to check
State docker component current operating conditions information;If the docker component current operating conditions are halted state, determine to examine
There is exception and meets the condition of self-healing in the docker component measured on host;It executes docker and starts script, to open
Move the docker component on the host.
It, can also be with it is to be appreciated that module information checks that script and component states check that script can execute same time
It is performed separately, if certain interval of time executes, such as can execute one of script when reaching the time of monitor component, then under
Another script is executed when once reaching the time of monitor component.
Wherein, can be checked by systemctl status docker docker component current operating conditions whether be
Stop (stop) state, i.e. whether docker component is hung, if docker component current operating conditions are halted state, executes
Docker starts script, to start docker component.Wherein, docker start script can be with are as follows: systemctl start
docker。
S503, if the type for the component to be monitored is mesos component or is marathon component, executive process is checked
Script whether there is with process corresponding to detection components;Process corresponding to component if it does not exist then determines to detect host
There is exception and meets the condition of self-healing in component on machine;Corresponding starting script is executed, to start the phase on the host
Answer component.
Wherein, it if the type for the component to be monitored is mesos component, executes mesos process and checks script, with detection
Mesos process whether there is;Mesos process if it does not exist then determines to detect that the component on host occurs abnormal and meets
The condition of self-healing;It executes mesos component and starts script, to start mesos component.Ps-ef can such as be passed through | grep mesos-
Slave checks that mesos process whether there is, if it does not exist, then executing the starting script of mesos component automatically.
Wherein, it if the type for the component to be monitored is marathon component, executes marathon process and checks script, with
Detection marathon process whether there is;Marathon process if it does not exist then determines to detect that the component on host occurs
Exception and the condition for meeting self-healing;It executes marathon component and starts script, to start marathon component.Ps-can such as be passed through
Ef | grep marathon checks that marathon process whether there is, if it does not exist, then executing the starting of marathon component
Script.
S504 executes port and checks script, to detect if the type for the component to be monitored is zookeeper component
Whether the listening port for stating zookeeper component is opened;If the listening port of zookeeper component is not opened, determine to detect
There is exception and meets the condition of self-healing in the zookeeper component on to host;It executes port and starts script, with starting
The listening port of the zookeeper component.
Netstat-anp can such as be passed through | grep zk_port detects whether the listening port of zookeeper component is opened
It opens, wherein zk_port refers to the listening port of zookeeper, if it does not exist, then automatic execute starting zookeeper group
The port of part starts script, to start the listening port of zookeeper component.
It should be noted that the component on host further includes other components.
Fig. 4-embodiment shown in fig. 5, which is realized, carries out self-healing to the component on host.It is detected on host by timing
Component whether there is exception, if the abnormal component on host occur carries out self-healing.Wherein, the self-healing of book is automatic
It executes, improves the efficiency of book self-healing, reduce book and abnormal probability occur, while self-healing is to detect book
When exception occur and meeting the condition of self-healing, it can both start self-healing, in this way, the exception of response data volume in real time, Bu Huiying
The operation for ringing host improves the stability of host operation;Self-healing is carried out to host to reduce the abnormal manpower of processing
Cost.
S102, in the preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
It is to be appreciated that after carrying out self-healing, if within a preset time, still there is identical exception, then it is assumed that self-healing
It not can solve the exception still later, then carry out warning note.If should be noted that in the preset time after self-healing, not
Detect that identical exception occurs in host, and more than the identical exception of host appearance after preset time, is detected again, then it still presses
It is handled according to the mode in step S101.
Book if Exception Type includes host is abnormal, after the book to host carries out self-healing, if the
The memory capacity for still occurring book in one preset time reaches the first preset capacity, then carries out warning note, while to default
User sends alarm mail, so that it is dilatation or to user that pre-set user, which carries out assessment determination come the book to host,
The file of upload standardizes etc..Wherein, the first preset time can be 10 minutes etc., and the first preset time can also be other
Time.If Exception Type includes the locally applied mirror image exception on host, to locally applied for saving on host
After the mirrored volume of mirror image carries out self-healing, if still occurring being used to save locally applied mirror image on host in the second preset time
The memory capacity of mirrored volume reaches the second preset capacity, then carries out warning note, while sending alarm mail to pre-set user.The
Two preset times can be half a day etc., and the second preset time can also be other times.
Availability detection as Exception Type includes mirror image warehouse is abnormal, is carrying out availability detection to mirror image warehouse address
Afterwards, unavailable if still occurring mirror image warehouse address in third preset time, warning note is carried out, while sending out to pre-set user
Send alarm mail.Third preset time can be 1 hour etc., can also be other times.
If Exception Type includes the resource exception in host, if after carrying out self-healing to the resource in host, the 4th
Still occur the resource exception in host in preset time, then carries out warning note, while sending alarm mail to pre-set user.
4th preset time can be 5 minutes etc., can also be other times.
If Exception Type includes the component exception on host, if after carrying out self-healing to the component on host, the 5th
The component still occurred in host in preset time is abnormal, then carries out warning note, while sending alarm mail to pre-set user.
5th preset time can be half an hour etc., can also be other times.
Fig. 6 is the schematic block diagram of host self-healing device provided by the embodiments of the present application.The device includes for executing
Unit corresponding to above-mentioned host self-healing method.As shown in fig. 6, the host self-healing device 100 include self-healing unit 101,
Prompt unit 102.
Self-healing unit 101, if for detecting that host exception occurs and meets the condition of self-healing, it is different according to what is occurred
Normal type carries out self-healing to host, wherein self-healing mode corresponding to different Exception Types is different, different Exception Types
The condition of corresponding self-healing is different.
Prompt unit 102, in the preset time after self-healing, if detecting, there is identical exception in host, into
Row warning note.
In one embodiment, the Exception Type includes that the book of host is abnormal.As shown in fig. 7, self-healing unit 101
Including volume self-healing judging unit 201, volume kind detecting unit 202, volume self-healing unit 203.
Self-healing judging unit 201 is rolled up, if for detecting that it is default that the memory capacity of the book on host reaches first
Capacity then determines to detect that the book on host exception occurs and meets the condition of self-healing.
Kind detecting unit 202 is rolled up, for detecting the type of the abnormal book of the appearance on host.
Self-healing unit 203 is rolled up, for involving according to the type for abnormal book occur to the data on the host
Row self-healing.
In one embodiment, as shown in figure 8, volume self-healing unit 203 include Traversal Unit 301, file judging unit 302,
File deletion unit 303, compression profiling unit 304, local volume detection unit 305, the first clearing cell 306, second remove list
Member 307, root roll up processing unit 308, using volume processing unit 309.
Traversal Unit 301, if occurring the log volume that abnormal book is container application, traversal on the host
Big file in the book.
File judging unit 302, for judging whether the big file is standard output journal file.
File deletes unit 303, if being standard output journal file for the big file, executes deletion script, with
Delete the standard output journal file in the big file.
Compress profiling unit 304, if for the big file be non-standard outputting log file, by the big file into
Row compression filing.
Local volume detection unit 305, if occurring abnormal book on the host is that host is locally rolled up, inspection
Survey the root that the book is the book of docker of the host locally in volume, the metadata volume of docker or host
Volume.
First clearing cell 306 executes first and removes script if the book for being docker for the book, with
Remove the information of remaining container application packet on the host.
Second clearing cell 307 executes second and removes script if the metadata volume for being docker for the book,
To remove the data of the container exited extremely on the host.
Root rolls up processing unit 308, if the root for the book to be host is rolled up, what detection host was presently in is answered
With environment, and the application environment being presently according to host, the default file in the root volume of the host is carried out corresponding
Processing.
Using volume processing unit 309, if occurring the application that abnormal book is container application on the host
Volume, traverses the maximum preceding predetermined number that takes up space in the book (a container application in such as preceding 30), to obtain container application
Catalogue sends mail to pre-set user according to the information that container application catalogue, container are applied, so that pre-set user is according to mail
Information further operated.
In one embodiment, the Exception Type includes that the locally applied mirror image on host is abnormal.Self-healing unit 101 wraps
Include mirrored volume self-healing judging unit, mirrored volume clearing cell.Wherein, mirrored volume self-healing judging unit, if described for detecting
The memory capacity of the mirrored volume for saving locally applied mirror image on host reaches the second preset capacity, then determines to detect
Occur exception for the mirrored volume that saves locally applied mirror image on the host and meet the condition of self-healing.Mirrored volume is removed
Unit removes script for executing third, remaining using Mirror Info on the host to remove.
In one embodiment, the Exception Type includes that the availability detection in mirror image warehouse is abnormal.Self-healing unit 101 includes
Availability probe unit, availability self-healing judging unit, transmission unit.Wherein, availability probe unit, if being reached for detecting
To the time in detection mirror image warehouse, the mirror image warehouse address of docker component liaison is detected by perform script, it should with detection
The availability of mirror image warehouse address.Availability self-healing judging unit determines host if unavailable for mirror image warehouse address
On mirror image warehouse there is exception and meet the condition of self-healing.Transmission unit, for sending mail to pre-set user.
In one embodiment, Exception Type includes the resource exception in host.Self-healing unit 101 includes resources measurement list
Member, obtains transmission unit at resource self-healing judging unit.Resources measurement unit, for whether detecting the CPU usage in host
Whether the mem utilization rate reached in default CPU usage, or detection host reaches default mem utilization rate.Resource self-healing is sentenced
Order member, reaches if reaching the mem utilization rate in default CPU usage or host for the CPU usage in host
Default mem determines that host exception occurs and meets the condition of self-healing.Obtain transmission unit, obtain occupy cpu resource or
The process of the preceding preset quantity of mem resource obtains container application corresponding to the process, and the resource for obtaining container application is matched
It sets;The process of the preceding preset quantity of cpu resource or mem resource and the resource distribution of corresponding container application will be occupied,
It is sent to related personnel.
In one embodiment, Exception Type includes that the component in host is abnormal.As shown in figure 9, self-healing unit 101 includes
Component type determination unit 401, component self-healing unit 402.
Component type determination unit 401, if determining the component to be monitored for detecting the time for reaching monitor component
Type.
Component self-healing unit 402, for the type according to the component to be monitored, whether to detect the component on host
There is exception and meet the condition of self-healing, the component on host exception occurs and meets the condition of self-healing if detecting, to place
Component on host carries out self-healing.
In one embodiment, as shown in Figure 10, component self-healing unit 402 include docker component self-healing unit 501,
Mesos component self-healing unit 502, marathon component self-healing unit 503, zookeeper component self-healing unit 504.
Docker component self-healing unit 501, if the type for the component to be monitored is docker component, executive module
Information inspection script, to check the docker module information;If not receiving the docker component in the given time to return
Data, then determine detect that the docker component on host exception occurs and meets the condition of self-healing;It executes
Docker restarts script, to restart the docker component on the host.
Docker component self-healing unit 501, if the type for the component for being also used to be monitored is docker component, execution group
Part state checks script, to check the docker component current operating conditions information;If the docker component is currently run
State is halted state, then determines to detect that the docker component on host exception occurs and meets the condition of self-healing;
It executes docker and starts script, to start the docker component on the host.
Mesos component self-healing unit 502 executes mesos if the type for the component to be monitored is mesos component
Process checks script, whether there is with detecting mesos process;Mesos process if it does not exist then determines to detect on host
There is exception and meets the condition of self-healing in component;It executes mesos component and starts script, to start mesos component.
Marathon component self-healing unit 503 executes if the type for the component to be monitored is marathon component
Marathon process checks script, whether there is with detecting marathon process;Marathon process if it does not exist then determines to examine
The component on host is measured exception occur and meet the condition of self-healing;It executes marathon component and starts script, with starting
Marathon component.
Zookeeper component self-healing unit 504 is held if the type for the component to be monitored is zookeeper component
Script is checked in row port, to detect whether the listening port of the zookeeper component is opened;If the prison of zookeeper component
It listens port not open, then determines to detect that the zookeeper component on host exception occurs and meets the condition of self-healing;
It executes port and starts script, to start the listening port of the zookeeper component.
It should be noted that it is apparent to those skilled in the art that, the tool of above-mentioned apparatus and each unit
Body realizes process, can be no longer superfluous herein with reference to the corresponding description in preceding method embodiment, for convenience of description and succinctly
It states.
Above-mentioned apparatus can be implemented as a kind of form of computer program, and computer program can be in meter as shown in figure 11
It calculates and is run on machine equipment.
Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The equipment is that terminal etc. is set
It is standby, such as the host in PaaS platform.The equipment 100 includes processor 102, the memory connected by system bus 101
With network interface 103, wherein memory may include non-volatile memory medium 104 and built-in storage 105.
The non-volatile memory medium 104 can storage program area 1041 and computer program 1042.This is non-volatile to deposit
, it can be achieved that host described in above-mentioned terminal when the computer program 1042 stored in storage media is executed by processor 102
Self-healing method.The processor 102 supports the operation of whole equipment 100 for providing calculating and control ability.The built-in storage
105 provide environment for the operation of the computer program in non-volatile memory medium, which is executed by processor 102
When, it may make processor 102 to execute host self-healing method described in above-mentioned terminal.The network interface 103 is for carrying out net
Network communication.It will be understood by those skilled in the art that structure shown in figure, only part-structure relevant to application scheme
Block diagram, do not constitute the restriction for the equipment being applied thereon to application scheme, specific equipment may include than in figure
Shown more or fewer components perhaps combine certain components or with different component layouts.
Wherein, the processor 102 is for running computer program stored in memory, to realize above-mentioned host
Any embodiment of self-healing method.
It should be appreciated that in the embodiment of the present application, alleged processor 102 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (application program lication Specific Integrated
Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other can
Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor
Or the processor is also possible to any conventional processor etc..
Those of ordinary skill in the art will appreciate that be realize above-described embodiment method in all or part of the process,
It is that relevant hardware can be instructed to complete by computer program.The computer program can be stored in a storage medium,
The storage medium can be computer readable storage medium.The computer program is by the processing of at least one of the computer system
Device executes, to realize the process step of the embodiment of the above method.
Therefore, present invention also provides a kind of storage mediums.The storage medium can be computer readable storage medium, should
Computer readable storage medium includes non-volatile computer readable storage medium storing program for executing.The storage medium is stored with computer program,
The computer program realizes any embodiment of above-mentioned host self-healing method when being executed by a processor.
The storage medium can be USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), magnetic disk
Or the various computer readable storage mediums that can store program code such as CD.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation.Those skilled in the art can be with
It is well understood, for convenience of description and succinctly, the specific work process of the device of foregoing description, equipment and unit can
With with reference to the corresponding process in preceding method embodiment, details are not described herein.The above, the only specific embodiment party of the application
Formula, but the protection scope of the application is not limited thereto, and anyone skilled in the art discloses in the application
In technical scope, various equivalent modifications or substitutions can be readily occurred in, these modifications or substitutions should all cover the guarantor in the application
Within the scope of shield.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of host self-healing method, which is characterized in that the described method includes:
If detecting, there is exception and meets the condition of self-healing in host, is carried out certainly according to the Exception Type occurred to host
More, wherein self-healing mode corresponding to different Exception Types is different, and the condition of self-healing corresponding to different Exception Types is not
Together;
In preset time after self-healing, if detecting, there is identical exception in host, carries out warning note.
2. the method according to claim 1, wherein the Exception Type include host book it is abnormal,
If described detect that host exception occurs and meets the condition of self-healing, host is carried out certainly according to the Exception Type occurred
More, comprising:
If detecting, the memory capacity of the book on host reaches the first preset capacity, and judgement detects on host
There is exception and meets the condition of self-healing in book;
Detect the type of the book of the appearance exception on host;
Self-healing is carried out to the book on the host according to the type for abnormal book occur.
3. according to the method described in claim 2, it is characterized in that, the type of the book includes the log of container application
Volume, it is described that self-healing is carried out to the book on the host according to the type for abnormal book occur, comprising:
If occurring the log volume that abnormal book is container application on the host, the big text in the book is traversed
Part;
Judge whether the big file is standard output journal file;
If the big file is standard output journal file, deletion script is executed, it is defeated to delete the standard in the big file
Journal file out;
If the big file is non-standard outputting log file, the big file is subjected to compression filing.
4. according to the method described in claim 2, it is characterized in that, the type of the book includes host locally volume, institute
It states and self-healing is carried out to the book on the host according to the type for abnormal book occur, comprising:
If occurring abnormal book on the host is that host is locally rolled up, detecting the book is that host is locally rolled up
In the book of docker, the metadata volume of docker or host root volume;
If the book is the book of docker, executes first and remove script, to remove remaining appearance on the host
The information of device application packet;
If the book is the metadata volume of docker, executes second and remove script, moved back extremely on the host with removing
The data of container out;
If the root that the book is host is rolled up, the application environment that detection host is presently in, and current according to host
Locating application environment carries out respective handling to the default file in the root volume of the host.
5. the method according to claim 1, wherein the Exception Type includes the locally applied mirror on host
As abnormal, if described detect that host exception occurs and meets the condition of self-healing, according to the Exception Type occurred to host
Machine carries out self-healing, comprising:
If it is default to detect that the memory capacity of the mirrored volume for saving locally applied mirror image on the host reaches second
Capacity then determines to detect occur exception for the mirrored volume that saves locally applied mirror image on the host and meet self-healing
Condition;
It executes third and removes script, it is remaining using Mirror Info on the host to remove.
6. the method according to claim 1, wherein the Exception Type includes that component on host is abnormal,
If described detect that host exception occurs and meets the condition of self-healing, host is carried out certainly according to the Exception Type occurred
More, comprising:
If detecting the time for reaching monitor component, the type for the component to be monitored is determined;
According to the type for the component to be monitored, whether there is exception to detect the component on host and meet the item of self-healing
Part, if detecting, there is exception and meets the condition of self-healing in the component on host, carries out self-healing to the component on host.
7. according to the method described in claim 6, it is characterized in that, the type according to the component to be monitored, to detect
Whether the component on host there is exception and meets the condition of self-healing, if detecting, the component appearance on host is abnormal and full
The condition of sufficient self-healing carries out self-healing to the component on host, comprising:
If the type for the component to be monitored is docker component, executive module information inspection script, to check the docker
Module information;If not receiving the data that the docker component returns in the given time, judgement is detected on host
There is exception and meets the condition of self-healing in the docker component;It executes docker and restarts script, to restart on the host
The docker component;
If the type for the component to be monitored is docker component, executive module state checks script, to check the docker
Component current operating conditions information;If the docker component current operating conditions are halted state, judgement detects host
There is exception and meets the condition of self-healing in the docker component on machine;It executes docker and starts script, to start the place
The docker component on host;
If the type for the component to be monitored is mesos component or is marathon component, executive process checks script, with inspection
Surveying process corresponding to component whether there is;Process corresponding to component if it does not exist then determines to detect the group on host
There is exception and meets the condition of self-healing in part;Corresponding starting script is executed, to start the corresponding assembly on the host;
If the type for the component to be monitored is zookeeper component, executes port and check script, described in detection
Whether the listening port of zookeeper component is opened;If the listening port of zookeeper component is not opened, judgement is detected
There is exception and meets the condition of self-healing in the zookeeper component on host;It executes port and starts script, to start
State the listening port of zookeeper component.
8. a kind of host self-healing device, which is characterized in that the host self-healing device includes:
Self-healing unit, if for detecting that host exception occurs and meets the condition of self-healing, according to the Exception Type occurred
Self-healing is carried out to host, wherein self-healing mode corresponding to different Exception Types is different, corresponding to different Exception Types
Self-healing condition it is different;
Prompt unit, for if detecting the identical exception of host appearance, carrying out alarm and mentioning in the preset time after self-healing
Show.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory, and is connected with the memory
Processor;
The memory is for storing computer program;The processor is for running the computer journey stored in the memory
Sequence, to execute the method according to claim 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence when the computer program is executed by processor, realizes the method according to claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406740.8A CN110262917A (en) | 2019-05-15 | 2019-05-15 | Host self-healing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910406740.8A CN110262917A (en) | 2019-05-15 | 2019-05-15 | Host self-healing method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110262917A true CN110262917A (en) | 2019-09-20 |
Family
ID=67913256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910406740.8A Pending CN110262917A (en) | 2019-05-15 | 2019-05-15 | Host self-healing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110262917A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111796959A (en) * | 2020-06-30 | 2020-10-20 | 中国工商银行股份有限公司 | Host machine container self-healing method, device and system |
CN112346874A (en) * | 2020-11-27 | 2021-02-09 | 中国工商银行股份有限公司 | Abnormal volume processing method and device based on cloud platform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080282104A1 (en) * | 2007-05-11 | 2008-11-13 | Microsoft Corporation | Self Healing Software |
CN105808394A (en) * | 2014-12-31 | 2016-07-27 | 中兴通讯股份有限公司 | Server self-healing method and device |
CN107179957A (en) * | 2016-03-10 | 2017-09-19 | 阿里巴巴集团控股有限公司 | Physical machine failure modes processing method, device and virtual machine restoration methods, system |
-
2019
- 2019-05-15 CN CN201910406740.8A patent/CN110262917A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080282104A1 (en) * | 2007-05-11 | 2008-11-13 | Microsoft Corporation | Self Healing Software |
CN105808394A (en) * | 2014-12-31 | 2016-07-27 | 中兴通讯股份有限公司 | Server self-healing method and device |
CN107179957A (en) * | 2016-03-10 | 2017-09-19 | 阿里巴巴集团控股有限公司 | Physical machine failure modes processing method, device and virtual machine restoration methods, system |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111796959A (en) * | 2020-06-30 | 2020-10-20 | 中国工商银行股份有限公司 | Host machine container self-healing method, device and system |
CN111796959B (en) * | 2020-06-30 | 2023-08-08 | 中国工商银行股份有限公司 | Self-healing method, device and system for host container |
CN112346874A (en) * | 2020-11-27 | 2021-02-09 | 中国工商银行股份有限公司 | Abnormal volume processing method and device based on cloud platform |
CN112346874B (en) * | 2020-11-27 | 2023-08-25 | 中国工商银行股份有限公司 | Abnormal volume processing method and device based on cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11429506B2 (en) | Systems and methods for collecting, tracking, and storing system performance and event data for computing devices | |
CN106548402B (en) | Resource transfer monitoring method and device | |
CN108829560A (en) | Data monitoring method, device, computer equipment and storage medium | |
CN109660426B (en) | Monitoring method and system, computer readable medium and electronic device | |
CN105610648B (en) | A kind of acquisition method and server of O&M monitoring data | |
CN105302697B (en) | A kind of running state monitoring method and system of density data model database | |
CN111221591A (en) | Method, system and medium for detecting availability of micro-service deployed based on Kubernetes | |
CN107181821A (en) | A kind of information push method and device based on SSE specifications | |
CN112230847B (en) | Method, system, terminal and storage medium for monitoring K8s storage volume | |
CN110266544B (en) | Device and method for positioning reason of cloud platform micro-service failure | |
CN110262917A (en) | Host self-healing method, device, computer equipment and storage medium | |
CN107004169A (en) | The automation tenant upgrading serviced for multi-tenant | |
CN103490978A (en) | Terminal, server and message monitoring method | |
CN114356499A (en) | Kubernetes cluster alarm root cause analysis method and device | |
CN114513400B (en) | Log aggregation system and method for improving usability of log aggregation system | |
CN110018932B (en) | Method and device for monitoring container magnetic disk | |
CN109165147A (en) | Log print control program, device, system, back-end server and headend equipment | |
CN109558299A (en) | Business monitoring and the method, apparatus of early warning, equipment and storage medium | |
CN106446158B (en) | Application data sharing method, sharing device and terminal | |
CN106650281B (en) | A kind of data processing method, system, server and client side | |
CN115525392A (en) | Container monitoring method and device, electronic equipment and storage medium | |
CN106933718B (en) | Method for monitoring performance and device | |
CN110990237B (en) | Information collection system, method and storage medium | |
CN109508356B (en) | Data abnormality early warning method, device, computer equipment and storage medium | |
CN108023741A (en) | One kind monitoring resource using method and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |