CN110515820B - Server fault maintenance method and device, server and storage medium - Google Patents

Server fault maintenance method and device, server and storage medium Download PDF

Info

Publication number
CN110515820B
CN110515820B CN201910809361.3A CN201910809361A CN110515820B CN 110515820 B CN110515820 B CN 110515820B CN 201910809361 A CN201910809361 A CN 201910809361A CN 110515820 B CN110515820 B CN 110515820B
Authority
CN
China
Prior art keywords
server
current service
service process
state
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910809361.3A
Other languages
Chinese (zh)
Other versions
CN110515820A (en
Inventor
张帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Inspur Data Technology Co Ltd
Original Assignee
Beijing Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Inspur Data Technology Co Ltd filed Critical Beijing Inspur Data Technology Co Ltd
Priority to CN201910809361.3A priority Critical patent/CN110515820B/en
Publication of CN110515820A publication Critical patent/CN110515820A/en
Application granted granted Critical
Publication of CN110515820B publication Critical patent/CN110515820B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a server fault maintenance method which comprises the steps of carrying out state detection on a current service process according to a fault detection instruction when the fault detection instruction is obtained, and obtaining an execution state of the current service process; judging whether the execution state of the current service process is in a normal state or not; if not, starting a log collection process, and collecting log information by using the log process; judging whether a core file corresponding to the current service process exists in the log information; if yes, closing the current service process and restarting the server; the server fault maintenance method can effectively process server faults such as service process collapse or service request card crash, avoids client service interruption and improves user experience. The application also discloses a server fault maintenance device, a server and a computer readable storage medium, which have the beneficial effects.

Description

Server fault maintenance method and device, server and storage medium
Technical Field
The present application relates to the field of server technologies, and in particular, to a server fault maintenance method, and further, to a server fault maintenance apparatus, a server, and a computer-readable storage medium.
Background
The server is a device for providing computing services, has the capability of bearing and guaranteeing the services, can provide highly reliable services, and has high service performance.
In the process of starting a service process in a server and processing various service requests, once the service requests are blocked or the service process is crashed, a core file (a file generated by crash after an application program receives a system signal, wherein information such as the cause of program crash, a calling stack, memory and variable values during crash and the like is recorded) is generated. However, since the amount of information in the core file is large, the time of generation is long, generally over half an hour, and in the process, the server cannot provide normal service, that is, cannot normally process the service request of the client, so that the client service cannot be performed, the client service is seriously affected, and the client experience is reduced.
Therefore, how to effectively handle server failures such as a service process crash or a service request card crash, and avoid a client service interruption, so as to improve user experience is a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The server fault maintenance method can effectively process server faults such as service process breakdown or service request card death and the like, avoids client service interruption and improves user experience; another object of the present application is to provide a server failure maintenance apparatus, a server, and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to solve the above technical problem, the present application provides a server fault maintenance method, where the server fault maintenance method includes:
when a fault detection instruction is obtained, carrying out state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process;
judging whether the execution state of the current service process is in a normal state or not;
if not, starting a log collection process, and collecting log information by using the log process;
judging whether a core file corresponding to the current service process exists in the log information;
if so, closing the current service process and restarting the server.
Preferably, the obtaining the fault detection instruction includes:
and responding to the fault detection instruction according to a preset time interval.
Preferably, the performing state detection on the current service process according to the fault detection instruction includes:
and carrying out state detection on the current service process according to the showmount-e command.
Preferably, the performing state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process includes:
acquiring the number of service requests and the change state of the number of processed service requests in the current service process according to the fault detection instruction;
and determining the execution state of the current service process according to the change state.
Preferably, before collecting the log information by using the log process, the method further includes:
and adjusting the log level of the log collection process according to a preset rule.
Preferably, after collecting the log information by using the log process, the method further includes:
acquiring the current execution state of the current service process;
judging whether the current execution state is in the normal state or not;
and if so, executing the step of judging whether the core file corresponding to the current service process exists in the log information.
Preferably, the server failure maintenance method further includes:
and restarting the server when the core file does not exist in the log information.
In order to solve the above technical problem, the present application further provides a server failure maintenance device, where the server failure maintenance device includes:
the state detection module is used for carrying out state detection on the current service process according to the fault detection instruction when the fault detection instruction is obtained, and obtaining the execution state of the current service process;
the state judgment module is used for judging whether the execution state of the current service process is in a normal state or not;
the log collection module is used for starting a log collection process when the execution state of the current service process is not in the normal state, and collecting log information by using the log process;
the file judging module is used for judging whether a core file corresponding to the current service process exists in the log information;
and the process closing module is used for closing the current service process and restarting the server when the core file exists in the log information.
In order to solve the above technical problem, the present application further provides a server, including:
a memory for storing a computer program;
and the processor is used for realizing the steps of any one server fault maintenance method when executing the computer program.
In order to solve the above technical problem, the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of any one of the above server fault maintenance methods.
The server fault maintenance method comprises the steps that when a fault detection instruction is obtained, state detection is carried out on a current service process according to the fault detection instruction, and the execution state of the current service process is obtained; judging whether the execution state of the current service process is in a normal state or not; if not, starting a log collection process, and collecting log information by using the log process; judging whether a core file corresponding to the current service process exists in the log information or not; if so, closing the current service process and restarting the server.
The server fault maintenance method provided by the application can be seen in that in the running process of a server, the current service process in the server is effectively detected, when the current service process is found to be in an abnormal state, a log collection process is immediately started to obtain the log information corresponding to the service process, and then whether the server is in a stuck or breakdown state is determined by judging whether a core file is generated in the log information, and further, once the core file is generated, the current service process is immediately killed, and the server is restarted, so that the condition that the client service is interrupted due to the fact that the core file is generated for too long time is effectively avoided, and further the user experience is guaranteed.
The server fault maintenance device, the server and the computer readable storage medium provided by the application all have the beneficial effects, and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a server fault maintenance method provided in the present application;
fig. 2 is a schematic flowchart of another server fault maintenance method provided in the present application;
FIG. 3 is a schematic flow chart illustrating another server failure maintenance method provided in the present application;
fig. 4 is a schematic structural diagram of a server fault maintenance apparatus provided in the present application;
fig. 5 is a schematic structural diagram of a server provided in the present application.
Detailed Description
The core of the application is to provide a server fault maintenance method, which can effectively process server faults such as service process crash or service request card crash, avoid interruption of customer service and improve user experience; another core of the present application is to provide a server failure maintenance apparatus, a server, and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
At present, in a service processing flow based on a server, because a service request is blocked and cannot be automatically recovered, the time for generating a core file after a service process is crashed is too long, and in the process, the server cannot provide normal service, namely cannot normally process a service request of a client, so that the service of the client cannot be performed, the service of the client is seriously affected, and the experience of the client is reduced.
Therefore, in order to solve the above problems, the present application provides a server fault maintenance method, which effectively detects a current service process in a server operation process, immediately starts a log collection process to obtain log information corresponding to the service process when the current service process is found to be in an abnormal state, and further determines whether the server is in a stuck or collapsed state by judging whether a core file is generated in the log information, and further immediately kills the current service process and restarts the server once the core file is generated, thereby effectively avoiding a situation of client service interruption caused by an excessively long core file generation time, and further ensuring user experience.
Referring to fig. 1, fig. 1 is a schematic flowchart of a server fault maintenance method provided in the present application, where the server fault maintenance method may include:
s101: when a fault detection instruction is obtained, carrying out state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process;
the method comprises the steps of detecting the service process to obtain the execution state of the current service process in the server, wherein the current service process is the service process currently running in the server. Specifically, when the server obtains the fault detection instruction, the server enters a fault detection process to perform state detection on the current service process in the server according to the fault detection instruction, so as to obtain the execution state of the current service process. The execution state of the current service process includes a normal state and an abnormal state, and it can be understood that the abnormal state represents a server failure, so that when the execution state is the abnormal state, a subsequent failure maintenance flow can be entered.
In addition, the above fault detection instruction may not be obtained in a unique manner, and may be input by a technician from the front end of the server as required, or may be automatically triggered based on a preset response instruction, such as a timing instruction, and the like. In addition, the specific type of the fault detection instruction also does not affect the implementation of the technical scheme.
Preferably, the performing state detection on the current service process according to the fault detection instruction may include: and carrying out state detection on the current service process according to the showmount-e command.
The embodiment of the application provides a specific type of fault state detection instruction, namely a showmount-e command. Specifically, the showmount command is a query command corresponding to an NFS (Network File System) server, and is used to query relevant information of the NFS server.
It should be noted that the technical solution provided in this application is not limited by the type of the server, that is, is applicable to various types of servers, as described above, in this embodiment of the application, an NFS server is used, a corresponding service process of the NFS server may be a ganesha service, and a corresponding fault state detection instruction may be a showmount-e command, and if the NFS server is applied to other types of servers, the NFS server detects the corresponding service process and uses the corresponding fault detection instruction.
Preferably, the performing state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process may include: acquiring the number of service requests and the change state of the number of processed service requests in the current service process according to the fault detection instruction; and determining the execution state of the current service process according to the change state.
The embodiment of the application provides a more specific method for acquiring an execution state of a service process, and specifically, after a fault detection instruction is acquired, the number of service requests (the total number of service requests acquired in a server) and the number of service request processes (the total number of service requests processed in the server) in a current server process can be acquired according to the instruction, and whether the service requests are in a stuck state or not and which service requests are in the stuck state can be determined by analyzing the change states of the two numbers, so that the execution state of the current service process can be further determined.
S102: judging whether the execution state of the current service process is in a normal state; if not, executing S103;
the step aims to realize the state judgment of the current service process, specifically, whether the execution state of the current service process is in a normal state or not is judged, if the execution state is in the normal state, the server has no fault, other processing is not needed, the normal service process is continued, and if the execution state is in an abnormal state, the server is in a fault, and a subsequent fault maintenance process is needed.
S103: starting a log collection process, and collecting log information by using the log process;
this step aims to collect log information, which is mainly used to realize fault determination, such as specific fault type, fault location, and the like. Specifically, when the current service process is determined to be in an abnormal state, the log collection process can be started immediately, and then various types of log information generated in the running process of the current service process are collected by the log collection process. Of course, specific content of the log information is not limited in this application, for example, for the NFS server, the corresponding log information may be current storage cluster state information, ganesha log information, and the like.
S104: judging whether a core file corresponding to the current service process exists in the log information; if yes, executing S105, otherwise executing S106;
s105: closing the current service process and restarting the server;
the step is to determine whether a core file is generated in the log information, and specifically, since the core file is a file generated due to a crash after the application program receives the system signal, once it is determined that the core file is generated, it indicates that the server has crashed and the normal service process is terminated. However, since the information amount in the core file is large, the time occupied by the generation process is too long, and in order to avoid reducing the user experience due to service interruption, the current service process is directly killed, namely the current service process is closed, and the core file is prevented from being continuously generated; furthermore, the server is restarted, and a new service process is started, which is equivalent to initializing to an original state and performing service processing again. Wherein, the existence of the core file can be directly inquired and confirmed under the core file directory in the log information.
S106: and restarting the server.
Specifically, for the case that no core file is generated in the log information, it indicates that the server is not a crash fault, at this time, the server is restarted to restart the current service process, and the original service processing is continued, and for the specific fault processing, a technician performs analysis according to the log information, and does not interrupt the normal service processing. For example, for a ganesha service process in an NFS server, a reboot may be implemented based on a "systemctl restart ganesha" command.
According to the server fault maintenance method provided by the embodiment of the application, in the running process of the server, the current service process in the server is effectively detected, when the current service process is found to be in an abnormal state, the log collection process is immediately started to obtain the log information corresponding to the service process, and then whether the server is in a stuck or broken state is determined by judging whether a core file is generated in the log information, further, once the core file is generated, the current service process is immediately killed, the server is restarted, the condition that the client service is interrupted due to overlong core file generation time is effectively avoided, and further user experience is guaranteed.
On the basis of the foregoing embodiments, an embodiment of the present application provides a more specific server fault maintenance method, please refer to fig. 2, where fig. 2 is a schematic flow diagram of another server fault maintenance method provided by the present application, where the server fault maintenance method may include:
s201: responding to a fault detection instruction according to a preset time interval;
s202: performing state detection on the current service process according to the fault detection instruction to obtain an execution state of the current service process;
s203: judging whether the execution state of the current service process is in a normal state; if not, executing S204, if yes, returning to S201;
s204: starting a log collection process, and collecting log information by using the log process;
s205: judging whether a core file corresponding to the current service process exists in the log information; if yes, executing S206, otherwise executing S207;
s206: closing the current service process and restarting the server;
s207: and restarting the server.
For a fault detection instruction, the embodiment of the present application provides a specific acquisition mode, that is, a timing response. Specifically, because the server generally runs for a long time, the manual input of the fault detection instruction cannot ensure that the server is always in a detected state, and further cannot ensure the normal operation of the server. Therefore, a timing instruction can be preset to automatically respond to a fault detection instruction according to a preset time interval, the purpose of continuous detection for 24 hours is achieved, the defect that manual inspection cannot be carried out in real time is overcome, the current service process in the server is guaranteed to be always in a detected state, and normal operation of the server is effectively guaranteed.
The timing command can be implemented based on a crontab command, and particularly, the crontab command is mainly used for setting a periodically executed instruction, and the command reads the instruction from a standard input device and stores the instruction in a crontab file for subsequent reading and execution. Of course, the loop detection may be entered by directly setting the waiting time. In addition, it should be noted that, specific values of the preset time interval may be set by a technician according to an actual situation, which is not limited in the present application.
For the specific implementation process of the steps S202 to S207, reference may be made to the content of the foregoing embodiment, which is not described herein again.
Therefore, the embodiment of the application realizes the timing detection of the server by setting the timing instruction, ensures that the server can be always in a detected state, effectively ensures the normal operation of the server, and further ensures the normal operation of the client service.
On the basis of the foregoing embodiments, an embodiment of the present application provides another specific server fault maintenance method, please refer to fig. 2, where fig. 2 is a schematic flow diagram of another server fault maintenance method provided by the present application, where the server fault maintenance method may include:
s301: responding to a fault detection instruction according to a preset time interval;
s302: performing state detection on the current service process according to the fault detection instruction to obtain an execution state of the current service process;
s303: judging whether the execution state of the current service process is in a normal state; if not, executing S304, if yes, returning to S301;
s304: starting a log collection process, adjusting the log level of the log collection process according to a preset rule, and collecting log information by using the adjusted log process;
s305: acquiring a current execution state of a current service process, judging whether the current execution state is in a normal state, if not, executing S306, and if so, returning to S301;
s306: judging whether a core file corresponding to the current service process exists in the log information; if yes, executing S307, otherwise executing S308;
s307: closing the current service process and restarting the server;
s308: and restarting the server.
The embodiment of the application takes the fault detection of the ganesha service process in the NFS server as an example, and introduces the technical scheme. Specifically, the state of the server cluster ganesha service is circularly detected by setting a timing program script for 24 hours, if the service is normal, a detection result is output, and the server cluster ganesha service is checked again after a set time interval is reached; if the service abnormality is detected, starting a fault information collection (log collection process) and recovery program, firstly dynamically adjusting the ganesha service log level to collect more detailed log information including log information, cluster state and other information in a specified time period, and then dynamically recovering the log level; and further, checking and confirming whether the ganesha service state is abnormal again, starting a fault recovery program and closing the ganesha service if the ganesha service state is abnormal, and waiting for entering a next inspection program if the ganesha service state is normal.
Specifically, for a state detection program of the ganesha service, whether a ganesha service process is in a starting state or not can be judged firstly in the program, if yes, whether the service is jammed or not is checked through a shoumount-e command of the ganesha service, and meanwhile, changes of the request quantity and the request processing quantity are checked through a time delay statistical function of the ganesha service, so that which requests have the jammed states are determined, and then the execution state of the ganesha service is obtained.
Specifically, for the fault information collection program, when the ganesha service is determined to be in a normal state based on the state detection flow of the ganesha service, log information that the detection service is normal is output. On the contrary, when the ganesha service is abnormal, firstly, the log level of the ganesha service is dynamically adjusted so as to collect more detailed log information, and the ganesha service is dynamically closed after a certain period of time; secondly, confirming whether the ganesha service fails or not through the state detection process of the ganesha service, and if the abnormal state of the ganesha service is recovered, continuing to continuously patrol according to the normal process; and if the log information is still in an abnormal state, collecting all log information to a specified directory for saving and outputting.
Specifically, for the fault recovery program, based on log information, it is first determined whether the ganesha service fails due to a crash, that is, it is queried whether a core file of the ganesha process is generated in a core file directory, if no core file is generated, the ganesha service can be restarted through a "system restart ganesha" command, if a core file is generated, because the core file is very large, the generation time may be more than half an hour, it is not necessary to wait for the completion of production, the ganesha service process is directly killed, and then the ganesha service is restarted, so that the fault recovery is completed.
Therefore, the server fault maintenance method provided by the embodiment of the application effectively detects the current service process in the server in the operation process of the server, immediately starts the log collection process to obtain the log information corresponding to the service process when the current service process is found to be in an abnormal state, further determines whether the server is in a stuck or collapsed state by judging whether a core file is generated in the log information, further, immediately kills the current service process once the core file is generated, and restarts the server, thereby effectively avoiding the condition of client service interruption caused by overlong core file generation time and further ensuring user experience.
To solve the above problem, please refer to fig. 4, fig. 4 is a schematic structural diagram of a server fault maintenance apparatus provided in the present application, where the server fault maintenance apparatus may include:
the state detection module 100 is configured to, when a fault detection instruction is obtained, perform state detection on a current service process according to the fault detection instruction, and obtain an execution state of the current service process;
a state determination module 200, configured to determine whether an execution state of a current service process is in a normal state;
a log collection module 300, configured to start a log collection process when the execution state of the current service process is not in a normal state, and collect log information by using the log process;
the file judgment module 400 is configured to judge whether a core file corresponding to the current service process exists in the log information;
and a process closing module 500, configured to close the current service process and restart the server when the core file exists in the log information.
It can be seen that, in the server fault maintenance apparatus provided in the embodiment of the present application, in the server operation process, the current service process in the server is effectively detected, when the current service process is found to be in an abnormal state, the log collection process is immediately started to obtain the log information corresponding to the service process, and then whether the server is in a stuck or collapsed state is determined by judging whether a core file is generated in the log information, further, once the core file is generated, the current service process is immediately killed, and the server is restarted, thereby effectively avoiding a situation of client service interruption caused by an excessively long core file generation time, and further guaranteeing user experience.
As a preferred embodiment, the server failure maintenance apparatus may further include an instruction obtaining module, configured to respond to the failure detection instruction according to a preset time interval.
As a preferred embodiment, the state detection module 100 may be specifically configured to perform state detection on the current service process according to a showmount-e command, and obtain an execution state of the current service process.
As a preferred embodiment, the state detection module 100 may be specifically configured to acquire the number of service requests and the change state of the number of service request processes in the current service process according to the fault detection instruction; and determining the execution state of the current service process according to the change state.
As a preferred embodiment, the server failure maintenance apparatus may further include a log level adjustment module configured to adjust a log level of the log collection process according to a preset rule before collecting the log information by using the log process.
As a preferred embodiment, the server failure maintenance apparatus may further include a secondary state determination module, configured to obtain a current execution state of the current service process; judging whether the current execution state is in a normal state or not; if yes, the file determination module 400 is executed.
As a preferred embodiment, the server failure maintenance apparatus may further include a server restart module, configured to restart the server when the core file does not exist in the log information.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above problem, please refer to fig. 5, fig. 5 is a schematic structural diagram of a server provided in the present application, where the server may include:
a memory 10 for storing a computer program;
a processor 20, configured to implement the following steps when executing the computer program:
when a fault detection instruction is obtained, carrying out state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process; judging whether the execution state of the current service process is in a normal state; if not, starting a log collection process, and collecting log information by using the log process; judging whether a core file corresponding to the current service process exists in the log information; if so, closing the current service process and restarting the server.
For the introduction of the server provided in the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the following steps:
when a fault detection instruction is obtained, carrying out state detection on the current service process according to the fault detection instruction to obtain the execution state of the current service process; judging whether the execution state of the current service process is in a normal state; if not, starting a log collection process, and collecting log information by using the log process; judging whether a core file corresponding to the current service process exists in the log information; if yes, closing the current service process and restarting the server.
The computer-readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is on the difference from the other embodiments, and the same and similar parts among the embodiments can be referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The server fault maintenance method, apparatus, server and computer readable storage medium provided in the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and these improvements and modifications also fall into the elements of the protection scope of the claims of the present application.

Claims (9)

1. A server fault maintenance method is characterized by comprising the following steps:
when a fault detection instruction is acquired, acquiring the number of service requests and the change state of the number of processed service requests in the current service process according to the fault detection instruction;
determining the execution state of the current service process according to the change state;
judging whether the execution state of the current service process is in a normal state or not;
if not, starting a log collection process, and collecting log information by using the log collection process;
judging whether a core file corresponding to the current service process exists in the log information or not;
if so, closing the current service process and restarting the server;
if not, restarting the server.
2. The server failure maintenance method according to claim 1, wherein obtaining the failure detection instruction comprises:
and responding to the fault detection instruction according to a preset time interval.
3. The method for maintaining server failure according to claim 1, wherein the performing state detection on the current service process according to the failure detection instruction comprises:
and carrying out state detection on the current service process according to the showmount-e command.
4. The server failure maintenance method according to any one of claims 1 to 3, wherein before collecting log information by using the log process, the method further comprises:
and adjusting the log level of the log collection process according to a preset rule.
5. The server failure maintenance method of claim 4, wherein after collecting log information using the log process, further comprising:
acquiring the current execution state of the current service process;
judging whether the current execution state is in the normal state or not;
if not, executing the step of judging whether the core file corresponding to the current service process exists in the log information.
6. The server failure maintenance method of claim 5, further comprising:
restarting a server when the core file does not exist in the log information;
and when the core file exists in the log information, closing the current service process and restarting the server.
7. A server failure maintenance apparatus, comprising:
the state detection module is used for acquiring the change states of the service request quantity and the service request processing quantity in the current service process according to the fault detection instruction when the fault detection instruction is acquired, and determining the execution state of the current service process according to the change states;
the state judgment module is used for judging whether the execution state of the current service process is in a normal state or not;
the log collection module is used for starting a log collection process when the execution state of the current service process is not in the normal state, and collecting log information by using the log collection process;
the file judging module is used for judging whether a core file corresponding to the current service process exists in the log information;
and the process closing module is used for closing the current service process and restarting the server when the core file exists in the log information, and restarting the server when the core file does not exist in the log information.
8. A server, characterized by further comprising:
a memory for storing a computer program;
a processor for implementing the steps of the server failure maintenance method according to any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the server failure maintenance method according to any one of claims 1 to 6.
CN201910809361.3A 2019-08-29 2019-08-29 Server fault maintenance method and device, server and storage medium Active CN110515820B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910809361.3A CN110515820B (en) 2019-08-29 2019-08-29 Server fault maintenance method and device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910809361.3A CN110515820B (en) 2019-08-29 2019-08-29 Server fault maintenance method and device, server and storage medium

Publications (2)

Publication Number Publication Date
CN110515820A CN110515820A (en) 2019-11-29
CN110515820B true CN110515820B (en) 2022-07-08

Family

ID=68629134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910809361.3A Active CN110515820B (en) 2019-08-29 2019-08-29 Server fault maintenance method and device, server and storage medium

Country Status (1)

Country Link
CN (1) CN110515820B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176946B (en) * 2019-12-29 2022-04-22 山东英信计算机技术有限公司 SEL log recording method, device, equipment and storage medium
WO2021189315A1 (en) * 2020-03-25 2021-09-30 Beijing Didi Infinity Technology And Development Co., Ltd. Proxy server crash recovery in object storage system using enhanced meta structure
CN113535506A (en) * 2020-04-21 2021-10-22 上海际链网络科技有限公司 Service system monitoring method and device, storage medium and computer equipment
CN111625383B (en) * 2020-05-22 2023-11-14 北京达佳互联信息技术有限公司 Process exception event processing method and device, electronic equipment and storage medium
CN111898158B (en) * 2020-07-23 2023-09-26 百望股份有限公司 Encryption method of OFD (optical frequency division) document
CN111949009B (en) * 2020-08-14 2022-04-08 深圳市中物互联技术发展有限公司 Self-diagnosis and self-maintenance method and device for embedded controller and storage medium
CN112559057B (en) * 2020-11-17 2022-05-27 新华三技术有限公司成都分公司 Shutdown processing method and device
CN112417245A (en) * 2020-11-18 2021-02-26 掌阅科技股份有限公司 Application log capturing method, computing device and computer storage medium
CN112769652B (en) * 2021-01-14 2022-12-16 苏州浪潮智能科技有限公司 Node service monitoring method, device, equipment and medium
CN112954035B (en) * 2021-02-02 2022-03-18 深圳市禅游科技股份有限公司 Server restarting method, device, equipment and storage medium
CN112925691B (en) * 2021-02-20 2024-05-24 中通天鸿(北京)通信科技股份有限公司 System monitoring method and device
CN113238913B (en) * 2021-05-12 2023-10-24 康键信息技术(深圳)有限公司 Intelligent pushing method, device, equipment and storage medium for server faults
CN113687971B (en) * 2021-08-24 2023-06-27 杭州迪普科技股份有限公司 Method and device for generating memory map file
CN113467407B (en) * 2021-09-06 2021-11-16 西安热工研究院有限公司 Fault information collection method, system and equipment for distributed control system
CN113850490A (en) * 2021-09-17 2021-12-28 深圳追一科技有限公司 Customer service message timing quality inspection method and device, electronic equipment and storage medium
CN114020356B (en) * 2021-11-02 2023-11-28 北京天融信网络安全技术有限公司 Method and device for safely closing service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630659A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Application crash log acquisition method and apparatus
CN109324834A (en) * 2018-09-19 2019-02-12 郑州云海信息技术有限公司 A kind of system and method that distributed storage server is restarted automatically
CN109976959A (en) * 2019-03-27 2019-07-05 苏州浪潮智能科技有限公司 A kind of portable device and method for server failure detection
CN110011854A (en) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 MDS fault handling method, device, storage system and computer readable storage medium
CN110555009A (en) * 2019-08-09 2019-12-10 苏州浪潮智能科技有限公司 processing method and device for Network File System (NFS) service

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9693178B2 (en) * 2015-03-18 2017-06-27 Intel IP Corporation Procedures to provision and attach a cellular internet of things device to a cloud service provider

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630659A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Application crash log acquisition method and apparatus
CN109324834A (en) * 2018-09-19 2019-02-12 郑州云海信息技术有限公司 A kind of system and method that distributed storage server is restarted automatically
CN109976959A (en) * 2019-03-27 2019-07-05 苏州浪潮智能科技有限公司 A kind of portable device and method for server failure detection
CN110011854A (en) * 2019-04-12 2019-07-12 苏州浪潮智能科技有限公司 MDS fault handling method, device, storage system and computer readable storage medium
CN110555009A (en) * 2019-08-09 2019-12-10 苏州浪潮智能科技有限公司 processing method and device for Network File System (NFS) service

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"FT-NFS: an efficient fault-tolerant NFS server designed for off-the-shelf workstations";N. Peyrouze et al.;《Proceedings of Annual Symposium on Fault Tolerant Computing》;20020806;第64-73页 *
"基于CloudStack云平台的研究与自助系统的实现";余志涛;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20150715;第I137-10页 *

Also Published As

Publication number Publication date
CN110515820A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN110515820B (en) Server fault maintenance method and device, server and storage medium
CN111949368A (en) Application program control method and device
CN111694710A (en) Method, device and equipment for monitoring faults of substrate management controller and storage medium
CN108958965B (en) Method, device and equipment for monitoring recoverable ECC errors by BMC
CN110011854B (en) MDS fault processing method, device, storage system and computer readable storage medium
CN111124761B (en) Equipment restarting method, device, equipment and medium
CN114528350B (en) Cluster brain fracture processing method, device, equipment and readable storage medium
CN111800432A (en) Anti-brute force cracking method and device based on log analysis
CN111090593A (en) Method, device, electronic equipment and storage medium for determining crash attribution
CN114756406A (en) Processing method and device for application program crash and electronic equipment
CN113076213B (en) Method and system for optimizing system management interrupt handling hardware error time
CN113688021B (en) Load balancing service processing method, device, equipment and readable storage medium
CN113127245B (en) Method, system and device for processing system management interrupt
CN114860292A (en) Terminal equipment firmware upgrading control method and device, computer equipment and medium
CN113836043A (en) Test case based self-maintenance method and device for middlebox and storage medium
CN113918407A (en) Method and device for managing service process and readable storage medium
CN111400094A (en) Method, device, equipment and medium for restoring factory settings of server system
CN107861842B (en) Metadata damage detection method, system, equipment and storage medium
CN111475339A (en) BIOS firmware updating method, device, equipment and storage medium
CN111984844A (en) Method and system for automatically supplementing graph based on big data
CN111400113A (en) Complete machine self-checking method, device and system of computer system
CN111953544B (en) Fault detection method, device, equipment and storage medium of server
CN107679161B (en) File processing method of electronic terminal and electronic terminal
CN115794883A (en) Data stream alignment method and device, electronic equipment and storage medium
CN117009149A (en) Message middleware fault switching method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant