CN115904771A - Server fault early warning method and device, electronic equipment and storage medium - Google Patents

Server fault early warning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115904771A
CN115904771A CN202211234086.5A CN202211234086A CN115904771A CN 115904771 A CN115904771 A CN 115904771A CN 202211234086 A CN202211234086 A CN 202211234086A CN 115904771 A CN115904771 A CN 115904771A
Authority
CN
China
Prior art keywords
server
value
parameter
fault
fault value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211234086.5A
Other languages
Chinese (zh)
Inventor
倪磊俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202211234086.5A priority Critical patent/CN115904771A/en
Publication of CN115904771A publication Critical patent/CN115904771A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides a server fault early warning method, a server fault early warning device, electronic equipment and a storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: collecting parameter values of dynamic parameters of a server to be subjected to early warning analysis and collecting index contents of specified auxiliary indexes; calculating a first type fault value of the server by using the parameter value of the dynamic parameter, and calculating a second type fault value of the server by using the index content of the specified auxiliary index; fusing the first type fault value and the second type fault value to obtain a target fault value; and if the target fault value meets a preset fault condition, outputting fault early warning information aiming at the server. The fault early warning of the server can be realized through the scheme.

Description

Server fault early warning method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a server fault early warning method and device, electronic equipment and a storage medium.
Background
During the operation of the server, the server may have a fault, which causes the server to be down. And a sudden downtime of the server may cause the service to be too late to perform service switching, for example: it is not time to switch to a redundant server, resulting in fluctuations in server stability. By down is meant that the server is unavailable, for example: the failure that causes the server to go down may include, but is not limited to, a memory failure, such as a server dying of life, a decommissioning or shutdown, etc.
In the related art, whether the parameter information of the specified parameter meets the fault condition is monitored, and if the parameter information meets the fault condition, an alarm is sent to the fault of the server. Parameter information called a specified parameter can characterize whether a server has failed.
In the related art, the method for alarming the fault of the server generally generates the fault of the server after the alarm is sent, so that the working personnel is difficult to be effectively prompted to switch the server in time. Therefore, how to early warn the fault of the server is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a server fault early warning method, a server fault early warning device, electronic equipment and a storage medium, so as to realize early warning of a fault of a server. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a server failure early warning method, where the method includes:
collecting parameter values of dynamic parameters of a server to be subjected to early warning analysis and collecting index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
calculating a first type fault value of the server by using the parameter value of the dynamic parameter, and calculating a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type of fault value and the second type of fault value is used for representing the probability of the server about to fail;
fusing the first type fault value and the second type fault value to obtain a target fault value;
if the target fault value meets a preset fault condition, outputting fault early warning information aiming at the server; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
Optionally, the calculating a first type fault value of the server by using the parameter value of the dynamic parameter includes:
determining a fault value corresponding to the dynamic parameter according to a preset fault value calculation mode by using the parameter value of the dynamic parameter; the fault value corresponding to the dynamic parameter is used for representing the probability that the server is about to fault under the condition that the dynamic parameter has the acquired parameter value;
determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter;
wherein, the calculation mode of the preset fault value comprises the following steps: if the parameter value of the dynamic parameter is in the target parameter value interval corresponding to the dynamic parameter, determining a fault value corresponding to the parameter value of the dynamic parameter by using the corresponding relation between the parameter value in the target parameter value interval and the fault value; otherwise, determining a default fault value set for the dynamic parameter as a fault value corresponding to the parameter value of the dynamic parameter; determining a fault value corresponding to the dynamic parameter based on a fault value corresponding to a parameter value of the dynamic parameter;
the parameter value in the target parameter value interval corresponding to the dynamic parameter is the parameter value of the dynamic parameter when the server is about to break down; in the corresponding relation, the fault value corresponding to any parameter value is the probability that the server is about to fail when the dynamic parameter has the parameter value, and the fault value corresponding to any parameter value in the target parameter value interval is larger than the default fault value.
Optionally, the number of the dynamic parameters is multiple;
the determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter includes:
and performing first specified operation processing on the fault value corresponding to each dynamic parameter to obtain a first type of fault value of the server.
Optionally, the number of the parameter values of the dynamic parameter is multiple, and the multiple parameter values are parameter values acquired within a preset time range;
the determining the fault value corresponding to the dynamic parameter based on the fault value corresponding to the parameter value of the dynamic parameter includes:
and performing second specified operation processing on the fault values corresponding to the parameter values of the dynamic parameters to obtain the fault values corresponding to the dynamic parameters.
Optionally, the calculating, by using the index content of the specified auxiliary index, a second type fault value of the server includes:
calculating a second type fault value of the server by utilizing a calculation mode corresponding to each appointed auxiliary index based on the index content of the appointed auxiliary index;
the calculation mode corresponding to the indexes of the static parameters comprises the following steps:
generating a first character string based on a parameter value of a static parameter acquired at an acquisition moment, calculating a target information abstract value of the first character string, comparing the target information abstract value with a standard information abstract value, and determining a second type fault value of the server by using a comparison result; the standard information abstract value is set by a character string generated by parameter values of static parameters when the server does not break down;
the calculation mode corresponding to the index of the server log comprises the following steps:
and generating a second character string based on log information of the server log acquired at a collecting moment, matching the second character string with a preset key character string, and determining a second type fault value of the server according to a matching result.
Optionally, the fusing the first type fault value and the second type fault value to obtain a target fault value includes:
fusing the first type fault value and the second type fault value by using a preset weighting coefficient of the first type fault value and a preset weighting coefficient of the second type fault value to obtain a target fault value; wherein the weighting coefficient of the first type of fault value is greater than the weighting coefficient of the second type of fault value.
Optionally, if the target failure value meets a predetermined failure condition, outputting failure early warning information for the server, including:
and if the target fault value meets the target fault condition, determining the probability of the impending fault of the server represented by the target fault value, and outputting fault early warning information containing the probability of the impending fault of the server.
Optionally, the dynamic parameter includes a performance parameter of the server memory, and the specified auxiliary indicator includes a static parameter of the server memory and/or a server log.
Optionally, the number of the dynamic parameters is at least one, and the at least one dynamic parameter includes: at least one of a memory power consumption parameter, a memory bandwidth utilization parameter, a memory temperature parameter, a memory space utilization parameter, and a CPU utilization parameter;
the static parameters include: at least one of a memory model, a memory frequency, a memory capacity, and a memory type; the server log includes: baseboard management controller log information and/or system log information.
In a second aspect, an embodiment of the present invention provides a server failure early-warning apparatus, where the apparatus includes:
the acquisition module is used for acquiring parameter values of dynamic parameters of a server to be subjected to early warning analysis and acquiring index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
the calculation module is used for calculating a first type fault value of the server by using the parameter value of the dynamic parameter and calculating a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type fault value and the second type fault value is used for representing the probability of impending failure of the server;
the fusion module is used for fusing the first type fault value and the second type fault value to obtain a target fault value;
the output module is used for outputting fault early warning information aiming at the server if the target fault value meets a preset fault condition; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any server fault early warning method when executing the program stored in the memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any one of the server failure early warning methods.
In a fifth aspect, an embodiment of the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the above-mentioned server failure early warning methods.
The embodiment of the invention has the following beneficial effects:
the server fault early warning method provided by the embodiment of the invention can collect the parameter values of the dynamic parameters of the server to be analyzed and early warned and the index content of the designated auxiliary indexes, and the corresponding parameter values and the index content of the dynamic parameters and the index content of the designated auxiliary indexes are influenced when the server is about to fail, so that the parameter values of the dynamic parameters and the index content of the designated auxiliary indexes can be utilized to respectively calculate the first type fault value and the second type fault value which represent the probability of the server about to fail, then the first type fault value and the second type fault value can be fused to obtain the target fault value, when the target fault value meets the preset fault condition, the server is about to fail, and at the moment, the fault early warning information aiming at the server can be output. Therefore, according to the scheme provided by the invention, the fault values corresponding to multiple dimensions can be calculated according to the parameter values of the dynamic parameters and the multiple dimensions of the index content of the specified auxiliary index, and the fault early warning information of the server is output by using the fault values after the multiple dimensions are combined. Therefore, the server fault early warning can be realized through the scheme.
Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flowchart of a server fault early warning method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an embodiment of a server failure early warning method according to an embodiment of the present invention;
fig. 3 is another schematic flow chart of a server failure early warning method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server failure early warning apparatus according to an embodiment of the present invention;
fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
When a server runs, a fault may exist to cause the server to be down, and a sudden down may cause the server of a service to be not in time to switch to a redundant server, so that the stability of the server fluctuates, and a user cannot normally use the service.
The related art only alarms the failure of the server when the parameter information of the specified parameter satisfies the failure condition. However, when an alarm is given, a fault of the server is usually generated, and it is difficult to effectively prompt a worker to switch the server in time. Thus, the alarm is only supplemented as a root cause of a query failure after a server failure.
Based on this, the embodiment of the invention provides a server fault early warning method and device, an electronic device and a storage medium, so as to early warn a fault of a server.
First, a server failure early warning method provided by the present invention is described below.
The server fault early warning method provided by the embodiment of the invention can be applied to electronic equipment, wherein the electronic equipment can be a server or terminal equipment, and the terminal equipment can be exemplified as follows: the invention provides a server fault early warning method, which can be applied to any scene with server fault early warning requirements, such as: a scenario in which a server for providing a game service is subjected to a trouble warning, or a scenario in which a server for providing a data storage service is subjected to a trouble warning, and the like.
Specifically, an execution subject of the server failure early warning method may be a server failure early warning device. For example, when the server failure early warning method is applied to a terminal device, the server failure early warning apparatus may be functional software running on the terminal device, for example: the server fault early warning device can also be a plug-in of an existing client, for example: and the plug-in is used for monitoring the running state of the server in the client. For example, when the server failure early warning method is applied to a server, the server failure early warning apparatus may be a functional module in a server program corresponding to a client running in the server and monitoring the running state of the server.
The server fault early warning method provided by the embodiment of the invention can comprise the following steps:
collecting parameter values of dynamic parameters of a server to be subjected to early warning analysis and collecting index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
calculating a first type fault value of the server by using the parameter value of the dynamic parameter, and calculating a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type of fault value and the second type of fault value is used for representing the probability of the server about to fail;
fusing the first type fault value and the second type fault value to obtain a target fault value;
if the target fault value meets a preset fault condition, outputting fault early warning information aiming at the server; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
The server fault early warning method provided by the embodiment of the invention can collect the parameter values of the dynamic parameters of the server to be analyzed and early warned and the index content of the designated auxiliary indexes, and the corresponding parameter values and the index content of the dynamic parameters and the index content of the designated auxiliary indexes are influenced when the server is about to fail, so that the parameter values of the dynamic parameters and the index content of the designated auxiliary indexes can be utilized to respectively calculate the first type fault value and the second type fault value which represent the probability of the server about to fail, then the first type fault value and the second type fault value can be fused to obtain the target fault value, when the target fault value meets the preset fault condition, the server is about to fail, and at the moment, the fault early warning information aiming at the server can be output. Therefore, according to the scheme provided by the invention, the fault values corresponding to multiple dimensions can be calculated according to the parameter values of the dynamic parameters and the multiple dimensions of the index content of the specified auxiliary index, and the fault early warning information of the server is output by using the fault values after the multiple dimensions are combined. Therefore, the server fault early warning can be realized through the scheme.
The following describes a server failure early warning method provided by an embodiment of the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, a server failure early warning method provided in an embodiment of the present invention may include the following steps:
s101: collecting parameter values of dynamic parameters of a server to be subjected to early warning analysis and collecting index contents of specified auxiliary indexes;
wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
the inventor researches and discovers that when the server is about to fail, parameters or indexes existing in the server are influenced and changed, so that the failure of the server can be warned by utilizing the parameters or indexes influenced when the server is about to fail. Firstly, parameter values of dynamic parameters of a server to be subjected to early warning analysis and index contents of specified auxiliary indexes can be collected, and the fault of the server can be early warned by executing subsequent steps.
It should be noted that there may be various ways to collect parameter values of dynamic parameters of a server to be subjected to early warning analysis and to collect index contents of specified auxiliary indexes, and the collection way is not limited herein. For example, the acquisition may be completed by sending a request for acquiring a parameter value of the dynamic parameter and index content of the specified auxiliary index to the server, or by monitoring the server in real time to acquire a parameter value of the dynamic parameter and index content of the specified auxiliary index. In addition, parameter values of dynamic parameters and index contents of specified auxiliary indexes can be periodically acquired from a server to be subjected to early warning analysis, for example: the dynamic parameter acquisition method can acquire the parameter values of the dynamic parameters and the index contents of the specified auxiliary indexes from the server at random time once every 5 min.
Specifically, the dynamic parameter and the designated auxiliary index may be parameters and indexes of the server or parameters and indexes of a memory in the server, and at this time, a fault of the memory of the server may be early-warned. For example, in one implementation, the dynamic parameter includes a performance parameter of a server memory, and the specified auxiliary indicator includes a static parameter of the server memory and/or a server log.
Taking a failure in the memory of the warning server as an example, the number of the dynamic parameters is at least one, and the at least one dynamic parameter includes: at least one of a memory power consumption parameter, a memory bandwidth utilization parameter, a memory temperature parameter, a memory space utilization parameter, and a CPU utilization parameter;
the static parameters include: at least one of a memory model, a memory frequency, a memory capacity, and a memory type; the server log comprises: baseboard management controller log information and/or system log information.
In addition, the dynamic parameter and the specified auxiliary index may also be parameters and indexes of other components in the server, and are not limited herein.
S102: calculating a first type fault value of the server by using the parameter value of the dynamic parameter, and calculating a second type fault value of the server by using the index content of the specified auxiliary index;
any fault value in the first type of fault value and the second type of fault value is used for representing the probability of the server about to fail;
after the parameter values of the dynamic parameters and the index contents of the designated auxiliary indexes are acquired, because the parameter values of any dynamic parameter and the index contents of the designated auxiliary indexes can represent a fault value of the server, and the fault values corresponding to different dimensions of the dynamic parameters and the designated auxiliary indexes are considered, and the calculation modes are different, the first type fault value of the server can be calculated by using the parameter values of the dynamic parameters, and the second type fault value of the server can be calculated by using the index contents of the designated auxiliary indexes. Therefore, according to the first-class fault value and the second-class fault value obtained through calculation, early warning of the server fault is achieved through subsequent steps.
In addition, the number of the specified auxiliary indexes can be one or more, and the number of the second type fault values calculated by using the index content of the specified auxiliary indexes can also be one or more, that is, any specified auxiliary index can correspond to one second type fault value; of course, the number of the second-type fault values calculated by using the index content of the specified auxiliary index may also be one, that is, the fault value corresponding to one specified auxiliary index may be one second-type fault value, and when the specified auxiliary index is multiple, the corresponding second-type fault value may be a fault value obtained by fusing multiple fault values corresponding to multiple specified auxiliary indexes.
It should be noted that, the manner of calculating the first type fault value of the server by using the parameter value of the dynamic parameter, and the index content of the specified auxiliary index, and calculating the second type fault value of the server will be described in detail later, and will not be described herein again.
S103: fusing the first type fault value and the second type fault value to obtain a target fault value;
after the first-class fault value and the second-class fault value are obtained, the first-class fault value and the second-class fault value can be fused, so that a fault value used for representing the probability of the integral fault of the server is obtained, and the server fault is early warned through subsequent steps.
For example, in an implementation manner, fusing the first type fault value and the second type fault value to obtain a target fault value includes: fusing the first type fault value and the second type fault value by using a preset weighting coefficient of the first type fault value and a preset weighting coefficient of the second type fault value to obtain a target fault value; wherein the weighting coefficient of the first type fault value is greater than the weighting coefficient of the second type fault value.
It should be noted that the first-type fault value and the second-type fault value may have preset corresponding weighting coefficients, and thus, when the first-type fault value and the second-type fault value are fused, the first-type fault value and the second-type fault value may be fused according to the weighting coefficients of the first-type fault value and the weighting coefficients of the second-type fault value, so as to obtain the target fault value. In addition, the weighting factor for the first type of fault value may be greater than the weighting factor for the second type of fault value, for example: the weighting coefficient of the first type fault value is 2, the weighting coefficient of the second type fault value is 1, and of course, other values may be used, and the present invention is not limited herein.
S104: if the target fault value meets a preset fault condition, outputting fault early warning information aiming at the server;
wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
After the target fault value is obtained, whether the target fault value meets a preset fault condition or not can be identified, and when the preset fault condition is met, fault early warning information aiming at the server can be output.
For example, in one implementation, if the target failure value satisfies a predetermined failure condition, outputting failure warning information for the server includes: and if the target fault value meets the target fault condition, determining the probability of the impending fault of the server represented by the target fault value, and outputting fault early warning information containing the probability of the impending fault of the server.
Because any fault value can represent the probability of the server about to fail, before the fault early warning information is output, the probability of the server about to fail represented by the target fault value can be determined, and therefore the fault early warning information containing the probability of the server about to fail can be output.
Additionally, in one implementation, the greater any failure value of a server, the greater the probability that the server will fail. The predetermined fault condition at this time may be: the target fault value is located in a preset fault interval, the fault value of the preset fault interval is the fault value when the server is about to break down, and the larger the target fault value is, the larger the probability that the server contained in the output fault early warning information is about to break down is. For example, the predetermined fault interval may be: [1, + ∞), when the target failure value is 1, the probability of the characterized server being about to fail may be 50%, and when the target failure value is 2, the probability of the characterized server being about to fail may be 80%. Of course, the smaller any failure value of the server is, the greater the probability of the server about to fail is, and in this case, the predetermined failure interval may be (— infinity, 1), the probability of the characterized server about to fail may be 50% when the target failure value is 1, and the probability of the characterized server about to fail may be 80% when the target failure value is 0.
The server fault early warning method provided by the embodiment of the invention can collect the parameter value of the dynamic parameter of the server to be analyzed and early warned and the index content of the designated auxiliary index, and the parameter value of the dynamic parameter and the corresponding parameter value and index content of the designated auxiliary index can be influenced when the server is about to break down, so that the first type fault value and the second type fault value which represent the probability of the server about to break down can be respectively calculated by using the parameter value of the dynamic parameter and the index content of the designated auxiliary index, and then the first type fault value and the second type fault value can be fused to obtain the target fault value. Therefore, according to the scheme provided by the invention, the fault values corresponding to multiple dimensions can be calculated according to the parameter values of the dynamic parameters and the multiple dimensions of the index content of the specified auxiliary index, and the fault early warning information for the server is output by utilizing the fault values after the multiple dimensions are combined. Therefore, the server fault early warning can be realized through the scheme.
Optionally, in another implementation manner, the calculating a first type fault value of the server by using the parameter value of the dynamic parameter includes steps A1-A2:
step A1: determining a fault value corresponding to the dynamic parameter according to a preset fault value calculation mode by using the parameter value of the dynamic parameter;
the fault value corresponding to the dynamic parameter is used for representing the probability that the server is about to fault under the condition that the dynamic parameter has the acquired parameter value;
wherein, the calculation mode of the preset fault value comprises the following steps: if the parameter value of the dynamic parameter is in the target parameter value interval corresponding to the dynamic parameter, determining a fault value corresponding to the parameter value of the dynamic parameter by using the corresponding relation between the parameter value in the target parameter value interval and the fault value; otherwise, determining a default fault value set for the dynamic parameter as a fault value corresponding to the parameter value of the dynamic parameter; determining a fault value corresponding to the dynamic parameter based on a fault value corresponding to a parameter value of the dynamic parameter;
the parameter value in the target parameter value interval corresponding to the dynamic parameter is the parameter value of the dynamic parameter when the server is about to break down; in the corresponding relation, the fault value corresponding to any parameter value is the probability that the server is about to fail when the dynamic parameter has the parameter value, and the fault value corresponding to any parameter value in the target parameter value interval is larger than the default fault value.
When the first-class fault value of the server is calculated by using the parameter values of the dynamic parameters, the parameter value of any dynamic parameter corresponds to a fault value, so that the fault value corresponding to the dynamic parameter can be determined according to a preset fault value calculation mode, subsequent steps are continuously executed, and the first-class fault value of the server is calculated.
When the parameter value of a dynamic parameter is located in the target parameter value interval corresponding to the dynamic parameter, the server can be characterized to be about to fail, and the probability of failure is higher when the parameter value is larger, so that when the parameter value of any dynamic parameter is located in the target parameter value interval corresponding to the dynamic parameter, the failure value corresponding to the parameter value of the dynamic parameter in the target parameter value interval can be used as the failure value corresponding to the parameter value of the dynamic parameter. When the parameter value of the dynamic parameter is not in the target parameter value interval corresponding to the dynamic parameter, it may be characterized that the server does not fail, and therefore, when the parameter value of any dynamic parameter is not in the target parameter value interval corresponding to the dynamic parameter, a default failure value smaller than the failure value corresponding to any parameter value in the target parameter value interval may be used as the failure value corresponding to the parameter value of the dynamic parameter. It should be noted that, at this time, the larger the fault value is, the larger the probability of representing that the server is about to fail is, and the fault value corresponding to any parameter value in the target parameter value interval is greater than the default fault value, for example: the default fault value may be 0, and the fault value corresponding to any parameter value in the target parameter value interval is greater than 0.
After determining the fault value corresponding to the parameter value of the dynamic parameter, the fault value corresponding to the parameter value of the dynamic parameter may be continuously determined by using the fault value corresponding to the parameter value of the dynamic parameter.
When any dynamic parameter only corresponds to one parameter value, the fault value corresponding to the parameter value can be directly used as the fault value corresponding to the dynamic parameter.
The dynamic parameter comprises a plurality of parameter values, wherein the plurality of parameter values are acquired within a preset time range;
the determining the fault value corresponding to the dynamic parameter based on the fault value corresponding to the parameter value of the dynamic parameter includes:
and performing second specified operation processing on the fault values corresponding to the parameter values of the dynamic parameters to obtain the fault values corresponding to the dynamic parameters.
The parameter value of the dynamic parameter may be a plurality of parameter values collected within a predetermined time range, and at this time, the second specified operation processing may be performed on the fault value corresponding to each parameter value of the determined dynamic parameter, so as to obtain the fault value corresponding to the dynamic parameter.
It is to be understood that the second specified operation may be an averaging operation, a weighting operation, and the like, and is not limited herein.
Step A2: determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter;
after the fault value corresponding to the dynamic parameter is obtained, the fault value corresponding to the dynamic parameter can be continuously utilized to determine the first-class fault value of the server.
When the number of the dynamic parameters is one, the fault value corresponding to the dynamic parameter can be directly used as the first-class fault value of the server.
Illustratively, in one implementation, the number of the dynamic parameters is multiple;
the determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter includes:
and performing first specified operation processing on the fault value corresponding to each dynamic parameter to obtain a first type of fault value of the server.
The number of the dynamic parameters may be multiple, and at this time, the first specified operation processing may be performed on the fault value corresponding to each dynamic parameter obtained by calculation in step A1, so as to obtain the first type of fault value of the server.
Additionally, the first specified operation may be an averaging operation, a weighting operation, or the like, and the first specified operation may be the same as the second specified operation, such as: all the operations may be different, and are not limited herein.
It should be noted that the plurality of acquired parameter values of the dynamic parameter may also be parameter values all located in a target parameter value interval within a predetermined time length range, and a specified number of dynamic parameters may exist in the plurality of dynamic parameters, and the plurality of corresponding parameter values are parameter values located in the target parameter value interval, for example: the number of the dynamic parameters is 5, the specified number is 3, and the preset time duration range is 30min; in the 5 dynamic parameters, there may be 3 dynamic parameters, and the corresponding parameter values are all located in the target parameter value interval within the time range of 30 min. At the moment, the probability that the server fails can be represented by the first-class fault value obtained through calculation is higher, so that the early warning effect is more obvious when fault early warning is carried out on the server subsequently.
According to a preset fault value calculation mode, a fault value corresponding to a parameter value corresponding to a dynamic parameter can be accurately determined according to whether the parameter value of the dynamic parameter is located in a target parameter value interval, the fault value corresponding to the dynamic parameter can also be determined according to the fault value corresponding to the parameter value of the dynamic parameter, and finally the first-class fault value of the server can be determined according to the fault value corresponding to the dynamic parameter. When the parameter value of the dynamic parameter is one, the fault value corresponding to the parameter value of the dynamic parameter can be quickly determined, and when the number of the dynamic parameters is one, the fault value corresponding to the dynamic parameter can be quickly determined, so that the efficiency of follow-up server fault early warning is improved. Through the calculation mode of the fault values corresponding to the parameter values of the plurality of quantities and the fault values corresponding to the dynamic parameters, the fault values corresponding to the dynamic parameters can be accurately determined, so that more accurate first-class fault values are obtained, and the accuracy of subsequent server fault early warning is improved.
Optionally, in another implementation manner, the calculating, by using the index content of the specified auxiliary index, a second type fault value of the server includes:
calculating a second type fault value of the server by utilizing a calculation mode corresponding to each appointed auxiliary index based on the index content of the appointed auxiliary index;
the calculation mode corresponding to the indexes of the static parameters comprises the following steps:
generating a first character string based on a parameter value of a static parameter acquired at an acquisition moment, calculating a target information abstract value of the first character string, comparing the target information abstract value with a standard information abstract value, and determining a second type fault value of the server by using a comparison result; the standard information abstract value is set by a character string generated by parameter values of static parameters when the server does not break down;
when the second-type failure value of the server is determined by using the comparison result, if the comparison result is the same, it indicates that the parameter value of the static parameter is the same as the parameter value when the server fails, and indicates that the probability of the server about to fail is very small or no failure occurs, the determined second-type failure value may be a smaller failure value, for example: 0; if the comparison result is different, it indicates that the parameter value of the static parameter is different from the parameter value when the server fails, and it indicates that the server is likely to fail, the determined second-type failure value may be a larger failure value, for example: 1. at this time, the larger the failure value, the larger the probability of representing that the server is about to fail.
It is to be understood that the designated auxiliary index may include only one index, and when the designated auxiliary index includes only a static parameter, the second-type fault value of the server may be calculated by using a calculation manner corresponding to the index of the static parameter based on a parameter value of the static parameter as the index content. That is, the failure value corresponding to the static parameter is a second type failure value of the server.
In addition, it should be noted that the number of parameter values of any static parameter may be one, the number of static parameters may be one or more, and when the number of static parameters is multiple, the parameter values of multiple static parameters may be concatenated to obtain a first character string, and the second-class failure value of the server is calculated by using the calculation method corresponding to the index of the static parameter.
It should be noted that the parameter value of the static parameter may also be a plurality of parameter values acquired within a predetermined time range, at this time, for the predetermined time range, a second type fault value corresponding to the index of the static parameter at each time may be calculated, and when the fault value within the predetermined time range is determined by using the second type fault value at each time, if the second type fault value corresponding to any time within the predetermined time range indicates that the server is likely to fail, the second type fault value corresponding to the predetermined time range is the second type fault value corresponding to the time: 1, if the second type of fault value corresponding to any time within the predetermined time range indicates that the server does not fail, the second type of fault value corresponding to the predetermined time range is the second type of fault value that the server does not fail: 0; the first calculated fault value is used for representing the time when the server is likely to be about to fail, and the fault value corresponding to any time after the time can be the fault value representing the possible about to fail of the server: 1. for example: if the predetermined time length range is 10min, if the second type of fault value calculated in the 5 th min is a fault value representing that the server is likely to be in fault, the method comprises the following steps: 1, the second type fault values at other times after 5min can be directly determined as: 1, within the 10min, the second type fault value under the static parameter dimension can also be determined as: 1. at this time, the larger the failure value is, the larger the probability that the characterization server will fail is, that is, 1 characterizes that the server may be about to fail, and 0 characterizes that the server will not fail.
The calculation mode corresponding to the index of the server log comprises the following steps:
and generating a second character string based on log information of the server log acquired at a collecting moment, matching the second character string with a preset key character string, and determining a second type fault value of the server according to a matching result.
It should be noted that, the number of the server logs may be multiple, the number of the log information may be one, and when the number of the server logs is multiple, and when the second-type fault value is calculated by using a calculation method corresponding to the server log index, whether information content matching with the preset key character string exists in a second character string formed by splicing the log information of multiple server logs collected at one collection time may be detected, so that the second-type fault value of the server is determined according to the matching result. Of course, the key characters in the log information of the server log may also be extracted to match with the preset key character strings, so as to determine the second type fault value of the server according to the matching result. When the second type of fault value of the server is determined according to the matching result, if the matching result is a matching failure, it indicates that the log information of the server log is not matched with the log information of the server when the server is about to fail, and indicates that the probability of the server about to fail is very small or no failure occurs, the determined second type of fault value may represent a fault value at which the server is not failed, for example: 0; if the matching result is that the matching is successful, the log information of the server log is matched with the log information when the server is about to fail, and the server is likely to fail, the determined second type of failure value may be a failure value at which the server is likely to fail, for example: 1. at this time, the larger the failure value, the larger the probability of representing that the server is about to fail.
It is to be understood that the specified auxiliary index may include only one index, and when the specified auxiliary index includes only the server log, the second-type fault value may be calculated in a calculation manner corresponding to the index of the server log based on log information of the server log as the content of the index. That is, the failure value corresponding to the server log is a second type failure value of the server.
It should be noted that the log information of the server log may also be a plurality of log information collected within a predetermined time range, and the log information at each time may be log information generated by the server running before the time; at this time, for the predetermined duration range, a second type fault value corresponding to an index of a server log at each moment may be calculated, and when the fault value in the predetermined duration range is determined by using the second type fault value at each moment, if the second type fault value corresponding to any moment in the predetermined duration range indicates that the server is about to fail, the second type fault value corresponding to the predetermined duration range is the second type fault value corresponding to the moment: 1, if the second type of fault value corresponding to any time within the predetermined time range indicates that the server does not fail, the second type of fault value corresponding to the predetermined time range is the second type of fault value that the server does not fail: 0; aiming at the first calculated fault value, the server is characterized in the time when the server is likely to be in fault, and the fault value corresponding to any time after the time can be the fault value of the characterization server when the group navigation is likely to be in fault: 1. for example: if the predetermined time length range is 10min, if the second type of fault value calculated in the 5 th min is a fault value representing that the server is likely to be in fault, the method comprises the following steps: 1, the second type fault values at other times after 5min can be directly determined as 1, and the second type fault values in the server log dimension can also be determined as 1 within 10 min. At this time, the larger the failure value is, the larger the probability that the characterization server will fail is, that is, 1 characterizes that the server may be about to fail, and 0 characterizes that the server will not fail.
It will be appreciated that each of the specified auxiliary indicators may yield a value for a second type of fault, and then at least one of the auxiliary indicators may yield at least one value for the second type of fault. For example: when the specified auxiliary indexes include indexes related to static parameters and indexes related to server logs, the number of the second-class fault values obtained through calculation is two, and at this time, when the first-class fault value and the second-class fault value are fused, the weighting coefficients of the second-class fault values of the two different dimensions may be the same, for example, the weighting coefficients of the two second-class fault values are both 1, and the weighting coefficient of the first-class fault value is 2, which is reasonable.
Through the differentiation of the index quantity according to appointing supplementary index to include, the calculation to second class fault value, when appointing supplementary index and only including an index, can calculate second class fault value fast to improve the efficiency of follow-up server trouble early warning. When the designated auxiliary indexes comprise a plurality of indexes, a plurality of second-class fault values represented by the indexes can be comprehensively considered, so that the early warning accuracy of subsequent server faults is improved.
In addition, when calculating the first type fault value and the second type fault value of the server, since the specified auxiliary index includes the static parameter of the server and/or the server log, the second type fault value may be a fault value calculated by an index of the static parameter or a fault value calculated by an index of the server log, or may be a plurality of second type fault values calculated by using an index of the static parameter and an index of the server log.
When the designated auxiliary index is a static parameter, the first fault value of the server is a fault value corresponding to the dynamic parameter, and the second fault value of the server is a fault value corresponding to the static parameter;
when the designated auxiliary index is a server log, the first fault value of the server is a fault value corresponding to the dynamic parameter, and the second fault value of the server is a fault value corresponding to the server log;
when the specified auxiliary index is a static parameter and the server log, the first type fault value of the server is a fault value corresponding to the dynamic parameter, and the second type fault value of the server is a fault value corresponding to the static parameter and a fault value corresponding to the server log.
That is, the target failure value representing the probability of the server about to fail may be a failure value calculated based on the dynamic parameter and the static parameter, a failure value calculated based on the dynamic parameter and the server log, or a failure value calculated based on the dynamic parameter, the static parameter, and the server log.
The server failure early warning method provided by the embodiment of the invention is exemplarily described below by taking the failure early warning of the server memory as an example.
As shown in fig. 2, in the server failure early warning method provided in the embodiment of the present invention, the acquired dynamic parameter may be a combined parameter 2, the specified auxiliary index may include a combined parameter 1 and a combined parameter 3, each combined parameter belongs to a specified auxiliary index, the combined parameter 1 may be server memory static information (corresponding to a static parameter included in the specified auxiliary index), and the multiple static parameters include: the method comprises the following steps of (1) storing the type, frequency, capacity, type, RANK and installation slot position of a memory; the memory RANK is a set formed by a plurality of memory granules and used for matching a CPU interface, for example: the bit width of an interface between the CPU and the memory is 64 bits, the bit width of each memory particle is only 4 bits, 8 bits or 16 bits, at the moment, a plurality of memory particles are required to be combined into 64 bits (such as 8 memory particles with 8 bits), and the memory can interact with the CPU; the combination parameter 2 may be dynamic information of a server memory, and the included multiple dynamic parameters may be memory real-time power consumption, memory bandwidth utilization rate, memory real-time temperature, memory space utilization rate, and CPU utilization rate; the combination parameter 3 may be server memory failure information (corresponding to a server log included in the designated auxiliary index), and the plurality of pieces of information included may be BMC log information (corresponding to the BMC log information) and operating system log information (corresponding to the system log information).
When the server fault is early warned, server memory static information (which may include the above static parameters such as memory hardware information, memory quantity, memory model, memory frequency, etc.), server memory log information (which corresponds to the above server memory fault information and includes information such as BMC log, operating system log, etc.), and server memory dynamic information (which corresponds to the above server memory dynamic information and includes parameters such as power consumption, temperature, bandwidth utilization rate, etc. of the memory) may be collected first. And correspondingly acquiring the parameter value of the dynamic parameter of the server to be subjected to early warning analysis, and acquiring the index content of the specified auxiliary index.
The static information of the memory of the server can be combined into a character string, and an MD5 (Message-Digest Algorithm5, information Digest Algorithm 5) value (corresponding to the target information Digest value of the first character string) is calculated, when the MD5 value changes, the fault value which is calculated by the static information and represents the memory of the server is 1, otherwise, the fault value is 0; the server memory log information can preset a key character string (corresponding to the preset key character string matched with the second character string), and is matched with the preset key character string by using the log information, when the preset key character string is matched (namely the matching is successful), the fault value of the server memory represented by the log information is 1, otherwise, the fault value is 0; when the dynamic information is five, if three or more than three parameter values of the dynamic information exist in the five dynamic information (memory real-time power consumption, memory bandwidth utilization rate, memory real-time temperature, memory space utilization rate and CPU utilization rate) and enter a parameter value interval representing that the server is about to fail, and the parameter value interval lasts for more than 30min, the calculation mode of the failure value corresponding to the dynamic information is as follows: and calculating the average value of the fault values corresponding to the multiple parameter values of each dynamic information in the five dynamic information, and calculating the average value of the fault values corresponding to the dynamic parameters of each dynamic information in the five dynamic information by using the calculated average value of the fault values corresponding to the dynamic parameters of each dynamic information in the five dynamic information to obtain the fault value of the server represented by the dynamic information. Of course, the fault values represented by the parameter values of the five pieces of dynamic information can also be calculated in real time, so that the fault values represented by the dynamic information can be obtained. That is, the fault values represented by the parameter values of the dynamic information can be calculated in real time, so that the fault value of the server memory can be calculated in real time, and whether the server memory is about to break down or not can be monitored in real time; when the dynamic information has the designated quantity and the corresponding parameter value enters the target parameter value interval, the calculated fault value of the server memory can represent that the server is about to fail, so that the fault of the server memory is accurately warned. And calculating a first type fault value of the server according to the parameter value of the dynamic parameter, and calculating a second type fault value of the server according to the index content of the specified auxiliary index.
When the combination parameter 1, that is, the static information of the server memory, is combined into a character string, the information such as the memory model, the memory frequency, the memory capacity, the memory type, the memory RANK, and the installation slot position may be directly combined, and the MD5 value of the combined character string set is calculated, where the fault value of the static parameter is 0 when the MD5 value is unchanged, and the fault value of the static parameter is 1 when the MD5 value is changed.
The combination parameter 2 is server memory dynamic information, each piece of information corresponds to a parameter value interval (corresponding to the parameter value interval of the target table) representing that the server is about to fail, the real-time power consumption of the memory is greater than 80% of the starting entering interval, the memory bandwidth utilization rate is greater than 90% of the starting entering interval, the real-time temperature of the memory is greater than 80% of the starting entering interval, the memory space utilization rate is greater than 80% of the starting entering interval, and the CPU utilization rate is greater than 60% of the starting entering interval, when the parameter values of three or more dynamic information in five dynamic information enter the acquisition interval and last for more than 30min, the failure value corresponding to the dynamic information is calculated in the following mode: and calculating the average value of the fault values corresponding to the parameter values of each piece of the five pieces of dynamic information, and calculating the average value of the fault values corresponding to the five pieces of dynamic information. Because the parameter values of the dynamic information are changed in real time, when the parameter values enter the corresponding parameter value intervals, the parameter values may not enter the parameter value intervals subsequently, and in view of the accuracy of calculating the fault values, the average value of the fault values corresponding to each dynamic information may also be calculated when the parameter values of any two dynamic information of the three dynamic information enter the parameter value intervals, and the parameter value of another dynamic information is in the corresponding parameter value interval. For example, the preset early warning interval (i.e. the above parameter value interval) value (i.e. the failure value) of each parameter: the lowest interval (i.e. the interval just entered into the parameter value) is 0, and the highest interval (i.e. the parameter value becomes larger, and the parameter value will not belong to the parameter value interval) is 1; for example: for the real-time power consumption parameter of the memory, when the real-time power consumption parameter of the memory is 80%, namely the lowest interval, the corresponding fault value is 0, when the real-time power consumption parameter of the memory is 90%, the corresponding fault value is 0.5, and when the real-time power consumption parameter of the memory is 100%, the corresponding fault value is 1. And when the dynamic time sequence data (namely the parameter values) of each parameter reach the lowest interval, overlapping is started, and when a plurality of overlapped parameters (more than or equal to three parameters) are kept in the interval within a time period (30 min), the average value is calculated, so that the fault value under the dynamic time sequence parameter dimension is obtained. Of course, the fault value corresponding to the dynamic information may be calculated in real time, or the fault value corresponding to the dynamic information within a predetermined time length range may be calculated.
The combination parameter 3, that is, the log information of the server memory, that is, the BMC log information and the operating system log information, may be matched with a preset key string generated when the server represented by the historical failure information is about to fail, where if the preset key string matching fails, the failure value is 0, and if the preset key string matching succeeds, the failure value is 1.
After the failure values corresponding to the combination parameter 1, the combination parameter 2, and the combination parameter 3 are obtained, the failure values may be fused according to a preset weight coefficient, for example, the weight coefficient of the combination parameter 1 is 1, the weight coefficient of the combination parameter 2 is 2, and the weight coefficient of the combination parameter 3 is 1, so as to obtain the failure value stored in the server. And correspondingly fusing the first type fault value and the second type fault value to obtain a target fault value.
When the obtained final value (namely the fault value of the server memory) is more than or equal to 1, fault early warning of the server memory can be generated, and the higher the value of the fault value is, the higher the probability of the fault of the server memory is.
It should be noted that the setting of any one of the above-mentioned failure values and the setting of the weighting coefficients are merely examples, and may be flexibly adjusted according to different servers, requirements, and the like, and are not limited herein. In addition, the collected or monitored parameters of the server can also be other parameters capable of representing the server to be in failure, so that the granularity is finer, the dimensionality is more, and the judgment on the server failure is more accurate.
Through the three modes of calculating the fault value of the server memory by combining the parameters, the data to be judged can be combined, the change trend of the probability of the server about to fail can be predicted by using the preset parameter value change coefficient in the predicted interval range of the server about to fail, so that the probability of the server about to fail can be determined according to the size of the fault value, and the server fault early warning is realized.
A fault warning method provided in an embodiment of the present invention is described below with reference to another embodiment.
As shown in fig. 3, the server failure early warning method provided in the embodiment of the present invention may include the following steps:
s301: acquiring static memory information, acquiring log memory information and acquiring dynamic memory information; and correspondingly acquiring parameter values of dynamic parameters of the server to be subjected to early warning analysis and acquiring index contents of specified auxiliary indexes.
S302: classifying the related information and giving different preset values (namely fault values); and calculating a first type fault value of the server according to the parameter value of the dynamic parameter, and calculating a second type fault value of the server according to the index content of the specified auxiliary index.
S303: multiplying different coefficients by preset values and combining and calculating; and obtaining a final value (namely a target fault value), and correspondingly fusing the first type fault value and the second type fault value to obtain the target fault value.
S304: starting to generate fault early warning after the calculated value reaches a limited interval, wherein the larger the value is, the higher the fault probability is; and correspondingly, if the target fault value meets the preset fault condition, outputting fault early warning information aiming at the server.
It should be noted that, the server failure early warning method of the present embodiment is similar to the above, and is not described herein again.
The server fault early warning method provided by the embodiment of the invention can collect the parameter value of the dynamic parameter of the server to be analyzed and early warned and the index content of the designated auxiliary index, and the parameter value of the dynamic parameter and the corresponding parameter value and index content of the designated auxiliary index can be influenced when the server is about to break down, so that the first type fault value and the second type fault value which represent the probability of the server about to break down can be respectively calculated by using the parameter value of the dynamic parameter and the index content of the designated auxiliary index, and then the first type fault value and the second type fault value can be fused to obtain the target fault value. Therefore, according to the scheme provided by the invention, the fault values corresponding to multiple dimensions can be calculated according to the parameter values of the dynamic parameters and the multiple dimensions of the index content of the specified auxiliary index, and the fault early warning information for the server is output by utilizing the fault values after the multiple dimensions are combined. Therefore, the server fault early warning can be realized through the scheme.
Based on the server fault early warning method, an embodiment of the present invention further provides a server fault early warning apparatus, as shown in fig. 4, including:
the acquisition module 410 is used for acquiring parameter values of dynamic parameters of a server to be subjected to early warning analysis and acquiring index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
a calculating module 420, configured to calculate a first type fault value of the server by using the parameter value of the dynamic parameter, and calculate a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type fault value and the second type fault value is used for representing the probability of impending failure of the server;
the fusion module 430 is configured to fuse the first-type fault value and the second-type fault value to obtain a target fault value;
an output module 440, configured to output fault early warning information for the server if the target fault value meets a predetermined fault condition; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
The server fault early warning method provided by the embodiment of the invention can collect the parameter values of the dynamic parameters of the server to be analyzed and early warned and the index content of the designated auxiliary indexes, and the corresponding parameter values and the index content of the dynamic parameters and the index content of the designated auxiliary indexes are influenced when the server is about to fail, so that the parameter values of the dynamic parameters and the index content of the designated auxiliary indexes can be utilized to respectively calculate the first type fault value and the second type fault value which represent the probability of the server about to fail, then the first type fault value and the second type fault value can be fused to obtain the target fault value, when the target fault value meets the preset fault condition, the server is about to fail, and at the moment, the fault early warning information aiming at the server can be output. Therefore, according to the scheme provided by the invention, the fault values corresponding to multiple dimensions can be calculated according to the parameter values of the dynamic parameters and the multiple dimensions of the index content of the specified auxiliary index, and the fault early warning information for the server is output by utilizing the fault values after the multiple dimensions are combined. Therefore, the server fault early warning can be realized through the scheme.
Optionally, the calculation module includes:
the first determining submodule is used for determining a fault value corresponding to the dynamic parameter according to a preset fault value calculation mode by using the parameter value of the dynamic parameter; the fault value corresponding to the dynamic parameter is used for representing the probability that the server is about to fault under the condition that the dynamic parameter has the acquired parameter value;
the second determining submodule is used for determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter;
wherein the predetermined failure value is calculated in a manner that: if the parameter value of the dynamic parameter is in the target parameter value interval corresponding to the dynamic parameter, determining a fault value corresponding to the parameter value of the dynamic parameter by using the corresponding relation between the parameter value in the target parameter value interval and the fault value; otherwise, determining a default fault value set for the dynamic parameter as a fault value corresponding to the parameter value of the dynamic parameter; determining a fault value corresponding to the dynamic parameter based on the fault value corresponding to the parameter value of the dynamic parameter;
the parameter value in the target parameter value interval corresponding to the dynamic parameter is the parameter value of the dynamic parameter when the server is about to break down; in the corresponding relation, the fault value corresponding to any parameter value is the probability that the server is about to fail when the dynamic parameter has the parameter value, and the fault value corresponding to any parameter value in the target parameter value interval is greater than the default fault value.
Optionally, the number of the dynamic parameters is multiple;
the second determining submodule is specifically configured to:
and performing first specified operation processing on the fault value corresponding to each dynamic parameter to obtain a first type of fault value of the server.
Optionally, the number of the parameter values of the dynamic parameter is multiple, and the multiple parameter values are parameter values acquired within a preset time range;
the first determining submodule is specifically configured to:
and performing second specified operation processing on the fault values corresponding to the parameter values of the dynamic parameters to obtain the fault values corresponding to the dynamic parameters.
Optionally, the computing module is further configured to:
calculating a second type fault value of the server by utilizing a calculation mode corresponding to each appointed auxiliary index based on the index content of the appointed auxiliary index;
the calculation mode corresponding to the indexes of the static parameters comprises the following steps:
generating a first character string based on a parameter value of a static parameter acquired at an acquisition moment, calculating a target information abstract value of the first character string, comparing the target information abstract value with a standard information abstract value, and determining a second type fault value of the server by using a comparison result; the standard information abstract value is set by a character string generated by parameter values of static parameters when the server does not break down;
the calculation mode corresponding to the index of the server log comprises the following steps:
and generating a second character string based on log information of the server log acquired at a collecting moment, matching the second character string with a preset key character string, and determining a second type fault value of the server according to a matching result.
Optionally, the fusion module is specifically configured to:
fusing the first type fault value and the second type fault value by using a preset weighting coefficient of the first type fault value and a preset weighting coefficient of the second type fault value to obtain a target fault value; wherein the weighting coefficient of the first type of fault value is greater than the weighting coefficient of the second type of fault value.
Optionally, the output module is specifically configured to:
and if the target fault value meets the target fault condition, determining the probability of the impending fault of the server represented by the target fault value, and outputting fault early warning information containing the probability of the impending fault of the server.
Optionally, the dynamic parameter includes a performance parameter of the server memory, and the specified auxiliary index includes a static parameter of the server memory and/or a server log.
Optionally, the number of the dynamic parameters is at least one, and the at least one dynamic parameter includes: at least one of a memory power consumption parameter, a memory bandwidth utilization parameter, a memory temperature parameter, a memory space utilization parameter, and a CPU utilization parameter;
the static parameters include: at least one of a memory model, a memory frequency, a memory capacity, and a memory type; the metrics on the server log include: baseboard management controller log information and/or system log information.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,
a memory 503 for storing a computer program;
the processor 501 is configured to implement any server failure warning method when executing the program stored in the memory 503.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
In another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the server failure warning method in any of the above embodiments.
In yet another embodiment of the present invention, a computer program product containing instructions is further provided, which when run on a computer, causes the computer to execute the server failure warning method of any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A server fault early warning method is characterized by comprising the following steps:
collecting parameter values of dynamic parameters of a server to be subjected to early warning analysis and collecting index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
calculating a first type fault value of the server by using the parameter value of the dynamic parameter, and calculating a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type of fault value and the second type of fault value is used for representing the probability of the server about to fail;
fusing the first type fault value and the second type fault value to obtain a target fault value;
if the target fault value meets a preset fault condition, outputting fault early warning information aiming at the server; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
2. The method according to claim 1, wherein said calculating a first type failure value of said server using parameter values of said dynamic parameters comprises:
determining a fault value corresponding to the dynamic parameter according to a preset fault value calculation mode by using the parameter value of the dynamic parameter; the fault value corresponding to the dynamic parameter is used for representing the probability that the server is about to fault under the condition that the dynamic parameter has the acquired parameter value;
determining a first type fault value of the server by using the fault value corresponding to the dynamic parameter;
wherein the predetermined failure value is calculated in a manner that: if the parameter value of the dynamic parameter is in the target parameter value interval corresponding to the dynamic parameter, determining a fault value corresponding to the parameter value of the dynamic parameter by using the corresponding relation between the parameter value in the target parameter value interval and the fault value; otherwise, determining a default fault value set for the dynamic parameter as a fault value corresponding to the parameter value of the dynamic parameter; determining a fault value corresponding to the dynamic parameter based on the fault value corresponding to the parameter value of the dynamic parameter;
the parameter value in the target parameter value interval corresponding to the dynamic parameter is the parameter value of the dynamic parameter when the server is about to break down; in the corresponding relation, the fault value corresponding to any parameter value is the probability that the server is about to fail when the dynamic parameter has the parameter value, and the fault value corresponding to any parameter value in the target parameter value interval is greater than the default fault value.
3. The method of claim 2, wherein the number of dynamic parameters is plural;
the determining the first type fault value of the server by using the fault value corresponding to the dynamic parameter includes:
and performing first specified operation processing on the fault value corresponding to each dynamic parameter to obtain a first type of fault value of the server.
4. The method according to claim 2, wherein the dynamic parameter has a plurality of parameter values, and the plurality of parameter values are parameter values collected within a predetermined time length range;
the determining the fault value corresponding to the dynamic parameter based on the fault value corresponding to the parameter value of the dynamic parameter includes:
and performing second specified operation processing on the fault values corresponding to the parameter values of the dynamic parameters to obtain the fault values corresponding to the dynamic parameters.
5. The method according to claim 1, wherein the calculating the second type fault value of the server by using the index content of the specified auxiliary index comprises:
calculating a second type fault value of the server by utilizing a calculation mode corresponding to each appointed auxiliary index based on the index content of the appointed auxiliary index;
the calculation mode corresponding to the indexes of the static parameters comprises the following steps:
generating a first character string based on a parameter value of a static parameter acquired at an acquisition moment, calculating a target information abstract value of the first character string, comparing the target information abstract value with a standard information abstract value, and determining a second type fault value of the server by using a comparison result; the standard information abstract value is set by a character string generated by parameter values of static parameters when the server does not break down;
the calculation mode corresponding to the index of the server log comprises the following steps:
and generating a second character string based on log information of the server log acquired at a collecting moment, matching the second character string with a preset key character string, and determining a second type fault value of the server according to a matching result.
6. The method according to claim 1, wherein fusing the first type of fault value and the second type of fault value to obtain a target fault value comprises:
fusing the first type fault value and the second type fault value by using a preset weighting coefficient of the first type fault value and a preset weighting coefficient of the second type fault value to obtain a target fault value; wherein the weighting coefficient of the first type of fault value is greater than the weighting coefficient of the second type of fault value.
7. The method of claim 1, wherein outputting fault warning information for the server if the target fault value satisfies a predetermined fault condition comprises:
and if the target fault value meets the target fault condition, determining the probability of the impending fault of the server represented by the target fault value, and outputting fault early warning information containing the probability of the impending fault of the server.
8. The method according to any of claims 1-7, wherein the dynamic parameters comprise performance parameters of a server memory, and the specified auxiliary metrics comprise static parameters of the server memory and/or a server log.
9. The method of claim 8, wherein the number of dynamic parameters is at least one, and wherein the at least one dynamic parameter comprises: at least one of a memory power consumption parameter, a memory bandwidth utilization parameter, a memory temperature parameter, a memory space utilization parameter, and a CPU utilization parameter;
the static parameters include: at least one of a memory model, a memory frequency, a memory capacity, and a memory type; the server log comprises: baseboard management controller log information and/or system log information.
10. A server failure early warning apparatus, the apparatus comprising:
the acquisition module is used for acquiring parameter values of dynamic parameters of a server to be subjected to early warning analysis and acquiring index contents of specified auxiliary indexes; wherein the dynamic parameters include performance parameters of the server; the specified auxiliary index comprises a static parameter of the server and/or a server log; the static parameters comprise hardware parameters of the server;
the calculation module is used for calculating a first type fault value of the server by using the parameter value of the dynamic parameter and calculating a second type fault value of the server by using the index content of the specified auxiliary index; any fault value in the first type of fault value and the second type of fault value is used for representing the probability of the server about to fail;
the fusion module is used for fusing the first type fault value and the second type fault value to obtain a target fault value;
the output module is used for outputting fault early warning information aiming at the server if the target fault value meets a preset fault condition; wherein the predetermined failure condition is a condition for characterizing an impending failure of the server.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of any one of claims 1 to 9 when executing a program stored in a memory.
12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method of any one of claims 1 to 9.
CN202211234086.5A 2022-10-10 2022-10-10 Server fault early warning method and device, electronic equipment and storage medium Pending CN115904771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211234086.5A CN115904771A (en) 2022-10-10 2022-10-10 Server fault early warning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211234086.5A CN115904771A (en) 2022-10-10 2022-10-10 Server fault early warning method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115904771A true CN115904771A (en) 2023-04-04

Family

ID=86477204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211234086.5A Pending CN115904771A (en) 2022-10-10 2022-10-10 Server fault early warning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115904771A (en)

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN112162878B (en) Database fault discovery method and device, electronic equipment and storage medium
US9672085B2 (en) Adaptive fault diagnosis
CN112631913B (en) Method, device, equipment and storage medium for monitoring operation faults of application program
US20070271219A1 (en) Performance degradation root cause prediction in a distributed computing system
US9524223B2 (en) Performance metrics of a computer system
KR20180108446A (en) System and method for management of ict infra
CN112380089A (en) Data center monitoring and early warning method and system
CN113505044B (en) Database warning method, device, equipment and storage medium
CN112699007A (en) Method, system, network device and storage medium for monitoring machine performance
CN115529595A (en) Method, device, equipment and medium for detecting abnormity of log data
CN117041029A (en) Network equipment fault processing method and device, electronic equipment and storage medium
CN114443441B (en) Storage system management method, device and equipment and readable storage medium
CN113992602B (en) Cable monitoring data uploading method, device, equipment and storage medium
CN115327299A (en) Method for identifying cascading failure of power system and related equipment
CN116594840A (en) Log fault acquisition and analysis method, system, equipment and medium based on ELK
CN114138617B (en) Self-learning frequency conversion monitoring method and system, electronic equipment and storage medium
CN116414608A (en) Abnormality detection method, abnormality detection device, abnormality detection apparatus, and storage medium
CN115904771A (en) Server fault early warning method and device, electronic equipment and storage medium
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
US20230336409A1 (en) Combination rules creation device, method and program
CN114706893A (en) Fault detection method, device, equipment and storage medium
CN114743703A (en) Reliability analysis method, device, equipment and storage medium for nuclear power station unit
AU2014200806B1 (en) Adaptive fault diagnosis
CN113760856A (en) Database management method and device, computer readable storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination