CN112860527A - Fault monitoring method and device of application server - Google Patents

Fault monitoring method and device of application server Download PDF

Info

Publication number
CN112860527A
CN112860527A CN202110352583.4A CN202110352583A CN112860527A CN 112860527 A CN112860527 A CN 112860527A CN 202110352583 A CN202110352583 A CN 202110352583A CN 112860527 A CN112860527 A CN 112860527A
Authority
CN
China
Prior art keywords
application server
data
log
fault
log file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110352583.4A
Other languages
Chinese (zh)
Inventor
聂艳平
杨晓
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110352583.4A priority Critical patent/CN112860527A/en
Publication of CN112860527A publication Critical patent/CN112860527A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Abstract

The invention provides a fault monitoring method and device of an application server, and relates to the technical field of artificial intelligence. The fault monitoring method comprises the following steps: acquiring a log file and an operating system log corresponding to calling information on an application server; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The invention can effectively improve the accuracy and the operation and maintenance efficiency of the fault monitoring of the application server.

Description

Fault monitoring method and device of application server
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a fault monitoring method and device of an application server.
Background
With the continuous development of internet technology, the number of application servers increases, and the failure duration of the application servers is short. The failure of the application server is difficult to be checked by the operation and maintenance personnel at the first time, so the monitoring strength of the operation and maintenance personnel on the working condition of the application server is gradually reduced. Based on the above situation, the operation and maintenance personnel urgently need to be able to master the fault situation monitoring of the application server in the operation stage.
Whether an existing monitoring application server fails or not is mainly used for counting transaction failure rate by monitoring application transactions; meanwhile, the running conditions (CPU, memory use conditions and the like) of the server are monitored; and when the transaction failure rate is higher than a certain threshold value or the CPU and the memory use exceeds a certain threshold value, carrying out short message and mail early warning. And the operation and maintenance personnel inquire and analyze the related log content according to the alarm content and give out the fault reason. However, the transaction failure rate, the CPU, and the memory reach certain thresholds and perform early warning, which is often greatly influenced by human factors. When the threshold is set to be too low, a large number of alarming storms which are mistakenly alarmed are easily caused, operation and maintenance manpower is wasted, and when the threshold is set to be too high, the alarm missing risk is increased; meanwhile, for operation and maintenance personnel, a large number of retrieval logs have repeated work, and the work efficiency is reduced to a certain extent.
Therefore, it is obvious that providing a fault monitoring method for an application server to improve the accuracy of fault monitoring and the operation and maintenance efficiency is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a fault monitoring method and device of an application server, which can improve the accuracy and operation and maintenance efficiency of fault monitoring of the application server.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, the present invention provides a method for monitoring a failure of an application server, including:
acquiring a log file and an operating system log corresponding to calling information on an application server;
extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log;
determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
Further, after obtaining the log file corresponding to the call information on the application server, the method further includes:
cleaning the data in the log file to obtain a cleaned log file;
correspondingly, the step of extracting the transaction data and the time consumption data on the application server according to the log file comprises the following steps:
and extracting transaction data and time-consuming data on the application server according to the cleaning log file.
The cleaning of the data in the log file to obtain the cleaned log file includes:
extracting log records containing target data in the log files, and determining the log records as cleaning log files; wherein the target data comprises: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
Further, after determining a fault monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operating data and a preset decision tree model, the method further includes:
verifying the fault monitoring result to obtain a fault verification result;
and training the decision tree model according to the fault verification result to obtain an optimized decision tree model.
Further, after determining a fault monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operating data and a preset decision tree model, the method further includes:
and if the fault monitoring result is determined to be the fault of the application server, sending a fault early warning request to the server side so that the server side can acquire the method parameter data from the log file of the application server according to the fault early warning request.
The acquiring the log file and the operating system log corresponding to the call information on the application server includes:
and deploying the log collection script to an application server, and acquiring a log file and an operating system log through the log collection script.
In a second aspect, the present invention provides a fault monitoring apparatus for an application server, including:
the acquisition module is used for acquiring a log file and an operating system log corresponding to the calling information on the application server;
the extraction module is used for extracting transaction data and time consumption data on the application server according to the log file and extracting hardware operation data of the application server according to the operating system log;
the monitoring module is used for determining a fault monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
Further, still include:
the filtering module is used for cleaning the data in the log file to obtain a cleaned log file;
correspondingly, the extraction module comprises:
and the extraction submodule is used for extracting the transaction data and the time-consuming data on the application server according to the cleaning log file.
Wherein the filter module comprises:
the filtering submodule is used for extracting the log record containing the target data in the log file and determining the log record as a cleaning log file; wherein the target data comprises: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
Further, still include:
the checking module is used for checking the fault monitoring result to obtain a fault checking result;
and the optimization module is used for training the decision tree model according to the fault verification result to obtain the optimized decision tree model.
Further, still include:
and the fault early warning module is used for sending a fault early warning request to the server side to enable the server side to acquire the method parameter data from the log file of the application server according to the fault early warning request if the fault monitoring result is determined to be the fault of the application server.
Wherein, the collection module includes:
and the acquisition submodule is used for deploying the log acquisition script to the application server and acquiring the log file and the operating system log through the log acquisition script.
In a third aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for monitoring the failure of the application server when executing the program.
In a fourth aspect, the present invention provides a computer readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for fault monitoring of an application server as described.
According to the technical scheme, the invention provides the fault monitoring method and the fault monitoring device for the application server, wherein the log file and the operating system log corresponding to the calling information on the application server are obtained; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The accuracy and the operation and maintenance efficiency of the fault monitoring of the application server can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first flowchart of a fault monitoring method for an application server according to an embodiment of the present invention.
Fig. 2 is a second flowchart of a fault monitoring method for an application server according to an embodiment of the present invention.
Fig. 3 is a third flowchart of a fault monitoring method for an application server in an embodiment of the present invention.
Fig. 4 is a fourth flowchart illustrating a method for monitoring a failure of an application server according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a fault monitoring apparatus of an application server in an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an embodiment of a fault monitoring method of an application server, and referring to fig. 1, the fault monitoring method of the application server specifically comprises the following contents:
s101: acquiring a log file and an operating system log corresponding to calling information on an application server;
in this step, when the application calls a service, calls a database, and interacts with an external system, the specific call information (log recording time, log identification, call time, upstream node IP address, method name, method entry parameter, method exit parameter, method return code, return code type, and full-process flow number) of the application server is recorded, and the call information is recorded in a file to generate a corresponding log file.
It should be noted that the os log is information for recording hardware, software and system problems in the system, and can also monitor events occurring in the system.
The embodiment provides a specific implementation manner for acquiring a log file and an operating system log corresponding to call information on an application server: and deploying the log collection script to an application server, and acquiring a log file and an operating system log through the log collection script.
The method comprises the steps of acquiring a log file and an operating system log through a log acquisition script realized through autonomous programming and based on a fluent (td-agent) system log acquisition method, and using the log file and the operating system log for the next data processing.
S102: extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log;
in the step, the total transaction amount, the transaction failure rate, the maximum time consumption, the average time consumption and the time consumption median of the application server are extracted according to the calling information in the log file. Meanwhile, the operating system logs are analyzed, and hardware operation data of the application server are extracted. Wherein the transaction data comprises: total transaction amount, transaction failure amount, and transaction failure rate. The time-consuming data includes: maximum elapsed time, average elapsed time, and median elapsed time. The hardware operation data comprises: disk space usage, CPU usage, and memory usage.
It is understood that the call information contains: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-process flow number.
S103: determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
In this step, the application server is subjected to fault diagnosis through a preset decision tree model. And periodically acquiring the transaction data, the time-consuming data and the hardware operating data of the application server, and judging whether the current state of the application server is in failure or not according to the acquired data.
In this embodiment, by collecting log information and hardware operation information, a decision tree model trained in advance by a machine learning technique is used, and whether an application server fails or not is determined according to a prediction result of the decision tree model. The implementation is simple, the early warning can be timely carried out when the fault does not occur, and the probability of the occurrence of the production accident is reduced to a certain extent.
As can be seen from the above description, in the fault monitoring method for the application server provided in the embodiment of the present invention, the log file and the operating system log corresponding to the call information on the application server are obtained; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The accuracy and the operation and maintenance efficiency of the fault monitoring of the application server can be effectively improved.
In an embodiment of the present invention, referring to fig. 2, after step S101 of the method for monitoring a failure of an application server, the method specifically includes the following steps:
s104: cleaning the data in the log file to obtain a cleaned log file;
correspondingly, step S102 extracts transaction data and time-consuming data on the application server according to the log file, and extracts hardware operating data of the application server according to the operating system log, including:
s1021: and extracting transaction data and time-consuming data on the application server according to the cleaning log file, and extracting hardware operation data of the application server according to the operating system log.
It should be noted that, an application server records two pieces of log information for a call request of an application, but critical data is only recorded in the log with the log identifier E.
In this embodiment, the collected log file is subjected to log file data cleaning to obtain a log with a log identifier E, and another piece of log data is filtered to reduce the data volume of the log file and improve the processing speed of the subsequent steps.
In specific implementation, extracting the log record containing the target data in the log file, and determining the log record as a cleaning log file; wherein, the log record containing the target data is the log with the log identifier of E. The target data includes: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
In an embodiment of the present invention, referring to fig. 3, after step S103 of the method for monitoring a failure of an application server, the method specifically includes the following steps:
s105: verifying the fault monitoring result to obtain a fault verification result;
s106: and training the decision tree model according to the fault verification result to obtain an optimized decision tree model.
In this embodiment, the fault monitoring result is verified, so that whether the fault monitoring result is accurate or not can be determined. And taking the fault monitoring result with accurate fault verification result as the training data of the decision tree model, so as to increase the training data of the decision tree model and change the depth of the decision tree. And optimizing the decision tree model through the added training data to obtain a decision tree model with higher precision, so as to improve the prediction precision of the decision tree model.
In an embodiment of the present invention, referring to fig. 4, after step S103 of the method for monitoring a failure of an application server, the method specifically includes the following steps:
s107: and if the fault monitoring result is determined to be the fault of the application server, sending a fault early warning request to the server side so that the server side can acquire the method parameter data from the log file of the application server according to the fault early warning request.
In this embodiment, when an application server failure is predicted, a failure early warning request needs to be sent to a server (background), and after the server receives the failure early warning request, the server acquires method parameter data from a log file of the application server according to the failure early warning request and displays specific error information of the application server.
In an embodiment of the present invention, an embodiment of a method for training a preset decision tree model in a fault monitoring method for an application server specifically includes the following contents:
in the training method of the decision tree model in this embodiment, by performing destructive testing on the application server (destroying the database and the network to make the application service unavailable), the tag data of the application server is extracted (the tag data marks the data to identify whether the data is a positive case or a negative case, and the specific extraction steps are detailed in the following tag extraction part); meanwhile, cleaning the collected log files, and comparing and counting according to the time recorded by the log and the destructive testing time to obtain characteristic data; and training sample data synthesized by the feature data and the label data by using an extreme gradient boosting (XGboost) machine learning algorithm to obtain a decision tree model for judging whether the application server is in fault. The specific operation comprises the following steps:
(1) and (6) data cleaning. The application server records two pieces of log information for one call request of the application, but critical data is only recorded in the log with the log identification E. Therefore, the other log data in the collected log file data is filtered to reduce the data volume and improve the processing speed of the subsequent steps. The critical data refers to calling information, and the calling information comprises; log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-process flow number.
(2) And (5) feature extraction. And extracting the total transaction amount, the transaction failure rate, the maximum time consumption, the average time consumption and the time consumption median of the application server in the latest 10/30/60/180/300 seconds at a certain moment according to the calling information in the log file. And simultaneously, analyzing the log of the operating system, and extracting the disk space utilization rate, the CPU utilization rate and the memory use condition of the application server at the current moment. The specific properties are detailed in table 1 below.
Table 1 detailed table of characteristics
Dimension (d) of Feature(s)
Last 10/30 seconds Maximum elapsed time/average elapsed time/median elapsed time
Last 1/3/5 minutes Maximum elapsed time/average elapsed time/median elapsed time
Last 10/30 seconds Total transaction amount/successful transaction amount/failed transaction amount
Last 1/3/5 minutes Total transaction amount/successful transaction amount/failed transaction amount
Last 10/30 seconds Total technical transaction amount/successful technical transaction amount/failed technical transaction amount
Recently 1/3/5 points Total technical transaction amount/successful technical transaction amount/failed technical transaction amount
Last 10/30 seconds Total business transaction amount/successful business transaction amount/failed business transaction amount
Last 1/3/5 minutes Total business transaction amount/successful business transaction amount/failed business transaction amount
Last 1/3/5 minutes Disk usage
Last 1/3/5 minutes Cpu use cases
Last 1/3/5 minutes Memory usage
(3) And (4) extracting the label. Through uninterrupted destructive testing and non-destructive testing of the application server, an indication of whether the application server has failed at a certain time is extracted. When performing destructive testing, the server is malfunctioning; when performing non-destructive testing, the server is non-failing.
(4) And (4) generating a sample. And determining the characteristic data obtained in the step 2 and the label data obtained in the step 3, and performing correlation operation on the characteristic data and the label data according to time to generate corresponding sample data.
Specific formats can be referred to as follows: labeling: feature 1, feature 2, feature 3, feature 4, feature 5.
(5) And (5) training a model. And (5) training the sample by utilizing an extreme gradient boosting (XGboost) machine learning algorithm according to the sample data obtained in the step 4, and obtaining a corresponding tree model.
It should be noted that extreme gradient boost (XGBoost) is to establish K regression trees, so that the predicted value of the tree group is as close to the true value as possible and has the generalization capability as large as possible. Regression trees are decision tree models that can be used for regression. Each non-leaf node of the tree also needs to be divided into sub-trees according to a certain characteristic, but the value of the characteristic is continuous, and a regression tree corresponds to a division of an input space (namely, a characteristic space) and an output value on a division unit. Decision tree models are a tree structure that is applied to classification and regression. Decision trees are composed of nodes and directed edges, and generally, one decision tree includes one root node, several internal nodes and several leaf nodes. The decision process of the decision tree needs to start from the root node of the decision tree, compare the characteristic nodes in the decision tree of the data domain to be tested, and select the next comparison branch according to the comparison result until the leaf node is used as the final decision result.
As can be seen from the above description, in this embodiment, a decision tree model for predicting whether an application server fails is obtained with high accuracy through measures such as data acquisition, feature generation, model training, and model tuning, and the decision tree model is used to predict whether an application server fails on line in real time, and when an application server fails, an early warning is given. Compared with the prior art, the process is of a pre-existing type, namely, corresponding application support personnel and developers can be informed before the application server fails, and the corresponding application support personnel and the developers can make corresponding solutions according to the related log information, so that the occurrence of production accidents is reduced to a certain extent, and the operation and maintenance efficiency is improved.
An embodiment of the present invention provides a specific implementation manner of a fault monitoring device of an application server, which is capable of implementing all contents in a fault monitoring method of the application server, and referring to fig. 5, the fault monitoring device of the application server specifically includes the following contents:
the acquisition module 10 is used for acquiring a log file and an operating system log corresponding to the calling information on the application server;
an extracting module 20, configured to extract transaction data and time-consuming data on the application server according to the log file, and extract hardware operating data of the application server according to the operating system log;
the monitoring module 30 is configured to determine a fault monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operating data, and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
Further, still include:
the filtering module is used for cleaning the data in the log file to obtain a cleaned log file;
correspondingly, the extraction module comprises:
and the extraction submodule is used for extracting the transaction data and the time-consuming data on the application server according to the cleaning log file.
Wherein the filter module comprises:
the filtering submodule is used for extracting the log record containing the target data in the log file and determining the log record as a cleaning log file; wherein the target data comprises: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
Further, still include:
the checking module is used for checking the fault monitoring result to obtain a fault checking result;
and the optimization module is used for training the decision tree model according to the fault verification result to obtain the optimized decision tree model.
Further, still include:
and the fault early warning module is used for sending a fault early warning request to the server side to enable the server side to acquire the method parameter data from the log file of the application server according to the fault early warning request if the fault monitoring result is determined to be the fault of the application server.
Wherein, the collection module includes:
and the acquisition submodule is used for deploying the log acquisition script to the application server and acquiring the log file and the operating system log through the log acquisition script.
The embodiment of the fault monitoring apparatus of the application server provided by the present invention may be specifically configured to execute the processing procedure of the embodiment of the fault monitoring method of the application server in the foregoing embodiment, and the functions thereof are not described herein again, and reference may be made to the detailed description of the embodiment of the method.
As can be seen from the above description, the fault monitoring apparatus for an application server provided in the embodiments of the present invention obtains a log file and an operating system log corresponding to call information on the application server; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The accuracy and the operation and maintenance efficiency of the fault monitoring of the application server can be effectively improved.
The application provides an embodiment of an electronic device for implementing all or part of contents in a fault monitoring method of an application server, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between related devices; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment of the method for monitoring a failure of the application server and the embodiment of the apparatus for monitoring a failure of the application server in the embodiments, which are incorporated herein, and repeated details are not repeated.
Fig. 6 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 6, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 6 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the fault monitoring functionality of the application server may be integrated into the central processor 9100.
The central processor 9100 may be configured to control as follows:
acquiring a log file and an operating system log corresponding to calling information on an application server; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
As can be seen from the above description, in the electronic device provided in the embodiment of the present application, the log file and the operating system log corresponding to the call information on the application server are obtained; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The accuracy and the operation and maintenance efficiency of the fault monitoring of the application server can be effectively improved.
In another embodiment, the failure monitoring apparatus of the application server may be configured separately from the central processor 9100, for example, the failure monitoring apparatus of the application server may be configured as a chip connected to the central processor 9100, and the failure monitoring function of the application server is implemented by the control of the central processor.
As shown in fig. 6, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 6; further, the electronic device 9600 may further include components not shown in fig. 6, which may be referred to in the art.
As shown in fig. 6, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
An embodiment of the present invention further provides a computer-readable storage medium capable of implementing all the steps in the fault monitoring method of the application server in the above embodiment, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all the steps in the fault monitoring method of the application server in the above embodiment, for example, the processor implements the following steps when executing the computer program:
acquiring a log file and an operating system log corresponding to calling information on an application server; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present invention obtains the log file and the operating system log corresponding to the call information on the application server; extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log; determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server. The accuracy and the operation and maintenance efficiency of the fault monitoring of the application server can be effectively improved.
Although the present invention provides method steps as described in the examples or flowcharts, more or fewer steps may be included based on routine or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, apparatus (system) or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (14)

1. A fault monitoring method for an application server is characterized by comprising the following steps:
acquiring a log file and an operating system log corresponding to calling information on an application server;
extracting transaction data and time consumption data on the application server according to the log file, and extracting hardware operation data of the application server according to the operating system log;
determining a fault monitoring result of an application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
2. The method for monitoring the failure of the application server according to claim 1, after obtaining the log file corresponding to the call information on the application server, further comprising:
cleaning the data in the log file to obtain a cleaned log file;
correspondingly, the step of extracting the transaction data and the time consumption data on the application server according to the log file comprises the following steps:
and extracting transaction data and time-consuming data on the application server according to the cleaning log file.
3. The method for monitoring the failure of the application server according to claim 2, wherein the step of cleaning the data in the log file to obtain a cleaned log file comprises:
extracting log records containing target data in the log files, and determining the log records as cleaning log files; wherein the target data comprises: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
4. The method for monitoring the failure of the application server according to claim 1, wherein after determining the failure monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operating data and a preset decision tree model, the method further comprises:
verifying the fault monitoring result to obtain a fault verification result;
and training the decision tree model according to the fault verification result to obtain an optimized decision tree model.
5. The method for monitoring the failure of the application server according to claim 1, wherein after determining the failure monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operating data and a preset decision tree model, the method further comprises:
and if the fault monitoring result is determined to be the fault of the application server, sending a fault early warning request to the server side so that the server side can acquire the method parameter data from the log file of the application server according to the fault early warning request.
6. The method for monitoring the failure of the application server according to claim 1, wherein the acquiring the log file and the operating system log corresponding to the call information on the application server comprises:
and deploying the log collection script to an application server, and acquiring a log file and an operating system log through the log collection script.
7. A failure monitoring apparatus for an application server, comprising:
the acquisition module is used for acquiring a log file and an operating system log corresponding to the calling information on the application server;
the extraction module is used for extracting transaction data and time consumption data on the application server according to the log file and extracting hardware operation data of the application server according to the operating system log;
the monitoring module is used for determining a fault monitoring result of the application server based on the transaction data, the time-consuming data, the hardware operation data and a preset decision tree model; the decision tree model is used for determining whether the current state of the application server is in fault or not according to transaction data, time-consuming data and hardware operation data of the application server.
8. The failure monitoring device of an application server according to claim 7, further comprising:
the filtering module is used for cleaning the data in the log file to obtain a cleaned log file;
correspondingly, the extraction module comprises:
and the extraction submodule is used for extracting the transaction data and the time-consuming data on the application server according to the cleaning log file.
9. The failure monitoring device of application server according to claim 8, wherein the filtering module comprises:
the filtering submodule is used for extracting the log record containing the target data in the log file and determining the log record as a cleaning log file; wherein the target data comprises: log recording time, log identification, calling time consumption, an upstream node IP address, a method name, method entry and entry parameters, method exit and entry parameters, a method return code, a return code type and a full-flow serial number.
10. The failure monitoring device of an application server according to claim 7, further comprising:
the checking module is used for checking the fault monitoring result to obtain a fault checking result;
and the optimization module is used for training the decision tree model according to the fault verification result to obtain the optimized decision tree model.
11. The failure monitoring device of an application server according to claim 7, further comprising:
and the fault early warning module is used for sending a fault early warning request to the server side to enable the server side to acquire the method parameter data from the log file of the application server according to the fault early warning request if the fault monitoring result is determined to be the fault of the application server.
12. The failure monitoring device of the application server according to claim 7, wherein the collecting module comprises:
and the acquisition submodule is used for deploying the log acquisition script to the application server and acquiring the log file and the operating system log through the log acquisition script.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for fault monitoring of an application server according to any of claims 1 to 6 when executing the program.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for failure monitoring of an application server according to any one of claims 1 to 6.
CN202110352583.4A 2021-03-31 2021-03-31 Fault monitoring method and device of application server Pending CN112860527A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110352583.4A CN112860527A (en) 2021-03-31 2021-03-31 Fault monitoring method and device of application server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110352583.4A CN112860527A (en) 2021-03-31 2021-03-31 Fault monitoring method and device of application server

Publications (1)

Publication Number Publication Date
CN112860527A true CN112860527A (en) 2021-05-28

Family

ID=75992078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110352583.4A Pending CN112860527A (en) 2021-03-31 2021-03-31 Fault monitoring method and device of application server

Country Status (1)

Country Link
CN (1) CN112860527A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640567A (en) * 2022-02-23 2022-06-17 中银金融科技有限公司 Apache log analysis method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114640567A (en) * 2022-02-23 2022-06-17 中银金融科技有限公司 Apache log analysis method and device

Similar Documents

Publication Publication Date Title
CN112612675B (en) Distributed big data log link tracking method and system under micro-service architecture
CN111209131A (en) Method and system for determining fault of heterogeneous system based on machine learning
CN110442498B (en) Abnormal data node positioning method and device, storage medium and computer equipment
CN110046073B (en) Log collection method and device, equipment and storage medium
CN112395156A (en) Fault warning method and device, storage medium and electronic equipment
CN109005162B (en) Industrial control system security audit method and device
CN109862396A (en) A kind of analysis method of video code flow, electronic equipment and readable storage medium storing program for executing
CN113946499A (en) Micro-service link tracking and performance analysis method, system, equipment and application
CN115809183A (en) Method for discovering and disposing information-creating terminal fault based on knowledge graph
CN113360722A (en) Fault root cause positioning method and system based on multidimensional data map
CN109409948B (en) Transaction abnormity detection method, device, equipment and computer readable storage medium
CN111339052A (en) Unstructured log data processing method and device
CN112860527A (en) Fault monitoring method and device of application server
CN109284331B (en) Certificate making information acquisition method based on service data resources, terminal equipment and medium
CN114138601A (en) Service alarm method, device, equipment and storage medium
CN108613820A (en) A kind of online allophone monitoring algorithm for GIS bulk mechanicals defect diagonsis and positioning
CN111324583B (en) Service log classification method and device
CN113123955A (en) Plunger pump abnormality detection method and device, storage medium and electronic device
CN116244202A (en) Automatic performance test method and device
CN114500178B (en) Self-operation intelligent Internet of things gateway
CN115525392A (en) Container monitoring method and device, electronic equipment and storage medium
CN116302989A (en) Pressure testing method and system, storage medium and computer equipment
CN113626236B (en) Fault diagnosis method, device, equipment and medium for distributed file system
CN114118440A (en) Model iteration method, model iteration device, electronic equipment and computer readable storage medium
CN111639249A (en) Automatic monitoring method, device and equipment for user feedback error reporting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination