CN112433926B - IT product-based fault analysis method, system, equipment and storage medium - Google Patents

IT product-based fault analysis method, system, equipment and storage medium Download PDF

Info

Publication number
CN112433926B
CN112433926B CN202011357555.3A CN202011357555A CN112433926B CN 112433926 B CN112433926 B CN 112433926B CN 202011357555 A CN202011357555 A CN 202011357555A CN 112433926 B CN112433926 B CN 112433926B
Authority
CN
China
Prior art keywords
data
fault
health examination
health
result data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011357555.3A
Other languages
Chinese (zh)
Other versions
CN112433926A (en
Inventor
蒋钊
刘富林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202011357555.3A priority Critical patent/CN112433926B/en
Publication of CN112433926A publication Critical patent/CN112433926A/en
Application granted granted Critical
Publication of CN112433926B publication Critical patent/CN112433926B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/80Database-specific techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure provides a fault analysis method, system, device and storage medium based on IT products. The fault analysis method comprises the following steps: after structuring health examination data and health examination result data from various sources, storing and acquiring incremental data according to corresponding attribute information; calculating a failure parameter in a first predetermined period, a failure number in a second predetermined period, and a failure number average value in a third predetermined period based on the incremental data of the health check result data; and determining that the operation and maintenance environment is abnormal when the ratio between the number of faults in the second preset time period and the average value of the number of faults in the third preset time period is out of the range defined by the fault parameters. According to the fault analysis method, the health examination original data of various sources are subjected to centralized management and analysis in the time dimension, and the health examination original data in different fields are analyzed according to the universal index, so that the stability of the operation and maintenance environment can be determined.

Description

IT product-based fault analysis method, system, equipment and storage medium
Technical Field
The present invention relates to the field of computer application technology, and more particularly, to a fault analysis method, system, device and storage medium based on IT products (information technology products).
Background
In the existing operation and maintenance mode, the collection, management, statistics and analysis of fault information are separated according to the type and operation and maintenance indexes of each operation and maintenance device. There is no interactive correlation between the data, nor has a comparison analysis and correlation analysis been performed. This mode of management and analysis does not fully describe the health of the device nor reduces the usability of the data. The analysis scene, the analysis index and the like cannot be applied across fields, so that fault judgment among different grades and different judgment standards is inaccurate, and part of operation and maintenance resources are consumed in the judgment and analysis of a single fault scene.
For example, the health examination methods currently used in various fields are mainly classified into the following three types:
the first type is to acquire the health condition of the current product through the instructions carried by each product. The method generally relies on the self-checking characteristics of the product, and the judging basis and the judging standard are set by the product. Such as health checks of the storage device, health checks of the server, and the status of the device, is only to directly obtain the information without any processing or analysis.
The second category is to screen out information with definite meaning manually or automatically after the history records are acquired by the instructions of the products. The method is mainly characterized in that the historical record of the product in the past period of time is simply analyzed, and related information is obtained according to manually established standards. Such as fault log analysis in server health checks.
Thirdly, the resource use condition of the current product is obtained through the instructions of the products, and then judgment is carried out according to the index customized by people. The method mainly comprises the steps of defining a threshold value in advance, checking the current resource use condition of the product in real time, judging, and judging that the resource use exceeds the set threshold value to be abnormal. Such as a check of storage logical volume usage.
Therefore, the prior health examination has different data sources at the bottom layer, different middle analysis modes, diversified upper index requirements and differences among fields, so that the original data is not fully utilized. And secondly, the current examination item is basically the current health state of the equipment, and no index for comparing with the historical health state and missing the time dimension exists.
Disclosure of Invention
In order to solve the problems or part of the problems in the prior art, embodiments of the present invention provide a fault analysis method, system, device and storage medium based on IT products, which perform centralized management and analysis on health inspection data and health inspection result data from multiple sources in a time dimension, and analyze health inspection raw data in different fields according to a general index to determine stability of an operation and maintenance environment.
According to a first aspect of the present invention, an embodiment of the present invention provides a fault analysis method based on IT products, including: respectively acquiring health check data and health check result data of the IT product according to a data source; carrying out structural processing on the health examination data and the health examination result data, and storing attribute information corresponding to the health examination data and the health examination result data subjected to structural processing; acquiring health examination data and incremental data of the health examination result data based on the stored health examination data and the health examination result data; calculating a fault parameter for a first predetermined period of time based on the delta data of the health check result data; acquiring the fault quantity of the IT product in a second preset time period and the average value of the fault quantity in a third preset time period according to the increment data of the health examination result data; and determining that the operation and maintenance environment of the IT product is abnormal when the ratio between the number of faults in the second preset time period and the average value of the number of faults in the third preset time period is out of the range defined by the fault parameters.
According to the embodiment of the invention, the health examination source data and the health examination result data of different fields can be stored in the same unified position by carrying out structural processing on the health examination data and the health examination result data of different sources and storing the health examination source data and the health examination result data according to the attribute information, so that cross-field analysis is facilitated. And moreover, the abnormal condition of the operation and maintenance environment of the IT product is judged based on the fault parameters acquired by the incremental data, the original data can be fully utilized, the health check result is analyzed from the time dimension layer, and the stability of the current operation and maintenance environment is judged.
In some embodiments of the invention, calculating the fault parameter for the first predetermined period of time based on the delta data of the health check result data comprises: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
In some embodiments of the invention, the fault analysis method further comprises: acquiring fault statistical data of the IT product in a fourth preset time period according to the increment data of the health examination result data; acquiring a fault record curve based on the fault statistics data; and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve.
According to the embodiment of the invention, the fault condition of the operation and maintenance environment of the IT product is predicted through the fault record curve, so that the occurrence of faults can be prevented according to early warning, and the safety and reliability of the operation and maintenance environment are ensured.
In some embodiments of the invention, the fault analysis method further comprises: calculating the minimum square error of the fault mean value in the fourth preset time period based on the fault statistical data; and predicting a fault value range of the fault condition according to the calculated minimum square error.
According to the embodiment of the invention, through predicting the fault value range of the fault condition, more accurate fault prediction information can be provided for operation and maintenance personnel, and the stability of the operation and maintenance environment is further improved.
According to a second aspect of the present invention, an embodiment of the present invention provides a fault analysis system based on IT products, including: the source data acquisition module is used for respectively acquiring health check data and health check result data of the IT product according to a data source; the data storage module is used for carrying out structural processing on the health examination data and the health examination result data and storing attribute information corresponding to the health examination data and the health examination result data which are subjected to structural processing; the incremental data acquisition module is used for acquiring the incremental data of the health examination data and the health examination result data based on the stored health examination data and the health examination result data; a fault parameter calculation module for calculating a fault parameter within a first predetermined period of time based on the delta data of the health examination result data; the fault quantity calculation module is used for acquiring the fault quantity of the IT product in a second preset time period and the fault quantity average value in a third preset time period according to the increment data of the health examination result data; and the abnormality judgment module is used for determining that the operation and maintenance environment of the IT product is abnormal when the ratio between the number of faults in the second preset time period and the average value of the number of faults in the third preset time period exceeds the range defined by the fault parameters.
According to the embodiment of the invention, the health examination source data and the health examination result data of different fields can be stored in the same unified position by carrying out structural processing on the health examination data and the health examination result data of different sources and storing the health examination source data and the health examination result data according to the attribute information, so that cross-field analysis is facilitated. And moreover, the abnormal condition of the operation and maintenance environment of the IT product is judged based on the fault parameters acquired by the incremental data, the original data can be fully utilized, the health check result is analyzed from the time dimension layer, and the stability of the current operation and maintenance environment is judged.
In some embodiments of the invention, calculating the fault parameter for the first predetermined period of time based on the delta data of the health check result data comprises: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
In some embodiments of the present invention, the fault analysis system further comprises a fault prediction module for performing the following operations: acquiring fault statistical data of the IT product in a fourth preset time period according to the increment data of the health examination result data; acquiring a fault record curve based on the fault statistics data; and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve.
According to the embodiment of the invention, the fault condition of the operation and maintenance environment of the IT product is predicted through the fault record curve, so that the occurrence of faults can be prevented according to early warning, and the safety and reliability of the operation and maintenance environment are ensured.
In some embodiments of the present invention, the fault prediction module is further configured to perform the following operations: calculating the minimum square error of the fault mean value in the fourth preset time period based on the fault statistical data; and predicting a fault value range of the fault condition according to the calculated minimum square error.
According to the embodiment of the invention, through predicting the fault value range of the fault condition, more accurate fault prediction information can be provided for operation and maintenance personnel, and the stability of the operation and maintenance environment is further improved.
According to a third aspect of the present invention, embodiments provide a computer storage medium having stored thereon computer readable instructions which, when executed by a processor, cause a computer to perform the operations of: the operations include steps involved in the fault analysis method according to any of the embodiments above.
According to a fourth aspect of the present invention, embodiments of the present invention provide a computer device comprising a memory and a processor, the memory being configured to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, enable the implementation of the fault analysis method according to any one of the embodiments above.
As can be seen from the foregoing, according to the fault analysis method, system, storage medium and device based on IT products provided by the embodiments of the present invention, raw data of health check can be centrally managed across fields by performing structural processing on raw data of multiple sources and storing the raw data according to attribute information, so as to implement joint analysis and comparison across fields. Meanwhile, based on fault parameters acquired by the incremental data, abnormal conditions of the operation and maintenance environment of the IT product are judged, and the health check result is analyzed in the time dimension to judge the stability of the current operation and maintenance environment.
Drawings
FIG. 1 is a flow diagram of a method of IT product-based fault analysis in accordance with one embodiment of the present invention;
FIG. 2 is an architecture diagram of an IT product-based fault analysis system in accordance with one embodiment of the present invention.
Detailed Description
Various aspects of the invention are described in detail below with reference to the drawings and detailed description. Well-known modules, units, and their connections, links, communications, or operations between each other are not shown or described in detail. Also, the described features, architectures, or functions may be combined in any manner in one or more implementations. It will be appreciated by those skilled in the art that the various embodiments described below are for illustration only and are not intended to limit the scope of the invention. It will be further appreciated that the modules or units or processes of the embodiments described herein and illustrated in the drawings may be combined and designed in a wide variety of different configurations.
The following is a brief description of the terminology used herein.
IT products: information Technology, information technology, and relates to a plurality of products such as upper layer applications, operating systems, middleware, servers of bottom layer hardware, network equipment, and the like.
FIG. 1 is a flow diagram of a method of IT product-based fault analysis in accordance with one embodiment of the present invention. The running environment of the IT product is front-end and back-end separation: the back end uses a django framework (a Web application framework with open source code written by a cross-platform computer programming language Python) and is realized by a Python language; the front end uses vue framework (a set of progressive frameworks for building user interfaces) and the front end model is ant design.
As shown in fig. 1, in one embodiment of the present invention, the method may include: step S11, step S12, step S13, step S14, step S15, and step S16, which are specifically described below.
In step S11, health check data and health check result data of IT products are obtained according to the data sources, respectively. In an alternative embodiment, due to the variety of data, data needs to be acquired respectively according to different data sources when acquiring the data. For example:
1. a file saved as txt (text file) will open the read information by open () function (in Python language, for opening the file);
2. the data stored as an Excel file (form file) is opened to read information through an xlrd (extension tool for reading the Excel file in a Python language) module;
3. the file stored as the database can be obtained by directly accessing the database or can be read by a json module after being exported by a json (a data format) file.
In step S12, the health check data and the health check result data are structured, and the attribute information corresponding to the structured health check data and the health check result data is stored. Therefore, the source data (health check data and health check result data) among different fields can be associated according to a specific relation and then analyzed, for example, the temperature and humidity data of the machine room environment and the fault number of storage equipment and the like are compared, and the server and the storage of the same machine room are compared and the like.
In an alternative embodiment, the data types are various, the acquired data (health examination data and health examination result data) are subjected to preliminary structuring processing to become standard data, the data which is convenient for centralized analysis and display are placed in one table to be processed, and the data which is inconvenient for centralized analysis and display are respectively stored in different tables. The back-end used django framework is provided with a warehouse-in operation function and is used for storing acquired data.
In another alternative embodiment, different functions are written according to different data to preprocess the data, and the preprocessed data are stored separately according to the characteristics/attributes of the preprocessed data.
In other alternative embodiments, information is obtained and structured by the pandas module (a data analysis package of python) for data that can be centrally analyzed and presented, and data with the same meaning are placed in the same column. For example, the stored failure information and the device name and serial number in the failure information of the server are placed in the same column, the failed component is placed in a column, the failure location is placed in a column, and the like.
Optionally, the data which cannot be analyzed and displayed in a centralized manner temporarily are respectively stored in different tables according to the acquired data, and then the data are adjusted if the data can be correlated and arranged with other data.
In step S13, incremental data of the health check data and the health check result data is acquired based on the stored health check data and health check result data. In an alternative embodiment, the health check data and health check result data stored in step S12 are obtained by mysql (a relational database management system of open source code), after which some basic processing calculations (e.g., obtaining incremental data, obtaining statistical data, etc.) are performed by numpy module (an open source numerical calculation extension of Python).
In an alternative embodiment, the incremental data cannot be obtained directly, since the data currently obtained is mostly current information and is not compared with the historical information. Therefore, according to the add_value () function (incremental data acquisition function), the latest data can be compared with a certain piece of data in the history by inputting the table name, the required data name, the filtering condition and the time span, and the added data can be acquired.
In other alternative embodiments, statistics may be obtained based on the stored health examination data and health examination result data, so that the obtained scattered data is counted, specifically, since the obtained data sources are inconsistent, the statistics are not too large to be calculated separately, and therefore, when the data in different fields are put together, the statistics of a unified time dimension need to be compared. Optionally, according to the statistical function account_value (), the statistical data in the current time condition can be obtained by inputting the table name, the data name and the screening condition and the time condition.
In addition, the health examination data and the health examination result data from various sources can be processed in various data processing modes such as the mean value, so that the original data can be processed according to the designed application scene standard, and corresponding analysis and judgment can be performed on the processed data.
In step S14, a fault parameter is calculated for a first predetermined period of time based on the delta data of the health check result data. In an alternative embodiment, calculating the fault parameter for the first predetermined period of time based on the delta data of the health check result data may specifically include: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
In step S15, the number of faults of the IT product in the second predetermined period of time and the average value of the number of faults in the third predetermined period of time are obtained according to the increment data of the health check result data.
In step S16, an operation and maintenance environment abnormality of the IT product is determined when a ratio between the number of faults in the second predetermined period and an average value of the number of faults in a third predetermined period is out of a range defined by the fault parameter.
The specific values of "first predetermined period", "second predetermined period", and "third predetermined period" in step S14 and step S15 are set by the operation and maintenance personnel.
The invention provides an example for judging whether the operation and maintenance environment of an IT product is abnormal or not according to the fault analysis method:
and obtaining the mean square error of the number of faults in the past month (namely, the first preset time period) according to the increment data of the health check result data, and calculating to obtain the fault parameters of 0.2 and 1.8, wherein the normal range defined by the fault parameters is 0.2-1.8. When the ratio between the number of faults on a single day (i.e., the second predetermined period of time) and the average value of the number of faults on the previous 10 days (i.e., the third predetermined period of time) exceeds 1.8 or is less than 0.2, it is determined that the operation and maintenance condition on the same day has a fault rate abnormality.
According to this example, the fault analysis method described above does not have any manually established values other than the time range. The acquired fault parameters are values generated through actual data calculation, and the operation condition of the operation and maintenance environment of the IT product can be confirmed in the time dimension based on the values. And the difference between the current equipment health check result and the historical health check result can be judged through secondary analysis, and further information such as whether the current operation and maintenance environment is stable or not can be estimated.
By adopting the method of the embodiment of the invention, the health examination source data and the health examination result data of different sources are structured and stored according to the attribute information, so that the health examination source data and the health examination result in different fields can be stored in a unified position for centralized management and analysis, and cross-field joint analysis and comparison can be conveniently realized. And by calculating general class indexes (fault parameters) and analyzing the health state of the operation and maintenance environment of the IT product in the time dimension, the original data can be fully utilized and the health check result can be secondarily analyzed in the time dimension so as to determine the stability of the current operation and maintenance environment.
In an alternative embodiment, acquiring fault statistics of the IT product in a fourth predetermined period of time based on the delta data of the health check result data; acquiring a fault record curve based on the fault statistics data; and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve. Wherein the specific value of the fourth predetermined period of time is set by the operation and maintenance personnel. Optionally, calculating a least squares difference of the mean value of the faults in the fourth predetermined period of time based on the fault statistics; and predicting a fault value range of the fault condition according to the calculated minimum square error.
The present invention provides an example of predicting a fault condition of an operational environment of an IT product according to the above-described alternative embodiments:
A. acquiring fault statistics of a previous month (i.e., a fourth predetermined period) of the current log;
B. calculating the minimum square error of the fault mean value of the previous month by a statistical method;
C. calculating a curve which most accords with the current historical fault record through a cooling algorithm;
D. and predicting possible faults on the same day according to the curve calculated in the step C.
E. And (5) predicting the range of possible fault values on the same day according to the least square deviation and the predicted fault values.
The fault condition of the operation and maintenance environment of the IT product is predicted through the fault record curve, the fault value range of the fault condition is predicted, the occurrence of faults can be prevented according to early warning, more accurate fault prediction information is provided for operation and maintenance personnel, and the safety and reliability of the operation and maintenance environment are ensured.
In another alternative embodiment, stability analysis for the operational aspect of an IT product may be accomplished by:
a. acquiring fault statistical data of a month before the current date;
b. calculating the mean square error of the fault value of the previous month by a statistical method;
c. and calculating the mean value plus-minus mean value square difference of the fault values of the previous month by a statistical method to obtain data with larger errors in the previous month.
d. And calculating the ratio of the data with larger error to the total number.
e. And d, calculating the ratio between the current fault value and the mean value, and judging that the current fault value is data with larger error when the ratio exceeds the ratio obtained in the step d, namely, the current operation and maintenance condition is unstable.
In other alternative embodiments, the fault information acquisition process may be presented after analysis. Optionally, the modules built at the front end by vue and antdesign are presented. Wherein the presentations can be divided into two categories:
one type is data presentation, which is presented in a table.
The other is a statistical graph, which is partially displayed by echarties (Enterprise Charts, business-level data charts, providing visual, vivid, interactive, highly personalized, customizable data visualization charts).
The data display part directly obtains each health examination result from the database and then directly displays the health examination result.
The statistical diagram display is to obtain data from a database, sort the data, and display the data according to different display parameters, such as Line (Line Chart), bar (Bar Chart), pie (Pie Chart), and other different charts. Meanwhile, the chart supports data drilling, and deeper data can be acquired through the operation of the chart.
In addition, besides the two data displays, there is a small amount of analysis result display, and the part of data display is directly returned to the front end for display in a text format, and the color and the font of the display are changed according to the result.
Through the data display, more visual and clear operation and maintenance conditions are provided for operation and maintenance personnel.
FIG. 2 is an architecture diagram of an IT product-based fault analysis system in accordance with one embodiment of the present invention.
As shown in fig. 2, the fault analysis system includes:
the source data obtaining module 210 is configured to obtain health check data and health check result data of the IT product according to a data source.
The data storage module 220 is configured to perform a structuring process on the health check data and the health check result data, and store attribute information corresponding to the health check data and the health check result data that are subjected to the structuring process.
The incremental data obtaining module 230 is configured to obtain incremental data of the health check data and the health check result data based on the stored health check data and health check result data.
The fault parameter calculation module 240 is configured to calculate a fault parameter within a first predetermined period of time based on the delta data of the health check result data. In an alternative embodiment, calculating the fault parameter for the first predetermined period of time based on the delta data of the health check result data may specifically include: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
The failure number calculation module 250 is configured to obtain the failure number of the IT product in the second predetermined period of time and the failure number average value in the third predetermined period of time according to the increment data of the health check result data.
An anomaly determination module 260, configured to determine that the operation and maintenance environment of the IT product is abnormal when a ratio between the number of faults in the second predetermined period and an average value of the number of faults in a third predetermined period is out of a range defined by the fault parameter.
A fault prediction module 270, configured to obtain fault statistics of the IT product in a fourth predetermined period according to the incremental data of the health check result data; acquiring a fault record curve based on the fault statistics data; and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve. Optionally, the fault prediction module 270 is further configured to calculate a least squares difference of the mean value of the fault in the fourth predetermined period based on the fault statistics; and predicting a fault value range of the fault condition according to the calculated minimum square error.
The specific values of "first predetermined period", "second predetermined period", "third predetermined period", and "fourth predetermined period" in each module are set by the operation and maintenance personnel.
The data display module 280 is configured to process and analyze the obtained health check data and the health check result data, and display the information of the fault condition. Optionally, the data display module 280 is constructed at the front end through vue and ant design for display. Wherein the presentations can be divided into two categories:
one type is data presentation, which is presented in a table.
The other is a statistical graph, which is partially displayed by echarties (Enterprise Charts, business-level data charts, providing visual, vivid, interactive, highly personalized, customizable data visualization charts).
The data display part directly obtains each health examination result from the database and then directly displays the health examination result.
The statistical diagram display is to obtain data from a database, sort the data, and display the data according to different display parameters, such as Line (Line Chart), bar (Bar Chart), pie (Pie Chart), and other different charts. Meanwhile, the chart supports data drilling, and deeper data can be acquired through the operation of the chart.
In addition, besides the two data displays, there is a small amount of analysis result display, and the part of data display is directly returned to the front end for display in a text format, and the color and the font of the display are changed according to the result.
By adopting the fault analysis system of the embodiment of the invention, the health examination source data and the health examination result data of different sources are structured and stored according to the attribute information, so that the health examination source data and the health examination result in different fields can be stored in a unified position for centralized management and analysis, and cross-field joint analysis and comparison can be conveniently realized. And by calculating general class indexes (fault parameters) and analyzing the health state of the operation and maintenance environment of the IT product in the time dimension, the original data can be fully utilized and the health check result can be secondarily analyzed in the time dimension so as to determine the stability of the current operation and maintenance environment. In addition, the fault condition of the operation and maintenance environment of the IT product is predicted through the fault record curve, the fault value range of the fault condition is predicted, the fault can be prevented according to early warning, more accurate fault prediction information is provided for operation and maintenance personnel, and the safety and reliability of the operation and maintenance environment are guaranteed. And through the display of the data, more visual and clear operation and maintenance conditions are provided for operation and maintenance personnel, so that the subsequent fault treatment is facilitated.
From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software in combination with a hardware platform. With such understanding, all or part of the technical solution of the present invention contributing to the background art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the various embodiments or parts of the embodiments of the present invention.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, on which computer readable instructions or programs are stored, which when executed by a processor, cause the computer to perform the following operations: the operation includes steps included in the fault analysis method according to any one of the foregoing embodiments, which are not described herein. Wherein the storage medium may include: such as optical disks, hard disks, floppy disks, flash memory, magnetic tape, etc.
In addition, the embodiment of the invention further provides a computer device comprising a memory and a processor, wherein the memory is used for storing one or more computer instructions or programs, and the one or more computer instructions or programs can implement the fault analysis method according to any one of the embodiments. The computer device may be, for example, a server, a desktop computer, a notebook computer, or the like.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting thereof; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention. The scope of the invention should therefore be pointed out in the appended claims.

Claims (8)

1. A fault analysis method based on IT information technology products, characterized in that the fault analysis method comprises:
respectively acquiring health check data and health check result data of the IT product according to the data source;
carrying out structural processing on the health examination data and the health examination result data, and storing attribute information corresponding to the health examination data and the health examination result data subjected to structural processing;
acquiring health examination data and incremental data of the health examination result data based on the stored health examination data and the health examination result data;
calculating a fault parameter for a first predetermined period of time based on the delta data of the health check result data;
acquiring the fault quantity of the IT product in a second preset time period and the average value of the fault quantity in a third preset time period according to the increment data of the health examination result data;
determining that the operation and maintenance environment of the IT product is abnormal when the ratio between the number of faults in the second preset time period and the average value of the number of faults in the third preset time period exceeds the range defined by the fault parameters;
wherein calculating a fault parameter for a first predetermined period of time based on the delta data of the health check result data comprises: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
2. The fault analysis method of claim 1, wherein the fault analysis method further comprises:
acquiring fault statistical data of the IT product in a fourth preset time period according to the increment data of the health examination result data;
acquiring a fault record curve based on the fault statistics data;
and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve.
3. The fault analysis method of claim 2, wherein the fault analysis method further comprises:
calculating the minimum square error of the fault mean value in the fourth preset time period based on the fault statistical data;
and predicting a fault value range of the fault condition according to the calculated minimum square error.
4. A fault analysis system based on IT products, the fault analysis system comprising:
the source data acquisition module is used for respectively acquiring health check data and health check result data of the IT product according to a data source;
the data storage module is used for carrying out structural processing on the health examination data and the health examination result data and storing attribute information corresponding to the health examination data and the health examination result data which are subjected to structural processing;
the incremental data acquisition module is used for acquiring the incremental data of the health examination data and the health examination result data based on the stored health examination data and the health examination result data;
a fault parameter calculation module for calculating a fault parameter within a first predetermined period of time based on the delta data of the health examination result data;
the fault quantity calculation module is used for acquiring the fault quantity of the IT product in a second preset time period and the fault quantity average value in a third preset time period according to the increment data of the health examination result data;
an abnormality determination module configured to determine that an operation and maintenance environment of the IT product is abnormal when a ratio between a number of faults in the second predetermined period of time and an average value of the number of faults in a third predetermined period of time exceeds a range defined by the fault parameter;
wherein calculating a fault parameter for a first predetermined period of time based on the delta data of the health check result data comprises: and calculating the mean square difference of the fault quantity of the IT product in a first preset time period according to the increment data of the health examination result data, and taking the mean square difference as the fault parameter.
5. The fault analysis system of claim 4, further comprising a fault prediction module to:
acquiring fault statistical data of the IT product in a fourth preset time period according to the increment data of the health examination result data;
acquiring a fault record curve based on the fault statistics data;
and predicting the fault condition of the operation and maintenance environment of the IT product according to the fault record curve.
6. The fault analysis system of claim 5, wherein the fault prediction module is further configured to:
calculating the minimum square error of the fault mean value in the fourth preset time period based on the fault statistical data;
and predicting a fault value range of the fault condition according to the calculated minimum square error.
7. A computer storage medium storing computer software instructions for execution by a processor to implement the fault analysis method of any one of claims 1-3.
8. A computer device comprising a memory and a processor;
the method of any of claims 1-3, wherein the memory is configured to store one or more computer instructions that are executed by the processor to implement the fault analysis method of any of claims 1-3.
CN202011357555.3A 2020-11-27 2020-11-27 IT product-based fault analysis method, system, equipment and storage medium Active CN112433926B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011357555.3A CN112433926B (en) 2020-11-27 2020-11-27 IT product-based fault analysis method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011357555.3A CN112433926B (en) 2020-11-27 2020-11-27 IT product-based fault analysis method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112433926A CN112433926A (en) 2021-03-02
CN112433926B true CN112433926B (en) 2024-03-01

Family

ID=74699259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011357555.3A Active CN112433926B (en) 2020-11-27 2020-11-27 IT product-based fault analysis method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112433926B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776102A (en) * 2016-12-27 2017-05-31 中国建设银行股份有限公司 A kind of application system health examination method and system
CN109376877A (en) * 2018-10-11 2019-02-22 华自科技股份有限公司 Equipment O&M method for early warning, device, computer equipment and storage medium
CN111047082A (en) * 2019-12-02 2020-04-21 广州智光电气股份有限公司 Early warning method and device for equipment, storage medium and electronic device
CN111176872A (en) * 2019-12-12 2020-05-19 北京邮电大学 Monitoring data processing method, system, device and storage medium for IT operation and maintenance
WO2020119369A1 (en) * 2018-12-13 2020-06-18 平安普惠企业管理有限公司 Intelligent it operation and maintenance fault positioning method, apparatus and device, and readable storage medium
CN111459129A (en) * 2020-03-04 2020-07-28 辽宁工程技术大学 Method for determining importance of fault event in fault process of electrical system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776102A (en) * 2016-12-27 2017-05-31 中国建设银行股份有限公司 A kind of application system health examination method and system
CN109376877A (en) * 2018-10-11 2019-02-22 华自科技股份有限公司 Equipment O&M method for early warning, device, computer equipment and storage medium
WO2020119369A1 (en) * 2018-12-13 2020-06-18 平安普惠企业管理有限公司 Intelligent it operation and maintenance fault positioning method, apparatus and device, and readable storage medium
CN111047082A (en) * 2019-12-02 2020-04-21 广州智光电气股份有限公司 Early warning method and device for equipment, storage medium and electronic device
CN111176872A (en) * 2019-12-12 2020-05-19 北京邮电大学 Monitoring data processing method, system, device and storage medium for IT operation and maintenance
CN111459129A (en) * 2020-03-04 2020-07-28 辽宁工程技术大学 Method for determining importance of fault event in fault process of electrical system

Also Published As

Publication number Publication date
CN112433926A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
AU2019253860B2 (en) Data quality analysis
US8370181B2 (en) System and method for supply chain data mining and analysis
US20080244319A1 (en) Method and Apparatus For Detecting Performance, Availability and Content Deviations in Enterprise Software Applications
EP3514700A1 (en) Dynamic outlier bias reduction system and method
US10380528B2 (en) Interactive approach for managing risk and transparency
US20080217005A1 (en) Automated oil well test classification
US20090158189A1 (en) Predictive monitoring dashboard
CN109934268B (en) Abnormal transaction detection method and system
US20110252394A1 (en) Method and system for software developer guidance based on analyzing project events
US9817742B2 (en) Detecting hardware and software problems in remote systems
CN109791808B (en) Remote data analysis and diagnosis
US20120116827A1 (en) Plant analyzing system
CN111177139A (en) Data quality verification monitoring and early warning method and system based on data quality system
WO2021126688A1 (en) Automated chromatogram analysis for blood test evaluation
US20130152045A1 (en) Software internationalization estimation model
US8543552B2 (en) Detecting statistical variation from unclassified process log
US8224690B2 (en) Graphical risk-based performance measurement and benchmarking system and method
JP6975086B2 (en) Quality evaluation method and quality evaluation equipment
CN114202256A (en) Architecture upgrading early warning method and device, intelligent terminal and readable storage medium
CN112433926B (en) IT product-based fault analysis method, system, equipment and storage medium
US9373084B2 (en) Computer system and information presentation method using computer system
CN110688273B (en) Classification model monitoring method and device, terminal and computer storage medium
US20130061201A1 (en) System and method for determining defect trends
US11934776B2 (en) System and method for measuring user experience of information visualizations
CN113393169B (en) Financial industry transaction system performance index analysis method based on big data technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant