CN115098326A - System anomaly detection method and device, storage medium and electronic equipment - Google Patents

System anomaly detection method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN115098326A
CN115098326A CN202210722199.3A CN202210722199A CN115098326A CN 115098326 A CN115098326 A CN 115098326A CN 202210722199 A CN202210722199 A CN 202210722199A CN 115098326 A CN115098326 A CN 115098326A
Authority
CN
China
Prior art keywords
data
algorithm
target data
result
related data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210722199.3A
Other languages
Chinese (zh)
Inventor
王君
杨晓勤
李世宁
张明
安卫杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202210722199.3A priority Critical patent/CN115098326A/en
Publication of CN115098326A publication Critical patent/CN115098326A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a system anomaly detection method and device, a storage medium and an electronic device, comprising: the method comprises the steps of carrying out data processing on system related data of a service system acquired in real time in advance to obtain target data, storing the target into a database of a big data platform, obtaining the target data stored in the database of the big data platform in advance according to a preset time period, and carrying out anomaly detection on the target data by utilizing an expert rule, an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm to obtain an anomaly detection result. Therefore, the large data platform and the service system to be detected are deployed in different servers, so that the service system is subjected to anomaly detection in the large data platform, the load of the service system cannot be increased, and the target data is subjected to anomaly detection through an expert rule, an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.

Description

System anomaly detection method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of fault monitoring technologies, and in particular, to a method and an apparatus for detecting system abnormality, a storage medium, and an electronic device.
Background
The safety and stability of the operation of the service system are guaranteed to be the first key of operation and maintenance work of the data center, and the risk potential of the system can be timely found by executing regular health examination aiming at the service system every day, so that a manager can deal with the risk potential in advance.
In the prior art, a preset check script is executed in a server for deploying a service system, and whether the service system has a risk is judged according to the output of the script. However, when a preset check script is directly executed in a server deploying a service system, a system load may be generated, and a high operation risk may be caused in the case of a shortage of system resources, and since the script logic cannot be too complex, only a simple manner of determining whether the service system is abnormal through a threshold, a keyword, and the like is supported, so that the accuracy of detecting the abnormality is low.
Disclosure of Invention
The application provides a system anomaly detection method and device, a storage medium and electronic equipment, and aims to solve the problems of high operation risk of system load and low anomaly detection accuracy in the prior art.
In order to achieve the above object, the present application provides the following technical solutions:
a system anomaly detection method is applied to a big data platform, the big data platform and a service system to be detected are deployed in different servers, and the method comprises the following steps:
acquiring target data prestored in a database of the big data platform according to a preset time period; the target data is obtained by data processing based on system related data of a service system acquired in real time;
carrying out anomaly detection on the target data by utilizing each preset algorithm included in the expert rules and the algorithm library to obtain an anomaly detection result; the algorithm library comprises an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
The method described above, optionally, the process of pre-storing the target data in the database includes:
collecting system related data of a service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform;
acquiring system related data stored in the data resource pool;
and carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
Optionally, the above method, performing data processing on the system-related data to obtain target data, includes:
carrying out data quality inspection on the system related data;
after the system related data passes the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule;
performing data content fusion on the system related data after the data relation association;
and carrying out format standardization processing on the system related data after the data content fusion to obtain target data of the system related data.
Optionally, the above method, where the target data is subjected to anomaly detection by using the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly detection result, includes:
carrying out rule inspection on the target data by using an expert rule to obtain a rule inspection result;
performing outlier risk analysis on system indexes in the target data by using an outlier algorithm to obtain an outlier risk result of a server deploying the service system;
based on the target data, a trend prediction algorithm is utilized to predict the system index reachable value of the service system, and based on the system index reachable value and a preset capacity threshold value, a first result of whether the system index of the service system is abnormal is obtained;
performing time sequence index feature extraction on the target data to obtain a time sequence index feature in the current time period, and comparing the time sequence index feature in the current time period with the time sequence index feature in the previous time period by using a comparison algorithm to obtain a comparison result;
identifying a system index mutation mode in the current time period based on the target data, and obtaining a second result of whether the system index mutation mode in the current time period is abnormal or not by utilizing an inverse rule algorithm based on the system index mutation mode in the current time period and a preset historical system index mutation mode;
and forming an abnormal inspection result of the business system by using the rule inspection result, the outlier risk result, the first result, the comparison result and the second result.
Optionally, the method further includes, after performing anomaly check on the target data by using the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly check result:
calling a preset report template based on the abnormal checking result to generate a checking report of the service system;
if the abnormal inspection result comprises a risk item, pushing the risk item to a manager terminal, and generating processing tracking information of the risk item; and the wind direction item is a data item which is included in the abnormal inspection result and is used for representing the existence of abnormal risks.
A system anomaly detection device is applied to a big data platform, the big data platform and a service system to be detected are deployed in different servers, and the method comprises the following steps:
the acquisition unit is used for acquiring target data prestored in a database of the big data platform according to a preset time period; the target data is obtained by data processing based on system related data of a service system acquired in real time;
the inspection unit is used for performing anomaly inspection on the target data by utilizing the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly inspection result; the algorithm library comprises an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
Optionally, the above apparatus, where the obtaining unit is specifically configured to, when the target data is stored in the database in advance:
collecting system related data of a service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform;
acquiring system related data stored in the data resource pool;
and carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
Optionally, in the apparatus described above, when the obtaining unit performs data processing on the system-related data to obtain target data, the obtaining unit is specifically configured to:
carrying out data quality inspection on the system related data;
after the system related data passes the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule;
performing data content fusion on the system related data after the data relation association;
and carrying out format standardization processing on the system related data after data content fusion to obtain target data of the system related data.
Compared with the prior art, the method has the following advantages:
the application provides a system anomaly detection method and device, a storage medium and an electronic device, comprising: the method comprises the steps of carrying out data processing on system related data of a service system acquired in real time in advance to obtain target data, storing the target into a database of a big data platform, obtaining the target data stored in the database of the big data platform in advance according to a preset time period, and carrying out anomaly detection on the target data by utilizing an expert rule and an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm included in an algorithm base to obtain an anomaly detection result. Therefore, according to the scheme, the big data platform and the service system to be detected are deployed in different servers, so that the service system is subjected to abnormity inspection in the big data platform, the load of the service system cannot be increased, and the target data is subjected to abnormity inspection through an expert rule, an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a big data platform provided in the present application;
FIG. 2 is a flowchart of a method for detecting system anomalies according to the present application;
FIG. 3 is a flowchart of another method of a system anomaly detection method provided herein;
FIG. 4 is a flowchart of another method of the system anomaly detection method of the present application;
FIG. 5 is a flowchart of another method of the system anomaly detection method of the present application;
FIG. 6 is a diagram illustrating an exemplary method for detecting system anomalies according to the present application;
FIG. 7 is a diagram illustrating a system anomaly detection method according to another embodiment of the present disclosure;
FIG. 8 is a diagram illustrating a system anomaly detection method according to another embodiment of the present disclosure;
FIG. 9 is a flowchart of another method of a system anomaly detection method provided herein;
FIG. 10 is a schematic diagram of a system anomaly detection device according to the present application;
fig. 11 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the disclosure of the present application are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.
It is noted that references to "a" or "an" in the disclosure are intended to be illustrative rather than limiting and that those skilled in the art will understand that reference to "one or more" unless the context clearly dictates otherwise.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.
The embodiment of the present application provides a big data platform, and a schematic structural diagram of the big data platform is shown in fig. 1, which specifically includes:
user interface, inspection processing engine, system management, report management, database, big data resource pool (namely the data resource pool mentioned above) and inspection configuration management.
The user interface is used for creating polling tasks, providing polling report (i.e. the above mentioned inspection report) subscription, query, display, feedback and tracking, and viewing statistical reports.
And the big data resource pool is used for storing basic monitoring acquisition data, transaction monitoring data, configuration data, trigger command acquisition data and operation state data.
The inspection configuration management comprises inspection task configuration, report configuration and evaluation method management, wherein the inspection task configuration comprises inspection object management, inspection item management, task setting management and task instance management, the report configuration comprises report template management and shielding policy management, and the evaluation method management comprises algorithm model management and expert rule management.
The inspection engine processing comprises data processing, inspection starting, inspection execution, task operation management and report generation cycle management, wherein the data processing comprises data quality verification, data relation association, data fusion processing and data standardization processing; the inspection starting comprises automatic triggering and manual triggering; the method comprises the steps of performing data processing-based result data and data included in inspection configuration management by inspection, performing algorithm inspection, rule inspection and inspection result integration, performing task operation management including task monitoring, exception handling and task scheduling, and performing report life cycle management including report generation, report notification, tracking and feedback processes.
The system management comprises user and authority management, personalized setting, system parameters, notification management and interface management.
And the report management user configures the statistical form.
The database is used for storing result data after data processing.
The embodiment of the application provides a system anomaly detection method, which can be applied to a big data platform, wherein the big data platform and a service system to be detected are deployed in different servers, and a flow chart of the system anomaly detection method is shown in fig. 2, and specifically comprises the following steps:
s201, acquiring target data prestored in a database of the big data platform according to a preset time period.
In this embodiment, target data pre-stored in a database of the big data platform is acquired according to a preset time period.
In this embodiment, target data is pre-stored in a database of the big data platform, where the target data is obtained by performing data processing on system-related data of a service system acquired in real time, where the system-related data includes, but is not limited to, related data such as performance of basic monitoring acquisition (e.g., CPU usage, memory usage, number of active processes, etc. of a server), logs (system error logs, application error logs, etc.), transaction-related index data such as system success rate, service response time, service processing time, transaction amount, etc. acquired by transaction monitoring, configuration data acquired by configuration management, data information such as job scheduling execution start time, job execution duration, etc. acquired by job management and control, and data acquired by a trigger command.
In this embodiment, referring to fig. 3, a process of pre-storing target data in a database specifically includes the following steps:
s301, collecting system related data of the service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform.
In this embodiment, the system-related data of the service system, that is, the data of the service system in the open system field, the network field, the environment field, and the application field, is collected in real time.
It should be noted that, in this embodiment, the acquisition of the system-related data of the service system is realized by the retrieval server, where the retrieval server may be an Elastic Search.
In this embodiment, the collected system-related data is stored in a data resource pool in the big data platform. Storage of system-related data across servers is achieved.
S302, obtaining system related data stored in the data resource pool.
In this embodiment, the system-related data stored in the data resource pool is obtained, and optionally, the system-related data stored in the data resource pool may be obtained according to a preset data processing period.
And S303, carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
In this embodiment, data processing is performed on system-related data to obtain target data. The data processing comprises data quality verification, data relation association, data content fusion and data format standardization.
Specifically, referring to fig. 4, the process of performing data processing on system-related data to obtain target data specifically includes the following steps:
s401, carrying out data quality inspection on the system related data.
In this embodiment, data quality verification is performed on the system-related data, specifically, whether a preset key field in the system-related data is complete or not is verified, and whether a data timestamp meets the requirement of checking timeliness or not is verified.
In this embodiment, it is determined whether the system-related data passes the data quality check, and if the system-related data does not pass the data quality check, the process is directly ended, and if the system-related data passes the data quality check, the step S402 is executed.
S402, after the system related data pass the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule.
In this embodiment, after it is determined that the system-related data passes the data quality verification, based on the data source of the system-related data, the data relation association is performed on each system-related data that passes the data quality verification according to a preset association rule, where the preset association rule is a rule set for human and can be modified as required.
The process of performing data relationship association on the data related to each system passing the data quality verification mentioned in step S402 is illustrated as follows:
the performance data of a certain server is obtained from the monitoring data, and the information of an application system to which the server belongs, the network storage information connected with the service and the like need to be associated through configuration information to complete data relation association.
It should be noted that, if the data relationship is missing, the corresponding system service data is determined as error data, and the flow is not continued.
And S403, performing data content fusion on the system related data after the data relation is correlated.
In this embodiment, after the data relationship association is completed, content fusion of different levels is performed on system related data of different data sources according to the inspection requirement, for example, for performing location-based inspection on a server, it is necessary that server data have a location information field, and for performing inspection on the server by the deployment unit, it is necessary that the server has a deployment unit field.
And S404, carrying out format standardization processing on the system related data after the data content is fused to obtain target data of the system related data.
In this embodiment, after the data content fusion is completed, format standardization processing is performed on the system-related data after the data content fusion to obtain target data, that is, the target data stored in the database is in a uniform format.
In the embodiment, the cross-server data storage is realized by collecting the system-related data of the service system in real time and storing the system-related data into the data resource pool, and the inspection efficiency of the subsequent abnormal inspection is improved by performing data quality verification, data relation association, data content fusion and format standardization processing on the system-related data in the data resource.
S202, carrying out anomaly check on the target data by utilizing the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly check result.
In this embodiment, the expert rules and the preset algorithms included in the algorithm library are used to perform anomaly detection on the target data to obtain an anomaly detection result, wherein the algorithm library includes, but is not limited to, an outlier algorithm, a trend prediction algorithm, a comparison algorithm, and an inverse rule algorithm. Namely, the target data is subjected to anomaly detection by using an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm, so that an anomaly detection result is obtained.
Referring to fig. 5, the process of performing anomaly check on the target data by using the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly check result specifically includes the following steps:
s501, carrying out rule inspection on the target data by using expert rules to obtain rule inspection results.
In this embodiment, the expert rules are used to perform rule polling on the target data, so as to obtain a rule polling result, where the expert rules include, but are not limited to, a static threshold, a dynamic threshold, and keyword comparison.
S502, performing outlier risk analysis on system indexes in the target data by using an outlier algorithm to obtain an outlier risk result of a server for deploying the service system.
In this embodiment, an outlier algorithm is used to perform outlier risk analysis on system indexes in target data, where the system indexes include service indexes and machine indexes, where the service indexes include, but are not limited to, a system success rate, a service success rate, service response time, service processing time, and transaction volume, and the machine indexes include, but are not limited to, a CPU usage rate and a memory usage rate.
In this embodiment, the system index in the target data is analyzed for the outlier risk based on the outlier threshold, wherein the outlier threshold is adjusted and set according to the training effect. Specifically, the system index in the target data is subjected to outlier risk analysis to obtain an analysis result, if the analysis result is greater than an outlier threshold, an outlier risk result representing that the server deploying the service system has an outlier risk is generated, and if the analysis result is not greater than the outlier threshold, an outlier risk result representing that the server deploying the service system does not have an outlier risk is generated.
S503, based on the target data, a trend prediction algorithm is used for predicting the system index reachable value of the service system, and based on the system index reachable value and a preset capacity threshold value, a first result of whether the system index of the service system is abnormal is obtained.
In this embodiment, the historical operation rule of the system index is learned, and based on the target data and the historical operation rule, the trend prediction algorithm is used to predict the reachable value of the system index of the service system, that is, predict the value that the system index may reach in the future time period. And if the system index reachable value is not greater than the preset capacity threshold value, generating a first result representing that the system index is abnormal.
For example, referring to fig. 6, fig. 6 is an operation trend graph of the memory usage of the server, and the index of the server is predicted to be at risk of exceeding the capacity threshold value in the future 30 days according to the trend, wherein an intersection of a dotted line with an arrow and the operation trend graph is the capacity threshold value, that is, the dotted line with the arrow is composed of the capacity threshold values in different time periods.
S504, extracting time sequence index features of the target data to obtain time sequence index features in the current time period, and comparing the time sequence index features in the current time period with the time sequence index features in the previous time period by using a comparison algorithm to obtain a comparison result.
In this embodiment, the time sequence index feature extraction is performed on the target data to obtain the time sequence index feature in the current time period, and specifically, an average value, a maximum value, or a minimum value of the system index included in each target data in the current time period is calculated.
In this embodiment, a comparison algorithm is used to compare the time sequence index characteristic in the current time period with the time sequence index characteristic in the previous time period, if a difference between the time sequence index characteristic in the current time period and the time sequence index characteristic in the previous time period is greater than a preset threshold, a comparison result representing that a system performance risk exists is generated, and if the difference between the time sequence index characteristic in the current time period and the time sequence index characteristic in the previous time period is not greater than the preset threshold, a comparison result representing that the system performance is normal is generated.
For example, referring to fig. 7, a in fig. 7 is an index trend graph of a previous time period, and b in fig. 7 is an index trend graph of a current time period, and it can be seen from the graphs that an average value of the system index of the current time period is greater than an average value of the system index of the previous time period and differs by more than a preset threshold, and therefore, there may be a situation of system performance abnormality.
And S505, identifying a system index mutation mode in the current time period based on the target data, and obtaining a second result of whether the system index mutation mode in the current time period is abnormal or not by utilizing an inverse rule algorithm based on the system index mutation mode in the current time period and a preset historical system index mutation mode.
In this embodiment, based on the target data, a system index mutation pattern of each system index in the current time period is identified, where the system index mutation pattern includes slow rise, slow fall, sudden increase, sudden decrease, hold after sudden increase, and hold after sudden decrease, for example, referring to fig. 8, a in fig. 8 is the slow rise system index mutation pattern, b in fig. 8 is the slow fall system index mutation pattern, c in fig. 8 is the sudden increase system index mutation pattern, d in fig. 8 is the sudden decrease system index mutation pattern, e in fig. 8 is the hold after sudden increase system index mutation pattern, and f in fig. 8 is the hold after sudden decrease system index mutation pattern.
In this embodiment, the historical system index mutation mode of each system index is preset based on historical data.
In this embodiment, for each system index, based on a system index mutation mode of the system index in a current time period, a preset historical system index mutation mode corresponding to the system index is obtained, whether the system index mutation mode is an inverse mode of the historical system index mutation mode is determined, if yes, a second result representing that the system index mutation mode in the current time period is abnormal is generated, and if not, a second result representing that the system index mutation mode in the current time period is not abnormal is generated.
The reverse mode for judging whether the system index mutation mode is the historical system index mutation mode is as follows: for example, if the sudden change pattern of the historical system index is a slow increase, and the sudden change pattern of the system index is a slow decrease, it is determined that the sudden change pattern of the system index is an inverse pattern of the sudden change pattern of the historical system index. That is, the slow rise and slow fall are in the opposite mode, the surge and surge are in the opposite mode, and the post-surge hold and post-surge hold are in the opposite mode.
S506, the rule inspection result, the outlier risk result, the first result, the comparison result and the second result form an abnormal inspection result of the business system.
In this embodiment, the rule inspection result, the outlier risk result, the first result, the comparison result, and the second result form an anomaly detection result of the service system.
The system anomaly detection method provided by the embodiment of the application carries out data processing on system related data of a service system acquired in real time in advance to obtain target data, stores the target into the database of the big data platform, thereby obtaining the target data stored in the database of the big data platform in advance according to a preset time period, and carries out anomaly detection on the target data by utilizing an expert rule and an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm which are included in an algorithm library to obtain an anomaly detection result. Therefore, according to the scheme, the big data platform and the service system to be detected are deployed in different servers, so that the service system is subjected to abnormity inspection in the big data platform, the load of the service system cannot be increased, and the target data is subjected to abnormity inspection through the expert rule, the outlier algorithm, the trend prediction algorithm, the comparison algorithm and the inverse rule algorithm, so that the abnormity inspection accuracy is improved compared with the simple script logic.
Referring to fig. 9, in the present embodiment, after step S102, the method for detecting system abnormality according to the embodiment of the present application may further include the following steps:
and S901, calling a preset report template based on the abnormal inspection result, and generating an inspection report of the service system.
In this embodiment, a report template is preset, the preset report template is called based on an anomaly detection result, and an inspection report of a service system is generated, specifically, attributes included in the report template are determined, the anomaly detection result is analyzed, a data item corresponding to each attribute is extracted, and the data item corresponding to each attribute is written into a data writing position corresponding to the attribute in the report template, so that the inspection report of the service system is obtained.
And S902, judging whether the abnormal checking result comprises a risk item, if so, executing S903, and if not, directly ending.
In this embodiment, it is determined whether the anomaly detection result includes a risk item, where the risk item is a data item included in the anomaly detection result and indicating that an abnormal wind direction exists.
In this embodiment, if the anomaly detection result does not include the risk item, the process is terminated directly.
And S903, pushing the risk items to a manager terminal, and generating processing tracking information of the risk items.
In this embodiment, if the anomaly detection result includes a risk item, the risk item is pushed to the administrator terminal, so that the administrator can process the risk item.
In this embodiment, after the risk items are pushed to the administrator terminal, processing tracking information of the risk items is generated to monitor the processing progress of the risk items.
According to the system abnormity detection method provided by the embodiment of the application, the report template is called to generate the inspection report of the business system, the generation efficiency of the inspection report is improved, the risk item is pushed to the administrator terminal, so that the administrator can process the wind direction item in time, and after the risk item is pushed to the administrator terminal, the processing tracking information of the risk item is generated, so that the processing progress of the risk item is monitored.
It should be noted that while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous.
It should be understood that the various steps recited in the method embodiments disclosed herein may be performed in a different order and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the disclosure is not limited in this respect.
Corresponding to the method described in fig. 1, an embodiment of the present application further provides a system anomaly detection apparatus, which is applied to a big data platform, and is used for implementing the method in fig. 1 specifically, where a schematic structural diagram of the apparatus is shown in fig. 10, and specifically includes:
an obtaining unit 1001, configured to obtain target data pre-stored in a database of the big data platform according to a preset time period; the target data is obtained by data processing based on system-related data of a service system acquired in real time;
the inspection unit 1002 is configured to perform anomaly inspection on the target data by using expert rules and preset algorithms included in the algorithm library to obtain an anomaly inspection result; the algorithm library comprises an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
In the system anomaly detection device provided by the embodiment of the application, because the big data platform and the service system to be detected are deployed in different servers, the anomaly detection is performed on the service system in the big data platform, the load of the service system cannot be increased, and the anomaly detection is performed on the target data through an expert rule, an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm, so that the accuracy of the anomaly detection is improved compared with simple script logic.
In an embodiment of the present application, based on the foregoing scheme, when the target data is stored in the database in advance, the obtaining unit 1001 is specifically configured to:
collecting system related data of a service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform;
acquiring system related data stored in the data resource pool;
and carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
In an embodiment of the present application, based on the foregoing scheme, when the obtaining unit 1001 performs data processing on the system-related data to obtain target data, specifically configured to:
performing data quality inspection on the system-related data;
after the system related data passes the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule;
performing data content fusion on the system related data after the data relation association;
and carrying out format standardization processing on the system related data after data content fusion to obtain target data of the system related data.
In an embodiment of the present application, based on the foregoing scheme, the inspection unit 1002 is specifically configured to:
carrying out rule inspection on the target data by using an expert rule to obtain a rule inspection result;
performing outlier risk analysis on system indexes in the target data by using an outlier algorithm to obtain an outlier risk result of a server deploying the service system;
based on the target data, a trend prediction algorithm is utilized to predict the system index reachable value of the service system, and based on the system index reachable value and a preset capacity threshold value, a first result of whether the system index of the service system is abnormal is obtained;
performing time sequence index feature extraction on the target data to obtain a time sequence index feature in the current time period, and comparing the time sequence index feature in the current time period with the time sequence index feature in the previous time period by using a comparison algorithm to obtain a comparison result;
identifying a system index mutation mode in the current time period based on the target data, and obtaining a second result whether the system index mutation mode in the current time period is abnormal or not by utilizing an inverse rule algorithm based on the system index mutation mode in the current time period and a preset historical system index mutation mode;
and forming the rule inspection result, the outlier risk result, the first result, the comparison result and the second result into an abnormal inspection result of the business system.
In an embodiment of the present application, based on the foregoing scheme, the method may further include:
the generating unit is used for calling a preset report template based on the abnormal checking result to generate a checking report of the service system;
the pushing unit is used for pushing the risk item to a manager terminal if the abnormal inspection result comprises the risk item, and generating processing tracking information of the risk item; wherein the wind direction item is a data item which is included in the abnormal inspection result and is used for representing that the abnormal risk exists.
The embodiment of the present application further provides a storage medium, where an instruction set is stored in the storage medium, and when the instruction set runs, the system anomaly detection method disclosed in any of the above embodiments is executed.
An electronic device is further provided in an embodiment of the present application, and a schematic structural diagram of the electronic device is shown in fig. 11, and specifically includes a memory 1101 for storing at least one set of instruction sets; a processor 1102 configured to execute the instruction set stored in the memory, and implement the system anomaly detection method disclosed in any of the above embodiments by executing the instruction set.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
While several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
The foregoing description is only illustrative of the preferred embodiments disclosed herein and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and (but not limited to) technical features having similar functions disclosed in the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A system anomaly detection method is applied to a big data platform, wherein the big data platform and a service system to be detected are deployed in different servers, and the method comprises the following steps:
acquiring target data prestored in a database of the big data platform according to a preset time period; the target data is obtained by data processing based on system-related data of a service system acquired in real time;
performing anomaly check on the target data by using each preset algorithm included in the expert rules and the algorithm library to obtain an anomaly check result; the algorithm library comprises an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
2. The method of claim 1, wherein the pre-storing of the target data into a database comprises:
collecting system related data of a service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform;
acquiring system related data stored in the data resource pool;
and carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
3. The method of claim 2, wherein the performing data processing on the system-related data to obtain target data comprises:
performing data quality inspection on the system-related data;
after the system related data passes the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule;
performing data content fusion on the system related data after the data relation association;
and carrying out format standardization processing on the system related data after the data content fusion to obtain target data of the system related data.
4. The method according to claim 1, wherein the performing an anomaly check on the target data by using the expert rules and each preset algorithm included in the algorithm library to obtain an anomaly check result comprises:
carrying out rule inspection on the target data by using an expert rule to obtain a rule inspection result;
performing outlier risk analysis on system indexes in the target data by using an outlier algorithm to obtain an outlier risk result of a server deploying the service system;
based on the target data, a trend prediction algorithm is utilized to predict the system index reachable value of the service system, and based on the system index reachable value and a preset capacity threshold value, a first result of whether the system index of the service system is abnormal is obtained;
performing time sequence index feature extraction on the target data to obtain a time sequence index feature in the current time period, and comparing the time sequence index feature in the current time period with the time sequence index feature in the previous time period by using a comparison algorithm to obtain a comparison result;
identifying a system index mutation mode in the current time period based on the target data, and obtaining a second result whether the system index mutation mode in the current time period is abnormal or not by utilizing an inverse rule algorithm based on the system index mutation mode in the current time period and a preset historical system index mutation mode;
and forming the rule inspection result, the outlier risk result, the first result, the comparison result and the second result into an abnormal inspection result of the business system.
5. The method according to claim 1, wherein the performing the anomaly check on the target data by using the expert rules and each preset algorithm included in the algorithm library further comprises, after obtaining an anomaly check result:
calling a preset report template based on the abnormal checking result to generate a checking report of the service system;
if the abnormal inspection result comprises a risk item, pushing the risk item to a manager terminal, and generating processing tracking information of the risk item; wherein the wind direction item is a data item which is included in the abnormal inspection result and is used for representing that the abnormal risk exists.
6. The system anomaly detection device is applied to a big data platform, wherein the big data platform and a service system to be detected are deployed in different servers, and the method comprises the following steps:
the acquisition unit is used for acquiring target data prestored in a database of the big data platform according to a preset time period; the target data is obtained by data processing based on system related data of a service system acquired in real time;
the inspection unit is used for carrying out abnormal inspection on the target data by utilizing the expert rules and each preset algorithm included in the algorithm library to obtain an abnormal inspection result; the algorithm library comprises an outlier algorithm, a trend prediction algorithm, a comparison algorithm and an inverse rule algorithm.
7. The apparatus according to claim 6, wherein the obtaining unit is specifically configured to, during a process of pre-storing the target data in the database:
collecting system related data of a service system in real time, and storing the collected system related data into a data resource pool of the big data resource platform;
acquiring system related data stored in the data resource pool;
and carrying out data processing on the system related data to obtain target data, and storing the target data into a database.
8. The apparatus according to claim 7, wherein the obtaining unit, when performing data processing on the system-related data to obtain target data, is specifically configured to:
performing data quality inspection on the system-related data;
after the system related data passes the data quality verification, performing data relation association on each system related data passing the data quality verification according to a preset association rule;
performing data content fusion on the system related data after the data relation association;
and carrying out format standardization processing on the system related data after the data content fusion to obtain target data of the system related data.
9. A storage medium storing a set of instructions, wherein the set of instructions, when executed by a processor, implement the system anomaly detection method according to any one of claims 1-5.
10. An electronic device, comprising:
a memory for storing at least one set of instructions;
a processor for executing a set of instructions stored in said memory, said set of instructions being executable to implement the system anomaly detection method of any one of claims 1-5.
CN202210722199.3A 2022-06-24 2022-06-24 System anomaly detection method and device, storage medium and electronic equipment Pending CN115098326A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210722199.3A CN115098326A (en) 2022-06-24 2022-06-24 System anomaly detection method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210722199.3A CN115098326A (en) 2022-06-24 2022-06-24 System anomaly detection method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115098326A true CN115098326A (en) 2022-09-23

Family

ID=83292859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210722199.3A Pending CN115098326A (en) 2022-06-24 2022-06-24 System anomaly detection method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115098326A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431341A (en) * 2023-03-30 2023-07-14 浙江大学 Resource specification adjustment method, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431341A (en) * 2023-03-30 2023-07-14 浙江大学 Resource specification adjustment method, device and storage medium

Similar Documents

Publication Publication Date Title
US10210036B2 (en) Time series metric data modeling and prediction
US10558545B2 (en) Multiple modeling paradigm for predictive analytics
US20170322120A1 (en) Fault detection using event-based predictive models
Jiang et al. Efficient fault detection and diagnosis in complex software systems with information-theoretic monitoring
JP2018045403A (en) Abnormality detection system and abnormality detection method
CN105183625A (en) Log data processing method and apparatus
Fu et al. A hybrid anomaly detection framework in cloud computing using one-class and two-class support vector machines
CN105743730A (en) Method and system used for providing real-time monitoring for webpage service of mobile terminal
US20210366268A1 (en) Automatic tuning of incident noise
US20150326446A1 (en) Automatic alert generation
US9489379B1 (en) Predicting data unavailability and data loss events in large database systems
US20180143897A1 (en) Determining idle testing periods
US10929258B1 (en) Method and system for model-based event-driven anomalous behavior detection
CN115421950B (en) Automatic system operation and maintenance management method and system based on machine learning
CN114398354A (en) Data monitoring method and device, electronic equipment and storage medium
US20190354991A1 (en) System and method for managing service requests
CN112988509A (en) Alarm message filtering method and device, electronic equipment and storage medium
CN115098326A (en) System anomaly detection method and device, storage medium and electronic equipment
KR20210083418A (en) Fire predictive analysis device and method of building
KR102372958B1 (en) Method and device for monitoring application performance in multi-cloud environment
CN113254153A (en) Process task processing method and device, computer equipment and storage medium
CN110677271B (en) Big data alarm method, device, equipment and storage medium based on ELK
CN116755974A (en) Cloud computing platform operation and maintenance method and device, electronic equipment and storage medium
WO2023224764A1 (en) Multi-modality root cause localization for cloud computing systems
US11102091B2 (en) Analyzing SCADA systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination