CN116701033A - Host switching abnormality detection method, device, computer equipment and storage medium - Google Patents

Host switching abnormality detection method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN116701033A
CN116701033A CN202310668698.3A CN202310668698A CN116701033A CN 116701033 A CN116701033 A CN 116701033A CN 202310668698 A CN202310668698 A CN 202310668698A CN 116701033 A CN116701033 A CN 116701033A
Authority
CN
China
Prior art keywords
anomaly
type
abnormality
data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310668698.3A
Other languages
Chinese (zh)
Inventor
闫美阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310668698.3A priority Critical patent/CN116701033A/en
Publication of CN116701033A publication Critical patent/CN116701033A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a host switching abnormality detection method, a device, computer equipment, a storage medium and a computer program product, relates to the technical field of intelligent operation and maintenance, and can be used in the financial and technological field or other fields. The method comprises the following steps: acquiring operation data of a target host in a host switching process; performing abnormality detection on the operation data by adopting a preset abnormality detection strategy to obtain an abnormality score corresponding to a preset abnormality type; based on the anomaly scores corresponding to the anomaly types, determining a target anomaly type from the anomaly types by adopting a preset attribution strategy, and obtaining a target anomaly detection result based on the target anomaly type. By adopting the method, the abnormality detection efficiency can be improved.

Description

Host switching abnormality detection method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of intelligent operation and maintenance technologies, and in particular, to a method and apparatus for detecting a host switching abnormality, a computer device, a storage medium, and a computer program product.
Background
In order to ensure the safety of client information and the continuity of enterprise business, a multi-center host architecture becomes a trend. Under a multi-center host architecture, the same-city disaster recovery mode and the different-place disaster recovery mode become main emergency measures for disaster occurrence, wherein host dual-activity switching is a core solution of host disaster recovery. In the process of switching the double activities of the host, the operation and running of various host software such as an operating system, a database, middleware, a replication link and the like are involved, if the running of certain software is problematic, the failure of switching the double activities of the host or the incapability of normally providing services after switching can be caused. Therefore, it is necessary to timely find out and solve the abnormal problem during the handover.
In the related art, operation and maintenance personnel usually check the running condition of each software involved in the host switching process one by one so as to judge whether an abnormality occurs. However, the efficiency of manually checking the abnormal problem is low, and the real-time requirement of the double-activity switching of the host computer is difficult to meet.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a host switching abnormality detection method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the abnormality detection efficiency.
In a first aspect, the present application provides a method for detecting a host switching abnormality. The method comprises the following steps:
acquiring operation data of a target host in a host switching process;
performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types;
and determining a target abnormality type from the abnormality types by adopting a preset attribution strategy based on the abnormality score corresponding to each abnormality type, and obtaining a target abnormality detection result based on the target abnormality type.
In one embodiment, the operational data includes data of a plurality of data types; performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types, wherein the anomaly scores comprise:
And aiming at the operation data of each data type, adopting an abnormality detection strategy corresponding to the data type to perform abnormality detection on the operation data of the data type to obtain an abnormality score corresponding to a preset abnormality type.
In one embodiment, the data type includes a functional type; the method comprises the steps of performing anomaly detection on the operation data of each data type by adopting an anomaly detection strategy corresponding to the data type to obtain an anomaly score corresponding to a preset anomaly type, wherein the anomaly score comprises the following steps:
judging the current state of each target software according to the software starting state information contained in the functional operation data; the current state comprises a starting state and an unactuated state;
and determining an anomaly score corresponding to the preset anomaly type according to the current state of each target software.
In one embodiment, the data types include performance types and log types; the method comprises the steps of performing anomaly detection on the operation data of each data type by adopting an anomaly detection strategy corresponding to the data type to obtain an anomaly score corresponding to a preset anomaly type, wherein the anomaly score comprises the following steps:
Inputting the performance type operation data into a pre-trained performance data anomaly detection model to obtain a first anomaly detection result;
inputting the log type operation data into a pre-trained log data anomaly detection model to obtain a second anomaly detection result;
and determining an abnormality score corresponding to a preset abnormality type based on the first abnormality detection result and the second abnormality detection result.
In one embodiment, the determining, based on the anomaly score corresponding to each anomaly type, the target anomaly type from each anomaly type by using a preset attribution policy includes:
sorting the abnormality types according to the abnormality score from large to small, and determining a first target number of candidate abnormality types with the previous sorting;
obtaining scores corresponding to the candidate anomaly types according to anomaly scores corresponding to the candidate anomaly types and target weights corresponding to the candidate anomaly types;
and sequencing the candidate abnormal types according to the scores from large to small, and determining a second target number of target abnormal types sequenced in front.
In one embodiment, the determining the target weight includes:
Acquiring comparison data of each abnormal type;
generating a comparison matrix of each anomaly type according to the comparison data of each anomaly type;
determining a first target comparison matrix meeting consistency check conditions in the comparison matrixes of the abnormal types;
and calculating the target weight of each anomaly type according to the first target comparison matrix and the weight calculation rule of the analytic hierarchy process.
In one embodiment, the operational data includes operational data over a plurality of detection periods; the anomaly score comprises anomaly scores corresponding to the anomaly types in the detection periods; the method further comprises the steps of after performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types:
for each detection period, sequencing the abnormality types in the detection period according to the abnormality score from large to small, and determining a first target number of candidate abnormality types sequenced in front;
obtaining a target index corresponding to each candidate abnormality type based on the abnormality score corresponding to each candidate abnormality type, wherein the target index comprises a functional index and a performance index;
Obtaining a correction index corresponding to the detection period according to the target index corresponding to each candidate abnormal type in the detection period;
sequencing the correction indexes corresponding to the detection periods from large to small, and determining a second target number of target correction indexes sequenced in front;
and determining each candidate abnormal type in the detection period corresponding to the target correction index as a type to be corrected, and outputting correction information based on the type to be corrected.
In a second aspect, the application further provides a device for detecting abnormal switching of the host. The device comprises:
the acquisition module is used for acquiring the operation data of the target host in the host switching process;
the detection module is used for carrying out abnormality detection on the operation data by adopting a preset abnormality detection strategy to obtain an abnormality score corresponding to a preset abnormality type;
the first determining module is used for determining a target abnormality type from the abnormality types by adopting a preset attribution strategy based on the abnormality score corresponding to each abnormality type, and obtaining a target abnormality detection result based on the target abnormality type.
In one embodiment, the operational data includes data of a plurality of data types; the detection module is specifically used for:
And aiming at the operation data of each data type, adopting an abnormality detection strategy corresponding to the data type to perform abnormality detection on the operation data of the data type to obtain an abnormality score corresponding to a preset abnormality type.
In one embodiment, the data type includes a functional type; the detection module is specifically used for:
judging the current state of each target software according to the software starting state information contained in the functional operation data; the current state comprises a starting state and an unactuated state; and determining an anomaly score corresponding to the preset anomaly type according to the current state of each target software.
In one embodiment, the data types include performance types and log types; the detection module is specifically used for:
inputting the performance type operation data into a pre-trained performance data anomaly detection model to obtain a first anomaly detection result; inputting the log type operation data into a pre-trained log data anomaly detection model to obtain a second anomaly detection result; and determining an abnormality score corresponding to a preset abnormality type based on the first abnormality detection result and the second abnormality detection result.
In one embodiment, the first determining module is specifically configured to:
sorting the abnormality types according to the abnormality score from large to small, and determining a first target number of candidate abnormality types with the previous sorting; obtaining scores corresponding to the candidate anomaly types according to anomaly scores corresponding to the candidate anomaly types and target weights corresponding to the candidate anomaly types; and sequencing the candidate abnormal types according to the scores from large to small, and determining a second target number of target abnormal types sequenced in front.
In one embodiment, the apparatus further includes a weight determining module, configured to obtain comparison data of each of the anomaly types; generating a comparison matrix of each anomaly type according to the comparison data of each anomaly type; determining a first target comparison matrix meeting consistency check conditions in the comparison matrixes of the abnormal types; and calculating the target weight of each anomaly type according to the first target comparison matrix and the weight calculation rule of the analytic hierarchy process.
In one embodiment, the operational data includes operational data over a plurality of detection periods; the anomaly score comprises anomaly scores corresponding to the anomaly types in the detection periods; the apparatus further comprises:
The second determining module is used for sequencing the abnormality types in the detection period according to the abnormality score from large to small for each detection period, and determining a first target number of candidate abnormality types sequenced in front;
the third determining module is used for obtaining a target index corresponding to each candidate abnormal type based on the abnormal score corresponding to each candidate abnormal type, wherein the target index comprises a functional index and a performance index;
a fourth determining module, configured to obtain a correction index corresponding to the detection period according to the target index corresponding to each candidate abnormality type in the detection period;
a fifth determining module, configured to sort the correction indexes corresponding to the detection periods from large to small, and determine a second target number of target correction indexes that are sorted in front;
and the output module is used for determining each candidate abnormal type in the detection period corresponding to the target correction index as a type to be corrected and outputting correction information based on the type to be corrected.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of the first aspect when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
According to the method, the device, the computer equipment, the storage medium and the computer program product for detecting the host switching abnormality, the operation data generated in the host switching process are acquired, the operation data are subjected to abnormality detection to obtain the abnormality scores corresponding to the different types, and then the target abnormality type (root cause problem) is determined from the abnormality types according to the abnormality scores of the different types, so that an abnormality detection result is obtained. Therefore, the abnormality detection can be performed in real time according to the operation data of the host in the host switching process, and the detection efficiency is improved.
In addition, as the operation of a plurality of software products is involved in the host double-activity switching scene, each software product has coupling and interaction, and the target abnormality type determined based on the abnormality score of each abnormality type can reflect the root cause problem of the host switching abnormality, the accuracy of the detection result obtained based on the target abnormality type is higher. Therefore, the accuracy of the detection result obtained by the method is higher, the method is favorable for assisting operation and maintenance personnel to quickly and accurately locate a problem source, and the operation and maintenance efficiency is improved, so that the success rate and the efficiency of double-activity switching of the host are improved.
Drawings
FIG. 1 is an application environment diagram of a host switching anomaly detection method in one example;
FIG. 2 is a flowchart of a method for detecting a host switching abnormality in one embodiment;
FIG. 3 is a flow diagram of determining anomaly scores in one embodiment;
FIG. 4 is a flow diagram of determining a target anomaly type in one embodiment;
FIG. 5 is a flowchart of a method for detecting a host switching abnormality in another embodiment;
FIG. 6 is a block diagram illustrating a host switching abnormality detection apparatus according to an embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
First, before the technical scheme of the embodiment of the present application is specifically described, a description is first given of a technical background or a technical evolution context on which the embodiment of the present application is based. In order to ensure the safety of client information and the continuity of enterprise business, a multi-center host architecture becomes a trend. Under a multi-center host architecture, the same-city disaster recovery mode and the different-place disaster recovery mode become main emergency measures for disaster occurrence, wherein the host dual-activity switching is a core solution of the host disaster recovery mode, and when the enterprise decision-making system software upgrading, the system hardware base replacement, the application updating and other changes are implemented, no influence on the dual-activity switching function is generally taken as a primary condition. In the process of switching the double activities of the host, the operation and running of various host software such as an operating system, a database, middleware, a replication link and the like are involved, if the running of certain software is problematic, the failure of switching the double activities of the host or the incapability of normally providing services after switching can be caused. Therefore, it is necessary to timely find out and solve the abnormal problem during the handover.
In the related art, operation and maintenance personnel usually check the running condition of each software involved in the host switching process one by one so as to judge whether an abnormality occurs. However, the efficiency of manually checking the abnormal problem is low, and the real-time requirement of the double-activity switching of the host computer is difficult to meet. Based on the background, the applicant provides the host switching abnormality detection method through long-term research and development and experimental verification, the abnormality detection can be performed in real time according to the operation data of the host in the host switching process, the detection efficiency is improved, the determined target abnormality type can reflect the root cause problem of the host switching abnormality, the operation and maintenance personnel can be assisted to quickly and accurately locate the problem source, and the operation and maintenance efficiency is improved, so that the success rate and the efficiency of the host double-activity switching are improved. In addition, the applicant has made a great deal of creative effort to find out the technical problems of the present application and to introduce the technical solutions of the following embodiments.
The method for detecting the host switching abnormality provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the primary data center 104 and the backup data center 106 may comprise a plurality of hosts, and the client 102 communicates with each host of the primary data center 104 and the backup data center 106, respectively, through a network. The client 102 may be implemented by various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices, servers, and other computer devices.
In one embodiment, as shown in fig. 2, a method for detecting a host switching abnormality is provided, and the method is applied to the client in fig. 1 for illustration, and includes the following steps:
step 201, in the host switching process, the operation data of the target host is obtained.
In implementation, under the dual-active disaster recovery scheme, the main data center and the standby data center generally bear the service of the user at the same time, and the two data centers are mutually backed up. The main data center and the standby data center can switch the double activities of the host according to the situation. The target host may be a host of a primary data center and/or a host of a backup data center. The running data refers to data generated during the running of the host, and may include functional data (or referred to as status functional data) such as a database startup status, a gateway startup status, a middleware startup status, a dual-active component startup status, an operating system startup status, etc., or may include performance type data such as a system CPU performance, a system memory performance, an online transaction amount, an online transaction response time, an online transaction success rate, a batch transaction process execution number, a batch transaction process execution efficiency, a database consumption CPU performance, a database consumption memory performance, a middleware consumption CPU performance, a middleware consumption memory performance, a dual-active component consumption CPU performance, a dual-active component consumption memory performance, an operating system allocation subsystem resource weight, etc., or may include performance type data such as a state system log corresponding to the above functional type data and/or a performance system log corresponding to the above performance type data.
In the host switching process, the client can acquire the operation data of the target host in real time. For example, a client may send a request to a host, which may return the requested data after receiving the request. In one example, the client may request host data by carrying a grab command through a post method of a request based on the host system IP, a login user, and a login password, and obtain the operation data of the host by taking the host response data through a response object. The configuration structure of the grab command may be "host data type+grab command corresponding to such type", for example, grab host log data, and the configuration structure is "system log+getlog". The client may send a request (or a grab command) to the host at a certain period, such as issuing a data grab command every 10 seconds for each host data type. The real-time grabbing of the host data can be realized through the data transmission mode, the requirement of detecting the timeliness of the double-activity switching of the host is met, the labor and time cost are saved, and the effectiveness of the double-activity switching function test is improved.
Optionally, in order to realize real-time efficient transmission of the grabbed data, the client may create a corresponding Theme (TOPIC) for each operation data according to the data type (function type, performance type, log type), for example, the TOPIC of the function type data is "status", the TOPIC of the performance type data is "log", and the TOPIC of the log type data is "performance"), store the operation data (store as hot spot data in the form of key-value), and use "{ data source topic_data type through KAFKA queues respectively: data content } "in the form of a message for asynchronous concurrent transmission.
Optionally, in order to improve high performance and data usability of data reading and writing, after the client obtains the operation data of the host, the client may store the data within three days to the cache module, where the cache module may be a master-three-slave high-availability cache node constructed with the redis memory database, and may put the data before three days to the persistent database storage module, where the persistent database storage module may be a master-three-slave architecture. And a zookeeper cluster server platform can be built to complete state monitoring on the three-slave cache equipment and the database, and the node state is sent to a master equipment node (such as the node where the client is located) in a form of weight parameters in real time. In order to improve the reading and writing efficiency and reduce the system pressure, a storage separation strategy can be constructed by configuring equipment parameters, the data writing process is put into a master equipment node, and the slave equipment node is designated by weight parameters to complete the data reading process. By using an asynchronous processing mechanism of the message channel, a client can be supported to acquire different types of data of various source domains of a host, the real-time capturing and consumption data capability of the device is realized, and the data delay caused by cross-platform transmission is reduced.
Step 202, performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types.
In implementation, multiple exception types (or referred to as exception problem types) may be set in advance according to experience, that is, exception types may occur in a host switching process, such as an operating system startup state exception, a database startup state exception, a gateway startup state exception, a middleware startup state exception, a dual-active component startup state exception, a system CPU performance exception, a system memory performance exception, an online transaction amount exception, an online transaction response time exception, an online transaction success rate exception, a batch transaction process execution number exception, a batch transaction process execution efficiency exception, a database consumption CPU performance exception, a database consumption memory performance exception, a middleware consumption CPU performance exception, a middleware consumption memory performance exception, a dual-active component consumption CPU performance exception, a dual-active component consumption memory performance exception, an operating system allocation each subsystem resource weight exception, and the like.
The client can adopt a preset abnormality detection strategy to perform abnormality detection on the operation data to obtain an abnormality score corresponding to a preset abnormality type. For example, the client may input the operation data to a machine learning model trained in advance to perform anomaly detection, obtain prediction probabilities corresponding to respective anomaly types, and use the prediction probabilities as anomaly scores.
Optionally, the historical operation data used for training the anomaly detection model may be data after being formatted, and corresponding formatting processing may be performed for operation data with different data types. For example, for functional data (identifying the running data containing "status" TOPIC in the key), the value of the functional data may be filtered into a format of "timestamp_software state" (UP or DOWN) using the fuzzy matching grammar "≡0-9 {13} _ [ U, P, D, O, W, N ] {2,4} $" of the re module in python, and the status of "UP" (active state) is set to "1" state, the status of "DOWN" (off state) is set to "0", and finally the data is integrated into a format specification, for example, the data format of the database status of which is active state under a certain timestamp is { status_database:1672974684470_1}.
For log-type data (data identifying that the key contains "log" TOPIC), the format of "timestamp_system name_product name_message content" may be used to filter its value with the fuzzy matching syntax of the re module in python "[ 0-9] {13} - [ \w\d ] {4} - [ a-Z ] {2,8} - (.+ -.) $" e.g., the relevant system log that the database state is the startup state at a certain timestamp is formatted as: { log_database:1672974684470_PY12_database_database is started successfully. For performance data (data identifying that "performance" TOPIC is contained in a key), the value of the performance data can be filtered into a format of "timestamp_performance index_performance data" by using fuzzy matching grammar of re modules in python "[ 0-9] {13} - [ A-Z ] {5,14} - [ _d ] {1,3} - [ _d ] {0,3} $", for example, the related performance data of which the database occupies 27M of system CPU resources under a certain timestamp is formatted into { performance_database:1672974684470_database CPU_27}.
Correspondingly, after the client acquires the operation data in the host switching process, the operation data can be subjected to corresponding formatting, and then the formatted data is input into an anomaly detection model to obtain anomaly scores of different types.
Step 203, determining a target anomaly type from the anomaly types by adopting a preset attribution strategy based on the anomaly scores corresponding to the anomaly types, and obtaining a target anomaly detection result based on the target anomaly type.
In implementation, the client may attribute the host switching exception based on the exception score corresponding to each exception type, and determine the target exception type from each exception type. The target exception type may reflect the root cause problem that caused the host switch exception. Because each software product of the host has product coupling and interaction, a plurality of host software are usually linked to be abnormal when an abnormality occurs at a certain place of the host, so that the target abnormality type of the root cause problem of the host switching abnormality is determined, the problem solving efficiency can be improved, and the problem investigation difficulty can be reduced. Then, the client can obtain a target abnormality detection result based on the target abnormality type, for example, can output identification information (name and the like) of the target abnormality type as the target abnormality detection result, and the operation and maintenance personnel can timely handle the abnormality problem according to the target abnormality detection result so as to ensure that host switching is smoothly performed.
In the method for detecting the host switching abnormality, the operation data generated in the host switching process is obtained to perform abnormality detection on the operation data to obtain the abnormality scores corresponding to the different types, and then the target abnormality type (the root abnormality type) is determined from the abnormality types according to the abnormality scores of the different types to obtain an abnormality detection result. Therefore, the abnormality detection can be performed in real time according to the operation data of the host in the host switching process, and the detection efficiency is improved. In addition, as the operation of a plurality of software products is involved in the host double-activity switching scene, each software product has coupling and interaction, and the target abnormality type determined based on the abnormality score of each abnormality type can reflect the root cause problem of the host switching abnormality, the accuracy of the detection result obtained based on the target abnormality type is higher. Therefore, the accuracy of the detection result obtained by the method is higher, the operation and maintenance personnel can be assisted to quickly and accurately locate the problem source, and the operation and maintenance efficiency is improved, so that the success rate and the efficiency of the double-activity switching of the host are improved.
In one embodiment, the operational data includes data of a plurality of data types. The process of detecting the abnormality of the operation data in step 202 to obtain the abnormality score specifically includes the following steps: and aiming at the operation data of each data type, adopting an abnormality detection strategy corresponding to the data type to perform abnormality detection on the operation data of the data type to obtain an abnormality score corresponding to the preset abnormality type.
In implementations, the client may send requests to the target host to obtain running data of multiple data types, such as functional data, performance data, log data, and so on. The corresponding abnormality detection policy may be set for each data type, so that the client may use the abnormality detection policy corresponding to the data type to perform abnormality detection on the operation data of the data type, to obtain an abnormality score corresponding to the preset abnormality type.
For example, corresponding anomaly detection models can be trained for different data types, for example, historical operation data which is extracted from a persistent database (MYSQL database) for 7 days and contains corresponding TOPICs in the key can be extracted, and then the historical operation data is input into the anomaly detection models for training; and then, verifying (or further training) the abnormality detection model by adopting on-line data (on-line data in a daily operation scene, namely, an off-host double-activity switching scene) in a normal state, so that parameter tuning is realized, model accuracy is improved, and a final abnormality detection model is obtained. The operation data captured in the double-activity switching scene of the host can also be used as test data for testing the model. Then, the client may input the operation data of each data type obtained in the host switching process to the anomaly detection model corresponding to each data type, to obtain anomaly scores of each anomaly type.
In this embodiment, by setting an anomaly detection policy matching with the characteristics of the data of different data types for the operation data of the different data types, the accuracy of anomaly detection can be improved.
In one embodiment, the data type of the operational data includes a functional type. The process of obtaining the anomaly score corresponding to the preset anomaly type in step 202 specifically includes the following steps: judging the current state of each target software according to the software starting state information contained in the functional operation data; the current state comprises a starting state and an unactuated state; and determining an anomaly score corresponding to the preset anomaly type according to the current state of each target software.
In implementation, for functional running data, the client may identify software start-up status information included in the running data, and if the running data includes status information "up" (or includes status information "1" after format processing), the client is in a start-up status, and if the running data includes status information "down" (or includes status information "0" after format processing), the client is in an inactive (off) status. For example, for the running data { status_database:1672974684470_1}, "status" is "TOPIC" of the running data, which may reflect that the data type of the piece of running data is functional, "database" is a software identifier, which indicates a database, "1672974684470" is a timestamp, "1" is software start-up status information, and indicates a start-up status. Thus, the client can identify the current state of each target software according to the operation data. Then, the client can determine an anomaly score corresponding to the preset anomaly type according to the current state of each target software. The target software refers to software involved in the host double-activity switching process, the target software is in a starting state in the host double-activity switching process, and if a certain software is in an un-starting state, the exception of host switching is indicated. Therefore, if the current state of a certain target software (such as a gateway) is in an inactive state, the anomaly score corresponding to the corresponding anomaly type (such as an anomaly in the gateway active state) may be set to a first score, which indicates that the anomaly probability is high, and for the target software (such as a database, etc.) whose current state is in an active state, the anomaly score corresponding to the corresponding anomaly type (such as an anomaly in the database active state) may be set to a second score, which indicates that the anomaly probability is low, and the first score may be greater than the second score, such as the first score being 100 and the second score being 0.
In this embodiment, for functional operation data (such as a database start state, a gateway start state, a middleware start state, a dual-active component start state, an operating system start state, etc.), an anomaly score of an anomaly type related to the software start state can be rapidly and accurately detected according to software start state information included in the operation data.
In one embodiment, the data types include performance types and log types. As shown in fig. 3, the process of obtaining the anomaly score corresponding to the preset anomaly type in step 202 specifically includes the following steps:
step 301, inputting the performance type operation data into a pre-trained performance data anomaly detection model to obtain a first anomaly detection result.
In implementation, for performance type operation data, such as system CPU performance, system memory performance, etc., the client may input the performance type operation data to the performance data anomaly detection model to obtain a first anomaly detection result. The performance data anomaly detection model may be a model constructed and trained based on a time series prediction algorithm. For example, the anomaly detection of the performance type operation data can be realized by adopting an isolated forest model, in the model training process, the history performance data of normal time sequence in 7 days before can be adopted as a training set, the time sequence performance data (daily host operation data, data in the non-host double-activity switching process) in the normal state is adopted as a verification set, and the time sequence performance data in the host double-activity switching process is adopted as a test set for training the performance data anomaly detection model. Whether the running performance of the target software is abnormal or not (such as abnormal number of batch transaction process execution, abnormal efficiency of batch transaction process execution, abnormal performance of the CPU consumed by the database, abnormal performance of the memory consumed by the database and the like) can be detected through the performance type data. The performance data abnormality detection model can perform abnormality detection on the input performance type operation data and output abnormality probabilities of preset abnormality types. Then, the client may obtain an anomaly score according to the anomaly probabilities of the anomaly types (e.g., the anomaly probability of the database consuming CPU performance anomaly is 0.8, and the corresponding anomaly score is 80).
Optionally, the performance data anomaly detection model corresponding to the performance data can be trained for each performance data, for example, when the training database consumes the anomaly detection model corresponding to the CPU performance data, the operation data such as "databaseCPU" can be extracted to construct a data set for training the model, and the structure of the operation data can be as follows: { performance_database: 1672974684470_database CPU_27}. The model can detect the anomaly probability of the database consuming CPU performance anomaly. In one example, the relevant parameters of the anomaly detection model for the databaseCPU performance data are set as follows:
random tree number (n_estimators) =100 (estimator number);
constructing a sub-sample number (max_samples) =auto (total sample is the sub-sample number, and does not perform downsampling process);
data anomaly ratio (accounting) =0.1 (anomaly data to total data ratio);
maximum feature (max_features) =1 (number of features used to train each evaluator);
using the ensable model in the sklearn library in python, the program statement is as follows:
model=IsolationForest(n_estimators=100,max_samples=auto,contamination=0.1,max_features=1);
calling a precision_function method to calculate the anomaly score of each data point according to the distance between the measured data and the data center vector,
Predictive probability score (prediction_score) =model.
And according to the prediction probability score, improving classification accuracy through continuous iteration (for example, the iteration number is set to be 100), and obtaining a trained performance data anomaly detection model for anomaly detection of CPU performance data consumed by the database.
Step 302, inputting the log-type operation data into a pre-trained log data anomaly detection model to obtain a second anomaly detection result.
In implementation, the log-type operation data generally includes data that can reflect the start-up state of the target software and data that can reflect the performance related to each target software, so that the log-data anomaly detection model can detect anomalies of the anomaly type related to the software state function and anomalies of the anomaly type related to the software performance simultaneously through the log-type operation data. Considering that the log data has the characteristics of large data set scale, various log contents and larger similarity of the same abnormal contents, a Word2Vec machine learning model which confirms meaning relation among Word senses by using characteristic vector mathematical relation can be adopted as a reference model to construct a log data abnormal detection model.
In one example, the parameter settings of the log data anomaly detection model are as follows:
The dimension of the eigenvector (num_features) =300 (the parameter specifies that the model extracts the eigenvalues from the input single log sense statement, if this value is large this will cause the model to suffer from over-fitting, if this value is small this will cause the model's detectability to be reduced
Word frequency threshold (min word count) =5 (word frequency less than set threshold will be ignored);
the number of parallel execution threads (num_works) =4 (acceleration model training rate);
window size (context) =10 (the maximum distance between the current word and the target word in the sentence is determined in conjunction with the context, 10 representing 10 words looking ahead of the current word).
In summary of the above parameters, this model is defined in the program as follows:
model=word2vec.Word2Vec(sentence,workers=num_workers,\
size=num_features,min_count=min_word_count,\
window=context)。
the log data anomaly detection model may process the input log type operation data, the state function or performance of the output target software is abnormal or normal, the anomaly score of the corresponding anomaly type may be set to a first score (e.g., 100) for the state function or performance in which the anomaly is detected, and the anomaly score of the corresponding anomaly type may be set to a second score (e.g., 0) for the state function or performance in which the anomaly is detected.
Optionally, for the abnormal state function or performance output by the log data abnormal detection model, the client may further perform abnormal keyword recognition on the log-type operation data in which the abnormality is detected. If a related exception occurs during the host switching process, a related field will occur in the log data (e.g. a system CPU performance exception, a "system CPU unusual" field will occur in the log), and the related field may be set as an exception key. If the client identifies the abnormal keyword corresponding to the abnormal type (abnormal output by the model) in the log data, the detection result output by the log data abnormal detection model can be used as a final second abnormal detection result, and if the client does not identify the abnormal keyword corresponding to the abnormal type in the log data, the detection result output by the log data abnormal detection model can be corrected (the abnormal type is corrected to be normal, and the corresponding abnormal score is set as a second score), so as to obtain the final second abnormal detection result.
Step 303, determining an anomaly score corresponding to the preset anomaly type based on the first anomaly detection result and the second anomaly detection result.
In an implementation, the first anomaly detection result may include an anomaly score of a performance-related anomaly type detected from the performance-type data, and the second anomaly detection result may include an anomaly score of a performance-related anomaly type and a functional state detected from the log-type data. For the anomaly type contained in both anomaly detection results, the client may use the greater anomaly score in both anomaly detection results as the anomaly score corresponding to the anomaly type. For example, if the abnormality score of the system CPU performance abnormality is 80 (the predicted abnormality probability is 0.8) based on the performance type data and the system CPU performance abnormality is detected (the abnormality score is 100) based on the log type data, the client may determine the score corresponding to the system CPU performance abnormality as 100. If the abnormality score of the database consumption CPU performance abnormality is detected as 40 (the predicted abnormality probability is 0.4) based on the performance type data, and the database consumption CPU performance abnormality is not detected (the detection is normal, the abnormality score is set as 0) based on the log type data, the client may determine the score corresponding to the database consumption CPU performance abnormality as 40. Thus, the anomaly score corresponding to each preset anomaly type can be obtained.
Optionally, the operation data may include functional data, performance data and log data at the same time, and anomaly detection may be performed on each type of data to obtain anomaly detection results corresponding to the three types of data, and the client may fuse (e.g. keep the highest score or perform weighted average) the anomaly detection results of the three types of data to obtain anomaly scores of each preset anomaly type.
In this embodiment, a performance data anomaly detection model is constructed by a time sequence prediction algorithm, anomaly detection is performed on performance data to obtain anomaly detection results of performance-related anomaly types, a log data anomaly detection model is constructed by a machine learning model for semantic analysis, anomaly detection is performed on log data to obtain anomaly detection results of performance and functional state correlations (all preset anomaly types), and thus the two types of detection results are fused to obtain anomaly scores of each preset anomaly type. The anomaly detection of the log type data is based on semantic detection, and can be corrected by combining with anomaly keyword recognition, so that the accuracy is high, and the score of the anomaly type of the log type data for detecting the anomaly can be set to be a high score (such as 100). However, since log-type data has a characteristic of delay, the real-time performance of performance-type data is good, and thus if an abnormality type is not detected in the log data, the detection result of the performance-type data (and the functional-type data) is in control. Thus, the accuracy of the anomaly score of the anomaly type obtained after fusion is higher. Moreover, the self-adaptive capacity of different time periods and different index types can be realized by using an unsupervised abnormal detection model, and the normal operation interval and the accurate positioning abnormal interval are fitted through a machine learning detection model without a threshold value, so that the pain point of mass false alarm of the traditional threshold value is effectively avoided.
In one embodiment, as shown in fig. 4, the process of determining the target anomaly type in step 203 specifically includes the following steps:
step 401, sorting the anomaly types according to the anomaly score from large to small, and determining a first target number of candidate anomaly types sorted in front.
In implementation, after the client obtains the anomaly scores of the anomaly types, the anomaly types can be ranked according to the magnitude of the anomaly scores, and the anomaly score before the ranking is larger. The top-ranked target number of exception types may then be determined as candidate exception types. The number of candidate anomaly types may be determined according to a preset rule, for example, the anomaly type with the largest anomaly score may be determined, if the largest anomaly score is greater than 90, the anomaly score may be greater than 90 (the same order of magnitude), or the number of anomaly types within a certain difference (e.g., a difference of 5 points) between the anomaly score and the largest anomaly score may be used as the target number, that is, the anomaly type with the anomaly score satisfying the condition is determined as the candidate anomaly type.
Step 402, obtaining scores corresponding to the candidate anomaly types according to the anomaly scores corresponding to the candidate anomaly types and the target weights corresponding to the candidate anomaly types.
In practice, the size of the target weight (or anomaly weight) may reflect the probability that the anomaly type is a causal anomaly, or the probability that the occurrence of the anomaly type will result in the occurrence of other anomalies. The target weights corresponding to the anomaly types may be determined in advance using a hierarchical analysis or other method. The client can multiply the anomaly score of each candidate anomaly type with the target weight to obtain the score corresponding to each candidate anomaly type. If the anomaly score of a anomaly type is 96 and its anomaly weight is 0.5, then the anomaly type score is 96×0.5=48.
Step 403, sorting the candidate exception types according to the scores from large to small, and determining a second target number of target exception types with the top sorting.
In implementation, the client may rank the candidate exception types according to their scores, where the scores of the candidate exception types are the largest. The client may take the top-ranked target number of candidate exception types as target exception types. For example, the candidate abnormality type with the largest score may be regarded as the target abnormality type, the number of targets is 1 if the score is one, and the number of targets is plural if the score is plural. I.e., the candidate abnormality type whose score satisfies the condition may be regarded as the target abnormality type.
It can be appreciated that during the host switching process (the whole switching process may last for tens of minutes), the client may acquire the operation data of the host in quasi-real time in a certain period (e.g. 10 seconds), so as to perform anomaly detection on the operation data in each period, and obtain the target anomaly type occurring in the period. If the target abnormality type detected by the client in the current period is the abnormality type detected for the first time in the switching process, the client can send out alarm information for prompting the host to have an abnormality (such as abnormal system CPU performance) corresponding to the abnormality type in the switching process, so that operation and maintenance personnel can process the abnormality in time. If the target abnormal type detected by the client in the current period is not the abnormal type detected for the first time in the switching process, the client can judge whether the time difference between the time of last sending the alarm information of the same target abnormal type and the current detection time meets the condition, if the time difference between the last alarm and the current detection is greater than the target duration (for example, 3 detection periods, namely, 30 seconds), the client can send the alarm information, and if the time difference is not less than or equal to the target duration, the client does not send the alarm information after the current detection. Therefore, the resource waste caused by frequent alarm information sending can be avoided, if the time difference is large, namely the abnormality is not repaired for a long time, the abnormality can appear again, or the operation and maintenance personnel do not process in time, so that the alarm information needs to be sent again to prompt the operation and maintenance personnel to process in time.
In this embodiment, the determined target anomaly type can reflect the root cause problem of the host switching anomaly by the anomaly score of the anomaly type and the target weight reflecting the probability that the anomaly type is the root cause anomaly, so that the accuracy of the anomaly detection result obtained based on the target anomaly type is higher, the quick and accurate positioning of the problem source by operation and maintenance personnel is facilitated, the operation and maintenance efficiency is improved, and the success rate and the efficiency of the host double-activity switching are improved.
In one embodiment, the target weights for each anomaly type may be determined by a hierarchical analysis, the determination comprising the steps of: acquiring comparison data of various types; generating a comparison matrix of each anomaly type according to the comparison data of each anomaly type; determining a first target comparison matrix meeting consistency check conditions in comparison matrices of different types; and calculating the target weight of each anomaly type according to the first target comparison matrix and the weight calculation rule of the analytic hierarchy process.
In practice, the comparison data for each anomaly type may be paired comparison data for a plurality of anomaly types reflecting a greater probability of which anomaly type is the root anomaly than any two anomaly types. In one example, the expert may fill out paired comparison data of a plurality of anomaly types by filling out a questionnaire, and the client may obtain electronic data corresponding to the paired comparison data. The client may then convert the comparison data for the plurality of anomaly types into a comparison matrix for the plurality of anomaly types. The elements contained in the comparison matrix may be assigned according to the Satty scale. Each element may represent a relative magnitude of probability that two anomaly types corresponding to the element are root cause anomalies. It will be appreciated that multiple sets of comparison data (e.g., data for multiple questionnaires) may be obtained by multiple experts to generate multiple comparison matrices. Because the obtained comparison data may have contradictory comparison results of some comparison factors, the client may perform consistency check on each comparison matrix distribution, and further determine a comparison matrix satisfying the consistency check condition, that is, the first target comparison matrix, from among the comparison matrices of the plurality of anomaly types. Then, the client may calculate a target weight (anomaly weight) for each anomaly type according to the first target comparison matrix and the weight calculation rule in the hierarchical analysis. Wherein the target weight may be calculated by the following formula:
Wherein n is the number of abnormal types; w (W) i Weights for the ith anomaly type; a, a ij In the comparison matrix, the degree of difference in the probability that the abnormality type i is the root cause abnormality as compared with the abnormality type j is represented.
In one example, the target weights (anomaly weights) for each anomaly type are shown in table 1. It is to be understood that the types and numbers of the abnormality types shown in table 1 are merely examples, and may be actually increased or decreased as needed. In addition, table 1 also shows abnormality keywords corresponding to each abnormality type, which can be used to correct the output result of the abnormality log data abnormality detection model.
TABLE 1
/>
In this embodiment, the target weights of the different types may be determined by an analytic hierarchy process, where the target weights may reflect the probability that the anomaly type is a root cause anomaly, so that the target anomaly type determined based on the target weights and the anomaly score may reflect the root cause anomaly, which is beneficial to improving the operation and maintenance efficiency.
In one embodiment, the operational data includes operational data over a plurality of detection periods, and the anomaly score includes an anomaly score corresponding to each anomaly type over each detection period. After obtaining the anomaly score of each anomaly type in step 202, the method may further include a correction type determining process, as shown in fig. 5, specifically including the following steps:
Step 501, for each detection period, sorting the anomaly types in the detection period according to the anomaly score from large to small, and determining a first target number of candidate anomaly types sorted in front.
In implementation, the client may acquire the operation data of the host in quasi-real time in a certain period (e.g., 10 seconds), so that abnormality detection may be performed on the operation data in each detection period, to obtain abnormality scores of different types in each detection period. The client may rank the anomaly scores of the anomaly types detected in each detection period from large to small, and determine the target number of candidate anomaly types (the candidate anomaly types whose anomaly scores satisfy the condition) ranked in front, and the specific process may be described in step 401.
Step 502, obtaining a target index corresponding to each candidate abnormality type based on the abnormality score corresponding to each candidate abnormality type, wherein the target index comprises a functional index and a performance index.
In implementation, for each detection period, the client may determine a target index for the candidate anomaly type based on the anomaly score for the candidate anomaly type determined during the detection period. For example, the anomaly score of the candidate anomaly type may be directly used as the target index, or the anomaly score of the target anomaly type (e.g., the target anomaly type in the detection period determined in step 203) in the candidate anomaly type may be subjected to increasing processing (e.g., multiplication by 2 and doubling), as the target index of the target anomaly type, the target index of the non-target anomaly type is the anomaly score. It will be appreciated that exception types may include functional state related exceptions (e.g., operating system startup state exceptions) and performance related exceptions (e.g., system CPU performance exceptions), and that the respective target indices include a functional index (corresponding to a functional state related exception type) and a performance index (corresponding to a performance related exception type). Thus, the target index of each candidate abnormality type in each detection period can be obtained.
Step 503, obtaining a correction index corresponding to the detection period according to the target index corresponding to each candidate abnormality type in the detection period.
In implementation, the client may calculate the correction index corresponding to each detection period according to the target index corresponding to each candidate abnormality type in each detection period, e.g., sum the target indexes corresponding to each candidate abnormality type to obtain the correction index. The correction index may reflect the degree to which an abnormality occurs during the host switch (which may be more of the time-consuming abnormality or more of the abnormality occurring).
Step 504, sorting the correction indexes corresponding to each detection period from large to small, and determining the second target number of target correction indexes with the first sorting.
In implementation, the client may sort the correction indexes corresponding to each detection period from large to small, and determine the target number (e.g., the first 3) of correction indexes sorted in front as the target correction indexes.
In step 505, each candidate abnormal type in the detection period corresponding to the target correction index is determined as a type to be corrected, and correction information is output based on the type to be corrected.
In implementation, the client may determine each candidate anomaly type in the detection period corresponding to the target correction index as the type to be corrected. For example, if j candidate abnormality types are detected in the ith detection period, a correction index is calculated according to the abnormality scores of the candidate abnormality types, the correction index is ranked first in all detection periods, and the client may use the j candidate abnormality types detected in the detection period as the types to be corrected. Then, the client can output correction information corresponding to the type to be corrected, and the correction information is used for reminding operation and maintenance personnel of correcting the abnormal problems related to the abnormal type. For the abnormal types related to performance, operation and maintenance personnel can be reminded to increase corresponding computing resources, and for the abnormal types related to functional states, the operation and maintenance personnel can be reminded to implement measures such as important attention, early warning, automatic emergency and the like so as to reduce the probability of abnormality in the next host double-activity switching process.
Optionally, the client may further sort the target indexes corresponding to the candidate abnormality types in all the detection periods, sort the function indexes and the performance indexes respectively, use the abnormality types corresponding to the previous target number (such as the first 3) of function indexes and performance indexes as the types to be corrected, combine (form a set) with the types to be corrected determined according to the target correction indexes, obtain the final types to be corrected, and output corresponding correction information.
In this embodiment, the type to be corrected is determined according to the anomaly score of the anomaly type in each detection period, and the type to be corrected can reflect the important anomaly type occurring in the current host dual-activity switching process, so as to output correction information for assisting operation and maintenance personnel in early warning or deploying related resources in advance, and thus the capability of compensating positioning can be realized, thereby providing traceability for an automatic emergency strategy in a host dual-activity switching scene, reducing the probability of occurrence of anomaly in the next host dual-activity switching, and improving the success rate and efficiency of subsequent host dual-activity switching.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a host switching abnormality detection device for realizing the host switching abnormality detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the host switching abnormality detection device or devices provided below may be referred to the limitation of the host switching abnormality detection method hereinabove, and will not be repeated here.
In one embodiment, as shown in fig. 6, there is provided a host switching abnormality detection apparatus 600 including: an acquisition module 601, a detection module 602, and a first determination module 603, wherein:
the obtaining module 601 is configured to obtain operation data of the target host in a host switching process.
The detection module 602 is configured to perform anomaly detection on the operation data by using a preset anomaly detection policy, so as to obtain an anomaly score corresponding to a preset anomaly type.
The first determining module 603 is configured to determine a target anomaly type from the anomaly types by using a preset attribution policy based on the anomaly scores corresponding to the anomaly types, and obtain a target anomaly detection result based on the target anomaly type.
In one embodiment, the operational data includes data of a plurality of data types. The detection module 602 is specifically configured to: and aiming at the operation data of each data type, adopting an abnormality detection strategy corresponding to the data type to perform abnormality detection on the operation data of the data type to obtain an abnormality score corresponding to the preset abnormality type.
In one embodiment, the data type includes a functional type. The detection module 602 is specifically configured to: judging the current state of each target software according to the software starting state information contained in the functional operation data; the current state comprises a starting state and an unactuated state; and determining an anomaly score corresponding to the preset anomaly type according to the current state of each target software.
In one embodiment, the data types include performance types and log types. The detection module 602 is specifically configured to: inputting the performance type operation data into a pre-trained performance data anomaly detection model to obtain a first anomaly detection result; inputting the log type operation data into a pre-trained log data anomaly detection model to obtain a second anomaly detection result; and determining an abnormality score corresponding to the preset abnormality type based on the first abnormality detection result and the second abnormality detection result.
In one embodiment, the first determining module 603 is specifically configured to: sorting the anomaly types according to the anomaly scores from large to small, and determining a first target number of candidate anomaly types with the top sorting; obtaining scores corresponding to the candidate anomaly types according to anomaly scores corresponding to the candidate anomaly types and target weights corresponding to the candidate anomaly types; and sequencing the candidate anomaly types according to the scores from large to small, and determining the second target number of target anomaly types sequenced in front.
In one embodiment, the apparatus further comprises a weight determination module for acquiring comparison data of each anomaly type; generating a comparison matrix of each anomaly type according to the comparison data of each anomaly type; determining a first target comparison matrix meeting consistency check conditions in comparison matrices of different types; and calculating the target weight of each anomaly type according to the first target comparison matrix and the weight calculation rule of the analytic hierarchy process.
In one embodiment, the operational data includes operational data over a plurality of detection periods, and the anomaly score includes an anomaly score corresponding to each anomaly type over each detection period. The device also comprises a second determining module, a third determining module, a fourth determining module, a fifth determining module and an output module, wherein:
And the second determining module is used for sequencing the anomaly types in the detection period according to the anomaly score from large to small for each detection period, and determining the first target number of candidate anomaly types sequenced in the front.
And the third determining module is used for obtaining a target index corresponding to each candidate abnormal type based on the abnormal score corresponding to each candidate abnormal type, wherein the target index comprises a functional index and a performance index.
And the fourth determining module is used for obtaining a correction index corresponding to the detection period according to the target index corresponding to each candidate abnormal type in the detection period.
And the fifth determining module is used for sequencing the correction indexes corresponding to each detection period from large to small, and determining the second target number of target correction indexes sequenced in front.
And the output module is used for determining each candidate abnormal type in the detection period corresponding to the target correction index as a type to be corrected and outputting correction information based on the type to be corrected.
The above-described respective modules in the host switching abnormality detection device may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a host switching anomaly detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The application provides a host switching abnormality detection method, a device, computer equipment, a storage medium and a computer program product, which relate to the technical field of intelligent operation and maintenance, can be used in the financial science and technology field or other fields, and are not limited in application field.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the application, which falls within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (11)

1. A host switching abnormality detection method, the method comprising:
acquiring operation data of a target host in a host switching process;
performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types;
and determining a target abnormality type from the abnormality types by adopting a preset attribution strategy based on the abnormality score corresponding to each abnormality type, and obtaining a target abnormality detection result based on the target abnormality type.
2. The method of claim 1, wherein the operational data comprises data of a plurality of data types; performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types, wherein the anomaly scores comprise:
and aiming at the operation data of each data type, adopting an abnormality detection strategy corresponding to the data type to perform abnormality detection on the operation data of the data type to obtain an abnormality score corresponding to a preset abnormality type.
3. The method of claim 2, wherein the data type comprises a functional type; the method comprises the steps of performing anomaly detection on the operation data of each data type by adopting an anomaly detection strategy corresponding to the data type to obtain an anomaly score corresponding to a preset anomaly type, wherein the anomaly score comprises the following steps:
judging the current state of each target software according to the software starting state information contained in the functional operation data; the current state comprises a starting state and an unactuated state;
and determining an anomaly score corresponding to the preset anomaly type according to the current state of each target software.
4. The method of claim 2, wherein the data types include performance types and log types; the method comprises the steps of performing anomaly detection on the operation data of each data type by adopting an anomaly detection strategy corresponding to the data type to obtain an anomaly score corresponding to a preset anomaly type, wherein the anomaly score comprises the following steps:
inputting the performance type operation data into a pre-trained performance data anomaly detection model to obtain a first anomaly detection result;
inputting the log type operation data into a pre-trained log data anomaly detection model to obtain a second anomaly detection result;
and determining an abnormality score corresponding to a preset abnormality type based on the first abnormality detection result and the second abnormality detection result.
5. The method of claim 1, wherein the determining, based on the anomaly score corresponding to each anomaly type, a target anomaly type from each anomaly type using a preset attribution policy comprises:
sorting the abnormality types according to the abnormality score from large to small, and determining a first target number of candidate abnormality types with the previous sorting;
Obtaining scores corresponding to the candidate anomaly types according to anomaly scores corresponding to the candidate anomaly types and target weights corresponding to the candidate anomaly types;
and sequencing the candidate abnormal types according to the scores from large to small, and determining a second target number of target abnormal types sequenced in front.
6. The method of claim 5, wherein the determining of the target weight comprises:
acquiring comparison data of each abnormal type;
generating a comparison matrix of each anomaly type according to the comparison data of each anomaly type;
determining a first target comparison matrix meeting consistency check conditions in the comparison matrixes of the abnormal types;
and calculating the target weight of each anomaly type according to the first target comparison matrix and the weight calculation rule of the analytic hierarchy process.
7. The method of claim 1, wherein the operational data comprises operational data over a plurality of detection periods; the anomaly score comprises anomaly scores corresponding to the anomaly types in the detection periods; the method further comprises the steps of after performing anomaly detection on the operation data by adopting a preset anomaly detection strategy to obtain anomaly scores corresponding to preset anomaly types:
For each detection period, sequencing the abnormality types in the detection period according to the abnormality score from large to small, and determining a first target number of candidate abnormality types sequenced in front;
obtaining a target index corresponding to each candidate abnormality type based on the abnormality score corresponding to each candidate abnormality type, wherein the target index comprises a functional index and a performance index;
obtaining a correction index corresponding to the detection period according to the target index corresponding to each candidate abnormal type in the detection period;
sequencing the correction indexes corresponding to the detection periods from large to small, and determining a second target number of target correction indexes sequenced in front;
and determining each candidate abnormal type in the detection period corresponding to the target correction index as a type to be corrected, and outputting correction information based on the type to be corrected.
8. A host switching abnormality detection apparatus, characterized by comprising:
the acquisition module is used for acquiring the operation data of the target host in the host switching process;
the detection module is used for carrying out abnormality detection on the operation data by adopting a preset abnormality detection strategy to obtain an abnormality score corresponding to a preset abnormality type;
The first determining module is used for determining a target abnormality type from the abnormality types by adopting a preset attribution strategy based on the abnormality score corresponding to each abnormality type, and obtaining a target abnormality detection result based on the target abnormality type.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310668698.3A 2023-06-07 2023-06-07 Host switching abnormality detection method, device, computer equipment and storage medium Pending CN116701033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310668698.3A CN116701033A (en) 2023-06-07 2023-06-07 Host switching abnormality detection method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310668698.3A CN116701033A (en) 2023-06-07 2023-06-07 Host switching abnormality detection method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116701033A true CN116701033A (en) 2023-09-05

Family

ID=87825164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310668698.3A Pending CN116701033A (en) 2023-06-07 2023-06-07 Host switching abnormality detection method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116701033A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150388A (en) * 2023-11-01 2023-12-01 江西现代职业技术学院 Abnormal state detection method and system for automobile chassis
CN117459370A (en) * 2023-12-26 2024-01-26 深圳鼎信通达股份有限公司 Main-standby competition method and system of single machine double main control boards, communication gateway and storage medium
CN117873408A (en) * 2024-03-11 2024-04-12 珠海芯烨电子科技有限公司 Cloud printer data recovery method and related device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150388A (en) * 2023-11-01 2023-12-01 江西现代职业技术学院 Abnormal state detection method and system for automobile chassis
CN117150388B (en) * 2023-11-01 2024-01-26 江西现代职业技术学院 Abnormal state detection method and system for automobile chassis
CN117459370A (en) * 2023-12-26 2024-01-26 深圳鼎信通达股份有限公司 Main-standby competition method and system of single machine double main control boards, communication gateway and storage medium
CN117459370B (en) * 2023-12-26 2024-03-15 深圳鼎信通达股份有限公司 Main-standby competition method and system of single machine double main control boards, communication gateway and storage medium
CN117873408A (en) * 2024-03-11 2024-04-12 珠海芯烨电子科技有限公司 Cloud printer data recovery method and related device
CN117873408B (en) * 2024-03-11 2024-05-31 珠海芯烨电子科技有限公司 Cloud printer data recovery method and related device

Similar Documents

Publication Publication Date Title
CN116701033A (en) Host switching abnormality detection method, device, computer equipment and storage medium
CN111639516B (en) Analysis platform based on machine learning
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US20210097343A1 (en) Method and apparatus for managing artificial intelligence systems
US20210241273A1 (en) Smart contract platform
KR101948634B1 (en) Failure prediction method of system resource for smart computing
US11720857B2 (en) Autonomous suggestion of issue request content in an issue tracking system
CN113515434B (en) Abnormality classification method, abnormality classification device, abnormality classification apparatus, and storage medium
US11797565B2 (en) Data validation using encode values
US11631031B2 (en) Automated model generation platform for recursive model building
US11669428B2 (en) Detection of matching datasets using encode values
CN111125529A (en) Product matching method and device, computer equipment and storage medium
WO2022001125A1 (en) Method, system and device for predicting storage failure in storage system
US11567824B2 (en) Restricting use of selected input in recovery from system failures
US20230004979A1 (en) Abnormal behavior detection method and apparatus, electronic device, and computer-readable storage medium
US20230308360A1 (en) Methods and systems for dynamic re-clustering of nodes in computer networks using machine learning models
CN117495538B (en) Risk assessment method and model training method for order financing
AU2021276239A1 (en) Identifying claim complexity by integrating supervised and unsupervised learning
US11874730B2 (en) Identifying log anomaly resolution from anomalous system logs
US20240004888A1 (en) Sorting method, apparatus and device, and computer storage medium
CN112764957A (en) Application fault delimiting method and device
US11513862B2 (en) System and method for state management of devices
Zhou et al. A grouping feature selection method based on feature interaction
US20240054509A1 (en) Intelligent shelfware prediction and system adoption assistant
CN117909797A (en) Electric power emergency capacity prediction method and device for electric power construction enterprises and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination