WO2019179248A1 - Anomaly detection method and device - Google Patents

Anomaly detection method and device Download PDF

Info

Publication number
WO2019179248A1
WO2019179248A1 PCT/CN2019/073880 CN2019073880W WO2019179248A1 WO 2019179248 A1 WO2019179248 A1 WO 2019179248A1 CN 2019073880 W CN2019073880 W CN 2019073880W WO 2019179248 A1 WO2019179248 A1 WO 2019179248A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
abnormal
training set
extended
detection model
Prior art date
Application number
PCT/CN2019/073880
Other languages
French (fr)
Chinese (zh)
Inventor
周扬
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019179248A1 publication Critical patent/WO2019179248A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems

Definitions

  • the present specification relates to the field of computer technology, and in particular, to an abnormality detecting method and apparatus.
  • Data processing systems need to cope with the ever-increasing amount of data, especially for systems that support multiple services.
  • Data processing systems usually require a certain scale of server collaboration to achieve large-scale data processing.
  • platforms are generally supported to support different services.
  • Each platform can include one or more servers. This leads to the system needing hundreds or even thousands of servers to support, the size of the server is very large.
  • the code, database and configuration of these servers will change very frequently. The number of changes per week may be tens of thousands or even more. Due to the negligence or error of any link, the platform may be faulty. Even the system is faulty.
  • the server may also be distributed in different regions, so the fault is difficult to locate, and the fault resolution time is too long, causing huge losses. Therefore, in the event of a system failure, the abnormality is accurately and timely identified, and the system can be used to stop bleeding and reduce losses in the shortest time.
  • the commonly used means is that the business-critical indicators calculated in minutes form a time series, and the faults are identified by identifying the abnormalities of the time series.
  • this method mainly relies on historical data when the system is running. Since the abnormality in the historical data of the system is usually small, it is not enough as the basis for fault identification. Therefore, the abnormality is generally identified by analyzing the laws in the normal data. The sample method is single, the fault identification is misjudged, and the missed rate is relatively high.
  • the present specification provides an abnormality detecting method and apparatus.
  • an embodiment of the present specification provides an abnormality detecting method.
  • the method includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • an embodiment of the present specification provides an abnormality detecting device, which is characterized in that: the device includes:
  • a first acquiring unit configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
  • a second acquiring unit configured to acquire abnormal data according to the pre-made rule
  • a looping unit configured to cyclically execute the following execution unit, the training unit, and the step of the second acquiring unit, until the recognition effect of the abnormality detecting model reaches an expectation, so that the abnormality detecting model is used to achieve the abnormality of the detected data by using the recognition effect Detection
  • the extension unit is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the training unit is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine a recognition effect of the abnormality detection model;
  • the second obtaining unit is further configured to: when the recognition effect of the abnormality detecting model is lower than expected, acquire new abnormal data according to the prefabricated rule.
  • an embodiment of the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the program The method steps of the aforementioned first aspect.
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the method of the first aspect described above.
  • a computer program product comprising instructions for causing a computer to perform the method of the first aspect described above when the instructions are run on a computer.
  • abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set.
  • the anomaly detection model performs the detection of the accuracy of the fault identification.
  • any of the embodiments of the present specification does not need to achieve all of the above effects.
  • FIG. 1 is a schematic diagram of an application scenario shown in an embodiment of the present specification
  • FIG. 2 is a schematic diagram of an abnormality detecting method according to an embodiment of the present specification
  • FIG. 3 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification.
  • FIG. 4 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification.
  • FIG. 5 is a schematic flow chart of an abnormality detecting method according to an embodiment of the present specification.
  • FIG. 6 is a schematic structural diagram of an abnormality detecting device according to an embodiment of the present specification.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present specification.
  • Data processing systems need to deal with ever-increasing amounts of data, especially for data processing systems that support multiple services.
  • Data processing systems usually need to achieve large-scale data processing through a certain scale of server collaboration.
  • platforms are generally supported to support different services.
  • Each platform can include one or more servers.
  • Ant Financial Services mainly involves hundreds of businesses such as convenience, wealth management, capital exchanges, and shopping and entertainment.
  • the number of platforms supporting these business systems is hundreds. Due to the sheer volume of platforms, changes to code, databases, and configuration can be very frequent, and changes that occur every week can be tens of thousands or more.
  • the number of actual failures is not frequent, and even some platforms have experienced failures, which leads to the coverage of abnormal data in the historical data sampled by Ant Financial in the running process.
  • the detection effect is not satisfactory.
  • the identified abnormal data is difficult to correspond with historical anomaly data, so it is difficult to analyze the root cause of abnormal data through historical data, which requires experienced technical personnel to judge. High cost and low efficiency.
  • the embodiment of the present specification provides an abnormality detecting method and apparatus.
  • the entity involved in the embodiment of the present specification includes: a data processing system 100 and a computer device 200.
  • the data processing system 100 may include a service server, a terminal, and the like.
  • the computer device 200 can be implemented independently of the data processing system 100 or by devices in the data processing system 100.
  • the functionality of the computer device 200 can be implemented by a service server in the service network 100.
  • the abnormality detection model is trained by the computer device 200, and the abnormality detection model of the data processing system 100 is abnormally detected by the trained abnormality detection model.
  • the computer device 200 updates the abnormal samples in the training set by acquiring abnormal data and extending the abnormal data, and training the abnormality detecting model according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training fails to meet the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set until the abnormal detection model is trained according to the updated training set. When the effect reaches the expected level, the training ends, and then, the abnormality detection model finally obtained through the training is used to perform abnormality detection on the data to be detected of the data processing system. Each time the training set is updated, the abnormal samples in the training set are increased, so that enough abnormal samples can be obtained as the basis for the abnormality detection.
  • the computer device 200 can quantize the acquired and extended abnormal data each time the training set is updated, so that after each update of the training set, the abnormal sample is increased by a specified number or 100%.
  • the abnormal sample of each training set update can be controlled by the parameter coverage of the abnormal sample.
  • the abnormal samples in the training set are updated, and then determining whether the parameter coverage of the abnormal samples in the updated training set reaches the expected value.
  • the anomaly detection model is trained according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training does not reach the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set, and the parameter coverage of the abnormal sample in the updated training set is ensured.
  • the rate is as expected until the recognition effect of the anomaly detection model trained according to the updated training set reaches the expected level, and the training ends.
  • the computer device 200 may also acquire an abnormal sample or extend the abnormal sample in combination with the recognition effect each time the training set is updated.
  • the manner of extending the abnormal sample may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, when the abnormal sample is extended, the focus may be increased. The amount of data or parameter coverage of the abnormal sample corresponding to the service.
  • the generation process of the abnormal sample (including the acquisition and extension of the abnormal data) can be regarded as an offensive closed loop, and the training abnormality detecting model according to the updated training set can be regarded as a defensive closed loop.
  • a sufficient number of abnormal samples can be obtained through the closed loop, and the anomaly detection model can be effectively trained through the defensive closed loop.
  • the attack and defense confrontation can effectively improve the recognition effect of the anomaly detection model.
  • the attack can be quantified by the parameter coverage or the amount of data of the abnormal sample, making the training anomaly detection model easier to iterate.
  • FIG. 5 is a schematic flowchart diagram of an abnormality detecting method according to an embodiment of the present disclosure. The method is applicable to a computer device, as shown in FIG. 5, the method includes steps 510-560:
  • Step 510 Acquire sampling data when the system is in normal operation, and use the sampled data as a normal sample in the training set.
  • the solution provided in this specification can periodically sample when the data processing system is in normal operation, and obtain sampling data during normal operation of the data processing system.
  • the data of the normal operation of the data processing system can be sampled every minute.
  • the sampled data of the acquired data processing system during normal operation is marked as a class as a normal sample in the training sample.
  • the sample data class when the data processing system is in normal operation is marked as "0"
  • the class "0" is used to indicate that the data marked by it is a normal sample.
  • the data of the normal operation of the data processing system includes one or more of call data, indicator data, change data, and operation and maintenance data.
  • the call data may include one or more of a call link, an interface name, an input parameter, an output parameter, and a call time-consuming.
  • the calling link can be a directed acyclic graph, the point is the calling interface, and the edge is the calling relationship.
  • the calling data may be for a call request.
  • the terminal invokes a request for a payment service.
  • the indicator data can be a key indicator of the data processing system, for example, the number of system calls for each service that can be aggregated in minutes in the form of time series.
  • Change data can trigger changes to actions such as code release and data processing system configuration modifications.
  • the operation and maintenance data can include hardware data. For example, CPU usage, network latency, memory usage, and more.
  • Step 520 Acquire abnormal data according to the pre-made rule.
  • the pre-made rule may be determined according to actual requirements. For example, the pre-made rule may sequentially generate a fault request for each service in the data processing system, so that the obtained abnormal sample corresponds to each service in the data processing system, and the coverage of the abnormal sample is high. .
  • the fault request may be generated according to the pre-made rule, and the context data of the fault request is obtained, and the context data of the fault request is added as an abnormal sample in the training set.
  • the context data of the fault request may be the running data of the collected data processing system after receiving the fault request.
  • the context data may include one or more of call data, indicator data, change data, and operation and maintenance data.
  • Steps 530-560 are executed cyclically until the recognition effect of the anomaly detection model is expected to be:
  • step 530 the abnormal data is extended, and the abnormal data and the extended abnormal data are added as abnormal samples in the training set.
  • the extension of the rules can be used to extend the anomaly data. Based on this, the abnormality detection data generated according to the prefabrication rule may be added or subtracted in the training set, and then the prefabricated rule is extended, and the extended fault request is generated according to the extended prefabrication rule, and the context data of the extended fault request is obtained, and The context data of the extended fault request is added as an exception sample in the training set.
  • the abnormality detection data generated according to the pre-made rule may first be added or subtracted in the training set, and then the following steps are performed cyclically until the parameter coverage of the abnormal sample in the training set reaches an expected value: the pre-made rule is performed Extending, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, adding context data of the extended fault request as an abnormal sample in the training set; and determining an abnormal sample in the training set The parameters cover whether the expected condition is reached. When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is used as the new pre-made rule. For example, determining whether the parameter of the abnormal sample in the training set covers whether the expected value can be achieved can be achieved by determining whether the abnormal sample in the training set is spread over each service, and whether the number of abnormal samples corresponding to each service reaches a threshold.
  • the recognition effect of the anomaly detection algorithm does not reach the expected value, and the expected value of the parameter coverage of the abnormal sample can be improved.
  • the extension of the prefabricated rules can be extended in conjunction with business rules or by tricks. For example, it can be extended in one or more of the following ways:
  • the intelligent fault extension for example, can use the context collected according to the fault request as a seed sample, and adopt a genetic algorithm to perform fault extension.
  • context data of the fault request can be marked as a class as an exception sample in the training sample.
  • the context data class of the fault request is marked as "1"
  • the class "1" is used to indicate that the data marked by it is an abnormal sample.
  • Step 540 Train the abnormality detection model according to the training set after adding the abnormal data, and determine the recognition effect of the abnormality detection model.
  • feature preprocessing can be performed on samples in the training set.
  • a variety of feature pre-processing methods can be employed herein to obtain features of one or more expressions of parameter expression, structural expression, indicator aggregation, and altered expression.
  • the feature of each feature expression form may correspond to one or more anomaly detection models, and the features of each feature expression form correspond to different anomaly detection models.
  • the corresponding anomaly detection models are trained according to the characteristics of each feature expression form.
  • the time series anomaly detection model is trained according to the indicator convergence feature;
  • the graph-based anomaly detection algorithm can be trained according to the characteristics of the structure expression;
  • the adjacent point-based, linear, subspace-based based training can be trained according to the parameter expression or the changed expression characteristics.
  • anomaly detection models based on supervised learning.
  • the recognition effect of the abnormality detecting model can be determined, and after the recognition effect is constant, the constant recognition effect is the recognition effect of the abnormality detecting model after the training.
  • the recognition effect can be expressed by one or more of recognition accuracy, recognition coverage, and KS value.
  • step 550 it is determined whether the recognition effect of the abnormality detection model is as expected.
  • the expectation may be a threshold corresponding to one or more of the identification accuracy, the recognition coverage, and the KS value, etc., for example, the prediction may be that the recognition accuracy is not less than 99.5%.
  • Step 560 When the recognition effect of the abnormality detection model is lower than expected, the new abnormal data is acquired according to the prefabrication rule.
  • the prefabricated rule in step 560 may be an extended prefabricated rule or an initial prefabricated rule, and the initial prefabricated rule may refer to a prefabricated rule in which no extension occurs.
  • the abnormality sample may be acquired in conjunction with the recognition effect or the abnormal sample may be extended.
  • the manner of extending the prefabrication rule may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, the extended prefabrication rule may focus on increasing the service for the service. The fault request is generated to obtain a richer abnormal sample corresponding to the service, thereby increasing the capability of the trained abnormality detecting model in identifying the data to be detected corresponding to the service.
  • Step 570 When the recognition effect of the abnormality detecting model reaches an expectation, the abnormality detecting model that uses the recognition effect to achieve the expected abnormality detecting model performs abnormality detection on the detected data.
  • the abnormality detection model when the data processing system receives the service processing request, the abnormality detection model may be triggered according to the recognition effect to perform the abnormality detection. After the abnormality detection is triggered, the data to be detected generated by the service processing request may be collected in real time or periodically.
  • the data to be detected includes one or more of call data, indicator data, change data, and operation and maintenance data.
  • the feature data may be preprocessed first.
  • a plurality of feature preprocessing methods may be used to obtain one or more of parameter expression, structure expression, indicator convergence, and change expression.
  • a characteristic of a form of expression may be used.
  • An anomaly detection model corresponding to the feature of each expression is used to identify whether the feature is abnormal.
  • the features of the same expression form correspond to a plurality of anomaly detection models, if the detection results obtained by the plurality of anomaly detection models are inconsistent, the feature may be determined by voting to determine whether the feature is abnormal.
  • abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set.
  • the anomaly detection model performs the detection of the accuracy of the fault identification.
  • the space for recognition effect is improved, and the basis for determining the root cause of the fault is provided, which helps the system to locate the problem faster. It can be detected at the system call link, parameter, system change level.
  • the context slice collected during fault injection can save the refined data, which can restore the situation when the system is faulty. It can be combined with multiple detailed data sources for identification. High performance, good recognition, and combined with refined data when locating faults.
  • the embodiment of the present specification further provides an abnormality detecting device.
  • the device may include:
  • the first obtaining unit 601 is configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
  • the second obtaining unit 602 is configured to acquire abnormal data according to the pre-made rule.
  • the looping unit 603 is configured to cyclically execute the following steps of performing the extending unit, the training unit, and the second acquiring unit until the recognition effect of the abnormality detecting model is expected to be used, so as to use the recognition effect to achieve the expected abnormality detecting model for the detected data. abnormal detection;
  • the extension unit 604 is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the training unit 605 is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine an identification effect of the abnormality detection model;
  • the second obtaining unit 602 is further configured to acquire new abnormal data according to the pre-made rule when the recognition effect of the abnormality detecting model is lower than expected.
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training unit 605 is specifically configured to
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the second obtaining unit 602 is specifically configured to generate a fault request according to the pre-made rule, and acquire context data of the fault request.
  • the extension unit 604 is specifically configured to: extend the pre-made rule, generate an extended fault request according to the extended pre-made rule, acquire context data of the extended fault request, and perform the fault request The context data and the context data of the extended fault request are added as abnormal samples in the training set.
  • extension unit 604 is specifically configured to:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • the embodiments of the present specification further provide a computer device including at least a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer device being implemented in the form of an anomaly detection server.
  • the processor executes the program, the foregoing abnormality detecting method is implemented.
  • the method at least includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training the anomaly detection model according to the training set includes:
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the obtaining the abnormal data according to the pre-made rule includes:
  • a fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
  • extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
  • the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • FIG. 7 shows a schematic diagram of a more specific computer device structure provided by an embodiment of the present specification.
  • the computer device may include a processor 710, a memory 720, an input/output interface 730, a communication interface 740, and a bus 750.
  • the processor 77, the memory 720, the input/output interface 730, and the communication interface 740 implement a communication connection between the devices via the bus 750.
  • the processor 710 can be implemented by using a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for performing correlation.
  • the program is implemented to implement the technical solutions provided by the embodiments of the present specification.
  • the memory 720 can be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like.
  • the memory 720 can store the operating system and other applications.
  • the technical solution provided by the embodiment of the present specification is implemented by software or firmware, the related program code is saved in the memory 720 and is called and executed by the processor 710.
  • the input/output interface 730 is used to connect an input/output module to implement information input and output.
  • the input/output/module can be configured as a component in the device (not shown) or externally connected to the device to provide the corresponding function.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, and the like, and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 740 is used to connect a communication module (not shown) to implement communication interaction between the device and other devices.
  • the communication module can communicate by wired means (such as USB, network cable, etc.), or can communicate by wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 750 includes a path for transferring information between various components of the device, such as processor 710, memory 720, input/output interface 730, and communication interface 740.
  • the above device only shows the processor 710, the memory 720, the input/output interface 730, the communication interface 740, and the bus 750, in a specific implementation, the device may also include necessary for normal operation. Other components.
  • the above-mentioned devices may also include only the components necessary for implementing the embodiments of the present specification, and do not necessarily include all the components shown in the drawings.
  • the embodiment of the present specification further provides a computer readable storage medium having stored thereon a computer program, which is implemented by the processor to implement the aforementioned abnormality detecting method.
  • the method at least includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training the anomaly detection model according to the training set includes:
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the obtaining the abnormal data according to the pre-made rule includes:
  • a fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
  • extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
  • the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present specification can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. Disks, optical disks, and the like, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the embodiments of the present specification or embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • the various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separated, and the functions of the modules may be the same in the implementation of the embodiments of the present specification. Or implemented in multiple software and/or hardware. It is also possible to select some or all of the modules according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.

Abstract

Disclosed are an anomaly detection method and device. The method comprises using, as a normal sample of a training set, data sampled during normal operation of a system; acquiring anomaly data; and cyclically performing the following steps until an expected identification effect of an anomaly detection model is achieved, so as to facilitate performing anomaly detection on data under detection by means of the anomaly detection model having achieved the expected identification effect: extending the anomaly data, and adding, as anomaly samples, the anomaly data and the extended anomaly data to the training set; training the anomaly detection model according to the training set, and determining the identification effect of the anomaly detection model; and when the identification effect of the anomaly detection model is worse than expected, acquiring new anomaly data. The method is adopted to acquire more anomaly samples, such that a training sample set having sufficient positive samples and negative samples is acquired with reference to normal samples, thereby improving the accuracy of fault identification performed by an anomaly detection model trained by the training set.

Description

异常检测方法及装置Abnormal detection method and device 技术领域Technical field
本说明书涉及计算机技术领域,尤其涉及一种异常检测方法及装置。The present specification relates to the field of computer technology, and in particular, to an abnormality detecting method and apparatus.
背景技术Background technique
随着科技的不断的发展,数据处理系统需要应对规模越来越大的数据量,尤其是对于支持多种业务的系统。数据处理系统通常需要一定规模的服务器协作来实现大规模的数据处理,对于提供多种业务的系统,一般还会分平台来为不同的业务提供支持,每个平台可以包括一个或多个服务器。这就导致系统需要成百上千甚至跟多的服务器来支持,服务器的规模非常庞大。在系统运行时,这些服务器的代码、数据库和配置等的变更会非常频繁,每周的变更可能就会成千上万甚至更多,由于任何一个环节的疏忽、错误,都可能引起平台故障,甚至系统故障。在解决故障时,因为系统规模庞大,服务器还可能分布在不同地区,所以故障难定位,故障解决时间过长,造成巨大损失。由此,在系统出现故障时,精准、及时的识别出异常,可以在最短的时间内帮助系统止血,降低损失。With the continuous development of technology, data processing systems need to cope with the ever-increasing amount of data, especially for systems that support multiple services. Data processing systems usually require a certain scale of server collaboration to achieve large-scale data processing. For systems that provide multiple services, platforms are generally supported to support different services. Each platform can include one or more servers. This leads to the system needing hundreds or even thousands of servers to support, the size of the server is very large. When the system is running, the code, database and configuration of these servers will change very frequently. The number of changes per week may be tens of thousands or even more. Due to the negligence or error of any link, the platform may be faulty. Even the system is faulty. When the fault is solved, because the system is large in scale, the server may also be distributed in different regions, so the fault is difficult to locate, and the fault resolution time is too long, causing huge losses. Therefore, in the event of a system failure, the abnormality is accurately and timely identified, and the system can be used to stop bleeding and reduce losses in the shortest time.
目前通常采用的手段是,按分钟计算的业务关键指标,形成时间序列,通过识别时间序列的异常,来识别出故障。但是,此种方式主要依赖于系统运行时的历史数据,由于系统运行时的历史数据中异常通常较少,不足以作为故障识别的依据,所以一般通过分析正常数据中的规律来识别异常,此种方式样本单一,故障识别误判、漏判率比较高。At present, the commonly used means is that the business-critical indicators calculated in minutes form a time series, and the faults are identified by identifying the abnormalities of the time series. However, this method mainly relies on historical data when the system is running. Since the abnormality in the historical data of the system is usually small, it is not enough as the basis for fault identification. Therefore, the abnormality is generally identified by analyzing the laws in the normal data. The sample method is single, the fault identification is misjudged, and the missed rate is relatively high.
发明内容Summary of the invention
针对上述技术问题,本说明书提供一种异常检测方法及装置。In view of the above technical problems, the present specification provides an abnormality detecting method and apparatus.
具体地,本说明书是通过如下技术方案实现的:Specifically, the present specification is implemented by the following technical solutions:
第一方面,本说明书实施例提供了一种异常检测方法。该方法包括:In a first aspect, an embodiment of the present specification provides an abnormality detecting method. The method includes:
获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;Obtaining sampling data when the system is in normal operation, and using the sampling data as a normal sample in the training set;
根据预制规则获取异常数据,循环执行如下步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测:Obtain the abnormal data according to the prefabrication rule, and perform the following steps cyclically until the recognition effect of the abnormality detection model reaches the expected value, so that the abnormality detection model that uses the recognition effect to achieve the expected abnormality detection model performs abnormality detection on the detected data:
对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在 所述训练集合中;Extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。When the recognition effect of the abnormality detecting model is lower than expected, new abnormal data is acquired according to the prefabricated rule.
第二方面,本说明书实施例提供了一种异常检测装置,其特征在于,该装置包括:In a second aspect, an embodiment of the present specification provides an abnormality detecting device, which is characterized in that: the device includes:
第一获取单元,用于获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;a first acquiring unit, configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
第二获取单元,用于根据预制规则获取异常数据;a second acquiring unit, configured to acquire abnormal data according to the pre-made rule;
循环单元,用于循环执行下述延伸单元、训练单元以及所述第二获取单元执行的步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测;a looping unit, configured to cyclically execute the following execution unit, the training unit, and the step of the second acquiring unit, until the recognition effect of the abnormality detecting model reaches an expectation, so that the abnormality detecting model is used to achieve the abnormality of the detected data by using the recognition effect Detection
所述延伸单元,用于对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;The extension unit is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
所述训练单元,用于根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The training unit is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine a recognition effect of the abnormality detection model;
所述第二获取单元还用于,当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。The second obtaining unit is further configured to: when the recognition effect of the abnormality detecting model is lower than expected, acquire new abnormal data according to the prefabricated rule.
第三方面,本说明书实施例提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现前述第一方面的方法步骤。In a third aspect, an embodiment of the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the program The method steps of the aforementioned first aspect.
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现上述第一方面所述的方法。In a fourth aspect, a computer readable storage medium is provided having stored thereon a computer program that, when executed by a processor, implements the method of the first aspect described above.
第五方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的方法。In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the method of the first aspect described above when the instructions are run on a computer.
通过本说明书实施例,可以获取异常数据,并对异常数据进行延伸,以此得到更多的异常样本,结合正常样本得到正样本和负样本都比较充足的训练集合,从而提高根据该训练集合训练的异常检测模型进行检测的故障识别的准确性。Through the embodiment of the present specification, abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set. The anomaly detection model performs the detection of the accuracy of the fault identification.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请本说明书实施例。The above general description and the following detailed description are merely exemplary and explanatory and are not intended to limit the embodiments.
此外,本申请本说明书实施例中的任一实施例并不需要达到上述的全部效果。Moreover, any of the embodiments of the present specification does not need to achieve all of the above effects.
附图说明DRAWINGS
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书实施例中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a few embodiments described in the embodiments of the present specification, and other drawings can be obtained from those skilled in the art based on these drawings.
图1是本说明书实施例示出的一种应用场景示意图;1 is a schematic diagram of an application scenario shown in an embodiment of the present specification;
图2是本说明书实施例示出的一种异常检测方法的示意图;2 is a schematic diagram of an abnormality detecting method according to an embodiment of the present specification;
图3是本说明书实施例示出的另一种异常检测方法的示意图;3 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification;
图4是本说明书实施例示出的另一种异常检测方法的示意图;4 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification;
图5是本说明书实施例示出的一种异常检测方法的流程示意图;FIG. 5 is a schematic flow chart of an abnormality detecting method according to an embodiment of the present specification; FIG.
图6是本说明书实施例示出的一种异常检测装置的结构示意图;6 is a schematic structural diagram of an abnormality detecting device according to an embodiment of the present specification;
图7是本说明书实施例示出的一种计算机设备的结构示意图。FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present specification.
具体实施方式detailed description
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Instead, they are merely examples of devices and methods consistent with aspects of the present specification as detailed in the appended claims.
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in the description is for the purpose of describing particular embodiments, and is not intended to The singular forms "a", "the" and "the" It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to a determination."
数据处理系统需要应对规模越来越大的数据量,尤其是对于支持多业务的数据处理 系统。数据处理系统通常需要通过一定规模的服务器协作来实现大规模的数据处理,对于支持多业务的系统,一般还会分平台来为不同的业务提供支持,每个平台可以包括一个或多个服务器。Data processing systems need to deal with ever-increasing amounts of data, especially for data processing systems that support multiple services. Data processing systems usually need to achieve large-scale data processing through a certain scale of server collaboration. For systems that support multiple services, platforms are generally supported to support different services. Each platform can include one or more servers.
以数据处理系统为蚂蚁金服业务数据处理系统为例进行说明。蚂蚁金服业务主要涉及便民生活、财富管理、资金往来以及购物娱乐等数百种业务,支撑这些业务系统平台数量就达数百。由于平台的庞大数量使得代码、数据库和配置等的变更会非常频繁,每周发生的变更可能会成千上万次甚至更多。但是,蚂蚁金服业务处理系统在运行过程中,实际出现故障的次数并不频繁,甚至仅仅部分平台出现过故障,这就导致蚂蚁金服在运行过程中采样的历史数据中,异常数据的覆盖面不够,使得使用历史数据作为依据进行异常检测时,检测效果不理想。另外,由于历史异常数据的匮乏,识别出的异常数据很难与历史异常数据中对应上,这样就很难通过历史数据来分析的出现异常数据的根因,需要经验丰富的技术人员来判断,成本高且效率低。Take the data processing system as an example of the Ant Financial Service Data Processing System. Ant Financial Services mainly involves hundreds of businesses such as convenience, wealth management, capital exchanges, and shopping and entertainment. The number of platforms supporting these business systems is hundreds. Due to the sheer volume of platforms, changes to code, databases, and configuration can be very frequent, and changes that occur every week can be tens of thousands or more. However, during the operation of the Ant Financial Service Processing System, the number of actual failures is not frequent, and even some platforms have experienced failures, which leads to the coverage of abnormal data in the historical data sampled by Ant Financial in the running process. Not enough, when the abnormality detection is performed based on the historical data, the detection effect is not satisfactory. In addition, due to the lack of historical anomaly data, the identified abnormal data is difficult to correspond with historical anomaly data, so it is difficult to analyze the root cause of abnormal data through historical data, which requires experienced technical personnel to judge. High cost and low efficiency.
针对以上问题,本说明书实施例提供一种异常检测方法及装置,下面首先对本说明书实施例方案的运行系统架构进行说明。参见图1所示,本说明书实施例方案涉及的实体包括:数据处理系统100以及计算机设备200,数据处理系统100中可以包括业务服务器以及终端等等。该计算机设备200可以独立于数据处理系统100,也可以通过数据处理系统100中的设备实现其功能,例如,计算机设备200的功能可以由业务网络100中的业务服务器实现。For the above problem, the embodiment of the present specification provides an abnormality detecting method and apparatus. First, the operating system architecture of the solution of the embodiment of the present specification is first described. Referring to FIG. 1, the entity involved in the embodiment of the present specification includes: a data processing system 100 and a computer device 200. The data processing system 100 may include a service server, a terminal, and the like. The computer device 200 can be implemented independently of the data processing system 100 or by devices in the data processing system 100. For example, the functionality of the computer device 200 can be implemented by a service server in the service network 100.
在本说明书实施例中,通过计算机设备200训练异常检测模型,并通过训练的异常检测模型对数据处理系统100的待检测数据进行异常检测。In the embodiment of the present specification, the abnormality detection model is trained by the computer device 200, and the abnormality detection model of the data processing system 100 is abnormally detected by the trained abnormality detection model.
结合图2所示,在一个示例中,计算机设备200通过获取异常数据并对异常数据进行延伸,以更新训练集合中的异常样本,在根据更新后的训练集合对异常检测模型进行训练时,若训练得到的异常检测模型的识别效果未达到预期,则继续获取异常数据,并对异常数据进行延伸,以更新训练集合中的异常样本,直至根据更新后的训练集合训练得到的异常检测模型的识别效果达到预期,则训练结束,进而,使用经过训练最终得到的异常检测模型对数据处理系统的待检测数据进行异常检测。其中,每次更新训练集合时,训练集合中的异常样本都会增加,以此可以获取足够多的异常样本来作为异常检测的依据。As shown in FIG. 2, in one example, the computer device 200 updates the abnormal samples in the training set by acquiring abnormal data and extending the abnormal data, and training the abnormality detecting model according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training fails to meet the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set until the abnormal detection model is trained according to the updated training set. When the effect reaches the expected level, the training ends, and then, the abnormality detection model finally obtained through the training is used to perform abnormality detection on the data to be detected of the data processing system. Each time the training set is updated, the abnormal samples in the training set are increased, so that enough abnormal samples can be obtained as the basis for the abnormality detection.
在另一示例中,结合图3所示,计算机设备200在每次更新训练集合时,可以量化获取以及延伸的异常数据,以实现训练集合每次更新后,异常样本增加指定数量或百分 百。例如,可以通过异常样本的参数覆盖率来控制每次训练集合更新的异常样本。In another example, as shown in FIG. 3, the computer device 200 can quantize the acquired and extended abnormal data each time the training set is updated, so that after each update of the training set, the abnormal sample is increased by a specified number or 100%. . For example, the abnormal sample of each training set update can be controlled by the parameter coverage of the abnormal sample.
基于此,在本说明书实施例中,首先,通过获取异常数据并对异常数据进行延伸,以更新训练集合中的异常样本,然后,确定更新后的训练集合中异常样本的参数覆盖率是否达到预期。Based on this, in the embodiment of the present specification, first, by acquiring abnormal data and extending the abnormal data, the abnormal samples in the training set are updated, and then determining whether the parameter coverage of the abnormal samples in the updated training set reaches the expected value. .
如果未达到,则继续对异常样本进行延伸。If not, continue to extend the exception sample.
若达到,则根据更新后的训练集合对异常检测模型进行训练。若训练得到的异常检测模型的识别效果未达到预期,则继续获取异常数据,并对异常数据进行延伸,以更新训练集合中的异常样本,并确保更新后的训练集合中的异常样本的参数覆盖率达到预期,直至根据更新后的训练集合训练得到的异常检测模型的识别效果达到预期,则训练结束。If so, the anomaly detection model is trained according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training does not reach the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set, and the parameter coverage of the abnormal sample in the updated training set is ensured. The rate is as expected until the recognition effect of the anomaly detection model trained according to the updated training set reaches the expected level, and the training ends.
在另一示例中,结合图4所示,计算机设备200在每次更新训练集合时,还可以结合识别效果来获取异常样本或者对异常样本进行延伸。在一个示例中,可以根据识别效果来调整延伸异常样本的方式,例如,训练后的异常检测模型对于某一业务对应的异常样本识别效果较差时,在后续延伸异常样本时,可以重点增加该业务对应的异常样本的数据量或参数覆盖率。In another example, as shown in FIG. 4, the computer device 200 may also acquire an abnormal sample or extend the abnormal sample in combination with the recognition effect each time the training set is updated. In an example, the manner of extending the abnormal sample may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, when the abnormal sample is extended, the focus may be increased. The amount of data or parameter coverage of the abnormal sample corresponding to the service.
在本说明书实施例中,异常样本的生成过程(包括异常数据的获取以及延伸)可以认为是进攻闭环,而根据更新后的训练集合训练异常检测模型可以认为是防守闭环。通过进攻闭环可以获得足够多的异常样本,通过防守闭环可以有效训练异常检测模型,通过攻防对抗可以有效的提高异常检测模型的识别效果。进一步地,通过异常样本的参数覆盖率或者数据量可以量化进攻,使得训练异常检测模型更容易迭代。In the embodiment of the present specification, the generation process of the abnormal sample (including the acquisition and extension of the abnormal data) can be regarded as an offensive closed loop, and the training abnormality detecting model according to the updated training set can be regarded as a defensive closed loop. A sufficient number of abnormal samples can be obtained through the closed loop, and the anomaly detection model can be effectively trained through the defensive closed loop. The attack and defense confrontation can effectively improve the recognition effect of the anomaly detection model. Further, the attack can be quantified by the parameter coverage or the amount of data of the abnormal sample, making the training anomaly detection model easier to iterate.
下面结合附图对本发明的实施例进行进一步地介绍。Embodiments of the present invention will be further described below with reference to the accompanying drawings.
图5为本说明书实施例提供的异常检测方法的流程示意图。该方法适用于计算机设备,如图5所示,该方法具包括步骤510-560:FIG. 5 is a schematic flowchart diagram of an abnormality detecting method according to an embodiment of the present disclosure. The method is applicable to a computer device, as shown in FIG. 5, the method includes steps 510-560:
步骤510,获取系统正常运行时的采样数据,将采样数据作为训练集合中的正常样本。Step 510: Acquire sampling data when the system is in normal operation, and use the sampled data as a normal sample in the training set.
本说明书所提供的方案,可以在数据处理系统正常运行时周期性进行采样,获取数据处理系统正常运行时的采样数据,例如,可以每分钟对数据处理系统正常运行的数据进行采样。然后,将获取到的数据处理系统正常运行时的采样数据标记为一类,作为训练样本中的正常样本。例如,数据处理系统正常运行时的采样数据类标为“0”,该类标“0”用于指示其标记的数据为正常样本。The solution provided in this specification can periodically sample when the data processing system is in normal operation, and obtain sampling data during normal operation of the data processing system. For example, the data of the normal operation of the data processing system can be sampled every minute. Then, the sampled data of the acquired data processing system during normal operation is marked as a class as a normal sample in the training sample. For example, the sample data class when the data processing system is in normal operation is marked as "0", and the class "0" is used to indicate that the data marked by it is a normal sample.
其中,在系统调用链路、参数、系统变更这个级别进行检测得到系统正常运行时的多种明细数据,根据该明细数据进行异常检测灵活性高,识别效果的理论上限高。这里数据处理系统正常运行的数据包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。Among them, in the system call link, parameter, system change level to detect a variety of detailed data when the system is in normal operation, according to the detailed data for anomaly detection flexibility, the theoretical upper limit of the recognition effect is high. Here, the data of the normal operation of the data processing system includes one or more of call data, indicator data, change data, and operation and maintenance data.
具体地,调用数据可以包括调用链路、接口名、入参、出参以及调用耗时等中的一项或多项。其中,调用链路可以为一个有向无环图,点是调用接口、边是调用关系。其中,该调用数据可以是针对调用请求来说的,例如,蚂蚁金服数据处理系统中,终端调用支付服务的请求。Specifically, the call data may include one or more of a call link, an interface name, an input parameter, an output parameter, and a call time-consuming. The calling link can be a directed acyclic graph, the point is the calling interface, and the edge is the calling relationship. The calling data may be for a call request. For example, in the ant service data processing system, the terminal invokes a request for a payment service.
指标数据可以为数据处理系统的关键指标,例如,可以是以时间序列的形式按分钟汇集的各个业务的系统调用量。The indicator data can be a key indicator of the data processing system, for example, the number of system calls for each service that can be aggregated in minutes in the form of time series.
变更数据可以为代码发布以及数据处理系统配置修改等操作触发变更的信息。Change data can trigger changes to actions such as code release and data processing system configuration modifications.
运维数据可以包括硬件数据。例如,CPU占用、网络延时以及内存占用等等。The operation and maintenance data can include hardware data. For example, CPU usage, network latency, memory usage, and more.
步骤520,根据预制规则获取异常数据。Step 520: Acquire abnormal data according to the pre-made rule.
其中,预制规则可以根据实际需求确定,例如,预制规则可以为依次生成对数据处理系统中各个业务的故障请求,以使得得到的异常样本对应数据处理系统中的各个业务,异常样本的覆盖率高。The pre-made rule may be determined according to actual requirements. For example, the pre-made rule may sequentially generate a fault request for each service in the data processing system, so that the obtained abnormal sample corresponds to each service in the data processing system, and the coverage of the abnormal sample is high. .
在本说明书实施例提供的方案中,可以根据预制规则生成故障请求,获取故障请求的上下文数据,将该故障请求的上下文数据作为异常样本增加在训练集合中。In the solution provided by the embodiment of the present disclosure, the fault request may be generated according to the pre-made rule, and the context data of the fault request is obtained, and the context data of the fault request is added as an abnormal sample in the training set.
其中,该故障请求的上下文数据可以为采集的数据处理系统在接收到故障请求后的运行数据。该上下文数据可以包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。The context data of the fault request may be the running data of the collected data processing system after receiving the fault request. The context data may include one or more of call data, indicator data, change data, and operation and maintenance data.
循环执行步骤530-560,直至异常检测模型的识别效果达到预期:Steps 530-560 are executed cyclically until the recognition effect of the anomaly detection model is expected to be:
步骤530,对异常数据进行延伸,将异常数据和延伸的异常数据作为异常样本增加在训练集合中。In step 530, the abnormal data is extended, and the abnormal data and the extended abnormal data are added as abnormal samples in the training set.
在一个示例中,可以通过规则的延伸,以实现异常数据的延伸。基于此,首先可以将根据预制规则生成的异常检测数据增减在训练集合中,然后对预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取延伸的故障请求的上下文数据,将延伸的故障请求的上下文数据作为异常样本增加在训练集合中。In one example, the extension of the rules can be used to extend the anomaly data. Based on this, the abnormality detection data generated according to the prefabrication rule may be added or subtracted in the training set, and then the prefabricated rule is extended, and the extended fault request is generated according to the extended prefabrication rule, and the context data of the extended fault request is obtained, and The context data of the extended fault request is added as an exception sample in the training set.
在另一个示例中,首先可以将根据预制规则生成的异常检测数据增减在训练集合中,然后循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取延伸的故障请求的上下文数据,将延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中;判断训练集合中的异常样本的参数覆盖了是否达到预期,当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。例如,判断训练集合中的异常样本的参数覆盖了是否达到预期可通过判断训练集合中的异常样本是否遍布各个业务,以及各个业务对应的异常样本的数量是否达到阈值来实现。In another example, the abnormality detection data generated according to the pre-made rule may first be added or subtracted in the training set, and then the following steps are performed cyclically until the parameter coverage of the abnormal sample in the training set reaches an expected value: the pre-made rule is performed Extending, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, adding context data of the extended fault request as an abnormal sample in the training set; and determining an abnormal sample in the training set The parameters cover whether the expected condition is reached. When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is used as the new pre-made rule. For example, determining whether the parameter of the abnormal sample in the training set covers whether the expected value can be achieved can be achieved by determining whether the abnormal sample in the training set is spread over each service, and whether the number of abnormal samples corresponding to each service reaches a threshold.
其中,当根据异常样本的参数覆盖率达到预期的训练集合训练异常检测算法时,该异常检测算法的识别效果未达到预期,此时可以提高异常样本的参数覆盖率的预期值。When the parameter coverage rate of the abnormal sample reaches the expected training set training anomaly detection algorithm, the recognition effect of the anomaly detection algorithm does not reach the expected value, and the expected value of the parameter coverage of the abnormal sample can be improved.
对预制规则的延伸,可以结合业务规则,或采用取巧的方式进行延伸。例如,可以通过以下一种或多种方式延伸:The extension of the prefabricated rules can be extended in conjunction with business rules or by tricks. For example, it can be extended in one or more of the following ways:
根据数据处理系统运行过程中的历史故障进行延伸;Extending according to historical faults in the operation of the data processing system;
根据故障请求同类型的历史故障进行延伸;Extending according to the same type of historical fault as the fault request;
根据用例库可能出现的故障进行延伸;Extend based on possible failures in the use case library;
智能故障延伸,例如,可以将根据故障请求采集的上下文作为种子样本,采用遗传算法来进行故障延伸。The intelligent fault extension, for example, can use the context collected according to the fault request as a seed sample, and adopt a genetic algorithm to perform fault extension.
另外,可以将故障请求的上下文数据标记为一类,作为训练样本中的异常样本。例如,故障请求的上下文数据类标为“1”,该类标“1”用于指示其标记的数据为异常样本。In addition, the context data of the fault request can be marked as a class as an exception sample in the training sample. For example, the context data class of the fault request is marked as "1", and the class "1" is used to indicate that the data marked by it is an abnormal sample.
步骤540,根据增加异常数据后的训练集合对异常检测模型进行训练,并确定异常检测模型的识别效果。Step 540: Train the abnormality detection model according to the training set after adding the abnormal data, and determine the recognition effect of the abnormality detection model.
本说明书所提供的方案,首先,可以对训练集合中的样本进行特征预处理。这里可以采用多种特征预处理方式,以得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征。每种特征表达形式的特征,可以对应一个或多个异常检测模型,每种特征表达形式的特征对应的异常检测模型不同。In the solution provided by this specification, first, feature preprocessing can be performed on samples in the training set. A variety of feature pre-processing methods can be employed herein to obtain features of one or more expressions of parameter expression, structural expression, indicator aggregation, and altered expression. The feature of each feature expression form may correspond to one or more anomaly detection models, and the features of each feature expression form correspond to different anomaly detection models.
然后,分别根据每种特征表达形式的特征训练其对应的异常检测模型。例如,根据指标汇聚特征训练时间序列异常检测模型;根据结构表达的特征可以训练基于图的异常 检测算法;根据参数表达或变更表达的特征,可以训练基于临近点的、线性的、基于子空间的和基于监督学习等的异常检测模型。Then, the corresponding anomaly detection models are trained according to the characteristics of each feature expression form. For example, the time series anomaly detection model is trained according to the indicator convergence feature; the graph-based anomaly detection algorithm can be trained according to the characteristics of the structure expression; the adjacent point-based, linear, subspace-based based training can be trained according to the parameter expression or the changed expression characteristics. And anomaly detection models based on supervised learning.
其中,在训练异常检测模型时,可以确定异常检测模型的识别效果,在识别效果恒定之后,该恒定的识别效果即为训练后的异常检测模型的识别效果。Wherein, when training the abnormality detecting model, the recognition effect of the abnormality detecting model can be determined, and after the recognition effect is constant, the constant recognition effect is the recognition effect of the abnormality detecting model after the training.
另外,识别效果可以通过识别准确率、识别覆盖率以及KS值等中的一项或多项表示。In addition, the recognition effect can be expressed by one or more of recognition accuracy, recognition coverage, and KS value.
步骤550,判断异常检测模型的识别效果是否达到预期。At step 550, it is determined whether the recognition effect of the abnormality detection model is as expected.
该预期可以为识别准确率、识别覆盖率以及KS值等中的一项或多项对应的阈值,例如,该预期可以为识别准确率不低于99.5%。The expectation may be a threshold corresponding to one or more of the identification accuracy, the recognition coverage, and the KS value, etc., for example, the prediction may be that the recognition accuracy is not less than 99.5%.
步骤560,当异常检测模型的识别效果低于预期时,根据预制规则获取新的异常数据。Step 560: When the recognition effect of the abnormality detection model is lower than expected, the new abnormal data is acquired according to the prefabrication rule.
在步骤560中的预制规则可以为延伸后的预制规则,也可以是初始预制规则,该初始预制规则可以是指未发生延伸的预制规则。The prefabricated rule in step 560 may be an extended prefabricated rule or an initial prefabricated rule, and the initial prefabricated rule may refer to a prefabricated rule in which no extension occurs.
另外,在每次更新训练集合时,还可以结合识别效果来获取异常样本或者对异常样本进行延伸。在一个示例中,可以根据识别效果来调整延伸预制规则的方式,例如,训练后的异常检测模型对于某一业务对应的异常样本识别效果较差时,延伸的预制规则可以重点增加针对该业务的故障请求的生成,以获取该业务对应的更丰富的异常样本,从而增加训练得到的异常检测模型在识别该业务对应的待检测数据时的能力。In addition, each time the training set is updated, the abnormality sample may be acquired in conjunction with the recognition effect or the abnormal sample may be extended. In an example, the manner of extending the prefabrication rule may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, the extended prefabrication rule may focus on increasing the service for the service. The fault request is generated to obtain a richer abnormal sample corresponding to the service, thereby increasing the capability of the trained abnormality detecting model in identifying the data to be detected corresponding to the service.
步骤570,当异常检测模型的识别效果达到预期时,使用识别效果达到预期的异常检测模型对待检测数据进行异常检测。Step 570: When the recognition effect of the abnormality detecting model reaches an expectation, the abnormality detecting model that uses the recognition effect to achieve the expected abnormality detecting model performs abnormality detection on the detected data.
在本说明书实施例中,可以在数据处理系统接收到业务处理请求时,触发根据识别效果达到预期的异常检测模型进行异常检测。在触发异常检测后,可以实时或周期性的采集业务处理请求产生的待检测数据。该待检测数据包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。In the embodiment of the present specification, when the data processing system receives the service processing request, the abnormality detection model may be triggered according to the recognition effect to perform the abnormality detection. After the abnormality detection is triggered, the data to be detected generated by the service processing request may be collected in real time or periodically. The data to be detected includes one or more of call data, indicator data, change data, and operation and maintenance data.
在利用异常检测模型检测待检测数据时,可以首先对待检测数据进行特征预处理,这里可以采用多种特征预处理方式,以得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征。When using the anomaly detection model to detect the data to be detected, the feature data may be preprocessed first. Here, a plurality of feature preprocessing methods may be used to obtain one or more of parameter expression, structure expression, indicator convergence, and change expression. A characteristic of a form of expression.
使用每种表达形式的特征对应的异常检测模型识别该特征是否异常。当同一种表达 形式的特征对应多个异常检测模型时,若该多个异常检测模型得到的检测结果不一致,可以通过投票的方式确定该特征是否异常。An anomaly detection model corresponding to the feature of each expression is used to identify whether the feature is abnormal. When the features of the same expression form correspond to a plurality of anomaly detection models, if the detection results obtained by the plurality of anomaly detection models are inconsistent, the feature may be determined by voting to determine whether the feature is abnormal.
通过本说明书实施例,可以获取异常数据,并对异常数据进行延伸,以此得到更多的异常样本,结合正常样本得到正样本和负样本都比较充足的训练集合,从而提高根据该训练集合训练的异常检测模型进行检测的故障识别的准确性。Through the embodiment of the present specification, abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set. The anomaly detection model performs the detection of the accuracy of the fault identification.
通过对抗式构建攻防闭环,量化攻击和防御效果,使得迭代良性循环,解决了异常检测迭代困难的问题。By constructing the offense and defense closed loop by confrontation, the attack and defense effects are quantified, and the benign loop is iterated, which solves the problem that the anomaly detection iteration is difficult.
通过数据精细化识别与定位,提升了识别效果的空间,同时也给确定故障根因提供了基础,更快帮助系统定位到问题。可以在系统调用链路、参数、系统变更这个级别进行检测,故障注入时采集的上下文切片可以保存精细化数据,可以较完整的还原系统故障时的情况,识别时融合多种明细数据源,灵活性高、识别效果好,同时在定位故障时,结合精细化数据。Through data refinement identification and positioning, the space for recognition effect is improved, and the basis for determining the root cause of the fault is provided, which helps the system to locate the problem faster. It can be detected at the system call link, parameter, system change level. The context slice collected during fault injection can save the refined data, which can restore the situation when the system is faulty. It can be combined with multiple detailed data sources for identification. High performance, good recognition, and combined with refined data when locating faults.
相应于上述方法实施例,本说明书实施例还提供一种异常检测装置,参见图6所示,该装置可以包括:Corresponding to the above method embodiment, the embodiment of the present specification further provides an abnormality detecting device. As shown in FIG. 6, the device may include:
第一获取单元601,用于获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;The first obtaining unit 601 is configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
第二获取单元602,用于根据预制规则获取异常数据。The second obtaining unit 602 is configured to acquire abnormal data according to the pre-made rule.
循环单元603,用于循环执行下述延伸单元、训练单元以及所述第二获取单元执行的步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测;The looping unit 603 is configured to cyclically execute the following steps of performing the extending unit, the training unit, and the second acquiring unit until the recognition effect of the abnormality detecting model is expected to be used, so as to use the recognition effect to achieve the expected abnormality detecting model for the detected data. abnormal detection;
所述延伸单元604,用于对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;The extension unit 604 is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
所述训练单元605,用于根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The training unit 605 is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine an identification effect of the abnormality detection model;
第二获取单元602还用于,当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。The second obtaining unit 602 is further configured to acquire new abnormal data according to the pre-made rule when the recognition effect of the abnormality detecting model is lower than expected.
在一个示例中,所述训练集合中的样本包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。In one example, the samples in the training set include one or more of call data, metric data, change data, and operational data.
在另一个示例中,训练单元605具体用于,In another example, the training unit 605 is specifically configured to
将所述训练集合中的样本进行特征预处理,得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征,其中,每种表达形式对应一个或多个异常检测模型;Performing feature pre-processing on the samples in the training set to obtain features of one or more expression forms of parameter expression, structure expression, index convergence, and altered expression, wherein each expression form corresponds to one or more abnormality detections model;
分别根据每种表达形式的特征训练对应的异常检测模型。The corresponding anomaly detection model is trained according to the characteristics of each expression form.
在另一个示例中,第二获取单元602具体用于,根据预制规则生成故障请求,获取所述故障请求的上下文数据。In another example, the second obtaining unit 602 is specifically configured to generate a fault request according to the pre-made rule, and acquire context data of the fault request.
在另一个示例中,延伸单元604具体用于,对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中。In another example, the extension unit 604 is specifically configured to: extend the pre-made rule, generate an extended fault request according to the extended pre-made rule, acquire context data of the extended fault request, and perform the fault request The context data and the context data of the extended fault request are added as abnormal samples in the training set.
在另一个示例中,延伸单元604具体用于:In another example, the extension unit 604 is specifically configured to:
循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:The loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中;Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is taken as the new pre-made rule.
述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For details of the implementation process of the functions and functions of the modules in the device, refer to the implementation process of the corresponding steps in the foregoing method, and details are not described herein again.
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,该计算机设备可以实现为异常检测服务器的形式。其中,处理器执行所述程序时实现前述的异常检测方法。该方法至少包括:The embodiments of the present specification further provide a computer device including at least a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer device being implemented in the form of an anomaly detection server. Wherein, when the processor executes the program, the foregoing abnormality detecting method is implemented. The method at least includes:
获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;Obtaining sampling data when the system is in normal operation, and using the sampling data as a normal sample in the training set;
根据预制规则获取异常数据,循环执行如下步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测:Obtain the abnormal data according to the prefabrication rule, and perform the following steps cyclically until the recognition effect of the abnormality detection model reaches the expected value, so that the abnormality detection model that uses the recognition effect to achieve the expected abnormality detection model performs abnormality detection on the detected data:
对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;Extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。When the recognition effect of the abnormality detecting model is lower than expected, new abnormal data is acquired according to the prefabricated rule.
在一个示例中,所述训练集合中的样本包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。In one example, the samples in the training set include one or more of call data, metric data, change data, and operational data.
在另一个示例中,所述根据所述训练集合对所述异常检测模型进行训练包括:In another example, the training the anomaly detection model according to the training set includes:
将所述训练集合中的样本进行特征预处理,得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征,其中,每种表达形式对应一个或多个异常检测模型;Performing feature pre-processing on the samples in the training set to obtain features of one or more expression forms of parameter expression, structure expression, index convergence, and altered expression, wherein each expression form corresponds to one or more abnormality detections model;
分别根据每种表达形式的特征训练对应的异常检测模型。The corresponding anomaly detection model is trained according to the characteristics of each expression form.
在另一个示例中,所述根据预制规则获取异常数据包括:In another example, the obtaining the abnormal data according to the pre-made rule includes:
根据预制规则生成故障请求,获取所述故障请求的上下文数据。A fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
在另一个示例中,对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中包括:In another example, extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中。Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
在另一个示例中,所述对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据包括:In another example, the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:The loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中;Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is taken as the new pre-made rule.
图7示出了本说明书实施例所提供的一种更为具体的计算机设备结构示意图, 该计算机设备可以包括:处理器710、存储器720、输入/输出接口730、通信接口740和总线750。其中处理器77、存储器720、输入/输出接口730和通信接口740通过总线750实现彼此之间在设备内部的通信连接。FIG. 7 shows a schematic diagram of a more specific computer device structure provided by an embodiment of the present specification. The computer device may include a processor 710, a memory 720, an input/output interface 730, a communication interface 740, and a bus 750. The processor 77, the memory 720, the input/output interface 730, and the communication interface 740 implement a communication connection between the devices via the bus 750.
处理器710可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。The processor 710 can be implemented by using a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for performing correlation. The program is implemented to implement the technical solutions provided by the embodiments of the present specification.
存储器720可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器720可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器720中,并由处理器710来调用执行。The memory 720 can be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 720 can store the operating system and other applications. When the technical solution provided by the embodiment of the present specification is implemented by software or firmware, the related program code is saved in the memory 720 and is called and executed by the processor 710.
输入/输出接口730用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 730 is used to connect an input/output module to implement information input and output. The input/output/module can be configured as a component in the device (not shown) or externally connected to the device to provide the corresponding function. The input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, and the like, and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
通信接口740用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The communication interface 740 is used to connect a communication module (not shown) to implement communication interaction between the device and other devices. The communication module can communicate by wired means (such as USB, network cable, etc.), or can communicate by wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线750包括一通路,在设备的各个组件(例如处理器710、存储器720、输入/输出接口730和通信接口740)之间传输信息。 Bus 750 includes a path for transferring information between various components of the device, such as processor 710, memory 720, input/output interface 730, and communication interface 740.
需要说明的是,尽管上述设备仅示出了处理器710、存储器720、输入/输出接口730、通信接口740以及总线750,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above device only shows the processor 710, the memory 720, the input/output interface 730, the communication interface 740, and the bus 750, in a specific implementation, the device may also include necessary for normal operation. Other components. In addition, it will be understood by those skilled in the art that the above-mentioned devices may also include only the components necessary for implementing the embodiments of the present specification, and do not necessarily include all the components shown in the drawings.
本说明书实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述的异常检测方法。该方法至少包括:The embodiment of the present specification further provides a computer readable storage medium having stored thereon a computer program, which is implemented by the processor to implement the aforementioned abnormality detecting method. The method at least includes:
获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;Obtaining sampling data when the system is in normal operation, and using the sampling data as a normal sample in the training set;
根据预制规则获取异常数据,循环执行如下步骤,直至异常检测模型的识别效 果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测:Obtain the abnormal data according to the prefabrication rule, and perform the following steps cyclically until the recognition effect of the abnormality detection model reaches the expected value, so that the abnormality detection model that uses the recognition effect to achieve the expected abnormality detection model performs abnormality detection on the detected data:
对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;Extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。When the recognition effect of the abnormality detecting model is lower than expected, new abnormal data is acquired according to the prefabricated rule.
在一个示例中,所述训练集合中的样本包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。In one example, the samples in the training set include one or more of call data, metric data, change data, and operational data.
在另一个示例中,所述根据所述训练集合对所述异常检测模型进行训练包括:In another example, the training the anomaly detection model according to the training set includes:
将所述训练集合中的样本进行特征预处理,得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征,其中,每种表达形式对应一个或多个异常检测模型;Performing feature pre-processing on the samples in the training set to obtain features of one or more expression forms of parameter expression, structure expression, index convergence, and altered expression, wherein each expression form corresponds to one or more abnormality detections model;
分别根据每种表达形式的特征训练对应的异常检测模型。The corresponding anomaly detection model is trained according to the characteristics of each expression form.
在另一个示例中,所述根据预制规则获取异常数据包括:In another example, the obtaining the abnormal data according to the pre-made rule includes:
根据预制规则生成故障请求,获取所述故障请求的上下文数据。A fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
在另一个示例中,对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中包括:In another example, extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中。Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
在另一个示例中,所述对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据包括:In another example, the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:The loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中;Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is taken as the new pre-made rule.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。It can be clearly understood by those skilled in the art that the embodiments of the present specification can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. Disks, optical disks, and the like, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the embodiments of the present specification or embodiments.
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。The system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control. A combination of a tablet, a tablet, a wearable device, or any of these devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment. The device embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separated, and the functions of the modules may be the same in the implementation of the embodiments of the present specification. Or implemented in multiple software and/or hardware. It is also possible to select some or all of the modules according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。The above is only a specific embodiment of the embodiments of the present specification, and it should be noted that those skilled in the art can make some improvements and refinements without departing from the principles of the embodiments of the present specification. Improvements and retouching should also be considered as protection of embodiments of the present specification.

Claims (13)

  1. 一种异常检测方法,其特征在于,所述方法包括:An abnormality detecting method, characterized in that the method comprises:
    获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;Obtaining sampling data when the system is in normal operation, and using the sampling data as a normal sample in the training set;
    根据预制规则获取异常数据,循环执行如下步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测:Obtain the abnormal data according to the prefabrication rule, and perform the following steps cyclically until the recognition effect of the abnormality detection model reaches the expected value, so that the abnormality detection model that uses the recognition effect to achieve the expected abnormality detection model performs abnormality detection on the detected data:
    对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;Extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
    根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
    当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。When the recognition effect of the abnormality detecting model is lower than expected, new abnormal data is acquired according to the prefabricated rule.
  2. 根据权利要求1所述的方法,其特征在于,所述训练集合中的样本包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。The method according to claim 1, wherein the samples in the training set comprise one or more of call data, indicator data, change data, and operation and maintenance data.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述训练集合对所述异常检测模型进行训练包括:The method according to claim 2, wherein the training the abnormality detection model according to the training set comprises:
    将所述训练集合中的样本进行特征预处理,得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征,其中,每种表达形式对应一个或多个异常检测模型;Performing feature pre-processing on the samples in the training set to obtain features of one or more expression forms of parameter expression, structure expression, index convergence, and altered expression, wherein each expression form corresponds to one or more abnormality detections model;
    分别根据每种表达形式的特征训练对应的异常检测模型。The corresponding anomaly detection model is trained according to the characteristics of each expression form.
  4. 根据权利要求1所述的方法,其特征在于,所述根据预制规则获取异常数据包括:The method according to claim 1, wherein the obtaining the abnormal data according to the prefabrication rule comprises:
    根据预制规则生成故障请求,获取所述故障请求的上下文数据。A fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
  5. 根据权利要求4所述的方法,其特征在于,对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中包括:The method according to claim 4, wherein extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set comprises:
    对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中。Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
  6. 根据权利要求5所述的方法,其特征在于,所述对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据包括:The method according to claim 5, wherein the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request comprises:
    循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:The loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
    对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的 上下文数据作为异常样本增加在所述训练集合中;Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
    当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is taken as the new pre-made rule.
  7. 一种异常检测装置,其特征在于,所述装置包括:An abnormality detecting device, characterized in that the device comprises:
    第一获取单元,用于获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;a first acquiring unit, configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
    第二获取单元,用于根据预制规则获取异常数据;a second acquiring unit, configured to acquire abnormal data according to the pre-made rule;
    循环单元,用于循环执行下述延伸单元、训练单元以及所述第二获取单元执行的步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测;a looping unit, configured to cyclically execute the following execution unit, the training unit, and the step of the second acquiring unit, until the recognition effect of the abnormality detecting model reaches an expectation, so that the abnormality detecting model is used to achieve the abnormality of the detected data by using the recognition effect Detection
    所述延伸单元,用于对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;The extension unit is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
    所述训练单元,用于根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The training unit is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine a recognition effect of the abnormality detection model;
    所述第二获取单元还用于,当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。The second obtaining unit is further configured to: when the recognition effect of the abnormality detecting model is lower than expected, acquire new abnormal data according to the prefabricated rule.
  8. 根据权利要求7所述的装置,其特征在于,所述训练集合中的样本包括调用数据、指标数据、变更数据以及运维数据中的一种或多种。The apparatus according to claim 7, wherein the samples in the training set comprise one or more of call data, indicator data, change data, and operation and maintenance data.
  9. 根据权利要求8所述的装置,其特征在于,所述训练单元具体用于,The apparatus according to claim 8, wherein said training unit is specifically configured to:
    将所述训练集合中的样本进行特征预处理,得到参数表达、结构表达、指标汇聚以及变更表达中的一种或多种表达形式的特征,其中,每种表达形式对应一个或多个异常检测模型;Performing feature pre-processing on the samples in the training set to obtain features of one or more expression forms of parameter expression, structure expression, index convergence, and altered expression, wherein each expression form corresponds to one or more abnormality detections model;
    分别根据每种表达形式的特征训练对应的异常检测模型。The corresponding anomaly detection model is trained according to the characteristics of each expression form.
  10. 根据权利要求7所述的装置,其特征在于,所述第二获取单元具体用于,根据预制规则生成故障请求,获取所述故障请求的上下文数据。The apparatus according to claim 7, wherein the second obtaining unit is specifically configured to: generate a fault request according to the pre-made rule, and acquire context data of the fault request.
  11. 根据权利要求10所述的装置,其特征在于,所述延伸单元具体用于,对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中。The apparatus according to claim 10, wherein the extension unit is configured to: extend the pre-made rule, generate an extended fault request according to the extended pre-made rule, and acquire a context of the extended fault request. Data, the context data of the fault request and the context data of the extended fault request are added as abnormal samples in the training set.
  12. 根据权利要求11所述的装置,其特征在于,所述延伸单元具体用于:The device according to claim 11, wherein the extension unit is specifically configured to:
    循环执行如下步骤,直至训练集合中的异常样本的参数覆盖率达到预期:The loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
    对所述预制规则进行延伸,根据延伸后的预制规则生成延伸的故障请求,获取所述延伸的故障请求的上下文数据,将所述故障请求的上下文数据和所述延伸的故障请求的上下文数据作为异常样本增加在所述训练集合中;Extending the pre-made rule, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
    当训练集合中的异常样本的参数覆盖率未达到预期时,将延伸后的预制规作为新的预制规则。When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is taken as the new pre-made rule.
  13. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现以下步骤:A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor performs the following steps when executing the program:
    获取系统正常运行时的采样数据,将所述采样数据作为训练集合中的正常样本;Obtaining sampling data when the system is in normal operation, and using the sampling data as a normal sample in the training set;
    根据预制规则获取异常数据,循环执行如下步骤,直至异常检测模型的识别效果达到预期,以便使用识别效果达到预期的异常检测模型对待检测数据进行异常检测:Obtain the abnormal data according to the prefabrication rule, and perform the following steps cyclically until the recognition effect of the abnormality detection model reaches the expected value, so that the abnormality detection model that uses the recognition effect to achieve the expected abnormality detection model performs abnormality detection on the detected data:
    对所述异常数据进行延伸,将所述异常数据和延伸的异常数据作为异常样本增加在所述训练集合中;Extending the abnormal data, adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
    根据增加异常数据后的训练集合对所述异常检测模型进行训练,并确定所述异常检测模型的识别效果;The abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
    当所述异常检测模型的识别效果低于预期时,根据所述预制规则获取新的异常数据。When the recognition effect of the abnormality detecting model is lower than expected, new abnormal data is acquired according to the prefabricated rule.
PCT/CN2019/073880 2018-03-19 2019-01-30 Anomaly detection method and device WO2019179248A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810223680.1A CN108563548B (en) 2018-03-19 2018-03-19 Abnormality detection method and apparatus
CN201810223680.1 2018-03-19

Publications (1)

Publication Number Publication Date
WO2019179248A1 true WO2019179248A1 (en) 2019-09-26

Family

ID=63532649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073880 WO2019179248A1 (en) 2018-03-19 2019-01-30 Anomaly detection method and device

Country Status (3)

Country Link
CN (1) CN108563548B (en)
TW (1) TW201941058A (en)
WO (1) WO2019179248A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563548B (en) * 2018-03-19 2020-10-16 创新先进技术有限公司 Abnormality detection method and apparatus
CN109614299B (en) * 2018-09-25 2022-05-31 创新先进技术有限公司 System anomaly detection method and device and electronic equipment
CN110991779A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Anomaly detection method and device for oil pumping well
CN109885417B (en) * 2018-12-28 2022-08-02 广州卓动信息科技有限公司 Anomaly analysis method, electronic device and readable storage medium
CN109905362B (en) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN109886290A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of user's request
CN109936561B (en) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN110113226B (en) * 2019-04-16 2021-03-12 新华三信息安全技术有限公司 Method and device for detecting equipment abnormity
CN111918280B (en) * 2019-05-07 2022-07-22 华为技术有限公司 Terminal information processing method, device and system
CN110399268B (en) * 2019-07-26 2023-09-26 创新先进技术有限公司 Abnormal data detection method, device and equipment
CN110554047B (en) * 2019-09-06 2021-07-02 腾讯科技(深圳)有限公司 Method, device, system and equipment for processing product defect detection data
CN112540842A (en) * 2019-09-20 2021-03-23 北京国双科技有限公司 Method and device for dynamically adjusting system resources
CN112818066A (en) * 2019-11-15 2021-05-18 深信服科技股份有限公司 Time sequence data anomaly detection method and device, electronic equipment and storage medium
CN111625516B (en) * 2020-01-10 2024-04-05 京东科技控股股份有限公司 Method, apparatus, computer device and storage medium for detecting data state
WO2021258348A1 (en) * 2020-06-24 2021-12-30 深圳市欢太科技有限公司 Abnormal flow detection method and system and computer storage medium
CN111813593B (en) * 2020-07-23 2023-08-18 平安银行股份有限公司 Data processing method, device, server and storage medium
CN111832666B (en) * 2020-09-15 2020-12-25 平安国际智慧城市科技股份有限公司 Medical image data amplification method, device, medium, and electronic apparatus
CN114386874B (en) * 2022-01-21 2022-11-29 北京国讯医疗软件有限公司 Multi-module linkage based medical and moral medical treatment and treatment integrated management method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942453A (en) * 2014-05-07 2014-07-23 华北电力大学 Intelligent electricity utilization anomaly detection method for non-technical loss
CN106886915A (en) * 2017-01-17 2017-06-23 华南理工大学 A kind of ad click predictor method based on time decay sampling
CN108563548A (en) * 2018-03-19 2018-09-21 阿里巴巴集团控股有限公司 Method for detecting abnormality and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339389B (en) * 2011-09-14 2013-05-29 清华大学 Fault detection method for one-class support vector machine based on density parameter optimization
US9916194B2 (en) * 2015-10-01 2018-03-13 International Business Machines Corporation System component failure diagnosis
CN107291911B (en) * 2017-06-26 2020-01-21 北京奇艺世纪科技有限公司 Anomaly detection method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942453A (en) * 2014-05-07 2014-07-23 华北电力大学 Intelligent electricity utilization anomaly detection method for non-technical loss
CN106886915A (en) * 2017-01-17 2017-06-23 华南理工大学 A kind of ad click predictor method based on time decay sampling
CN108563548A (en) * 2018-03-19 2018-09-21 阿里巴巴集团控股有限公司 Method for detecting abnormality and device

Also Published As

Publication number Publication date
CN108563548B (en) 2020-10-16
CN108563548A (en) 2018-09-21
TW201941058A (en) 2019-10-16

Similar Documents

Publication Publication Date Title
WO2019179248A1 (en) Anomaly detection method and device
CN106656536B (en) Method and equipment for processing service calling information
US11354219B2 (en) Machine defect prediction based on a signature
CN109933452B (en) Micro-service intelligent monitoring method facing abnormal propagation
US9672085B2 (en) Adaptive fault diagnosis
US10452983B2 (en) Determining an anomalous state of a system at a future point in time
US9940187B2 (en) Nexus determination in a computing device
CN108573355B (en) Method and device for replacing operation after model updating and business server
CN107124289B (en) Weblog time alignment method, device and host
CN108734304B (en) Training method and device of data model and computer equipment
CN111581036B (en) Internet of things fault detection method, detection system and storage medium
CN110096437A (en) The test method and Related product of micro services framework
WO2021188196A1 (en) Causality determination of upgrade regressions via comparisons of telemetry data
US9811447B2 (en) Generating a fingerprint representing a response of an application to a simulation of a fault of an external service
CN115373888A (en) Fault positioning method and device, electronic equipment and storage medium
CN114490375A (en) Method, device and equipment for testing performance of application program and storage medium
CN115118621A (en) Micro-service performance diagnosis method and system based on dependency graph
CN111506580A (en) Transaction storage method based on centralized block chain type account book
CN110543462A (en) Microservice reliability prediction method, prediction device, electronic device, and storage medium
US20210294717A1 (en) Graph analysis and database for aggregated distributed trace flows
US20220138557A1 (en) Deep Hybrid Graph-Based Forecasting Systems
CN115118580B (en) Alarm analysis method and device
CN108712284B (en) Fault service positioning method and device and service server
ChuahM et al. Failure diagnosis for cluster systems using partial correlations
CN110032488B (en) Monitoring system, method and device for specific nodes in cluster and service server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19771052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19771052

Country of ref document: EP

Kind code of ref document: A1