WO2019179248A1 - Procédé et dispositif de détection d'anomalie - Google Patents

Procédé et dispositif de détection d'anomalie Download PDF

Info

Publication number
WO2019179248A1
WO2019179248A1 PCT/CN2019/073880 CN2019073880W WO2019179248A1 WO 2019179248 A1 WO2019179248 A1 WO 2019179248A1 CN 2019073880 W CN2019073880 W CN 2019073880W WO 2019179248 A1 WO2019179248 A1 WO 2019179248A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
abnormal
training set
extended
detection model
Prior art date
Application number
PCT/CN2019/073880
Other languages
English (en)
Chinese (zh)
Inventor
周扬
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019179248A1 publication Critical patent/WO2019179248A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3495Performance evaluation by tracing or monitoring for systems

Definitions

  • the present specification relates to the field of computer technology, and in particular, to an abnormality detecting method and apparatus.
  • Data processing systems need to cope with the ever-increasing amount of data, especially for systems that support multiple services.
  • Data processing systems usually require a certain scale of server collaboration to achieve large-scale data processing.
  • platforms are generally supported to support different services.
  • Each platform can include one or more servers. This leads to the system needing hundreds or even thousands of servers to support, the size of the server is very large.
  • the code, database and configuration of these servers will change very frequently. The number of changes per week may be tens of thousands or even more. Due to the negligence or error of any link, the platform may be faulty. Even the system is faulty.
  • the server may also be distributed in different regions, so the fault is difficult to locate, and the fault resolution time is too long, causing huge losses. Therefore, in the event of a system failure, the abnormality is accurately and timely identified, and the system can be used to stop bleeding and reduce losses in the shortest time.
  • the commonly used means is that the business-critical indicators calculated in minutes form a time series, and the faults are identified by identifying the abnormalities of the time series.
  • this method mainly relies on historical data when the system is running. Since the abnormality in the historical data of the system is usually small, it is not enough as the basis for fault identification. Therefore, the abnormality is generally identified by analyzing the laws in the normal data. The sample method is single, the fault identification is misjudged, and the missed rate is relatively high.
  • the present specification provides an abnormality detecting method and apparatus.
  • an embodiment of the present specification provides an abnormality detecting method.
  • the method includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • an embodiment of the present specification provides an abnormality detecting device, which is characterized in that: the device includes:
  • a first acquiring unit configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
  • a second acquiring unit configured to acquire abnormal data according to the pre-made rule
  • a looping unit configured to cyclically execute the following execution unit, the training unit, and the step of the second acquiring unit, until the recognition effect of the abnormality detecting model reaches an expectation, so that the abnormality detecting model is used to achieve the abnormality of the detected data by using the recognition effect Detection
  • the extension unit is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the training unit is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine a recognition effect of the abnormality detection model;
  • the second obtaining unit is further configured to: when the recognition effect of the abnormality detecting model is lower than expected, acquire new abnormal data according to the prefabricated rule.
  • an embodiment of the present disclosure provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor implements the program The method steps of the aforementioned first aspect.
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the method of the first aspect described above.
  • a computer program product comprising instructions for causing a computer to perform the method of the first aspect described above when the instructions are run on a computer.
  • abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set.
  • the anomaly detection model performs the detection of the accuracy of the fault identification.
  • any of the embodiments of the present specification does not need to achieve all of the above effects.
  • FIG. 1 is a schematic diagram of an application scenario shown in an embodiment of the present specification
  • FIG. 2 is a schematic diagram of an abnormality detecting method according to an embodiment of the present specification
  • FIG. 3 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification.
  • FIG. 4 is a schematic diagram of another abnormality detecting method shown in an embodiment of the present specification.
  • FIG. 5 is a schematic flow chart of an abnormality detecting method according to an embodiment of the present specification.
  • FIG. 6 is a schematic structural diagram of an abnormality detecting device according to an embodiment of the present specification.
  • FIG. 7 is a schematic structural diagram of a computer device according to an embodiment of the present specification.
  • Data processing systems need to deal with ever-increasing amounts of data, especially for data processing systems that support multiple services.
  • Data processing systems usually need to achieve large-scale data processing through a certain scale of server collaboration.
  • platforms are generally supported to support different services.
  • Each platform can include one or more servers.
  • Ant Financial Services mainly involves hundreds of businesses such as convenience, wealth management, capital exchanges, and shopping and entertainment.
  • the number of platforms supporting these business systems is hundreds. Due to the sheer volume of platforms, changes to code, databases, and configuration can be very frequent, and changes that occur every week can be tens of thousands or more.
  • the number of actual failures is not frequent, and even some platforms have experienced failures, which leads to the coverage of abnormal data in the historical data sampled by Ant Financial in the running process.
  • the detection effect is not satisfactory.
  • the identified abnormal data is difficult to correspond with historical anomaly data, so it is difficult to analyze the root cause of abnormal data through historical data, which requires experienced technical personnel to judge. High cost and low efficiency.
  • the embodiment of the present specification provides an abnormality detecting method and apparatus.
  • the entity involved in the embodiment of the present specification includes: a data processing system 100 and a computer device 200.
  • the data processing system 100 may include a service server, a terminal, and the like.
  • the computer device 200 can be implemented independently of the data processing system 100 or by devices in the data processing system 100.
  • the functionality of the computer device 200 can be implemented by a service server in the service network 100.
  • the abnormality detection model is trained by the computer device 200, and the abnormality detection model of the data processing system 100 is abnormally detected by the trained abnormality detection model.
  • the computer device 200 updates the abnormal samples in the training set by acquiring abnormal data and extending the abnormal data, and training the abnormality detecting model according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training fails to meet the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set until the abnormal detection model is trained according to the updated training set. When the effect reaches the expected level, the training ends, and then, the abnormality detection model finally obtained through the training is used to perform abnormality detection on the data to be detected of the data processing system. Each time the training set is updated, the abnormal samples in the training set are increased, so that enough abnormal samples can be obtained as the basis for the abnormality detection.
  • the computer device 200 can quantize the acquired and extended abnormal data each time the training set is updated, so that after each update of the training set, the abnormal sample is increased by a specified number or 100%.
  • the abnormal sample of each training set update can be controlled by the parameter coverage of the abnormal sample.
  • the abnormal samples in the training set are updated, and then determining whether the parameter coverage of the abnormal samples in the updated training set reaches the expected value.
  • the anomaly detection model is trained according to the updated training set. If the recognition effect of the abnormality detection model obtained by the training does not reach the expected result, the abnormal data is continuously acquired, and the abnormal data is extended to update the abnormal sample in the training set, and the parameter coverage of the abnormal sample in the updated training set is ensured.
  • the rate is as expected until the recognition effect of the anomaly detection model trained according to the updated training set reaches the expected level, and the training ends.
  • the computer device 200 may also acquire an abnormal sample or extend the abnormal sample in combination with the recognition effect each time the training set is updated.
  • the manner of extending the abnormal sample may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, when the abnormal sample is extended, the focus may be increased. The amount of data or parameter coverage of the abnormal sample corresponding to the service.
  • the generation process of the abnormal sample (including the acquisition and extension of the abnormal data) can be regarded as an offensive closed loop, and the training abnormality detecting model according to the updated training set can be regarded as a defensive closed loop.
  • a sufficient number of abnormal samples can be obtained through the closed loop, and the anomaly detection model can be effectively trained through the defensive closed loop.
  • the attack and defense confrontation can effectively improve the recognition effect of the anomaly detection model.
  • the attack can be quantified by the parameter coverage or the amount of data of the abnormal sample, making the training anomaly detection model easier to iterate.
  • FIG. 5 is a schematic flowchart diagram of an abnormality detecting method according to an embodiment of the present disclosure. The method is applicable to a computer device, as shown in FIG. 5, the method includes steps 510-560:
  • Step 510 Acquire sampling data when the system is in normal operation, and use the sampled data as a normal sample in the training set.
  • the solution provided in this specification can periodically sample when the data processing system is in normal operation, and obtain sampling data during normal operation of the data processing system.
  • the data of the normal operation of the data processing system can be sampled every minute.
  • the sampled data of the acquired data processing system during normal operation is marked as a class as a normal sample in the training sample.
  • the sample data class when the data processing system is in normal operation is marked as "0"
  • the class "0" is used to indicate that the data marked by it is a normal sample.
  • the data of the normal operation of the data processing system includes one or more of call data, indicator data, change data, and operation and maintenance data.
  • the call data may include one or more of a call link, an interface name, an input parameter, an output parameter, and a call time-consuming.
  • the calling link can be a directed acyclic graph, the point is the calling interface, and the edge is the calling relationship.
  • the calling data may be for a call request.
  • the terminal invokes a request for a payment service.
  • the indicator data can be a key indicator of the data processing system, for example, the number of system calls for each service that can be aggregated in minutes in the form of time series.
  • Change data can trigger changes to actions such as code release and data processing system configuration modifications.
  • the operation and maintenance data can include hardware data. For example, CPU usage, network latency, memory usage, and more.
  • Step 520 Acquire abnormal data according to the pre-made rule.
  • the pre-made rule may be determined according to actual requirements. For example, the pre-made rule may sequentially generate a fault request for each service in the data processing system, so that the obtained abnormal sample corresponds to each service in the data processing system, and the coverage of the abnormal sample is high. .
  • the fault request may be generated according to the pre-made rule, and the context data of the fault request is obtained, and the context data of the fault request is added as an abnormal sample in the training set.
  • the context data of the fault request may be the running data of the collected data processing system after receiving the fault request.
  • the context data may include one or more of call data, indicator data, change data, and operation and maintenance data.
  • Steps 530-560 are executed cyclically until the recognition effect of the anomaly detection model is expected to be:
  • step 530 the abnormal data is extended, and the abnormal data and the extended abnormal data are added as abnormal samples in the training set.
  • the extension of the rules can be used to extend the anomaly data. Based on this, the abnormality detection data generated according to the prefabrication rule may be added or subtracted in the training set, and then the prefabricated rule is extended, and the extended fault request is generated according to the extended prefabrication rule, and the context data of the extended fault request is obtained, and The context data of the extended fault request is added as an exception sample in the training set.
  • the abnormality detection data generated according to the pre-made rule may first be added or subtracted in the training set, and then the following steps are performed cyclically until the parameter coverage of the abnormal sample in the training set reaches an expected value: the pre-made rule is performed Extending, generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, adding context data of the extended fault request as an abnormal sample in the training set; and determining an abnormal sample in the training set The parameters cover whether the expected condition is reached. When the parameter coverage of the abnormal samples in the training set does not reach the expected level, the extended pre-formation is used as the new pre-made rule. For example, determining whether the parameter of the abnormal sample in the training set covers whether the expected value can be achieved can be achieved by determining whether the abnormal sample in the training set is spread over each service, and whether the number of abnormal samples corresponding to each service reaches a threshold.
  • the recognition effect of the anomaly detection algorithm does not reach the expected value, and the expected value of the parameter coverage of the abnormal sample can be improved.
  • the extension of the prefabricated rules can be extended in conjunction with business rules or by tricks. For example, it can be extended in one or more of the following ways:
  • the intelligent fault extension for example, can use the context collected according to the fault request as a seed sample, and adopt a genetic algorithm to perform fault extension.
  • context data of the fault request can be marked as a class as an exception sample in the training sample.
  • the context data class of the fault request is marked as "1"
  • the class "1" is used to indicate that the data marked by it is an abnormal sample.
  • Step 540 Train the abnormality detection model according to the training set after adding the abnormal data, and determine the recognition effect of the abnormality detection model.
  • feature preprocessing can be performed on samples in the training set.
  • a variety of feature pre-processing methods can be employed herein to obtain features of one or more expressions of parameter expression, structural expression, indicator aggregation, and altered expression.
  • the feature of each feature expression form may correspond to one or more anomaly detection models, and the features of each feature expression form correspond to different anomaly detection models.
  • the corresponding anomaly detection models are trained according to the characteristics of each feature expression form.
  • the time series anomaly detection model is trained according to the indicator convergence feature;
  • the graph-based anomaly detection algorithm can be trained according to the characteristics of the structure expression;
  • the adjacent point-based, linear, subspace-based based training can be trained according to the parameter expression or the changed expression characteristics.
  • anomaly detection models based on supervised learning.
  • the recognition effect of the abnormality detecting model can be determined, and after the recognition effect is constant, the constant recognition effect is the recognition effect of the abnormality detecting model after the training.
  • the recognition effect can be expressed by one or more of recognition accuracy, recognition coverage, and KS value.
  • step 550 it is determined whether the recognition effect of the abnormality detection model is as expected.
  • the expectation may be a threshold corresponding to one or more of the identification accuracy, the recognition coverage, and the KS value, etc., for example, the prediction may be that the recognition accuracy is not less than 99.5%.
  • Step 560 When the recognition effect of the abnormality detection model is lower than expected, the new abnormal data is acquired according to the prefabrication rule.
  • the prefabricated rule in step 560 may be an extended prefabricated rule or an initial prefabricated rule, and the initial prefabricated rule may refer to a prefabricated rule in which no extension occurs.
  • the abnormality sample may be acquired in conjunction with the recognition effect or the abnormal sample may be extended.
  • the manner of extending the prefabrication rule may be adjusted according to the recognition effect. For example, when the abnormality detection model after the training has a poor recognition effect on the abnormal sample corresponding to a certain service, the extended prefabrication rule may focus on increasing the service for the service. The fault request is generated to obtain a richer abnormal sample corresponding to the service, thereby increasing the capability of the trained abnormality detecting model in identifying the data to be detected corresponding to the service.
  • Step 570 When the recognition effect of the abnormality detecting model reaches an expectation, the abnormality detecting model that uses the recognition effect to achieve the expected abnormality detecting model performs abnormality detection on the detected data.
  • the abnormality detection model when the data processing system receives the service processing request, the abnormality detection model may be triggered according to the recognition effect to perform the abnormality detection. After the abnormality detection is triggered, the data to be detected generated by the service processing request may be collected in real time or periodically.
  • the data to be detected includes one or more of call data, indicator data, change data, and operation and maintenance data.
  • the feature data may be preprocessed first.
  • a plurality of feature preprocessing methods may be used to obtain one or more of parameter expression, structure expression, indicator convergence, and change expression.
  • a characteristic of a form of expression may be used.
  • An anomaly detection model corresponding to the feature of each expression is used to identify whether the feature is abnormal.
  • the features of the same expression form correspond to a plurality of anomaly detection models, if the detection results obtained by the plurality of anomaly detection models are inconsistent, the feature may be determined by voting to determine whether the feature is abnormal.
  • abnormal data can be acquired, and the abnormal data is extended to obtain more abnormal samples, and the normal sample is obtained to obtain a training set with sufficient positive samples and negative samples, thereby improving training according to the training set.
  • the anomaly detection model performs the detection of the accuracy of the fault identification.
  • the space for recognition effect is improved, and the basis for determining the root cause of the fault is provided, which helps the system to locate the problem faster. It can be detected at the system call link, parameter, system change level.
  • the context slice collected during fault injection can save the refined data, which can restore the situation when the system is faulty. It can be combined with multiple detailed data sources for identification. High performance, good recognition, and combined with refined data when locating faults.
  • the embodiment of the present specification further provides an abnormality detecting device.
  • the device may include:
  • the first obtaining unit 601 is configured to acquire sampling data when the system is in normal operation, and use the sampling data as a normal sample in the training set;
  • the second obtaining unit 602 is configured to acquire abnormal data according to the pre-made rule.
  • the looping unit 603 is configured to cyclically execute the following steps of performing the extending unit, the training unit, and the second acquiring unit until the recognition effect of the abnormality detecting model is expected to be used, so as to use the recognition effect to achieve the expected abnormality detecting model for the detected data. abnormal detection;
  • the extension unit 604 is configured to extend the abnormal data, and add the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the training unit 605 is configured to train the abnormality detection model according to a training set after adding abnormal data, and determine an identification effect of the abnormality detection model;
  • the second obtaining unit 602 is further configured to acquire new abnormal data according to the pre-made rule when the recognition effect of the abnormality detecting model is lower than expected.
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training unit 605 is specifically configured to
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the second obtaining unit 602 is specifically configured to generate a fault request according to the pre-made rule, and acquire context data of the fault request.
  • the extension unit 604 is specifically configured to: extend the pre-made rule, generate an extended fault request according to the extended pre-made rule, acquire context data of the extended fault request, and perform the fault request The context data and the context data of the extended fault request are added as abnormal samples in the training set.
  • extension unit 604 is specifically configured to:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • the embodiments of the present specification further provide a computer device including at least a memory, a processor, and a computer program stored on the memory and operable on the processor, the computer device being implemented in the form of an anomaly detection server.
  • the processor executes the program, the foregoing abnormality detecting method is implemented.
  • the method at least includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training the anomaly detection model according to the training set includes:
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the obtaining the abnormal data according to the pre-made rule includes:
  • a fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
  • extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
  • the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • FIG. 7 shows a schematic diagram of a more specific computer device structure provided by an embodiment of the present specification.
  • the computer device may include a processor 710, a memory 720, an input/output interface 730, a communication interface 740, and a bus 750.
  • the processor 77, the memory 720, the input/output interface 730, and the communication interface 740 implement a communication connection between the devices via the bus 750.
  • the processor 710 can be implemented by using a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits for performing correlation.
  • the program is implemented to implement the technical solutions provided by the embodiments of the present specification.
  • the memory 720 can be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like.
  • the memory 720 can store the operating system and other applications.
  • the technical solution provided by the embodiment of the present specification is implemented by software or firmware, the related program code is saved in the memory 720 and is called and executed by the processor 710.
  • the input/output interface 730 is used to connect an input/output module to implement information input and output.
  • the input/output/module can be configured as a component in the device (not shown) or externally connected to the device to provide the corresponding function.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various types of sensors, and the like, and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 740 is used to connect a communication module (not shown) to implement communication interaction between the device and other devices.
  • the communication module can communicate by wired means (such as USB, network cable, etc.), or can communicate by wireless means (such as mobile network, WIFI, Bluetooth, etc.).
  • Bus 750 includes a path for transferring information between various components of the device, such as processor 710, memory 720, input/output interface 730, and communication interface 740.
  • the above device only shows the processor 710, the memory 720, the input/output interface 730, the communication interface 740, and the bus 750, in a specific implementation, the device may also include necessary for normal operation. Other components.
  • the above-mentioned devices may also include only the components necessary for implementing the embodiments of the present specification, and do not necessarily include all the components shown in the drawings.
  • the embodiment of the present specification further provides a computer readable storage medium having stored thereon a computer program, which is implemented by the processor to implement the aforementioned abnormality detecting method.
  • the method at least includes:
  • Extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set;
  • the abnormality detection model is trained according to a training set after the abnormal data is added, and the recognition effect of the abnormality detection model is determined;
  • the samples in the training set include one or more of call data, metric data, change data, and operational data.
  • the training the anomaly detection model according to the training set includes:
  • the corresponding anomaly detection model is trained according to the characteristics of each expression form.
  • the obtaining the abnormal data according to the pre-made rule includes:
  • a fault request is generated according to the pre-made rule, and the context data of the fault request is obtained.
  • extending the abnormal data adding the abnormal data and the extended abnormal data as abnormal samples in the training set includes:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as Anomalous samples are added to the training set.
  • the extending the pre-made rule to generate an extended fault request according to the extended pre-made rule, and acquiring the context data of the extended fault request includes:
  • the loop performs the following steps until the parameter coverage of the abnormal samples in the training set reaches the expected value:
  • Extending the pre-made rule generating an extended fault request according to the extended pre-made rule, acquiring context data of the extended fault request, and using context data of the fault request and context data of the extended fault request as An abnormal sample is added in the training set;
  • the extended pre-formation is taken as the new pre-made rule.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • the embodiments of the present specification can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the embodiments of the present specification may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM. Disks, optical disks, and the like, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the embodiments of the present specification or embodiments.
  • a computer device which may be a personal computer, server, or network device, etc.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • the various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the device embodiments described above are merely illustrative, and the modules described as separate components may or may not be physically separated, and the functions of the modules may be the same in the implementation of the embodiments of the present specification. Or implemented in multiple software and/or hardware. It is also possible to select some or all of the modules according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without any creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

L'invention concerne un procédé et un dispositif de détection d'anomalie. Le procédé comprend l'utilisation, en tant qu'échantillon normal d'un ensemble d'apprentissage, de données échantillonnées pendant le fonctionnement normal d'un système ; l'acquisition de données d'anomalie ; et la réalisation cyclique des étapes suivantes, jusqu'à ce qu'un effet d'identification attendu d'un modèle de détection d'anomalie soit obtenu, de façon à faciliter la réalisation d'une détection d'anomalie sur des données en cours de détection au moyen du modèle de détection d'anomalie ayant atteint l'effet d'identification attendu : l'extension des données d'anomalie, et l'ajout, en tant qu'échantillons d'anomalie, des données d'anomalie et des données d'anomalie étendues à l'ensemble d'apprentissage ; l'entraînement du modèle de détection d'anomalie en fonction de l'ensemble d'apprentissage, et la détermination de l'effet d'identification du modèle de détection d'anomalie ; et lorsque l'effet d'identification du modèle de détection d'anomalie est pire qu'attendu, l'acquisition de nouvelles données d'anomalie. Le procédé est utilisé pour acquérir davantage d'échantillons d'anomalie, de telle sorte qu'un ensemble d'échantillons d'apprentissage ayant des échantillons positifs et des échantillons négatifs suffisants est acquis en référence à des échantillons normaux, ce qui permet d'améliorer la précision d'une identification de défaillance effectuée par un modèle de détection d'anomalie entraîné par l'ensemble d'apprentissage.
PCT/CN2019/073880 2018-03-19 2019-01-30 Procédé et dispositif de détection d'anomalie WO2019179248A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810223680.1 2018-03-19
CN201810223680.1A CN108563548B (zh) 2018-03-19 2018-03-19 异常检测方法及装置

Publications (1)

Publication Number Publication Date
WO2019179248A1 true WO2019179248A1 (fr) 2019-09-26

Family

ID=63532649

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073880 WO2019179248A1 (fr) 2018-03-19 2019-01-30 Procédé et dispositif de détection d'anomalie

Country Status (3)

Country Link
CN (1) CN108563548B (fr)
TW (1) TW201941058A (fr)
WO (1) WO2019179248A1 (fr)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563548B (zh) * 2018-03-19 2020-10-16 创新先进技术有限公司 异常检测方法及装置
CN109614299B (zh) * 2018-09-25 2022-05-31 创新先进技术有限公司 一种系统异常检测方法、装置及电子设备
CN110991779A (zh) * 2018-09-30 2020-04-10 北京国双科技有限公司 抽油机井的异常检测方法及装置
CN109885417B (zh) * 2018-12-28 2022-08-02 广州卓动信息科技有限公司 异常分析方法及电子设备、可读存储介质
CN109886290A (zh) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 用户请求的检测方法、装置、计算机设备及存储介质
CN109936561B (zh) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 用户请求的检测方法、装置、计算机设备及存储介质
CN109905362B (zh) * 2019-01-08 2022-05-13 平安科技(深圳)有限公司 用户请求的检测方法、装置、计算机设备及存储介质
EP3712736A1 (fr) * 2019-03-22 2020-09-23 L'air Liquide, Societe Anonyme Pour L'etude Et L'exploitation Des Procedes Georges Claude Méthode de détection d'anomalies dans une installation de traitement des eaux mettant en oeuvre un appareil d'injection d'oxygène dans un bassin d'épuration
CN110113226B (zh) * 2019-04-16 2021-03-12 新华三信息安全技术有限公司 一种检测设备异常的方法及装置
CN111918280B (zh) * 2019-05-07 2022-07-22 华为技术有限公司 一种终端信息的处理方法、装置及系统
CN110399268B (zh) * 2019-07-26 2023-09-26 创新先进技术有限公司 一种异常数据检测的方法、装置及设备
CN110554047B (zh) * 2019-09-06 2021-07-02 腾讯科技(深圳)有限公司 产品缺陷检测数据处理方法、装置、系统和设备
CN112540842A (zh) * 2019-09-20 2021-03-23 北京国双科技有限公司 动态调整系统资源的方法及装置
CN112818066A (zh) * 2019-11-15 2021-05-18 深信服科技股份有限公司 一种时序数据异常检测方法、装置及电子设备和存储介质
CN111625516B (zh) * 2020-01-10 2024-04-05 京东科技控股股份有限公司 检测数据状态的方法、装置、计算机设备和存储介质
WO2021258348A1 (fr) * 2020-06-24 2021-12-30 深圳市欢太科技有限公司 Procédé et système de détection de flux anormal et support de stockage informatique
CN111813593B (zh) * 2020-07-23 2023-08-18 平安银行股份有限公司 一种数据处理方法、设备、服务器及存储介质
CN111832666B (zh) * 2020-09-15 2020-12-25 平安国际智慧城市科技股份有限公司 医疗影像数据扩增方法、装置、介质及电子设备
CN114386874B (zh) * 2022-01-21 2022-11-29 北京国讯医疗软件有限公司 一种基于多模块联动的医德医风综合管理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942453A (zh) * 2014-05-07 2014-07-23 华北电力大学 一种针对非技术性损失的智能用电异常检测方法
CN106886915A (zh) * 2017-01-17 2017-06-23 华南理工大学 一种基于时间衰减采样的广告点击预估方法
CN108563548A (zh) * 2018-03-19 2018-09-21 阿里巴巴集团控股有限公司 异常检测方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339389B (zh) * 2011-09-14 2013-05-29 清华大学 一种基于密度的参数优化单分类支持向量机故障检测方法
US9916194B2 (en) * 2015-10-01 2018-03-13 International Business Machines Corporation System component failure diagnosis
CN107291911B (zh) * 2017-06-26 2020-01-21 北京奇艺世纪科技有限公司 一种异常检测方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942453A (zh) * 2014-05-07 2014-07-23 华北电力大学 一种针对非技术性损失的智能用电异常检测方法
CN106886915A (zh) * 2017-01-17 2017-06-23 华南理工大学 一种基于时间衰减采样的广告点击预估方法
CN108563548A (zh) * 2018-03-19 2018-09-21 阿里巴巴集团控股有限公司 异常检测方法及装置

Also Published As

Publication number Publication date
CN108563548A (zh) 2018-09-21
CN108563548B (zh) 2020-10-16
TW201941058A (zh) 2019-10-16

Similar Documents

Publication Publication Date Title
WO2019179248A1 (fr) Procédé et dispositif de détection d'anomalie
US11354219B2 (en) Machine defect prediction based on a signature
AU2016351091B2 (en) Method and device for processing service calling information
US9672085B2 (en) Adaptive fault diagnosis
US9940187B2 (en) Nexus determination in a computing device
CN108573355B (zh) 模型更新后替换运行的方法、装置、及业务服务器
CN107124289B (zh) 网络日志时间对齐方法、装置及主机
CN108734304B (zh) 一种数据模型的训练方法、装置、及计算机设备
CN111581036B (zh) 一种物联网故障检测方法、检测系统、存储介质
CN110096437A (zh) 微服务架构的测试方法及相关产品
US8832839B2 (en) Assessing system performance impact of security attacks
WO2021188196A1 (fr) Détermination de causalité de régressions de mise à niveau par des comparaisons de données de télémesure
CN114844768A (zh) 信息分析方法、装置及电子设备
CN115373888A (zh) 故障定位方法、装置、电子设备和存储介质
CN115118621A (zh) 一种基于依赖关系图的微服务性能诊断方法及系统
WO2014204470A1 (fr) Génération d'une empreinte digitale représentant une réponse d'une application à une simulation d'une panne d'un service externe
CN111506580A (zh) 一种基于中心化块链式账本的交易存储方法
CN110543462A (zh) 微服务可靠性预测方法、预测装置、电子设备及存储介质
CN115118580B (zh) 告警分析方法以及装置
CN114546799A (zh) 埋点日志校验方法、装置、电子设备、存储介质及产品
ChuahM et al. Failure diagnosis for cluster systems using partial correlations
CN112905479B (zh) 一种基于云平台报警事故根因最佳路径确定方法及系统
US11758040B2 (en) Systems and methods for use in blocking of robocall and scam call phone numbers
Wenzler Automated performance regression detection in microservice architectures
US8510601B1 (en) Generating service call patterns for systems under test

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19771052

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19771052

Country of ref document: EP

Kind code of ref document: A1