WO2024060767A1

WO2024060767A1 - Anomaly detection method and related apparatus

Info

Publication number: WO2024060767A1
Application number: PCT/CN2023/103993
Authority: WO
Inventors: 杨松; 周颖杰; 陈亮; 邵传领; 王腾宇; 吴迪; 刘凡兴; 邓怡然
Original assignee: 华为云计算技术有限公司
Priority date: 2022-09-20
Filing date: 2023-06-29
Publication date: 2024-03-28
Also published as: CN117792662A

Abstract

The present application relates to the technical field of cloud services. Disclosed are an anomaly detection method and a related apparatus. The solution does not require related personnel in the field of cloud services to manually create security rules, and does not require manual deep analysis and summarization of various attack modes, thereby avoiding vulnerabilities in manual creation of security rules, reducing missing and false detections, and improving the anomaly detection efficiency. In addition, even if the related personnel have less or even no professional knowledge about machine learning and deep learning, for example, knowledge related to model design and tuning, the solution can realize quick self-construction of an anomaly detection model to realize anomaly detection oriented to specific tasks.

Description

Abnormality detection method and related device

This application claims priority to the Chinese patent application with application number 202211145851.6 and the invention title "Anomaly Detection Method and Related Devices" submitted on September 20, 2022, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of cloud service technology, and in particular to an anomaly detection method and related devices.

Background technique

With the development of cloud service technology, more and more users operate and access cloud platforms through application programming interfaces (APIs) to use cloud services provided by cloud platforms. In the process of users using cloud services, a large amount of valuable data will be generated on the cloud platform, such as users' operational data, personal information, business information, etc. In order to ensure the security of users' use of cloud services and prevent attackers from stealing or destroying various resources in the cloud platform through illegal operations, it is necessary to comprehensively monitor various operations and access in the cloud platform to discover potential threats in the cloud platform. Threats and anomalies.

In related technology, a web application firewall (WAF) is deployed between the server of the cloud platform and the Internet connected to the client, and WAF is used to detect and protect API requests from the Internet anomalies. For example, WAF filters and protects API requests based on manually created security rules. These security rules cover some common attack patterns. When WAF detects that certain API requests comply with the attack modes included in security rules, it can reject these API requests, thereby preventing malicious persons from attacking the cloud platform through APIs.

However, WAF-based anomaly detection solutions require manual creation of security rules, which is not only time-consuming and labor-intensive, but also requires manual in-depth analysis and summary of various attack modes to achieve better anomaly detection results. In addition, manually created security rules will inevitably have loopholes, resulting in missed detections and false detections in the above solutions.

Contents of the invention

This application provides an anomaly detection method and related devices, which can lower the threshold for abnormal detection by relevant personnel, reduce missed detections and false detections, and improve the efficiency of abnormal detection. The technical solutions are as follows:

In a first aspect, an anomaly detection method is provided, and the method includes:

Receive the configuration parameters of the anomaly detection task. The configuration parameters indicate the sample set, test set and candidate attribute fields. The sample set includes the log data used for parameter tuning in the cloud platform. The test set includes the logs to be detected in the cloud platform. data, the candidate attribute field is the attribute field corresponding to the log data of the cloud platform; based on the sample set and the candidate attribute field, the target attribute field is determined from the candidate attribute field, and the target attribute field is the attribute field used for anomaly detection tasks; based on the sample Set and target attribute fields, tune the first hyperparameter of the first detection model; based on the target attribute field, perform anomaly detection on the test set through the parameter-tuned first detection model to obtain the anomaly detection results of the test set .

This solution does not require relevant personnel in the cloud service field to manually create security rules, nor does it require manual in-depth analysis and summary of various attack modes, thus avoiding the vulnerabilities of manually created security rules and reducing missed detections and false detections. It can also improve the efficiency of anomaly detection. In addition, this solution can also enable self-service and rapid construction of anomaly detection models to achieve oriented implementation, even if the relevant personnel have as little or even no professional knowledge of machine learning and deep learning, such as knowledge of model design and tuning. Task-specific anomaly detection.

Optionally, the candidate attribute fields include m selected fields and n candidate fields, m is an integer not less than 0, and n is an integer greater than 0; based on the sample set and candidate attribute fields, determine the target from the candidate attribute fields Attribute fields include: based on the sample set, the m selected fields and the n candidate fields, determine the field scores corresponding to the n candidate fields, and the field scores represent adding the corresponding candidate fields to the m selected fields. The degree to which the anomaly detection effect is improved after selecting fields; based on the field scores corresponding to the n candidate fields, p candidate fields are determined from the n candidate fields, where p is a positive integer not greater than n; The m selected fields and p candidate fields are determined as target attribute fields. That is, use the sample set and filter the fields according to the degree to which each candidate field improves the anomaly detection effect, thereby filtering out more valuable attribute fields.

Optionally, the sample set includes a training subset and a validation subset; based on the sample set, the m selected fields and the n candidate fields, determining the field scores corresponding to the n candidate fields includes: converting the m The selected fields form a selected field set, and the n candidate fields form a candidate field set. Based on the training subset, the selected field set, and the candidate field set, determine the corresponding response of each candidate field in the candidate field set. Mutual information, which represents the correlation between the corresponding candidate field and all fields in the selected field set; select k candidate fields with the smallest mutual information from the candidate field set, k is not greater than n Positive integer; based on the training subset, verification subset, selected field set and the k candidate fields, determine the reconstruction loss corresponding to the k candidate fields. The reconstruction loss is represented by the corresponding candidate fields and selected fields. The effect of anomaly detection on the verification subset by the set; based on the mutual information and reconstruction loss corresponding to the k candidate fields, select one candidate field from the k candidate fields, and determine the value of the selected candidate field Field score; move the selected candidate fields from the candidate field set to the selected field set, return based on the training subset, the selected field set and the candidate field set, and determine the score corresponding to each candidate field in the candidate field set. In the mutual information step, until the set of candidate fields is empty, the field scores corresponding to the n candidate fields are obtained. That is, the mutual information between the field to be selected and the selected field and the reconstruction loss corresponding to the field to be selected are used to filter the fields, thereby filtering out more valuable attribute fields.

Optionally, based on the training subset, the verification subset, the selected field set and the k candidate fields, determine the reconstruction losses corresponding to the k candidate fields, including: for the kth candidate field among the k candidate fields. One candidate field, add the first candidate field to the selected field set to obtain a candidate field set, the first candidate field is any candidate field among the k candidate fields; based on the training subset and The candidate field set determines the second detection model corresponding to the first candidate field; based on the verification subset and the candidate field set, determines the reconstruction loss corresponding to the first candidate field through the second detection model corresponding to the first candidate field.

Optionally, based on the training subset and the candidate field set, determining the second detection model corresponding to the first candidate field includes: determining the reference statistical characteristics of the training subset based on the candidate field set, and the reference statistical characteristics of the training subset include training The statistics of the data of all fields included in the candidate field set in the subset; train the initial detection model through the reference statistical features of the training subset to obtain the second detection model corresponding to the first candidate field.

Optionally, based on the verification subset and the candidate field set, determining the reconstruction loss corresponding to the first candidate field through the second detection model corresponding to the first candidate field, including: determining the reference statistics of the verification subset based on the candidate field set. Features, the reference statistical features of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset; input the reference statistical features of the verification subset into the second detection model corresponding to the first candidate field to obtain verification The reference reconstruction characteristics of the subset, the reference reconstruction characteristics of the verification subset include reconstruction statistics of data of all fields included in the candidate field set in the verification subset; based on the reference statistical characteristics and reference reconstruction characteristics of the verification subset, determine the first candidate The reconstruction loss corresponding to the field.

Optionally, the configuration parameter also indicates the category of each attribute field in the candidate attribute fields. Different categories of attribute fields correspond to different types of statistics. This can improve the anomaly detection effect by calculating more valuable statistics.

Optionally, the first hyperparameter includes learning rate, number of training epochs, and hidden layer dimensions. That is, the first hyperparameters that need to be tuned include learning rate, number of training rounds, and hidden layer dimensions, which are three parameters that have a relatively large impact on model performance. Under the premise of ensuring the performance of the first detection model after parameter optimization, , which can improve the execution efficiency of anomaly detection tasks.

Optionally, the first detection model includes an input layer, a first hidden layer and a second hidden layer; the hidden layer dimensions include the dimensions of the first hidden layer and the second hidden layer, and the dimensions of the input layer are based on the fields included in the target attribute field. The quantity is determined, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimensions of the input layer. That is to say, the dimensions of the hidden layer are not set arbitrarily, and the search space of the hidden layer dimension is small.

Optionally, the first detection model includes an encoder, a decoder and a discriminator, and the parameters of the discriminator include an error threshold; based on the target attribute field, anomaly detection is performed on the test set through the parameter-tuned first detection model to Obtaining the anomaly detection results of the test set includes: determining the statistical characteristics of the test set based on the target attribute field. The statistical characteristics of the test set include statistics of the data in the target attribute field in the test set; inputting the statistical characteristics of the test set into the encoder to obtain Encoding features of the test set; inputting the encoding features of the test set into the decoder to obtain reconstructed features of the test set; inputting statistical features and reconstructed features of the test set into the discriminator to determine the Anomaly detection results on the test set.

The error threshold is determined according to the mean of multiple reconstruction losses, which include the error between the statistical features and the reconstructed features of each sample to be tested in the test set, or the error between the statistical features and the reconstructed features of each training sample in the training subset. That is, the error threshold is determined according to the average error of the sample population, which can improve the accuracy of anomaly detection.

In a second aspect, an anomaly detection device is provided. The anomaly detection device has the function of realizing the behavior of the anomaly detection method in the first aspect. The anomaly detection device includes one or more modules, the one or more modules are used to implement the anomalies provided by the first aspect. Detection method.

That is, an anomaly detection device is provided, which device includes:

The receiving module is used to receive the configuration parameters of the anomaly detection task. The configuration parameters indicate the sample set, the test set and the candidate attribute fields. The sample set includes log data used for parameter tuning in the cloud platform. The test set Includes log data to be detected for anomalies in the cloud platform, and the candidate attribute fields are attribute fields corresponding to the log data of the cloud platform;

A determination module, configured to determine a target attribute field from the candidate attribute field based on the sample set and the candidate attribute field, where the target attribute field is an attribute field used to perform the anomaly detection task;

A parameter tuning module, configured to tune the first hyperparameter of the first detection model based on the sample set and the target attribute field;

An anomaly detection module, configured to perform anomaly detection on the test set based on the target attribute field through a parameter-tuned first detection model to obtain an anomaly detection result of the test set.

Optionally, the candidate attribute fields include m selected fields and n candidate fields, m is an integer not less than 0, and n is an integer greater than 0;

The determination module includes:

The first determination sub-module is used to determine the field scores corresponding to the n candidate fields based on the sample set, the m selected fields and the n candidate fields, where the field scores represent the The extent to which the anomaly detection effect is improved after adding corresponding candidate fields to the m selected fields;

The second determination sub-module is used to determine p candidate fields from the n candidate fields based on the field scores corresponding to the n candidate fields, where the p is a positive integer not greater than n;

The third determination sub-module is used to determine the m selected fields and the p candidate fields as the target attribute fields.

Optionally, the sample set includes a training subset and a validation subset;

The first determination sub-module is specifically used for:

The m selected fields are formed into a selected field set, and the n candidate fields are formed into a candidate field set. Based on the training subset, the selected field set and the candidate field set, it is determined Mutual information corresponding to each candidate field in the candidate field set, the mutual information representing the correlation between the corresponding candidate field and all fields in the selected field set;

Select k candidate fields with the smallest mutual information from the set of candidate fields, where k is a positive integer not greater than n;

Based on the training subset, the verification subset, the selected field set and the k candidate fields, the reconstruction loss corresponding to the k candidate fields is determined, and the reconstruction loss is represented by the corresponding The effect of anomaly detection on the verification subset by the selected fields and the selected field set;

Selecting a candidate field from the k candidate fields based on the mutual information and the reconstruction loss respectively corresponding to the k candidate fields, and determining a field score of the selected candidate field;

Move the selected candidate fields from the candidate field set to the selected field set, and return to determine the candidate fields based on the training subset, the selected field set and the candidate field set. The step of selecting the mutual information corresponding to each field to be selected in the field set is until the field set to be selected is empty, and the field scores corresponding to the n candidate fields are obtained.

Optionally, the first determination sub-module is specifically used to:

For the first candidate field among the k candidate fields, the first candidate field is added to the selected field set to obtain a candidate field set, and the first candidate field is the Any candidate field among k candidate fields;

Based on the training subset and the candidate field set, determine a second detection model corresponding to the first candidate field;

Based on the verification subset and the candidate field set, the reconstruction loss corresponding to the first candidate field is determined through the second detection model corresponding to the first candidate field.

Optionally, the first determination sub-module is specifically used to:

Determine reference statistical characteristics of the training subset based on the candidate field set, where the reference statistical characteristics of the training subset include statistics of data of all fields included in the candidate field set in the training subset;

An initial detection model is trained using the reference statistical features of the training subset to obtain a second detection model corresponding to the first candidate field.

Optionally, the first determination sub-module is specifically used to:

Determine reference statistical characteristics of the verification subset based on the candidate field set, where the reference statistical characteristics of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset;

Input the reference statistical features of the validation subset into the second detection model corresponding to the first candidate field to obtain the validation subset A reference reconstruction feature, wherein the reference reconstruction feature of the validation subset includes reconstruction statistics of data of all fields included in the candidate field set in the validation subset;

Based on the reference statistical features and reference reconstruction features of the verification subset, the reconstruction loss corresponding to the first candidate field is determined.

Optionally, the configuration parameter also indicates the category of each attribute field in the candidate attribute field, and attribute fields of different categories have different types of statistics corresponding to them.

Optionally, the first hyperparameters include a learning rate, a number of training rounds, and a hidden layer dimension.

Optionally, the first detection model includes an input layer, a first hidden layer and a second hidden layer; the hidden layer dimensions include the dimensions of the first hidden layer and the second hidden layer, and the input layer The dimensions of are determined based on the number of fields included in the target attribute field, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimensions of the input layer.

Optionally, the first detection model includes an encoder, a decoder and a discriminator, and the parameters of the discriminator include an error threshold;

The anomaly detection module includes:

The fourth determination sub-module is used to determine the statistical characteristics of the test set based on the target attribute field, where the statistical characteristics of the test set include statistics of the data of the target attribute field in the test set;

The first input submodule is used to input the statistical characteristics of the test set into the encoder to obtain the coding characteristics of the test set;

The second input submodule is used to input the coding features of the test set into the decoder to obtain the reconstructed features of the test set;

The third input submodule is used to input the statistical features and reconstructed features of the test set into the discriminator to determine the anomaly detection result of the test set according to the error threshold.

In a third aspect, a computing device cluster is provided. The computing device cluster includes at least one computing device. The computing device includes a processor and a memory. The memory of the at least one computing device is used to store the data required for executing the first aspect. The program (that is, the instruction) of the anomaly detection method is provided, and the data involved in storing the anomaly detection method provided in the first aspect is stored. The processor is configured to execute a program stored in the memory. The computing device may also include a communication bus for establishing a connection between the processor and memory.

A fourth aspect provides a computer-readable storage medium in which a computer program is stored, which when run on a computer causes the computer to execute the anomaly detection method described in the first aspect.

A fifth aspect provides a computer program product containing instructions that, when run on a computer, causes the computer to execute the anomaly detection method described in the first aspect.

The technical effects obtained by the above-mentioned second aspect, third aspect, fourth aspect and fifth aspect are similar to those obtained by the corresponding technical means in the first aspect, and will not be described again here.

Description of the drawings

Figure 1 is a schematic structural diagram of an anomaly detection device provided by an embodiment of the present application;

Figure 2 is a schematic structural diagram of a computing device provided by an embodiment of the present application;

Figure 3 is a schematic structural diagram of a computing device cluster provided by an embodiment of the present application;

Figure 4 is a schematic structural diagram of another computing device cluster provided by an embodiment of the present application;

Figure 5 is a system architecture diagram involved in an anomaly detection method provided by an embodiment of the present application;

Figure 6 is a flow chart of an anomaly detection method provided by an embodiment of the present application;

Figure 7 is a flow chart of a method for determining field scores provided by an embodiment of the present application;

Figure 8 is a flow chart of another anomaly detection method provided by an embodiment of the present application;

Figure 9 is a flow chart of yet another anomaly detection method provided by an embodiment of the present application.

Detailed ways

In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Currently, there are many types of anomaly detection tasks in cloud service scenarios. The various heterogeneous resources involved in these anomaly detection tasks are not only the quantity Many, and often contain highly complex relationships with each other. Access to and operations on these resources in different types of cloud services usually generate massive log data on the cloud platform. Since the scale of log data is extremely large and it is extremely difficult to manually summarize and analyze the rules for determining relevant anomalies, it is difficult to achieve accurate and efficient anomaly detection by manually monitoring and analyzing relevant data.

Existing anomaly detection methods based on machine learning have the following two problems when performing the above tasks: On the one hand, most of the existing methods are specifically designed and tuned for specific types of anomaly detection tasks, and there is no way to build a Universal anomaly detection model; on the other hand, the process of designing and tuning anomaly detection models relies heavily on professional knowledge in machine learning and deep learning, and experts in the field of cloud services often have rich business experience in these tasks , but are not familiar with how to design and tune anomaly detection models, which makes it difficult for domain experts to build an effective anomaly detection model for specific problems even if they understand some machine learning-related algorithms.

Based on this, it is very important to design a prototype system that can be easily used by domain experts for self-help, is versatile for related anomaly detection tasks in cloud platforms, is lightweight, and can intelligently build anomaly detection models, so that domain experts can With as little machine learning and deep learning knowledge as possible, an anomaly detection model can be quickly built based on this prototype system to achieve target task-oriented anomaly detection.

An embodiment of the present application provides an anomaly detection device, as shown in Figure 1. The anomaly detection device includes:

The receiving module is used to receive the configuration parameters of the anomaly detection task. The configuration parameters indicate the sample set, test set and candidate attribute fields. The sample set includes the log data used for parameter tuning in the cloud platform. The test set includes the log data to be used for parameter tuning in the cloud platform. For log data used for anomaly detection, the candidate attribute fields are the attribute fields corresponding to the log data of the cloud platform;

The determination module is used to determine the target attribute field from the candidate attribute field based on the sample set and the candidate attribute field, and the target attribute field is the attribute field used for anomaly detection tasks;

A parameter tuning module, used to tune the first hyperparameter of the first detection model based on the sample set and the target attribute field;

The anomaly detection module is used to perform anomaly detection on the test set through the parameter-tuned first detection model based on the target attribute field to obtain the anomaly detection result of the test set. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the candidate attribute fields include m selected fields and n to-be-selected fields, where m is an integer not less than 0 and n is an integer greater than 0;

Determined modules include:

The first determination sub-module is used to determine the field scores corresponding to the n candidate fields based on the sample set, the m selected fields and the n candidate fields. The field score representation is added to the m selected fields. The extent to which the anomaly detection effect is improved after the corresponding candidate fields are selected;

The second determination sub-module is used to determine p candidate fields from the n candidate fields based on the field scores corresponding to the n candidate fields, where p is a positive integer not greater than n;

The third determination sub-module is used to determine the m selected fields and p candidate fields as target attribute fields. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the sample set includes a training subset and a validation subset;

The first determined sub-module is specifically used for:

The m selected fields are formed into a selected field set, and the n candidate fields are formed into a candidate field set. Based on the training subset, the selected field set and the candidate field set, each candidate field set in the candidate field set is determined. Mutual information corresponding to the selected field, which represents the correlation between the corresponding candidate field and all fields in the selected field set;

Based on the training subset, verification subset, selected field set and the k candidate fields, the reconstruction losses corresponding to the k candidate fields are determined, and the reconstruction loss representation is verified through the corresponding candidate fields and selected field sets. The effect of anomaly detection on subsets;

Based on the mutual information and reconstruction loss corresponding to the k candidate fields, select one candidate field from the k candidate fields, and determine the field score of the selected candidate field;

Move the selected candidate fields from the candidate field set to the selected field set, return the mutual information corresponding to each candidate field in the candidate field set based on the training subset, the selected field set and the candidate field set. Steps, until the set of candidate fields is empty, obtain the field scores corresponding to the n candidate fields. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the first determination sub-module is specifically used to:

For the first candidate field among the k candidate fields, the first candidate field is added to the selected field set to obtain a candidate field set, and the first candidate field is any of the k candidate fields. A field to be selected;

Based on the training subset and the candidate field set, determine the second detection model corresponding to the first candidate field;

Based on the verification subset and the candidate field set, the reconstruction loss corresponding to the first candidate field is determined through the second detection model corresponding to the first candidate field. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the first determination sub-module is specifically used to:

Determine the reference statistical characteristics of the training subset based on the candidate field set, and the reference statistical characteristics of the training subset include statistics of data of all fields included in the candidate field set in the training subset;

The initial detection model is trained by using the reference statistical features of the training subset to obtain a second detection model corresponding to the first candidate field. For a specific implementation method, please refer to the relevant introduction of the embodiment of FIG. 6 .

Optionally, the first determination sub-module is specifically used to:

Determine the reference statistical characteristics of the verification subset based on the candidate field set, and the reference statistical characteristics of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset;

Input the reference statistical characteristics of the verification subset into the second detection model corresponding to the first candidate field to obtain the reference reconstruction characteristics of the verification subset. The reference reconstruction characteristics of the verification subset include the characteristics of all fields included in the candidate field set in the verification subset. Reconstruction statistics of data;

Based on the reference statistical features and reference reconstruction features of the verification subset, the reconstruction loss corresponding to the first candidate field is determined. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the configuration parameter also indicates the category of each attribute field in the candidate attribute fields. Different categories of attribute fields correspond to different types of statistics. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the first hyperparameter includes learning rate, number of training epochs, and hidden layer dimensions. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Optionally, the first detection model includes an input layer, a first hidden layer and a second hidden layer; the hidden layer dimensions include the dimensions of the first hidden layer and the second hidden layer, and the dimensions of the input layer are based on the fields included in the target attribute field. The quantity is determined, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimensions of the input layer. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Anomaly detection modules include:

A fourth determination submodule is used to determine the statistical characteristics of the test set based on the target attribute field, where the statistical characteristics of the test set include the statistics of the data of the target attribute field in the test set;

A second input submodule, used for inputting the encoded features of the test set into the decoder to obtain the reconstructed features of the test set;

The third input submodule is used to input the statistical features and reconstructed features of the test set into the discriminator to determine the anomaly detection result of the test set according to the error threshold. For specific implementation methods, please refer to the relevant introduction of the embodiment in Figure 6.

Among them, the receiving module, determination module, parameter tuning module and anomaly detection module can all be implemented by software, or can be implemented by hardware, or can be implemented by a combination of software and hardware. Illustratively, the following takes the parameter tuning module as an example to introduce the implementation method of the parameter tuning module. Similarly, the implementation of the determination module, parameter tuning module and anomaly detection module can refer to the implementation of the parameter tuning module.

As an example of a software functional unit, a module can include code that runs on a computing instance. The computing instance may include at least one of a physical host (computing device), a virtual machine, and a container. Furthermore, the above computing instance may be one or more. For example, a parameter tuning module can include code running on multiple hosts/VMs/containers. It should be noted that multiple hosts/virtual machines/containers used to run the code can be distributed in the same region (region) or in different regions. Furthermore, multiple hosts/virtual machines/containers used to run the code can be distributed in the same availability zone (AZ) or in different AZs. Each AZ includes one data center or multiple AZs. geographically close data centers. Among them, usually a region can include multiple AZs.

Similarly, multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs. Usually, a VPC is set up in a region. For cross-region communication between two VPCs in the same region and between VPCs in different regions, a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.

As an example of a hardware functional unit, a module may include at least one computing device, such as a server. Alternatively, the parameter tuning module may also be a device implemented using an application-specific integrated circuit (ASIC) or a programmable logic device (PLD). Wherein, the above-mentioned PLD may be a complex program logic device (complex programmable logical device (CPLD), field-programmable gate array (FPGA), general array logic (GAL) or any combination thereof.

Multiple computing devices included in the parameter tuning module can be distributed in the same region or in different regions. Multiple computing devices included in the parameter tuning module can be distributed in the same AZ or in different AZs. Similarly, multiple computing devices included in the parameter tuning module can be distributed in the same VPC or in multiple VPCs. The plurality of computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.

It should be noted that in other embodiments, the receiving module can be used to perform any steps in the anomaly detection method, the determining module can be used to perform any steps in the anomaly detection method, and the parameter tuning module can be used to perform the anomaly detection method. For any step in the process, the steps responsible for implementation by the receiving module, determination module, parameter tuning module and anomaly detection module can be specified as needed. Different anomaly detection methods can be implemented through the receiving module, determination module, parameter tuning module and anomaly detection module respectively. steps to realize all functions of the anomaly detection device.

In the embodiment of this application, there is no need for relevant personnel in the cloud service field to manually create security rules, and there is no need to manually conduct in-depth analysis and summary of various attack modes, thus avoiding the loopholes in manually created security rules and reducing missed detection and In the case of false detection, it can also improve the efficiency of anomaly detection. In addition, this solution can also enable self-service and rapid construction of anomaly detection models to achieve oriented implementation, even if the relevant personnel have as little or even no professional knowledge of machine learning and deep learning, such as knowledge of model design and tuning. Task-specific anomaly detection.

It should be noted that when the abnormality detection device provided in the above embodiment performs abnormality detection, only the division of the above functional modules is used as an example. In practical applications, the above function allocation can be completed by different functional modules as needed. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the anomaly detection device provided by the above embodiments and the anomaly detection method embodiments belong to the same concept. Please refer to the method embodiments for the specific implementation process, which will not be described again here.

An embodiment of the present application also provides a computing device 100. As shown in FIG. 2 , computing device 100 includes: bus 102 , processor 104 , memory 106 , and communication interface 108 . The processor 104, the memory 106 and the communication interface 108 communicate through the bus 102. Computing device 100 may be a server or a terminal device. It should be understood that this application does not limit the number of processors and memories in the computing device 100.

The bus 102 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one line is used in Figure 2, but it does not mean that there is only one bus or one type of bus. Bus 104 may include a path that carries information between various components of computing device 100 (eg, memory 106, processor 104, communications interface 108).

The processor 104 may include a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP). any one or more of them.

Memory 106 may include volatile memory, such as random access memory (RAM). The processor 104 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, mechanical hard disk drive (hard disk drive, HDD) or solid state drive (solid state drive). drive, SSD).

The memory 106 stores executable program code, and the processor 104 executes the executable program code to respectively implement the functions of the aforementioned receiving module, determining module, parameter tuning module and anomaly detection module, thereby implementing the anomaly detection method. That is, the memory 106 stores instructions for executing the anomaly detection method.

The communication interface 103 uses transceiver modules such as, but not limited to, network interface cards and transceivers to implement communication between the computing device 100 and other devices or communication networks.

An embodiment of the present application also provides a computing device cluster. The computing device cluster includes at least one computing device. The computing device may be a server, such as a central server, an edge server, or a local server in a local data center. In some embodiments, the computing device may also be a terminal device such as a desktop computer, a laptop computer, or a smartphone.

As shown in Fig. 3, the computing device cluster includes at least one computing device 100. The memory 106 in one or more computing devices 100 in the computing device cluster may store the same instructions for executing the anomaly detection method.

In some possible implementations, the memory 106 of one or more computing devices 100 in the computing device cluster may also store partial instructions for executing the anomaly detection method. In other words, a combination of one or more computing devices 100 may collectively execute instructions for performing the anomaly detection method.

It should be noted that the memories 106 in different computing devices 100 in the computing device cluster can store different instructions, respectively used to execute part of the functions of the anomaly detection device. That is, the instructions stored in the memory 106 in different computing devices 100 can implement the functions of one or more modules among the receiving module, the determining module, the parameter tuning module and the anomaly detection module.

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected through a network. Wherein, the network may be a wide area network or a local area network, etc. Figure 4 shows a possible implementation. As shown in Figure 4, two computing devices 100A and 100B are connected through a network. Specifically, the connection to the network is made through a communication interface in each computing device. In this type of possible implementation, the memory 106 in the computing device 100A stores instructions for performing the functions of the receiving module and the determining module. At the same time, instructions for executing the functions of the parameter tuning module and the anomaly detection module are stored in the memory 106 of the computing device 100B.

The connection method between the computing device clusters shown in Figure 4 can be: Considering that the anomaly detection method provided by this application requires a large amount of data storage and computing resources, it is considered to hand over the functions implemented by the parameter tuning module and the anomaly detection module to the computing Device 100B executes.

It should be understood that the functions of the computing device 100A shown in FIG. 4 may also be performed by multiple computing devices 100 . Likewise, the functions of computing device 100B may also be performed by multiple computing devices 100 .

The embodiment of the present application also provides another computing device cluster. The connection relationship between the computing devices in the computing device cluster can be similar to the connection method of the computing device cluster described in FIG. 3 and FIG. 4 . The difference is that the same instructions for executing the anomaly detection method may be stored in the memory 106 of one or more computing devices 100 in the computing device cluster.

The embodiment of the present application also provides a computer program product including instructions. The computer program product may be software or a program product including instructions that can be run on a computing device or stored in any available medium. When the computer program product is run on at least one computing device, the at least one computing device executes the above-mentioned anomaly detection method.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store or a data storage device such as a data center that contains one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc. The computer-readable storage medium includes instructions that instruct the computing device to perform the above-mentioned anomaly detection method.

Figure 5 is a system architecture diagram involved in an anomaly detection method provided by an embodiment of the present application. Referring to Figure 5, this system can be called an anomaly detection system, which includes a client and a detection device.

The detection device is used to execute the anomaly detection task according to the configuration parameters of the received anomaly detection task. That is, the detection device is used to perform the steps of the anomaly detection method provided by the embodiment of the present application.

The client is used to send the configuration parameters of the anomaly detection task to the detection device. For example, when a configuration operation is detected, the client determines the configuration parameters of the anomaly detection task and sends the configuration parameters to the detection device.

Optionally, the detection device is the computing device shown in Figure 2, or includes multiple computing devices shown in Figures 3/4.

The system architecture and business scenarios described in the embodiments of this application are for the purpose of explaining the technical solutions of the embodiments of this application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of this application. Persons of ordinary skill in the art will know that as the system With the evolution of architecture and the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

Figure 6 is a flow chart of an anomaly detection method provided by an embodiment of the present application. Taking this method applied to detection equipment as an example, please refer to Figure 6. The method includes the following steps.

Step 601: Receive the configuration parameters of the anomaly detection task. The configuration parameters indicate the sample set, the test set and the candidate attribute fields. The sample set includes the log data used for parameter tuning in the cloud platform, and the test set includes the anomalies to be performed in the cloud platform. For the detected log data, the candidate attribute fields are the attribute fields corresponding to the log data of the cloud platform.

In order to realize the detection of abnormal operations and access in the cloud platform through the anomaly detection model when the relevant personnel have as little or even no knowledge related to machine learning and deep learning, in the embodiment of this application, some manually configured Professional parameters unrelated to model design and tuning, and submit the configuration parameters of the anomaly detection task to the detection device. The detection device receives the configuration parameters, and based on the configuration parameters, automatically performs feature engineering, model construction and parameter tuning, and implements anomalies for the anomaly detection task through the parameter-tuned anomaly detection model. detection.

Among them, the configuration parameter indicates the candidate attribute field, sample set and test set. The sample set includes log data used for parameter tuning in the cloud platform, and the test set includes log data to be used for anomaly detection in the cloud platform. It should be understood that the log data in the cloud platform includes data of multiple attribute fields, and the candidate attribute fields include part of the multiple attribute fields. Of course, they may also include all of the multiple attribute fields. That is, the candidate attribute fields are attribute fields corresponding to the log data of the cloud platform. The sample set is used for feature engineering, model building and parameter tuning. In the process of feature engineering, the detection equipment needs to filter out the target attribute fields from the candidate attribute fields based on the sample set. The test set is the data to be detected indicated by the anomaly detection task. The subsequent detection equipment will perform anomaly detection on the test set through a parameter-tuned anomaly detection model based on the target attribute field.

In the embodiment of this application, the sample set includes a training subset and a validation subset. This configuration parameter includes the start and end time of the training subset, the start and end time of the validation subset, and the start and end time of the test set. The above anomaly detection model obtains the training subset from the log data of the cloud platform based on the start and end time of the training subset, obtains the verification subset from the log data of the cloud platform based on the start and end time of the verification subset, and obtains the verification subset based on the start and end time of the test set. , obtain the test set from the log data of the cloud platform. Among them, the training subset and the verification subset are combined to build the anomaly detection model and tune parameters.

As can be seen from the foregoing, there are many types of anomaly detection tasks in cloud service scenarios. Based on this, the configuration parameter also indicates the detection object. For example, the configuration parameter includes the identification of the detection object, and the detection object can be the target user, target service, or target host etc. When the detection object is a target user, the anomaly detection task is used to detect anomalies in the target user's operations and access to the cloud platform. The sample set and test set include log data related to the target user in the cloud platform. When the detection object is a target service, the anomaly detection task is used to detect anomalies on the target service in the cloud platform. The sample set and test set include log data related to the target service in the cloud platform. When the detection is a target host, the anomaly detection task is used to detect anomalies in operations and access related to the target host in the cloud platform. The sample set and test set include log data related to the target host in the cloud platform. .

The training subset includes multiple training samples, the validation subset includes multiple validation samples, and the test set includes one or more samples to be tested. In order to determine which time period of log data each sample included in the training subset, validation subset, and test set specifically includes, this configuration parameter also indicates the time granularity of anomaly detection. Based on the time granularity of anomaly detection, the anomaly detection model determines multiple samples included in the training subset, validation subset, and test set respectively. The time granularity of each sample is equal to the time granularity of the anomaly detection. Among them, a sample included in the training subset is called a training sample, a sample included in the verification subset is called a verification sample, and a sample included in the test set is called a test sample.

For example, the detection object is user a, the time granularity of anomaly detection is 24 hours, and the starting and ending time of the training subset is from January 1 to June 30, 2022, that is, the training subset includes the data collected by users in these 6 months. a related log data, the detection device determines the log data of every 24 hours included in the 6 months in the training subset as a training sample included in the training subset.

In order to achieve more fine-grained data statistics for each of the above samples, so as to improve the performance of anomaly detection through more accurate feature engineering and model tuning, this configuration parameter also indicates the time granularity of a single sample, and the time granularity of anomaly detection is An integer multiple of the time granularity of a single sample. Each of the above samples includes multiple single samples. Based on the time granularity of a single sample, the detection device determines multiple single samples in each sample included in the training subset, verification subset, and test set respectively.

Still taking the above example as an example, the detection object is user a, the time granularity of anomaly detection is 24 hours, and the time granularity of a single sample is 1 hour. The detection device collects the log data of each hour of 24 hours included in each training sample. Determine it as a single sample, thus obtaining 24 single samples included in each training sample. To put it another way, the detection equipment determines the log data of each hour in the training subset as a single sample according to the time granularity of a single sample, and divides the 24 single samples every 24 hours in the training subset into a training sample.

Optionally, the candidate attribute fields indicated by this configuration parameter include selected fields and candidate fields. The selected fields refer to the fields determined to be used in the subsequent anomaly detection process of the test set, and the candidate fields refer to the fields to be used in the subsequent anomaly detection process of the test set. Before feature engineering, it was not certain whether the field would be used in the subsequent anomaly detection process on the test set. In the process of feature engineering, the detection equipment will determine a part of the candidate fields from all the candidate fields included in the candidate attribute field, and determine this part of the candidate fields and all the selected fields as the target attribute fields. The target attribute fields include All fields required in the subsequent process of anomaly detection on the test set, that is, the target attribute fields are the attribute fields used for anomaly detection tasks.

In the subsequent process of feature engineering, the anomaly detection model obtains the statistics of the candidate attribute fields in the log data through statistics, and performs feature engineering based on the statistics. The candidate attribute fields include multiple attribute fields. Due to different The characteristics of attribute fields are different. Therefore, the specific methods of statistics for different attribute fields are different, so that more valuable statistics can be obtained. Based on this, the configuration parameter also indicates the category of each attribute field in the candidate attribute field, and the types of statistics corresponding to different categories of attribute fields. There are differences. The specific statistics corresponding to different categories of attribute fields will be introduced in detail in step 602 below.

In this embodiment of the present application, the category of each attribute field is a first-type attribute, a second-type attribute, a third-type attribute, or a fourth-type attribute. As shown in Table 1, the first type of attributes is situational attributes. For example, fields that represent a specific scenario, such as service name or service type, belong to the first category of attributes. The values of the attribute fields belonging to the second type of attributes are discrete values, and the types of values do not exceed the type threshold. That is, the values of the attribute fields belonging to the second type of attributes are discrete values and have fewer types. Generally speaking, all possible values of an attribute field with limited values are ordered. For example, the status field belongs to the second type of attribute, and the possible values of the status field include ‘0, 1, 2, 3’. The values of the attribute fields belonging to the third type of attributes are discrete values, and the types of values exceed the type threshold, that is, the values of the attribute fields belonging to the third type of attributes are discrete values and have many types. For example, the remote operation address field belongs to the third category of attributes, and the value of the remote operation address field can be '192.168.2.1', '192.168.1.3', etc. The values of attribute fields belonging to the fourth category of attributes are continuous values. For example, the total length field of a hypertext transfer protocol (HTTP) request belongs to the fourth category of attributes.

Table 1

As an example, a piece of log data includes but is not limited to data in the following attribute fields: time (timestamp), userId (user identification), remote_addr (remote operation address), service_name (service name, also called service type) or service type), api_id (the application identification of the operation, one service can provide one or more applications), body_bytes_sent (the HTTP content length of the operation), forward_flag (the flag of whether the backend forwards), status (status, indicating the operation successful), request_method (HTTP request method), request_length (total length of HTTP request), diff_time (response time), accessModel (access mode), deploy_type (indicates database type). Among them, time is used to determine the log data included in each sample, userId is used to filter users, time and userId are not used as statistical objects, service_name is the first type of attribute, and the categories of the other attribute fields are as follows.

In some embodiments, the selected fields in the candidate attribute fields include three attribute fields: remote_addr, api_id, and body_bytes_sent, and the selected fields in the candidate attribute fields include seven attribute fields: forward_flag, status, accessModel, deploy_type, request_method, diff_time, and request_length.

In addition to determining the statistics that need to be calculated according to the category of the attribute field, the statistics that need to be calculated can also be determined based on whether the anomaly detection task is time-series related. Optionally, this configuration parameter also indicates whether the anomaly detection task is timing dependent. In the case where the anomaly detection task is time series related, the statistics corresponding to some attribute fields (such as attribute fields belonging to the second type of attributes and/or third type attributes) also include user access profiles and/or access profile similarities ( Also called access profile similarity). This content will also be introduced in detail in step 602.

In a cloud service scenario, log data in the cloud platform may be stored on multiple devices and multiple paths. For example, log data related to cloud service a is stored in device a, and log data related to cloud service b is stored in device b. The log data of enterprise 1 is stored in path 1, and the log data of enterprise 2 is stored in path 2. Based on this, humans can also configure the storage location of log data related to anomaly detection tasks. Optionally, this configuration parameter also indicates the storage location of log data related to the anomaly detection task. The anomaly detection model obtains the log data in the sample set and test set from the cloud platform according to the storage location.

The configuration parameters introduced above can be configured by experts in the cloud field based on rich business experience, that is, the relevant personnel have as little or even no professional knowledge of machine learning and deep learning. Table 2 is an input information table for configuration parameters provided by the embodiment of the present application. According to Table 2, the above configuration parameters can be configured.

Table 2

In order to ensure the successful execution of the anomaly detection task, after receiving the configuration parameters of the anomaly detection task, the detection device performs logical detection on the configuration parameters. If a logical anomaly of the configuration parameters is detected, the detection device feeds back prompt information to prompt for retry. to configure. For example, the time granularity of anomaly detection is smaller than the time granularity of a single sample, or the time granularity of anomaly detection is not equal to the time granularity of a single sample, or the time period corresponding to the start and end time of the test set is the time corresponding to the start and end time of the training subset. If the segments overlap, it indicates that the configuration parameter is logically abnormal. If it is detected that the logic of the configuration parameter is normal, the detection device continues to perform step 602.

Step 602: Based on the sample set and the candidate attribute fields, determine the target attribute field from the candidate attribute fields. The target attribute field is the attribute field used for the anomaly detection task.

In this embodiment of the present application, the detection device obtains the sample set indicated by the configuration parameter from the log data of the cloud platform.

For example, the configuration data includes the start and end time of the training subset and the start and end time of the verification subset. The sample set includes the training subset and the verification subset. The detection device obtains all training included in the training subset according to the start and end time of the training subset. sample. In the same way, the detection equipment obtains all verification samples included in the verification subset according to the start and end times of the verification subset.

For another example, the configuration data includes the identification of the detection object, the start and end time of the training subset, the start and end time of the verification subset, the storage location of the log data, and the time granularity of a single sample. The sample set includes the training subset and the verification subset. The detection The device obtains all single samples included in the training subset based on the identification of the detection object, the start and end time of the training subset, the storage location of the log data, and the time granularity of the single sample. In the same way, the detection device obtains all single samples included in the verification subset according to the identification of the detection object, the start and end time of the verification subset, the storage location of the log data, and the time granularity of the single sample. For specific implementation methods, please refer to the relevant description in step 601.

After obtaining the sample set, the detection device determines the target attribute field from the candidate attribute fields based on the sample set and the candidate attribute fields. That is, the detection device uses the sample set to filter out the target attribute fields from the candidate attribute fields through feature engineering.

As can be seen from the foregoing, candidate attribute fields include selected fields and candidate fields. In order to remove redundant fields in the candidate fields and ensure the value of the filtered target attribute fields to the anomaly detection task, the embodiment of this application determines each candidate field by Filter fields by selecting a field's field score. This will be described in detail next.

In this embodiment of the present application, the candidate attribute fields include m selected fields and n candidate fields, m is an integer not less than 0, and n is an integer greater than 0. The detection device determines the field scores corresponding to the n candidate fields based on the sample set, the m selected fields and the n candidate fields. The detection device determines p candidate fields from the n candidate fields based on the field scores corresponding to the n candidate fields, where p is a positive integer not greater than n. The detection device determines the m selected fields and p candidate fields as target attribute fields. Among them, the field score represents the degree to which the anomaly detection effect is improved after adding the corresponding candidate fields to the m selected fields. Put another way, the field score represents the effect of anomaly detection through the selected fields and the corresponding candidate fields.

In order to verify the degree of improvement in the anomaly detection effect after adding the corresponding candidate fields to the m selected fields, the embodiment of this application determines the field score by combining the mutual information between fields and the reconstruction loss corresponding to the fields. This will be introduced next in conjunction with Figure 7.

Figure 7 is a flow chart of a method for determining field scores provided by an embodiment of the present application. The method includes steps 6021 to 6026.

Step 6021: The m selected fields are formed into a selected field set, and the n candidate fields are formed into a candidate field set.

Step 6022: Based on the training subset, the selected field set and the candidate field set, determine the mutual information corresponding to each candidate field in the candidate field set.

Among them, mutual information represents the correlation between the corresponding candidate field and all fields in the selected field set.

In the embodiment of this application, the detection device determines the mutual information corresponding to each candidate field in the candidate field set based on the training subset, the selected field set, and the candidate field set. The implementation process is: determining that the selected fields in the training subset have been selected. Statistics of the data of the selected field set and all fields included in the selected field set; for the second candidate field in the candidate field set, based on the second candidate field in the training subset and all fields included in the selected field set The statistics of the data of the selected fields determine a plurality of first mutual information. The plurality of first mutual information include the mutual information between the second field to be selected and each selected field in the selected field set. The second mutual information is The selection field is any candidate field in the set of candidate fields; the maximum value among the plurality of first mutual information is determined as the mutual information corresponding to the second candidate field. Simply put, the mutual information between fields is calculated through the statistics corresponding to the fields.

In one embodiment, the statistics of the data of the first selected field in the training subset include R first statistics, and the statistics of the data of the second candidate field in the training subset include S second statistics. The selected field is any selected field in the selected field set, and R and S are both integers greater than 0. The detection device determines the implementation process of multiple first mutual information based on the statistics of the second candidate field in the training subset and the data of all selected fields included in the selected field set, including: based on the R first statistics and S second statistics are determined through multiple rounds of iterations; the mean value of the S second mutual information is determined to be the same as the second candidate field in the plurality of first mutual information and the first already selected field. Mutual information between selected fields.

Among them, in the j-th round of iteration process, the detection device determines the j-th second statistic among the S second statistics and the (R-j+1) first statistics included in the R first statistics. Mutual information between statistics to obtain (R-j+1) reference mutual information corresponding to the (R-j+1) first statistic, j is an integer greater than 0 and not greater than R . The detection equipment determines the maximum value among the (R-j+1) reference mutual information as the j-th second mutual information among the S second mutual information, and determines the (R-j+1)-th second mutual information. The (R-j) first statistics in a statistic, excluding the first statistic corresponding to the maximum value, are determined as the (R-j) first statistics in the j+1 iteration process.

For example, status is a selected field, and the statistics corresponding to status in the training subset include 500 first statistics, that is, it contains 500 dimensions. diff_time is a candidate field, and the statistics corresponding to diff_time in the training subset include 300 second statistics, that is, it contains 300 dimensions. In the first round of iteration process, the detection device calculates the mutual information between the first dimension of diff_time and each of the 500 dimensions of status to obtain 500 reference mutual information. Assume that the maximum value among these 500 reference mutual information is Corresponding to the 321st dimension of status, the detection device determines the maximum value as the mutual information between the first dimension of diff_time and status, that is, the first second mutual information is obtained, and then the detection device removes the 321st dimension of status. In the second iteration process, the detection device calculates the mutual information between the 2nd dimension of diff_time and each of the 499 dimensions of status to obtain 499 reference mutual information. Assume that the maximum value among these 499 reference mutual information is Corresponding to the 432nd dimension of status, the detection device determines the maximum value as the mutual information between the 2nd dimension of diff_time and status, that is, the second second mutual information is obtained, and then the detection device removes the 432nd dimension of status. By analogy, after calculating the mutual information between the remaining 298 dimensions of diff_time and status, the detection device obtains a total of 300 mutual information between the 300 dimensions of diff_time and status, that is, 300 second mutual information are obtained. The detection device uses the mean value of these 300 second mutual information as the mutual information between diff_time and status.

In the embodiment of the present application, the detection device can calculate the mutual information between statistics according to formula (1). In formula (1), I( _Su ; _Sc ) represents the mutual information between the statistic _Su and the statistic _Sc , and _su and _sc represent the values of _Su and _Sc respectively. P(·,·) represents the joint probability distribution, and P(·) represents the probability density.

The greater the mutual information between the two statistics, the greater the correlation between the two statistics and the more redundant information. Add a statistic of the unselected field on the basis of a statistic of the selected field. The smaller the contribution to the anomaly detection effect. The smaller the mutual information between the two statistics, the smaller the correlation between the two statistics and the less redundant information. Add a statistic of the unselected field on the basis of a statistic of the selected field. The greater the contribution to the anomaly detection effect.

In order to facilitate calculation, in the embodiment of the present application, statistics with continuous values are discretized, and mutual information is calculated based on the discretized statistics. The discretization method may be a quartile-based method or other methods, which are not limited in the embodiments of the present application.

For example, the detection device discretizes statistics with continuous values based on the quartile method. In the specific implementation, for each statistic, detection of unrecognized quartiles is calculated based on all values of the statistic in all training samples, and the three quartiles of the calculated quartiles are The points are marked Q1, Q2 and Q3 in sequence. Note IQR=Q3-Q1, determine six intervals with Q1-1.5IQR, Q1, Q2, Q3, Q3+1.5IQR as quantile points, discretize the statistics of all training samples into these six intervals, and then The statistics of the validation subset and the test subset are also discretized according to these six intervals.

Step 6023: Select k candidate fields with the smallest mutual information from the candidate field set.

That is, the detection device selects k candidate fields from the candidate field set in ascending order of mutual information. Among them, k is a positive integer not greater than n. More precisely, k is not greater than the total number of fields included in the candidate field set.

In this embodiment of the present application, the detection device determines k according to a preset value. If there are multiple candidate fields with the same mutual information in the current candidate field set, the total number of candidate fields with the smallest mutual information exceeds With this preset value, the detection device sets the current k equal to the total number of candidate fields with the smallest current mutual information. If the total number of fields included in the current candidate field set is less than the preset value, the detection device sets the current k equal to the total number of fields included in the candidate field set, or sets the current k as the candidate field. The total number of fields included in the set is reduced by a specified value (such as 1) to ensure that k does not exceed the total number of fields included in the candidate field set.

Step 6024: Based on the training subset, the verification subset, the selected field set and the k candidate fields, determine the reconstruction losses corresponding to the k candidate fields.

Among them, the reconstruction loss represents the effect of anomaly detection on the validation subset through the corresponding set of candidate fields and selected fields.

In this embodiment of the present application, the detection device determines the reconstruction losses corresponding to the k candidate fields based on the training subset, the verification subset, the selected field set and the k candidate fields. The implementation process is: for the k The first candidate field among the k candidate fields is added to the selected field set to obtain a candidate field set. The first candidate field is any one of the k candidate fields. field; based on the training subset and the candidate field set, determine the second detection model corresponding to the first candidate field; based on the verification subset and the candidate field set, determine through the second detection model corresponding to the first candidate field The reconstruction loss corresponding to the first candidate field. It should be understood that during this implementation process, the detection device can obtain k candidate field sets that correspond to the k candidate fields one-to-one, and k second detection models that correspond to the k candidate fields one-to-one.

To put it simply, after the detection device adds any of the k candidate fields to the selected field set, it trains through the training subset based on the selected field set to which one candidate field is added. The detection model is then used to verify the improvement in the anomaly detection effect after adding any of the k candidate fields to the selected field set by verifying the subset and the second detection model.

Wherein, the detection device determines the reference statistical characteristics of the training subset based on the candidate field set (that is, the candidate field set corresponding to the first candidate field among the k candidate field sets), and the reference statistical characteristics of the training subset include the training subset. Concentrate statistics on data from all fields included in this candidate field set. The detection device trains the initial detection model through the reference statistical features of the training subset to obtain the second detection model corresponding to the first candidate field.

The detection device determines reference statistical characteristics of the verification subset based on the candidate field set, and the reference statistical characteristics of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset. The detection device inputs the reference statistical characteristics of the verification subset into the second detection model corresponding to the first candidate field to obtain the reference reconstruction characteristics of the verification subset. The reference reconstruction characteristics of the verification subset include the candidate fields included in the verification subset. Reconstructed statistics for data for all fields. The detection device determines the reconstruction loss corresponding to the first candidate field based on the reference statistical features and reference reconstruction features of the verification subset. The reconstruction loss corresponding to the first candidate field also represents the reconstruction effect on the reference statistical characteristics of the verification subset after adding the first candidate field to the selected field set.

As an example, the detection device determines the reconstruction loss of the first candidate field according to formula (2). In formula (2), represents the reconstruction loss corresponding to the first candidate field, X and They respectively represent the reference statistical features and reference reconstruction features of the validation subset, n represents the total number of validation samples included in the validation subset, m represents the total dimension of the statistics of the first field to be selected in each validation sample and the data of all selected fields included in the selected field set, d represents the dth dimension in the total dimension, and i represents the total number of single samples included in the validation subset. represents the d-th dimension reference statistic (i.e., the input statistic of the second detection model) included in the reference statistical features of the i-th single sample in the validation subset, Represents the d-th dimension reconstruction statistic (i.e., the output statistic of the second detection model) included in the reference reconstruction features of the i-th single sample in the validation subset.

Optionally, the detection device normalizes the reference statistical features and reference reconstruction features of the verification subset, and determines the reconstruction corresponding to the first candidate field based on the normalized reference statistical features and reference reconstruction features of the verification subset. loss. For example, the detection device normalizes and The value range of is mapped to the interval [-1,1]. For example, for a single sample, the statistics at the time granularity of the single sample (such as one hour Granularity statistics), normalized based on all statistics at the time granularity of a single sample (such as statistics at each hour granularity in 24 hours) of the verification sample to which the single sample belongs. For the statistics of the time granularity of anomaly detection for a certain verification sample (such as statistics of 24-hour granularity), statistics based on the time granularity of all anomaly detection of all verification samples for this verification sample (such as every 24 hours in 7 days) Granularity statistics) are normalized.

The above-mentioned second detection model is an anomaly detection model. In order to ensure that the anomaly detection effect of the second detection model used in field screening is relatively consistent with the anomaly detection effect of the first detection model used for subsequent anomaly detection on the test set. The structure of the second detection model is roughly the same as that of the first detection model. The differences include the dimensions of the input layer and the following differences in model parameters. For example, the dimensions of the input layer in the second detection model match the total number of fields included in the selected field set after adding the first candidate field, while the dimensions of the input layer in the first detection model match the target attributes. Fields match the number of fields included. For another example, all hyperparameters of the second detection model may be preset, while part of the hyperparameters of the first detection model (such as the first hyperparameter below) are to be tuned.

Step 6025: Based on the mutual information and reconstruction loss corresponding to the k candidate fields, select one candidate field from the k candidate fields, and determine the field score of the selected candidate field.

After obtaining the mutual information and reconstruction loss corresponding to the k candidate fields, the detection device will be able to combine the mutual information and reconstruction loss to determine the field score of the candidate field.

In an embodiment of the present application, the detection device selects a candidate field from the k candidate fields based on the mutual information and reconstruction loss corresponding to the k candidate fields, and determines the implementation process of the field score of the selected candidate field as follows: based on the mutual information and reconstruction loss corresponding to the k candidate fields, determine the comprehensive scores corresponding to the k candidate fields, determine the candidate field with the highest comprehensive score among the k candidate fields as the selected candidate field, and determine the comprehensive score of the selected candidate field as the field score of the selected candidate field. It should be understood that in this implementation process, the detection device obtains k comprehensive scores corresponding to the k candidate fields one by one.

Since the greater the mutual information between fields, the stronger the correlation between the fields, the greater the reconstruction loss corresponding to the field (that is, the smaller the decrease in the reconstruction loss after adding the field to the selected field relative to the reconstruction loss obtained by not adding the field), the weaker the improvement of the field on the anomaly detection effect. Therefore, in order to remove the candidate fields that have a strong correlation with the selected fields and a weaker improvement on the anomaly detection effect as much as possible, the detection device can process the mutual information and reconstruction losses corresponding to the k candidate fields by weighted summation or other methods to obtain the comprehensive scores corresponding to the k candidate fields.

Optionally, the detection device determines the mutual information scores corresponding to the k candidate fields based on the mutual information corresponding to the k candidate fields. The higher the mutual information score of the candidate field, the greater the improvement in the anomaly detection effect. That is, the size of the mutual information is negatively correlated with the anomaly detection effect, while the size of the mutual information score is positively correlated with the anomaly detection effect. The detection device determines the comprehensive scores corresponding to the k candidate fields based on the mutual information scores and reconstruction losses corresponding to the k candidate fields.

As an example, the detection device determines the mutual information score of the candidate field according to formula (3), and determines the comprehensive score of the candidate field according to formula (4). In formulas (3) and (4), field represents a field to be selected, I represents the mutual information corresponding to the field to be selected, Score(I) represents the mutual information score of the field to be selected, and Score(Loss) represents the mutual information score of the field to be selected. The reconstruction loss corresponding to the selected field, Score(field) represents the field score of the field to be selected. β is an adjustable parameter, and the default value can be 1 or other values.
Score(I)＝1-(2×sigmoid(I)-1) (3)

It can be seen from formula (3) that the detection device normalizes the mutual information corresponding to the candidate field through the sigmoid function, that is, the value range of the mutual information is mapped to the interval (0,1). The value range of the mutual information score obtained by formula (3) is also the interval (0,1). The larger the mutual information score corresponding to a candidate field, the higher the degree of improvement of the anomaly detection effect of the candidate field.

Step 6026: Move the selected candidate fields from the candidate field set to the selected field set, and return to step 6022 until the candidate field set is empty, and obtain the field scores corresponding to the n candidate fields.

That is, after the detection device moves the selected candidate fields from the candidate field set into the selected field set, if the candidate field set is not empty, it returns to step 6022; if the candidate field set is empty, the detection device obtains the field scores corresponding to the n candidate fields respectively.

Based on the above description of steps 6021 to 6026, it can be seen that the detection device essentially determines the field scores corresponding to the n candidate fields through multiple rounds of iterations. In each iteration process, the field score of a candidate field is obtained. After n rounds of iteration, the field scores corresponding to the n candidate fields are obtained. To put it another way, the detection device determines the target attribute field from the candidate attribute field based on the sample set and the candidate attribute field as follows:

Determine the statistics of the data of the m selected fields and the statistics of the data of the n candidate fields in the training subset; determine the target attribute field by performing multiple rounds of iterations.

Among them, the detection device first forms the m selected fields into a selected field set, and the n candidate fields into a candidate field set. In the i-th iteration process, the detection device determines the mutual information corresponding to each candidate field in the candidate field set based on the statistics of the data of the selected field set in the training subset and all fields included in the candidate field set, i is an integer greater than 0 and not greater than n. The detection device selects k candidate fields from the candidate field set in order of mutual information from small to large, and adds each candidate field in the k candidate fields to the selected field set respectively, so as to obtain A set of k candidate fields that correspond one-to-one to the k candidate fields. For the first candidate field set among the k candidate field sets, the detection device obtains the statistics of the data of all fields included in the first candidate field set in the training subset, and determines all fields included in the first candidate field set in the verification subset. The statistics of the data, the first candidate field set is any candidate field set among the k candidate field sets. The detection device trains the initial detection model by using the statistics of the data of all fields included in the first candidate field set in the training subset to obtain the second detection model corresponding to the first candidate field. The first candidate field refers to the k The candidate fields corresponding to the first candidate field set among the candidate fields. The detection device inputs statistics of data of all fields included in the first candidate field set in the verification subset into the second detection model to obtain reconstructed statistics of data of all fields included in the first candidate field set in the verification subset. The detection device determines the reconstruction loss corresponding to the first candidate field based on the statistics and reconstruction statistics of the data of all fields included in the first candidate field set in the verification subset. The detection device determines the comprehensive score of each of the k candidate fields based on the mutual information and reconstruction loss corresponding to each of the k candidate fields. The detection device selects the candidate field with the highest comprehensive score among the k candidate fields, determines the comprehensive score of the selected candidate field as the field score of the selected candidate field, and converts the selected candidate field from The set of fields to be selected is moved into the set of selected fields to obtain the set of selected fields and the set of fields to be selected in the next iteration process. After the last round of iteration is completed, the detection device obtains the field score of each of the n candidate fields. Then, the detection device determines p candidate fields from the n candidate fields according to the field score of each of the n candidate fields, and combines the m selected fields and the p candidate fields. The field is determined as the target attribute field.

To put it simply, in each iteration process, all currently candidate fields are traversed, the mutual information between each candidate field and all currently selected fields is calculated one by one, and the k candidates with the smallest mutual information are selected. field. Then, one of the k candidate fields is added on the basis of all currently selected fields to obtain a set of k candidate fields. Use the training subset and the verification subset to verify the reconstruction effects corresponding to the k candidate field sets, that is, to obtain the reconstruction losses corresponding to the k candidate fields. Based on the mutual information and reconstruction loss corresponding to the k candidate fields, the comprehensive score of each of the k candidate fields is determined. According to the comprehensive score, the best candidate field in this round of iteration is selected from the k candidate fields, and the selected candidate field is moved from the set of all current candidate fields to the set of all currently selected fields. . Iterate the above process until the current set of all fields to be selected is empty. After that, the target attribute field is determined based on the comprehensive score of the candidate fields selected in each iteration process.

If the above m is 0, that is, the number of selected fields in the candidate fields is 0, which means that the relevant personnel of the cloud platform have not specified the selected fields this time, then the detection device will form the m selected fields into a selected field set (which is empty) , after forming the n candidate fields into a candidate field set, in the first round of iteration process, based on the training subset, the selected field set and the candidate field set, determine the corresponding value of each candidate field in the candidate field set The mutual information of The n-1 candidate fields other than the first reference candidate field are used as the hypothetical selected field set. Based on the training subset, the hypothetical selected field set and the first reference candidate field, the first reference is determined Mutual information corresponding to the fields to be selected. After determining the second reference candidate field, the detection device determines the mutual information corresponding to the second reference candidate field in the same manner. By analogy, in the first round of iteration process, a total of mutual information corresponding to n reference candidate fields is obtained. The mutual information corresponding to these n reference candidate fields is the n candidate fields in the candidate field set. corresponding mutual information.

Then, the detection device selects k candidate fields with the smallest mutual information from the set of candidate fields. Based on the training subset, the verification subset, the hypothetical selected field set and the k candidate fields, the reconstruction losses corresponding to the k candidate fields are determined. The detection device selects one candidate field from the k candidate fields based on the mutual information and reconstruction loss corresponding to the k candidate fields, and determines the field score of the selected candidate field. The detection device moves the selected field to be selected from the field set to be selected into the set of selected fields. At this point, the first selected field in the selected field set is obtained. In the second iteration process, the detection device determines the mutual information corresponding to each candidate field in the candidate field set based on the training subset, the selected field set, and the candidate field set, and so on, until the candidate field When the set is empty, the field scores corresponding to the n candidate fields are obtained.

After obtaining the field scores corresponding to the n candidate fields, you can select p candidate fields from the n candidate fields through manual decision-making, or you can select p candidate fields from the n candidate fields through automatic decision-making. Select p candidate fields, and then the detection device will The selected fields and the p candidate fields are determined as target attribute fields.

In one embodiment of selecting p candidate fields through manual decision-making, the detection device sends field score information, where the field score information includes the field score of each of the n candidate fields. The detection device receives field decision information, which indicates p candidate fields. The detection device determines the p candidate fields from the n candidate fields based on the field decision information. That is, the detection equipment feeds back the comprehensive score of each field to be selected to the relevant personnel, who then make the final decision based on the comprehensive score.

Optionally, the field score information also includes the mutual information (and/or mutual information score) corresponding to each candidate field obtained in the process of determining the field score and the reconstruction loss. That is, the detection equipment feeds back the mutual information, reconstruction loss and comprehensive score corresponding to each candidate field to the relevant personnel, who then make the final decision based on the mutual information, reconstruction loss and comprehensive score.

In one embodiment of determining the target attribute field through automatic decision-making, the detection device determines p from the n candidate fields based on the preset number of fields and the field score of each of the n candidate fields. fields to be selected. Among them, the preset number of fields represents the total number of attribute fields required for anomaly detection on the test set, and p+m is equal to the preset number of fields. Alternatively, the preset number of fields represents the total number of candidate fields that need to be selected from the n candidate fields, and p is equal to the preset number of fields.

Next, the statistics corresponding to the attribute fields in the embodiment of this application are introduced.

As can be seen from the foregoing, the configuration parameter also indicates the category of each attribute field in the candidate attribute field, and attribute fields of different categories have different types of statistics corresponding to them. For example, the category of each attribute field is a first-type attribute or a second-type attribute or a third-type attribute or a fourth-type attribute. There are different types of statistics corresponding to these four types of attribute fields. For the characteristics of these four types of attribute fields, please refer to the relevant introduction in Table 1 in step 601.

Table 3 is a relationship table between different categories of attribute fields and statistics provided by the embodiment of the present application. As shown in Table 3, the statistics corresponding to each type of attribute field include basic statistics, first type statistics and second type statistics. Among them, the first type statistics corresponding to the attribute fields belonging to the first type of attributes include maximum value, mean value, numerical standard deviation, information entropy and the number of field value types. The first type of statistics corresponding to attribute fields belonging to the second type of attributes include maximum value, mean, numerical standard deviation, proportional standard deviation, information entropy and the number of field value types. The first type of statistics corresponding to attribute fields belonging to the third type of attributes include maximum value, mean value, numerical standard deviation, proportional standard deviation, information entropy and the number of field value types. The first type of statistics corresponding to attribute fields belonging to the fourth type of attributes include mean and numerical standard deviation. In addition, when the anomaly detection task is time series related, the statistics corresponding to the attribute fields belonging to the second type of attributes and the third type of attributes also include user access profiles and/or access profile similarities. In Table 3, ‘√’ is used to indicate the statistics that need to be calculated, ‘×’ is used to indicate the statistics that do not need to be calculated, and ‘*’ is used to indicate the statistics that need to be calculated when the anomaly detection task is time series related.

table 3

Among them, the basic statistics are determined by counting the number of logs included in the corresponding sample. Table 4 is a table of meanings of basic statistics corresponding to different categories of attribute fields provided by the embodiment of the present application. Taking the time granularity of a single sample as 1 hour and the time granularity of anomaly detection as 24 hours as an example, the meanings of the basic statistics corresponding to different categories of attribute fields are shown in Table 4. Among them, the basic statistics of the first type of attributes include two types, and these two basic statistics correspond to the two situations of no differentiated services and differentiated services respectively. The basic statistics for the remaining three types of attributes include one.

Table 4

It can be seen from Table 4 that the basic statistics corresponding to the first type of attributes include the first basic statistic and the second basic statistic. The first basic statistic includes each type of cloud that exists in each single sample included in the corresponding sample. The number of logs of the service, and the number of logs of each cloud service that exist in multiple single samples included in the corresponding sample. The second basic statistics include the number of logs of each cloud service that exist in each single sample included in the corresponding sample. The number of logs for each application, and the number of logs for each application in each cloud service that exist across multiple single samples included in the corresponding sample. The basic statistics corresponding to the second type of attributes, the third type of attributes and the fourth type of attributes include the third basic statistics, and the third basic statistics include the attribute fields of the corresponding categories that exist in each single sample included in the corresponding sample. The number of logs in each cloud service for each value, and the number of logs in each cloud service for each value of the attribute field of the corresponding category that exists in the entire set of multiple single samples included in the corresponding sample.

Taking the detection object as user 1 as an example, user 1 operates two services within a day, including elastic computing service (elastic compute service, ECS) and object storage service (object storage service, OBS). Among them, ECS provides 2 applications and OBS provides 3 applications. For the first type of attributes and differentiated services, the detection device counts the number of logs generated by user 1 operating each application in ECS every hour of the day, and the number of logs generated by operating each application in OBS every hour. The number of logs, a total of 24*(2+3)-dimensional statistics are obtained, counting the number of logs generated by user 1 operating each application in ECS during this day, and the number of logs generated by operating each application in OBS during this day The number of logs, a total of (2+3)-dimensional statistics are obtained. Then, for the first type of attribute and differentiated services, the detection device obtains a total of (24+1)*5-dimensional basic statistics.

For the first type of attributes and no distinction between services, the detection device counts the number of logs generated by user 1 operating ECS every hour during the day, and the number of logs generated by operating OBS every hour, and a total of 24* is obtained A 2-dimensional statistic that counts the number of logs generated by user 1 operating ECS in this day and the number of logs generated by operating OBS in this day, resulting in a total of 1*2-dimensional statistics. Then, for the first type of attributes and no differentiation of services, the detection device obtains a total of (24+1)*2-dimensional basic statistics.

Take the log data generated by User 1 when using the cloud service as an example, including the data in the status field. The value range of the status field is [0,1,2], and the status field belongs to the second type of attribute. The detection device counts the number of logs under each cloud service for each value of the status field in the log data generated by user 1 in each hour of the day. A total of 24*3-dimensional statistics are obtained. The statistics are For each value of the status field in the log data generated within a day, the number of logs under each cloud service is obtained, and a total of 1*3-dimensional statistics are obtained. Then, for each field belonging to the second type of attribute, the detection device has a total of (24+1)*3-dimensional basic statistics, and '3' represents the number of field value types of the status field.

After obtaining the basic statistics, the detection device determines the first type of statistics based on the basic statistics, that is, the first type of statistics is determined based on the basic statistics. Table 5 is a table of calculation methods of the first type of statistics corresponding to different types of attribute fields provided by the embodiment of the present application. Taking the time granularity of a single sample as 1 hour and the time granularity of anomaly detection as 24 hours as an example, based on Table 4, the calculation method of the first type of statistics corresponding to different categories of attribute fields is shown in Table 5.

table 5

Table 6 is a table of meanings of the first type of statistics corresponding to different types of attribute fields provided by the embodiment of the present application. On the basis of Table 4, the meanings of the first type of statistics corresponding to different categories of attribute fields are shown in Table 6. The maximum value in the first type of statistics represents the upper bound of each statistic included in the basic statistics of the corresponding sample, the mean represents the average status of each statistic included in the basic statistics of the corresponding sample, and the numerical standard deviation represents The degree of dispersion of each statistic included in the basic statistics of the corresponding sample, the proportional standard deviation represents the imbalance of each statistic included in the basic statistics of the corresponding sample, and the information entropy represents the basic statistics included in the corresponding sample The degree of confusion of each statistic, and the number of field value types represents the number of possible values for each attribute field in the corresponding sample.

Table 6

Taking the detection object as user 1 as an example, user 1 operates two services within a day, including ECS and OBS. Among them, ECS provides 2 applications and OBS provides 3 applications. For the first type of attributes and without distinguishing services, the detection device counts the number of logs generated by user 1 operating these five applications in each hour of the day, and obtains the basic statistics for each hour including For the number of 5 logs, calculate the maximum value, mean, numerical standard deviation and information entropy of the 5 log numbers for each hour to obtain some of the statistics of the first type of statistics for each hour. In the same way, the detection device counts the number of logs generated by the five applications operated by user 1 during the day, obtains the number of five logs included in the basic statistics of this day, and calculates the maximum number of these five logs on this day. value, mean, numerical standard deviation and information entropy to obtain some statistics of the first type of statistics for this day. For the first type of attributes and differentiated services, the detection device counts the number of logs generated by user 1 operating ECS and the number of logs generated by operating OBS in each hour of the day, and obtains basic statistics for each hour. Including the other 2 log numbers, calculate the maximum value, mean, numerical standard deviation and information entropy of these 2 log numbers for each hour to obtain another part of the statistics of the first type of statistics for each hour. In the same way, the detection device detects the number of logs generated by user 1 operating ECS and the number of logs generated by operating OBS during the day, and obtains the number of the other two logs included in the basic statistics for this day, and calculates the two logs for this day. The maximum value, mean, numerical standard deviation and information entropy of the number of logs are used to obtain another part of the statistics of the first type of statistics on this day.

Take the log data generated by User 1 when using the cloud service as an example, including the data in the status field. The value range of the status field is [0,1,2], and the status field belongs to the second type of attribute. The detection device counts the number of logs under each cloud service for each value of the status field in the log data generated by user 1 in each hour of the day, and obtains 3 of the basic statistics for each hour. For the number of logs, calculate the maximum value, mean, numerical standard deviation, proportional standard deviation and information entropy of the three log numbers for each hour to obtain some of the statistics of the first type of statistics for each hour. The detection device counts the number of logs under each cloud service for each value of the status field in the log data generated during the day, obtains the number of three logs in the basic statistics for this day, and calculates the three logs for this day. The maximum value, mean, numerical standard deviation, proportional standard deviation and information entropy of the number of logs are used to obtain some of the statistics of the first type of statistics on this day.

After obtaining the first type of statistics, the detection device determines the second type of statistics based on the first type of statistics, that is, the second type of statistics is determined based on the first type of statistics, and the second type of statistics represents the first type of the corresponding sample. The difference between each statistic included in the class statistic and that statistic for all samples. Simply put, the second type of statistics reflects the relative standard distance of a single sample from the sample population mean, that is, it reflects the difference between the sample in the first type of statistics and the sample population in the historical observation period.

As an example, the detection device calculates the second type of statistic by calculating z-score according to formula (5). In formula (5), Represents the i-th statistic in the first type of statistics for each day or each hour, μ and σ respectively represent The mean and numerical standard deviation of z _i represents the i-th statistic in the second type of statistics for each day or hour, that is, the second type of statistic corresponding to the i-th statistic in the first type of statistics. n _s represents the total number of statistics included in the first type of statistics. For example, according to Table 5, n _s = (1*N)*(5+num(2nd_attr)*6+num(3nd_attr)*6num (4nd_attr)*2).

As can be seen from the foregoing, the configuration parameters also indicate whether the anomaly detection task is time-series related. In the case of time-series relatedness, the statistics corresponding to the attribute fields belonging to the second type of attributes and the third type of attributes also include user access profiles and/or access profile similarities. sex. Among them, the user access profile represents one or more of changes in the number of logs of multiple single samples included in the corresponding sample, changes in information entropy corresponding to the second type of attributes, and changes in information entropy corresponding to the third type of attributes. Access profile similarity characterizes the similarity between the user access profile of the corresponding sample and the user access profile of the reference sample. The reference sample can be the first sample in the sample set, for example, the first training sample in the training subset.

The change in the number of logs includes the change in the number of logs of each cloud service over time, and/or the change in the total number of logs in all cloud services over time. The change in information entropy corresponding to the second type attribute includes the change of information entropy over time under each cloud service for each value of each attribute field belonging to the second type attribute, and/or, for each second type attribute Information entropy changes over time under all cloud services. The change of information entropy corresponding to the third type attribute includes the change of the information entropy of each third type attribute in each cloud service over time, and/or the change of the information entropy of each third type attribute in all cloud services over time. Time changes.

For example, the detection device counts how the number of logs generated by user 1 operating a certain service within 24 hours of a day changes over time, and obtains a 24-dimensional user access profile.

The detection device calculates access profile similarity by calculating cosine similarity according to formula (6). In formula (6), vector A and vector B represent two user access profiles respectively, and i represents the element subscript in the vector.

The detection device calculates the access profile similarity between the user access profile of each training sample in the training subset and the user access profile of the reference sample according to formula (6), and obtains multiple access profile similarities that are the same as the total number of training samples. . In the same way, the detection device calculates the access profile similarity between the user access profile of each verification sample in the verification subset and the user access profile of the reference sample according to formula (6), and the user access profile of each sample to be tested in the test set. Access profile similarity between user access profiles of reference samples.

Optionally, the detection device determines the mean value and numerical standard deviation of multiple access profile similarities obtained based on the training subset, and based on the mean value and numerical standard deviation of the multiple access profile similarities, the access profile obtained based on the training subset is The similarity is normalized to obtain the normalized access profile similarity of the training subset, and the access profile similarity obtained based on the verification subset is normalized to obtain the normalized access profile similarity of the verification subset. Access profile similarity: normalize the access profile similarity obtained based on the test set to obtain the normalized access profile similarity of the test set. Subsequent detection equipment uses the normalized access profile similarity as a statistic for the corresponding sample.

As an example, the detection device normalizes access profile similarity according to equation (7). In formula (7), sim(g) represents the access profile similarity to be normalized for a sample (test sample or verification sample or training sample), Represents the normalized visit profile similarity of the sample, μ _sim and σ _sim respectively represent the mean and numerical standard deviation of multiple visit profile similarities obtained based on the training subset.

The statistics corresponding to the attribute fields in the embodiment of this application are introduced above. It should be understood that the above content is not intended to limit the embodiments of the present application. The statistics required to be calculated and the calculation method of each statistic in different embodiments may be the same or different.

Step 603: Based on the sample set and the target attribute field, tune the first hyperparameter of the first detection model.

After determining the target attribute field, the detection device determines the statistical characteristics of the sample set based on the target attribute field. The statistical characteristics of the sample set include statistics of the data of the target attribute field in the sample set. That is, the detection device determines the statistics of the data in the target attribute field in the sample set as the statistical characteristics of the sample set. The detection device optimizes the first hyperparameter of the first detection model based on the statistical characteristics of the sample set to obtain a parameter-tuned first detection model.

In the embodiment where the sample set includes a training subset and a verification subset, the statistical characteristics of the sample set include the statistical characteristics of the training subset and the verification subset. The detection device determines the statistics of the data of the target attribute field included in the training subset as the statistical characteristics of the training subset, and determines the statistics of the data of the target attribute field in the verification subset as the verification subset. statistical characteristics. The detection device optimizes the first hyperparameter of the first detection model based on the statistical characteristics of the training subset and the statistical characteristics of the verification subset.

Among them, the first hyperparameters include learning rate, number of training rounds and hidden layer dimensions. There are multiple possible values for the learning rate, number of training rounds, and hidden layer dimensions to be tuned. The detection device sets the first detection model according to the combination of various values of the learning rate, number of training rounds, and hidden layer dimensions. The first hyperparameter uses the statistical characteristics of the training subset to train the first detection model with different combinations of first hyperparameters, and uses the statistical characteristics of the verification subset to verify the anomaly detection effect of the trained first detection model, After obtaining the anomaly detection effects corresponding to the first hyperparameters of different combinations, the first hyperparameter combination with the best anomaly detection effect is determined as the tuned first hyperparameter, and the tuned first hyperparameter is determined is the first hyperparameter of the parameter-tuned first detection model. Simply put, the detection device traverses all possible first hyperparameter combinations in a network search manner to search for the optimal first hyperparameter combination in the first hyperparameter search space. The search space of the first hyperparameter is the Cartesian product of the search spaces of the three hyperparameters: learning rate, number of training epochs, and hidden layer dimension.

In this embodiment of the present application, the first detection model includes an input layer, a first hidden layer and a second hidden layer (which may also be called a bottleneck layer). The hidden layer dimensions to be tuned include the dimensions of the first hidden layer and the second hidden layer. The dimensions of the input layer are determined based on the number of fields included in the target attribute field, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimensions of the input layer. That is to say, the dimensions of the hidden layer are not set arbitrarily, and the search space of the hidden layer dimension is small.

Among them, the more fields the target attribute field includes, the higher the dimension of the input layer will be. The dimension of the input layer is also related to the dimension of the statistics of the fields included in the target attribute field. The higher the dimension of the statistics of the fields included in the target attribute field, that is, the greater the number of statistics, the higher the dimension of the input layer will be.

Taking the dimension of the input layer as N1, the dimension of the first hidden layer as N2, and the dimension of the second hidden layer as N3 as an example, in some embodiments, N1 satisfies Among them, ceil() means rounding up, that is, N is an integer power of 2 that is not less than N1. When N is located in the first interval, N2=N/4, and N3 is one of the integer powers of 2 located in the interval [8, N/8]. When N is located in the second interval, N2=N/2, N3 is one of the integer powers of 2 located in the interval [4, N/4], and the values in the second interval are smaller than those in the first interval value.

For example, when N is an integer power of 2 within the first interval [512,4096], that is, N1∈(256,4096], N2=N/4, the search space of N3 is {8, 16,32,64,128,256,512}. When N is an integer power of 2 in the second interval [64,256], that is, N1∈(264,256], N2=N/2, the search space of N3 is {4,8 ,16,32,64}.

Optionally, the learning rate is equal to Where l is an integer in the interval [0, L+1], and L does not exceed the search point threshold. In other words, the search space of the learning rate is Where l = 0, 1, ..., L + 1, L does not exceed the search point threshold, that is, there are L + 2 search points for the learning rate, including L search points within the learning rate optimization interval and two search points on the boundary of the learning rate optimization interval. The search point threshold is 12, 14, 16, etc.

Regarding the number of training rounds, usually the more the number of training rounds, the closer the model can be to the convergence state on the training subset, but it can also easily make the trained model overfit. In order to allow the model to converge while minimizing the degree of overfitting, in the embodiment of the present application, the verification subset is used to test the degree of overfitting of the first detection model trained by the training subset. When there are no abnormalities in the log data in the validation subset, all validation samples in the validation subset are regarded as positive samples. When the trained first detection model has better performance on the validation subset, that is, the validation When the verification samples exceeding the preset proportion in the subset are all detected as positive samples, the detection device may consider that the trained first detection model is close to a convergence state without overfitting. Among them, the preset ratio is 95%, 98% or 99%, etc.

In some embodiments, the value range of the number of training rounds is [1,100], that is, the search space of the number of training rounds is {1,2,3,...,100}. The detection equipment determines the number of rounds with the best performance on the verification subset among 100 rounds of training as the number of training rounds after tuning, that is, the optimal number of rounds is obtained.

In this embodiment of the present application, the first detection model also includes second hyperparameters, and the second hyperparameters include optimizer parameters, activation function, loss function, parameter initialization method, batch size, and number of hidden layers. The second hyperparameter is a hyperparameter that does not require tuning. For example, the second hyperparameter is a preset parameter. Among them, the activation function of each hidden layer can be a nonlinear function such as Relu, sigmoid, etc., and the activation function of the output layer can be a linear function.

In order to implement lightweight automatic anomaly detection, embodiments of this application provide an anomaly detection model based on an automatic codec. That is, the first detection model and the second detection model in this article are implemented based on the automatic codec.

In the embodiment of the present application, the first detection model includes an encoder and a decoder connected in series. The encoder includes an input layer, a first hidden layer and a second hidden layer connected in series. The decoder includes a third hidden layer connected in series. and output layer. The input layer of the encoder and the output of the decoder The dimensions of the layers are the same, and the dimensions of the first hidden layer and the third hidden layer are the same. The second hidden layer can be considered as the output layer of the encoder or the input layer of the decoder. That is, the second hidden layer can be considered as the network layer shared by the encoder and the decoder. The encoder and the decoder are relatively symmetrical. of. Among them, the input layer of the encoder is used for the statistical features of the input samples, and the output layer of the decoder is used for the reconstructed features of the output samples.

In addition, the first detection model also includes a discriminator, and the parameters of the discriminator include an error threshold. The detection device inputs the statistical features of the sample set into the input layer of the encoder. After the statistical features of the sample set are processed by the input layer, the first hidden layer and the second hidden layer of the encoder, the coding features of the sample set are obtained. After the coding features of the sample set are processed by the third hidden layer and output layer of the decoder, the reconstructed features of the sample set are obtained. The detection device inputs the statistical features and reconstruction features of the sample set into the discriminator to determine the reconstruction loss of the sample set according to the error threshold. The detection device performs training and hyperparameter tuning on the first detection model based on the reconstruction loss of the sample set.

Wherein, the error threshold is determined based on the mean value of multiple reconstruction losses. In the process of parameter tuning of the first detection model, the multiple reconstruction losses include the error between the statistical characteristics of each training sample in the training subset and the reconstruction characteristics. . For example, the detection device determines the mean value of the plurality of reconstruction losses as the error threshold. Alternatively, the error threshold is determined based on the mean and standard deviation of the multiple reconstruction losses. For example, the detection device determines the error threshold (threshold) according to formula (8). In formula (8), mean(loss) and std(loss) respectively represent the mean and standard deviation of multiple reconstruction losses, α is a preset parameter, and α can be 0.4, 0.5, 0.6 and other values.
threshold=mean(loss)+α*std(loss) (8)

In this article, the reconstruction loss between the statistical characteristics of the sample and the reconstruction characteristics can be root mean square error (RMSE), mean square error (MSE) or other forms of errors. The embodiments of this application are This is not a limitation. That is, the loss function can be an RMSE function or an MSE function or other functions.

From the above, it can be seen that the first hyperparameters that need to be tuned in the embodiment of the present application include the learning rate, the number of training rounds and the hidden layer dimension, which have a relatively large impact on the model performance. The second hyperparameters that have a relatively small impact on the model performance can be pre-set to speed up the construction of the model and parameter tuning, thereby improving the execution efficiency of the anomaly detection task while ensuring the performance of the first detection model after parameter optimization.

Step 604: Based on the target attribute field, perform anomaly detection on the test set through the parameter-tuned first detection model to obtain anomaly detection results of the test set.

After obtaining the parameter-tuned first detection model, the detection device determines the statistical characteristics of the test set based on the target attribute field, and the statistical characteristics of the test set include statistics of the data of the target attribute field in the test set. The detection device inputs the statistical characteristics of the test set into the parameter-tuned first detection model to obtain anomaly detection results of the test set.

For example, if the time granularity of a single sample is 1 hour and the time granularity of anomaly detection is 24 hours, if the detection object is user a, and each sample to be tested in the test set includes the log data of user a within 24 hours, then the The anomaly detection results of the sample to be tested indicate whether user a has abnormal operations and/or abnormal access behaviors within the 24 hours. If the detection object is cloud service a, and each sample to be tested in the test set includes the log data of cloud service a within 24 hours, then the anomaly detection result of the sample to be tested indicates whether cloud service a has abnormalities within these 24 hours.

In this embodiment of the present application, the first detection model includes an encoder, a decoder and a discriminator, and the parameters of the discriminator include an error threshold. The detection device inputs the statistical features of the test set into the encoder to obtain the coding features of the test set; it inputs the coding features of the test set into the decoder to obtain the reconstructed features of the test set. The detection device inputs the statistical features and reconstructed features of the test set into the discriminator to determine the anomaly detection results of the test set according to the error threshold.

The error threshold is determined based on the mean value of multiple reconstruction losses. In the process of anomaly detection on the test set, the multiple reconstruction losses include the error between the statistical characteristics and reconstruction characteristics of each sample to be tested in the test set, or, The error between the statistical features and reconstructed features of each training sample in the training subset is also included. The statistical characteristics of the sample to be tested include the statistics of the data of the target attribute field in the sample to be tested, and the reconstruction characteristics of the sample to be tested refer to the reconstruction statistics of the data of the target attribute field in the sample to be tested.

FIG8 is a flow chart of another anomaly detection method provided by an embodiment of the present application. Referring to FIG8, the detection device determines the target attribute field from the candidate attribute field through automatic feature selection based on the original log data, that is, automatically performs feature engineering to screen the candidate attribute field, and the screened candidate field and the selected field in the candidate attribute field are used as the target attribute field. After the field is screened, the data after feature selection is obtained, and the data after feature selection includes the statistics of the data of the target attribute field in the sample. The detection device automatically encodes and decodes the data after feature selection to obtain the reconstruction statistics of the data of the target attribute field in the sample, that is, outputs the reconstruction result. The detection device performs error calculation on the data after feature selection and the reconstruction result (calculating RMSE as shown in FIG8) to obtain the reconstruction loss of the sample. Then, the detection device outputs the discrimination result of the sample after discriminating the reconstruction loss of the sample according to the error threshold, that is, outputs the anomaly detection result.

Figure 9 is a flow chart of yet another anomaly detection method provided by an embodiment of the present application. Referring to Figure 9, cloud field personnel configure relevant parameters of the anomaly detection task on the client and submit the configuration file (including configuration parameters) to the anomaly detection system. Anomaly detection system provides anomaly detection self-service tools. Cloud field personnel or devices normalize the selected raw log data according to the log specification format. The self-service tool automatically calculates statistics on normalized log data based on configuration files, and automatically filters fields based on statistics and configured candidate attribute fields. The self-service tool automatically performs anomaly detection on the samples to be tested after filtering fields to obtain anomaly detection results.

To sum up, in the embodiment of the present application, based on simple manual configuration, abnormality detection of log data in the cloud platform can be realized. There is no need for relevant personnel in the cloud service field to manually create security rules, nor does it require manual in-depth analysis and summary of various attack modes. This avoids the loopholes in manually created security rules, reduces missed detections and false detections, and also reduces It can improve the efficiency of anomaly detection. In addition, this solution does not require relevant personnel to have professional knowledge of machine learning and deep learning, such as knowledge of model design and tuning, and can also enable self-service and rapid anomaly detection models to achieve anomaly detection for specific tasks.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer, or a data storage device such as a server or data center integrated with one or more available media. The available media may be magnetic media (such as floppy disks, hard disks, tapes), optical media (such as digital versatile discs (DVD)) or semiconductor media (such as solid state disks (SSD)) wait. It is worth noting that the computer-readable storage media mentioned in the embodiments of this application may be non-volatile storage media, in other words, may be non-transitory storage media.

It should be understood that "at least one" mentioned herein refers to one or more, and "plurality" refers to two or more. In the description of the embodiments of this application, unless otherwise stated, "/" means or, for example, A/B can mean A or B; "and/or" in this article is just a way to describe the association of related objects. Relationship means that three relationships can exist. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. In addition, in order to facilitate a clear description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not limit the number and execution order.

It should be noted that the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in the embodiments of this application and Signals are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data need to comply with the relevant laws, regulations and standards of relevant countries and regions. For example, the log data involved in the embodiments of this application are all obtained with full authorization.

The above-mentioned embodiments are provided for this application and are not intended to limit this application. Any modifications, equivalent substitutions, improvements, etc. made within the principles of this application shall be included in the protection scope of this application.

Claims

An anomaly detection method, characterized in that the method includes:

Receive the configuration parameters of the anomaly detection task. The configuration parameters indicate a sample set, a test set and a candidate attribute field. The sample set includes log data used for parameter tuning in the cloud platform. The test set includes the cloud platform. In the log data to be anomaly detected, the candidate attribute field is the attribute field corresponding to the log data of the cloud platform;

Based on the sample set and the candidate attribute field, determine a target attribute field from the candidate attribute field, where the target attribute field is an attribute field used to perform the anomaly detection task;

Based on the sample set and the target attribute field, tuning a first hyperparameter of a first detection model;

Based on the target attribute field, anomaly detection is performed on the test set through a parameter-tuned first detection model to obtain anomaly detection results of the test set.
The method of claim 1, wherein the candidate attribute fields include m selected fields and n candidate fields, m is an integer not less than 0, and n is an integer greater than 0;

Determining a target attribute field from the candidate attribute field based on the sample set and the candidate attribute field includes:

Based on the sample set, the m selected fields and the n candidate fields, the field scores corresponding to the n candidate fields are determined, and the field scores are represented in the m selected fields. How much the anomaly detection effect will be improved after adding the corresponding candidate fields;

Based on the field scores corresponding to the n candidate fields, determine p candidate fields from the n candidate fields, where p is a positive integer not greater than n;

The m selected fields and the p candidate fields are determined as the target attribute fields.
The method of claim 2, wherein the sample set includes a training subset and a validation subset;

Determining the field scores corresponding to the n candidate fields based on the sample set, the m selected fields and the n candidate fields includes:

The m selected fields are grouped into a selected field set, the n to-be-selected fields are grouped into a to-be-selected field set, and mutual information corresponding to each to-be-selected field in the to-be-selected field set is determined based on the training subset, the selected field set and the to-be-selected field set, wherein the mutual information represents the correlation between the corresponding to-be-selected field and all fields in the selected field set;

Select k candidate fields with the smallest mutual information from the set of candidate fields, where k is a positive integer not greater than n;

Based on the training subset, the verification subset, the selected field set and the k candidate fields, the reconstruction loss corresponding to the k candidate fields is determined, and the reconstruction loss is represented by the corresponding The effect of anomaly detection on the verification subset by the selected fields and the selected field set;

Based on the mutual information and reconstruction loss corresponding to the k candidate fields, select one candidate field from the k candidate fields, and determine the field score of the selected candidate field;

Move the selected candidate fields from the candidate field set to the selected field set, and return to determine the candidate fields based on the training subset, the selected field set and the candidate field set. The step of selecting the mutual information corresponding to each field to be selected in the field set is until the field set to be selected is empty, and the field scores corresponding to the n candidate fields are obtained.
The method of claim 3, wherein the k candidate fields are determined based on the training subset, the verification subset, the selected field set and the k candidate fields. The reconstruction losses corresponding to the fields include:

For the first candidate field among the k candidate fields, the first candidate field is added to the selected field set to obtain a candidate field set, and the first candidate field is the Any candidate field among k candidate fields;

Determine, based on the training subset and the candidate field set, a second detection model corresponding to the first candidate field;

Based on the verification subset and the candidate field set, the reconstruction loss corresponding to the first candidate field is determined through the second detection model corresponding to the first candidate field.
The method of claim 4, wherein the first step is determined based on the training subset and the candidate field set. The second detection model corresponding to a candidate field includes:

Determine reference statistical characteristics of the training subset based on the candidate field set, where the reference statistical characteristics of the training subset include statistics of data of all fields included in the candidate field set in the training subset;

An initial detection model is trained using the reference statistical features of the training subset to obtain a second detection model corresponding to the first candidate field.
The method according to claim 4 or 5, characterized in that, based on the verification subset and the candidate field set, the first candidate field is determined through a second detection model corresponding to the first candidate field. The reconstruction loss corresponding to the selected field includes:

Determine reference statistical characteristics of the verification subset based on the candidate field set, where the reference statistical characteristics of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset;

The reference statistical characteristics of the verification subset are input into the second detection model corresponding to the first candidate field to obtain the reference reconstruction characteristics of the verification subset, and the reference reconstruction characteristics of the verification subset include the verification Reconstruction statistics of data of all fields included in the candidate field set in the subset;

Based on the reference statistical features and reference reconstruction features of the verification subset, the reconstruction loss corresponding to the first candidate field is determined.
The method according to claim 5 or 6, characterized in that the configuration parameters also indicate the category of each attribute field in the candidate attribute fields, and attribute fields of different categories have different types of statistics corresponding to them.
The method according to any one of claims 1 to 7, characterized in that the first hyperparameter includes a learning rate, a number of training rounds, and a hidden layer dimension.
The method as claimed in claim 8 is characterized in that the first detection model includes an input layer, a first hidden layer and a second hidden layer; the hidden layer dimensions include the dimensions of the first hidden layer and the second hidden layer, the dimension of the input layer is determined based on the number of fields included in the target attribute field, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimension of the input layer.
The method according to any one of claims 1 to 9, wherein the first detection model includes an encoder, a decoder and a discriminator, and the parameters of the discriminator include an error threshold;

The step of performing anomaly detection on the test set based on the target attribute field through the parameter-tuned first detection model to obtain the anomaly detection results of the test set includes:

Determine statistical characteristics of the test set based on the target attribute field, and the statistical characteristics of the test set include statistics of the data of the target attribute field in the test set;

Input the statistical features of the test set into the encoder to obtain the coding features of the test set;

Input the encoding features of the test set into the decoder to obtain the reconstructed features of the test set;

The statistical features and reconstructed features of the test set are input into the discriminator to determine the anomaly detection result of the test set according to the error threshold.
An abnormality detection device, characterized in that the device comprises:

The receiving module is used to receive the configuration parameters of the anomaly detection task. The configuration parameters indicate the sample set, the test set and the candidate attribute fields. The sample set includes log data used for parameter tuning in the cloud platform. The test set Including log data to be detected for anomalies in the cloud platform, the candidate attribute fields are attribute fields corresponding to the log data of the cloud platform;

A determination module, configured to determine a target attribute field from the candidate attribute field based on the sample set and the candidate attribute field, where the target attribute field is an attribute field used to perform the anomaly detection task;

A parameter tuning module, configured to tune the first hyperparameter of the first detection model based on the sample set and the target attribute field;

The anomaly detection module is used to perform anomaly detection on the test set based on the target attribute field through a first detection model with optimized parameters to obtain anomaly detection results of the test set.
The device of claim 11, wherein the candidate attribute fields include m selected fields and n candidate fields, m is an integer not less than 0, and n is an integer greater than 0;

The determination module includes:

The first determination sub-module is used to determine the field scores corresponding to the n candidate fields based on the sample set, the m selected fields and the n candidate fields, where the field scores represent the The extent to which the anomaly detection effect is improved after adding corresponding candidate fields to the m selected fields;

The second determination sub-module is used to determine p candidate fields from the n candidate fields based on the field scores corresponding to the n candidate fields, where the p is a positive integer not greater than n;

The third determination sub-module is used to determine the m selected fields and the p candidate fields as the target attribute fields.
The device according to claim 12, wherein the sample set includes a training subset and a verification subset;

The first determination sub-module is specifically used for:

The m selected fields are formed into a selected field set, and the n candidate fields are formed into a candidate field set. Based on the training subset, the selected field set and the candidate field set, it is determined Mutual information corresponding to each candidate field in the candidate field set, the mutual information representing the correlation between the corresponding candidate field and all fields in the selected field set;

Select k candidate fields with the smallest mutual information from the candidate field set, where k is a positive integer not greater than n;

Based on the training subset, the verification subset, the selected field set and the k candidate fields, the reconstruction loss corresponding to the k candidate fields is determined, and the reconstruction loss is represented by the corresponding The effect of anomaly detection on the verification subset by the selected fields and the selected field set;

Based on the mutual information and reconstruction loss corresponding to the k candidate fields, select one candidate field from the k candidate fields, and determine the field score of the selected candidate field;

Move the selected candidate fields from the candidate field set to the selected field set, and return to determine the candidate fields based on the training subset, the selected field set and the candidate field set. The step of selecting the mutual information corresponding to each field to be selected in the field set is until the field set to be selected is empty, and the field scores corresponding to the n candidate fields are obtained.
The device according to claim 13, characterized in that the first determining sub-module is specifically used to:

For the first candidate field among the k candidate fields, the first candidate field is added to the selected field set to obtain a candidate field set, and the first candidate field is the Any candidate field among k candidate fields;

Based on the training subset and the candidate field set, determine a second detection model corresponding to the first candidate field;

Based on the verification subset and the candidate field set, the reconstruction loss corresponding to the first candidate field is determined through the second detection model corresponding to the first candidate field.
The device according to claim 14, characterized in that the first determination sub-module is specifically used to:

Determine reference statistical characteristics of the training subset based on the candidate field set, where the reference statistical characteristics of the training subset include statistics of data of all fields included in the candidate field set in the training subset;

An initial detection model is trained using the reference statistical features of the training subset to obtain a second detection model corresponding to the first candidate field.
The device according to claim 14 or 15, characterized in that the first determining sub-module is specifically used to:

Determine reference statistical characteristics of the verification subset based on the candidate field set, where the reference statistical characteristics of the verification subset include statistics of data of all fields included in the candidate field set in the verification subset;

The reference statistical features of the verification subset are input into the second detection model corresponding to the first candidate field to obtain the reference reconstruction features of the verification subset, and the reference reconstruction features of the verification subset include the verification Reconstruction statistics of data of all fields included in the candidate field set in the subset;

Based on the reference statistical features and the reference reconstruction features of the verification subset, a reconstruction loss corresponding to the first candidate field is determined.
The device according to claim 15 or 16, wherein the configuration parameter further indicates the category of each attribute field in the candidate attribute fields, and attribute fields of different categories have different types of statistics corresponding to them.
The device according to any one of claims 11-17, wherein the first hyperparameter includes learning rate, number of training rounds and hidden layer dimensions.
The device of claim 18, wherein the first detection model includes an input layer, a first hidden layer and a second hidden layer; the hidden layer dimensions include the first hidden layer and the second hidden layer. Dimensions of the hidden layer, the dimensions of the input layer are determined based on the number of fields included in the target attribute field, and the dimensions of the first hidden layer and the second hidden layer are determined based on the dimensions of the input layer.
The device according to any one of claims 11 to 19, wherein the first detection model includes an encoder, a decoder and a discriminator, and the parameters of the discriminator include an error threshold;

The anomaly detection module includes:

A fourth determination submodule, configured to determine a statistical feature of the test set based on the target attribute field, wherein the statistical feature of the test set includes a statistical value of data of the target attribute field in the test set;

The first input submodule is used to input the statistical characteristics of the test set into the encoder to obtain the coding characteristics of the test set;

The second input submodule is used to input the coding features of the test set into the decoder to obtain the reconstructed features of the test set;

The third input submodule is used to input the statistical features and reconstructed features of the test set into the discriminator to determine the anomaly detection result of the test set according to the error threshold.
A computing device cluster, characterized by including at least one computing device, each computing device including a processor and a memory;

The processor of the at least one computing device is configured to execute instructions stored in the memory of the at least one computing device, so that the computing device cluster performs the method according to any one of claims 1-10.
A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the method described in any one of claims 1-10 are implemented.
A computer program product, characterized in that computer instructions are stored in the computer program product, and when the computer instructions are executed by a processor, the steps of the method described in any one of claims 1-10 are implemented.