WO2020078059A1

WO2020078059A1 - Interpretation feature determination method and device for anomaly detection

Info

Publication number: WO2020078059A1
Application number: PCT/CN2019/097171
Authority: WO
Inventors: 方文静
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2018-10-17
Filing date: 2019-07-23
Publication date: 2020-04-23
Also published as: TWI723476B; CN109583470A; TW202044111A

Abstract

Embodiments of the description provide an interpretation feature determination method and device for anomaly detection. The method comprises: for a sample input to an anomaly detection model and comprising at least one sample feature, determining, according to a distribution parameter of each sample feature, the degree of deviation of the sample feature, wherein the distribution parameter is used to represent distribution characteristics of the sample feature in training set data of the anomaly detection model, and the anomaly detection model is an unsupervised model; and determining, according to the degree of deviation of all of the sample features of the sample, at least one sample feature to be an interpretation feature corresponding to the sample, wherein the interpretation feature is used to interpret the association between the sample and a model output result of the corresponding anomaly detection model.

Description

Anomaly detection interpretation feature determination method and device

Technical field

The present disclosure relates to the field of big data technology, and in particular, to an anomaly detection interpretation feature determination method and device.

Background technique

Anomaly detection is an important part of data mining and can be applied to various fields such as intrusion detection, fraud detection, fault detection, system health detection, sensor network event detection, and ecosystem interference detection. In actual anomaly detection applications, one of the algorithms is an unsupervised anomaly detection model. The anomaly detection model is often a black box, and users cannot perceive its internal working state. In order to improve the credibility of using the model, model interpretation is crucial. By interpreting the model, you can further understand the output of the model, such as which features of the input sample have the greatest impact on the model output. Model interpretation can provide an analysis direction for the cause of the output of the anomaly detection model.

Summary of the invention

In view of this, one or more embodiments of the present specification provide a method and apparatus for determining an interpretation feature of anomaly detection, so as to improve the accuracy of acquiring the interpretation feature of anomaly detection.

Specifically, one or more embodiments of this specification are implemented by the following technical solutions:

In a first aspect, a method for determining an interpretation feature for anomaly detection is provided. The method includes:

For a sample of the input anomaly detection model, the sample includes at least one sample feature, and the degree of deviation of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used to indicate that the sample feature is in the anomaly Distribution characteristics in the training set data of the detection model; the anomaly detection model is an unsupervised model;

According to the deviation degree of each sample feature in the sample, at least one sample feature is determined as the interpretation feature corresponding to the sample, and the interpretation feature is used to interpret the model output result of the sample and the corresponding anomaly detection model Associations.

In a second aspect, an interpretation feature determination device for anomaly detection is provided. The device includes:

An offset calculation module, for a sample of the input anomaly detection model, the sample includes at least one sample feature, and the offset of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used for Indicates the distribution characteristics of the sample feature in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model;

A feature determination module, configured to determine at least one sample feature as an interpretation feature corresponding to the sample according to the deviation degree of each sample feature in the sample, and the interpretation feature is used to interpret the sample and the corresponding anomaly Check the correlation between the model output results of the model.

In a third aspect, an interpretation feature determination device for anomaly detection is provided. The device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor implements the program to implement the following step:

The method and device for determining the interpretation feature of anomaly detection in one or more embodiments of this specification finds the interpretation feature based on the distribution parameters based on the distribution parameter, which is based on the data distribution characteristics of the feature value of the sample feature itself to find the interpretation feature and the model It is irrelevant and does not depend on the model. Therefore, imperfect information about the model, such as sample imbalance, will not affect the detection of interpretation features. Moreover, the use of distribution parameters to identify interpretation features conforms to the characteristics of abnormal point data distribution and interpretation features of anomaly detection. The accuracy of acquisition is high.

BRIEF DESCRIPTION

In order to more clearly explain one or more embodiments of the specification or the technical solutions in the prior art, the following will briefly introduce the drawings required in the description of the embodiments or the prior art. Obviously, the following description The drawings are only some of the embodiments described in one or more embodiments of this specification. For those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained from these drawings.

1 is a schematic diagram of the principle of anomaly detection provided by one or more embodiments of this specification;

2 is a method for determining an explanation feature of anomaly detection provided by one or more embodiments of this specification;

FIG. 3 is a schematic structural diagram of a device for determining an explanation feature of abnormality detection provided by one or more embodiments of the present specification;

FIG. 4 is a schematic structural diagram of another apparatus for determining an explanation feature of abnormality detection provided by one or more embodiments of the present specification.

detailed description

In order to enable those skilled in the art to better understand the technical solutions in one or more embodiments of this specification, the following will be combined with the drawings in one or more embodiments of this specification. The technical solution is described clearly and completely. Obviously, the described embodiments are only a part of the embodiments, but not all the embodiments. Based on one or more embodiments of this specification, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

Anomaly detection is also called outlier detection. An outlier is an object that deviates significantly from other data points. Outliers are not the same as most of the data, and only a small part of the overall data. Anomaly detection requires Identify these outliers from the data. For example, it can be used to identify abnormal transactions.

At least one embodiment of this specification provides an interpretation feature determination method for anomaly detection, which can be applied to the interpretation of an unsupervised anomaly detection model, and the interpretation scheme may not require the introduction of additional interpretation models, and will not rely on The anomaly detection model itself.

The following describes some of the features involved in the method description:

Sample: The sample can be used as the input of the anomaly detection model, and can correspond to the model output of an anomaly detection model. For example, you can input A into the anomaly detection model and get B output by the model, then A is the sample.

Sample characteristics: A sample may have at least one sample characteristic, which is used to describe the attribute properties of the sample in different aspects. For example, the sample may be a user whose user ID is 1100, and the at least one sample characteristic included in the sample may include: the user's age, address, and working years. Among them, age is a sample feature, and address can be another sample feature.

Explaining features: In machine learning tasks, different models are proposed to model the problem. In addition to the direct output of the model, we need to further understand the results, such as which features have the greatest impact on the model output, and what factors determine its corresponding output, which requires a corresponding interpretation of the model. In the embodiments of the present specification, "interpretation feature" is used to indicate a feature that can explain the model output result of the anomaly detection model. The interpretation feature can be used to explain the association between the input sample of the anomaly detection model and the model output result. For example, if the sample Y1 is input to the anomaly detection model to obtain the model output D1, and the determined interpretation features are t1 and t2, then the features t1 and t2 included in the sample Y1 have a higher contribution value to the output D1, possibly due to Only the sample features t1 and t2 lead to D1. The interpretation feature may be a partial feature determined from the above-mentioned sample features. For example, the sample feature may include F1, F2, and F3, and the interpretation feature may be F1 and F2 therein.

On the basis of the above-mentioned feature description, the following describes an explanation feature determination method of an embodiment of this specification.

As shown in FIG. 1, the process of anomaly detection includes two processes of "training" and "prediction". Among them, in the "training" stage, the anomaly detection model can be trained through the training set data. In the "prediction" stage, you can use a sample in the test set data as the input of the anomaly detection model to predict whether the input sample is abnormal data. However, in the explanation scheme for the abnormality detection model provided by at least one embodiment of the present specification, it is irrelevant to the above-mentioned training abnormality detection model and the application of the model for prediction, that is, the model interpretation and the model training prediction are two independent operations. part.

Please continue to refer to FIG. 1, and in conjunction with FIG. 2, FIG. 2 describes a method for determining an explanation feature of anomaly detection. Among them, the first thing that needs to be explained is that this method uses local model interpretation when interpreting the anomaly detection model, that is, to provide a corresponding interpretation for the prediction of a specific sample.

As shown in FIG. 2, the method may include:

In step 200, according to the training set data of the anomaly detection model, the distribution parameters of each sample feature in the training set data are obtained respectively.

In this step, the anomaly detection model may be an unsupervised model.

The training set data may be data for training an anomaly detection model. The training set data may include multiple samples, and each sample may include at least one sample feature.

Exemplarily, the sample may be a user whose user ID is 1100, and at least one sample characteristic included in the sample may include: the user's age, address, working years, and annual income.

Each sample feature can obtain a corresponding distribution parameter. For example, the sample feature "age" corresponds to a distribution parameter S1, and the sample feature "working years" corresponds to a distribution parameter S2.

The distribution parameter of each sample feature can be obtained by obtaining the same sample feature from each sample of the training set data. The same sample feature can be called a target sample feature, and then a plurality of target sample features are obtained. The target feature set of; and according to the target feature set, determine the distribution parameter of the target sample feature.

For example, taking the sample feature "annual income" as an example, the training set data may include multiple samples, assuming that the user identified as 1100, the user identified as 1101, and the user identified as 1102 are included. Each user's sample features include this "annual income". The "annual income" sample feature can be obtained from each sample, and this feature can be called the target sample feature. A target feature set can be obtained, and the target feature set includes the "annual income" of the above three users. Then, the distribution parameter corresponding to the feature "annual income" can be determined according to the feature value of the "annual income" in the target feature set.

Distribution parameters can be used to represent the distribution characteristics of sample features in the training set data of the anomaly detection model. For example, in anomaly detection, the multivariate Gaussian model is a classic algorithm. Its data is assumed to have a normal distribution for each dimensional feature distribution. Under this assumption, there is a well-known 3-sigma principle, and there are 3 variance regions around the mean. The range contains 99.7% of the data, and outside this area can be considered as an outlier. Of course, there can be 2-sigma principle, 1-sigma principle, etc.

The above description shows a data distribution characteristic. The abnormal point to be detected and identified by anomaly detection is usually a point that deviates from the area where most data is located in terms of distribution characteristics. Characteristic, for example, within the range of 3 variances around the mean.

Based on the above, for example, the distribution parameters calculated in this step may include: the mean and variance of the sample features. For example, the mean can be represented by u, and the variance can be represented by s.

In step 202, for a sample of the input anomaly detection model, the input sample includes at least one sample feature, and the degree of deviation of the sample feature is determined according to the distribution parameter of each sample feature.

In this step, the sample is a sample in the test set data. The test set data may include multiple samples, and each sample may include at least one sample feature. As mentioned before, this method's interpretation scheme for anomaly detection is applied to local model interpretation, that is, to explain the anomaly detection of each specific sample.

For example, the sample Y1 input training anomaly detection model gets the model output result D1, and the sample Y2 input anomaly detection model gets the model output result D2, and the model interpretation of this method should be used to explain the association between Y1 and D1, The relationship between D2. For example, which features of Y1 contribute more to the result D1, and which features of Y2 contribute more to get D2. Therefore, step 202 and step 204 may be performed on one of the samples in the test set data.

Similar to the training set data, each sample in the test set data may also include multiple sample features. In this step, a corresponding offset degree is calculated for each sample feature, and the offset degree may be an index for measuring whether the sample feature is in the above-mentioned "region where most data is located".

For example, the degree of offset can be calculated based on the following principle: for each dimension feature, the distance that each new sample deviates from the mean of the training set by several times the variance can be calculated. The greater the deviation, the more abnormal the data. Then, taking the distribution parameters as mean and variance as an example, the following formula (1) can be used as the calculation formula of the degree of deviation:

n = (v-u) / s ............. (1)

In the above formula (1), n is the degree of offset, and this n can provide a uniform abnormality measurement index for different sample characteristics. v is the actual feature value of a sample feature in the sample in the sample; u is the mean value of the sample feature based on the statistics of the training set; s is the variance of the sample feature based on the statistics of the training set. According to formula (1), the distance from which the actual value deviates from the mean by several times the variance is determined as the degree of deviation.

In step 204, at least one sample feature is determined as the interpretation feature of this anomaly detection corresponding to the sample according to the degree of deviation of each sample feature in the sample.

Wherein, the interpretation feature is used to explain the association between the sample input in this anomaly detection and the output result of the model. For example, if the sample Y1 is input to the anomaly detection model to obtain the model output result D1, and the determined interpretation features are t1 and t2, then the sample Y1 includes the features t1 and t2, and the contribution value of the t1 and t2 to the output D1 is relatively High, it may be that the two sample features t1 and t2 lead to the model output D1. Of course, you can further analyze the cause of the anomaly detection output D1 corresponding to Y1 this time on the basis of explaining the characteristics.

For example, the interpretation feature may be obtained by sorting the features of each sample in descending order according to the offset of each sample feature in the sample of the input model, and using at least one sample feature that is ranked in the preset number of digits as The explained feature. This method selects several sample features with higher offset as the interpretation features. In a specific implementation, it is not limited to this method. For example, an offset degree threshold may also be set, and sample features whose offset degree is higher than the threshold value are used as interpretation features.

The above steps can be executed on the same device or on different devices. For example, step 200 can be performed on one device and belongs to the training phase, that is, the training phase of the anomaly detection model can include two parts, one is the training of the conventional anomaly detection model, and the other is to obtain the distribution parameters according to the training set data.

Steps

202 and 204 can be executed on another device (or the same device). It belongs to the prediction phase of the model, that is, the prediction phase of the anomaly detection model also includes two parts. One part is the conventional use of the model to predict whether it is abnormal, and One part is to get the interpretation characteristics according to the distribution parameters. In each phase, training phase or prediction phase, the model interpretation scheme and the model's training prediction scheme can be run independently. Of course, it is also possible to calculate the distribution parameters while training, or to calculate and interpret features based on input samples while predicting.

The method for determining the interpretation features of anomaly detection in at least one embodiment of this specification finds the interpretation features by finding the anomaly interpretation features based on the distribution parameters. This is based on the data distribution characteristics of the feature values of the sample features themselves. Depends on the model, therefore, imperfect information about the model, such as sample imbalance, will not affect the detection of interpretation features, and the use of distribution parameters to identify interpretation features conforms to the characteristics of abnormal point data distribution of anomaly detection and accurate interpretation feature acquisition Sexuality is higher.

FIG. 3 is an anomaly detection interpretation feature determination device provided by one or more embodiments of the present specification. As shown in FIG. 3, the device may include: an offset calculation module 31 and a feature determination module 32.

The offset calculation module 31 is used for a sample of the input anomaly detection model, the sample includes at least one sample feature, and the offset of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used Yu represents the distribution characteristics of the sample features in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model;

The feature determination module 32 is configured to determine at least one sample feature as an interpretation feature corresponding to the sample according to the degree of deviation of each sample feature in the sample, and the interpretation feature is used to interpret the sample and the corresponding The correlation between the model output results of the anomaly detection model.

FIG. 4 is another apparatus for determining an explanatory feature for anomaly detection provided by one or more embodiments of the present specification. As shown in FIG. 4, the device may further include: a distribution calculation module 33 based on the structure shown in FIG. .

The distribution calculation module 33 is used to obtain target sample features from each sample of the training set data to obtain a target feature set including multiple target sample features; according to the target feature set, determine the distribution parameters of the target sample features; The training set data includes multiple samples, and each sample includes at least one sample feature.

In another example, the offset calculation module 31 is specifically configured to: for one of the sample features of the sample in the test set data of the anomaly detection model, determine the actual value of the sample feature in the sample Obtain the mean value of the sample features in the training set data; determine the distance that the actual value deviates from the mean several times the variance as the degree of offset; the distribution parameters include: the mean and variance of the sample features .

At least one embodiment of the present specification also provides an interpretation feature determination device for anomaly detection. The device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. The processor executes the The program implements the following steps:

The execution steps of the steps shown in the above method embodiments are not limited to the sequence in the flowchart. In addition, the description of each step can be implemented in the form of software, hardware, or a combination thereof. For example, those skilled in the art can implement it in the form of software code, which can be executable by a computer capable of implementing the logical function corresponding to the step instruction. When it is implemented in software, the executable instructions can be stored in the memory and executed by the processor in the device.

The device or module explained in the above embodiments may be implemented by a computer chip or entity, or by a product with a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or any combination of these devices.

For the convenience of description, when describing the above device, the functions are divided into various modules and described separately. Of course, when implementing one or more embodiments of this specification, the functions of each module may be implemented in one or more software and / or hardware.

Those skilled in the art should understand that one or more embodiments of this specification may be provided as a method, system, or computer program product. Therefore, one or more embodiments of this specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may employ computer programs implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code The form of the product.

These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.

It should also be noted that the terms "include", "include" or any other variant thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device that includes a series of elements not only includes those elements, but also includes Other elements not explicitly listed, or include elements inherent to such processes, methods, goods, or equipment. Without more restrictions, the element defined by the sentence "include one ..." does not exclude that there are other identical elements in the process, method, commodity, or equipment that includes the element.

One or more embodiments of this specification may be described in the general context of computer-executable instructions executed by a computer, such as program modules. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of this specification can also be practiced in distributed computing environments in which tasks are performed by remote processing devices connected through a communication network. In a distributed computing environment, program modules may be located in local and remote computer storage media including storage devices.

The embodiments in this specification are described in a progressive manner. The same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the embodiments of the data collection device or the data processing device, since they are basically similar to the method embodiments, the description is relatively simple. For the relevant parts, please refer to the description of the method embodiments.

The foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve the desired results. In addition, the processes depicted in the drawings do not necessarily require the particular order shown or sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The above are only preferred embodiments of one or more embodiments of this specification, and are not intended to limit this disclosure. Any modification, equivalent replacement, improvement, etc., made within the spirit and principle of this disclosure, All should be included in the protection scope of the present disclosure.

Claims

An interpretation feature determination method for anomaly detection, the method includes:

For a sample of the input anomaly detection model, the sample includes at least one sample feature, and the degree of deviation of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used to indicate that the sample feature is in the anomaly Distribution characteristics in the training set data of the detection model; the anomaly detection model is an unsupervised model;

According to the deviation degree of each sample feature in the sample, at least one sample feature is determined as the interpretation feature corresponding to the sample, and the interpretation feature is used to interpret the model output result of the sample and the corresponding anomaly detection model Associations.
The method according to claim 1, before determining the degree of deviation of the sample feature according to the distribution parameter of each sample feature, the method further comprises:

According to the training set data of the anomaly detection model, the distribution parameters of each sample feature in the training set data are obtained respectively.
According to the method of claim 2, the separately obtaining the distribution parameters of each sample feature in the training set data includes:

The training set data includes multiple samples, and each sample includes at least one sample feature;

Obtaining target sample features from each sample of the training set data, to obtain a target feature set including multiple target sample features;

According to the target feature set, the distribution parameters of the target sample features are determined.
The method according to claim 1,

The distribution parameters include: the mean and variance of the sample features.
The method according to claim 4, the determining the degree of deviation of the sample feature according to the distribution parameter of each sample feature comprises:

For one of the sample features of the sample in the test set data of the anomaly detection model, determine the actual value of the sample feature in the sample;

Obtaining the mean value of the sample features in the training set data;

The distance of the actual value from the mean value by several times the variance is determined as the degree of deviation.
According to the method of claim 1, the determining at least one sample feature as the interpretation feature corresponding to the sample according to the degree of deviation of each sample feature in the sample includes:

According to the deviation degree of each sample feature in the sample, the respective sample features are sorted in descending order, and the at least one sample feature sorted in the first preset number of digits is used as the interpretation feature.
An interpretation feature determination device for anomaly detection, the device comprising:

An offset calculation module, for a sample of the input anomaly detection model, the sample includes at least one sample feature, and the offset of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used for Indicates the distribution characteristics of the sample feature in the training set data of the anomaly detection model; the anomaly detection model is an unsupervised model;

A feature determination module, configured to determine at least one sample feature as an interpretation feature corresponding to the sample according to the deviation degree of each sample feature in the sample, and the interpretation feature is used to interpret the sample and the corresponding anomaly Check the correlation between the model output results of the model.
The device according to claim 7, further comprising:

The distribution calculation module is used to obtain target sample features from each sample of the training set data to obtain a target feature set including multiple target sample features; according to the target feature set, determine the distribution parameters of the target sample features; The training set data includes multiple samples, and each sample includes at least one sample feature.
The device according to claim 7,

The offset calculation module is specifically configured to: for one of the sample features of the sample in the test set data of the anomaly detection model, determine the actual value of the sample feature in the sample; obtain the sample feature in The mean value in the training set data; determine the distance that the actual value deviates from the mean by several times the variance as the degree of offset; and the distribution parameters include: the mean and variance of the sample features.
An interpretation feature determination device for anomaly detection. The device includes a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the program, the following steps are implemented:

For a sample of the input anomaly detection model, the sample includes at least one sample feature, and the degree of deviation of the sample feature is determined according to the distribution parameter of each sample feature; the distribution parameter is used to indicate that the sample feature is in the anomaly Distribution characteristics in the training set data of the detection model; the anomaly detection model is an unsupervised model;

According to the deviation degree of each sample feature in the sample, at least one sample feature is determined as the interpretation feature corresponding to the sample, and the interpretation feature is used to interpret the model output result of the sample and the corresponding anomaly detection model Associations.