CN109583470A - A kind of explanation feature of abnormality detection determines method and apparatus - Google Patents

A kind of explanation feature of abnormality detection determines method and apparatus Download PDF

Info

Publication number
CN109583470A
CN109583470A CN201811208609.2A CN201811208609A CN109583470A CN 109583470 A CN109583470 A CN 109583470A CN 201811208609 A CN201811208609 A CN 201811208609A CN 109583470 A CN109583470 A CN 109583470A
Authority
CN
China
Prior art keywords
sample
abnormality detection
feature
sample characteristics
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811208609.2A
Other languages
Chinese (zh)
Inventor
方文静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811208609.2A priority Critical patent/CN109583470A/en
Publication of CN109583470A publication Critical patent/CN109583470A/en
Priority to PCT/CN2019/097171 priority patent/WO2020078059A1/en
Priority to TW108126301A priority patent/TWI723476B/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The explanation feature that this specification embodiment provides a kind of abnormality detection determines method and apparatus, wherein, method may include: a sample for inputting abnormality detection model, the sample includes at least one sample characteristics, and the drift rate of the sample characteristics is determined according to the distribution parameter of each sample characteristics;The distribution parameter is for indicating characteristic distributions of the sample characteristics in the training set data of the abnormality detection model;The abnormality detection model is unsupervised model;According to the drift rate of each sample characteristics in the sample, at least one sample characteristics is determined as the corresponding explanation feature of the sample, the explanation feature is used to explain the association between the sample and the model output result of the corresponding abnormality detection model.

Description

A kind of explanation feature of abnormality detection determines method and apparatus
Technical field
This disclosure relates to big data technical field, in particular to the explanation feature of a kind of abnormality detection determines method and dress It sets.
Background technique
Abnormality detection is more important a part in data mining, can be applied to intrusion detection, fraud detection, event The multiple fields such as barrier detection, system health detection, sensor network event detection and disturbances in ecosystems detection.Actual different In often detection application, one of algorithm is unsupervised abnormality detection model.Abnormality detection model is often one black Box, user can not perceive its inner workings, and in order to improve the confidence level for using model, model explanation just seems to Guan Chong It wants.By to model explanation, it will be further appreciated that the output of model as a result, for example actually which feature of input sample to mould Type output influences maximum.By model explanation analysis directions can be provided for the reason of output result of abnormality detection model.
Summary of the invention
In view of this, the explanation feature that this specification one or more embodiment provides a kind of abnormality detection determine method and Device, to improve the accuracy that the explanation feature of abnormality detection obtains.
Specifically, this specification one or more embodiment is achieved by the following technical solution:
In a first aspect, the explanation feature for providing a kind of abnormality detection determines method, which comprises
For inputting a sample of abnormality detection model, the sample includes at least one sample characteristics, according to each The distribution parameter of sample characteristics determines the drift rate of the sample characteristics;The distribution parameter is for indicating the sample characteristics in institute State the characteristic distributions in the training set data of abnormality detection model;The abnormality detection model is unsupervised model;
According to the drift rate of each sample characteristics in the sample, determine at least one sample characteristics as the sample Corresponding explanation feature, it is described to explain that feature is used to explain that the sample to be exported with the model of the corresponding abnormality detection model As a result the association between.
Second aspect, provides a kind of explanation feature determining device of abnormality detection, and described device includes:
Drift rate computing module, for a sample for inputting abnormality detection model, the sample includes at least one A sample characteristics determine the drift rate of the sample characteristics according to the distribution parameter of each sample characteristics;The distribution parameter is used In indicating characteristic distributions of the sample characteristics in the training set data of the abnormality detection model;The abnormality detection model is Unsupervised model;
Characteristic determination module determines at least one sample for the drift rate according to each sample characteristics in the sample Eigen is as the corresponding explanation feature of the sample, and the explanation feature is for explaining the sample and the corresponding exception Association between the model output result of detection model.
The third aspect, the explanation feature for providing a kind of abnormality detection determine that equipment, the equipment include memory, processor And storage is on a memory and the computer program that can run on a processor, when the processor executes described program realization with Lower step:
For inputting a sample of abnormality detection model, the sample includes at least one sample characteristics, according to each The distribution parameter of sample characteristics determines the drift rate of the sample characteristics;The distribution parameter is for indicating the sample characteristics in institute State the characteristic distributions in the training set data of abnormality detection model;The abnormality detection model is unsupervised model;
According to the drift rate of each sample characteristics in the sample, determine at least one sample characteristics as the sample Corresponding explanation feature, it is described to explain that feature is used to explain that the sample to be exported with the model of the corresponding abnormality detection model As a result the association between.
The explanation feature of the abnormality detection of this specification one or more embodiment determines method and apparatus, passes through basis point Cloth parameter finds abnormal explanation feature, this is the data distribution feature of characteristic value based on sample characteristics itself, to find solution Feature is released, unrelated with model and independent of model, therefore, the not perfect such as sample imbalance of model relevant information will not The detection for explaining feature is influenced, also, explains feature using distribution parameter identification, meets the abnormal point numerical point of abnormality detection Cloth feature, the accuracy for explaining that feature obtains are higher.
Detailed description of the invention
In order to illustrate more clearly of this specification one or more embodiment or technical solution in the prior art, below will A brief introduction will be made to the drawings that need to be used in the embodiment or the description of the prior art, it should be apparent that, it is described below Attached drawing is only some embodiments recorded in this specification one or more embodiment, and those of ordinary skill in the art are come It says, without any creative labor, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the schematic illustration for the abnormality detection that this specification one or more embodiment provides;
Fig. 2 is the determination method of the explanation feature for the abnormality detection that this specification one or more embodiment provides;
Fig. 3 is a kind of determining device of the explanation feature for abnormality detection that this specification one or more embodiment provides Structural schematic diagram;
Fig. 4 is the determining device of the explanation feature for another abnormality detection that this specification one or more embodiment provides Structural schematic diagram.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification one or more embodiment, Below in conjunction with the attached drawing in this specification one or more embodiment, to the technology in this specification one or more embodiment Scheme is clearly and completely described, it is clear that described embodiment is only a part of the embodiment, rather than whole realities Apply example.Based on this specification one or more embodiment, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present application.
Abnormality detection is also referred to as outlier detection, and outlier is the object for deviating considerably from other data points, outlier It is not quite alike with most data, also only account for sub-fraction in whole data, abnormality detection need by these from Group's point is distinguished from data.For example, can be used for identifying abnormal transaction.
The explanation feature that at least one embodiment of this specification provides a kind of abnormality detection determines method, and this method can be with It is released applied to unsupervised abnormality detection solution to model, and the interpretation scheme may not need and introduce additional interpretation model, And abnormality detection model itself will not be depended on.
The Partial Feature being related in this method description is illustrated as follows:
Sample: the sample can be for the input as abnormality detection model, and can correspond to an abnormality detection The model of model exports result.For example, A can be inputted abnormality detection model, and the B of model output is obtained, then A is institute State sample.
Sample characteristics: a sample can have at least one sample characteristics, which exists for describing the sample The attribute properties of different aspect.For example, the sample can be user identifier be 1100 user, the sample include at least one Sample characteristics may include: age, address, length of service of the user etc..Wherein, the age is a sample characteristics, and address can To be another sample characteristics.
Explain feature: in machine learning task, different models is suggested, to model to problem.In addition to model Directly export other than, we also need to be further understood from result, for example, actually which feature on model output influence most It greatly, is that factor determines output corresponding to it actually, this just needs to explain model accordingly.This specification is real Applying is indicated that the feature that result explains, the explanation can be exported to the model of abnormality detection model with " explaining feature " in example Feature can be used for explaining the association between the input sample of abnormality detection model and model output result.For example, by sample Y1 Input abnormality detection model obtains model output result D1, and the explanation determined is characterized in t1 and t2, then, include in sample Y1 Feature t1 and t2 to output D1 contribution margin it is higher, it may be possible to since the two sample characteristics t1 and t2 just cause to obtain D1.Explain that feature can be the Partial Feature by determining in above-mentioned sample characteristics, for example, sample characteristics may include F1, F2 And F3, explain that feature can be F1 and F2 therein.
On the basis of features described above explanation, the explanation feature that this specification embodiment is described below determines method.
Shown in Figure 1, the process of abnormality detection includes " training " and " prediction " two processes.Wherein, in " training " Stage can go to train abnormality detection model by training set data.In " prediction " stage, so that it may will be in test set data Input of some sample as the abnormality detection model, to predict whether the sample of the input is abnormal data.And this specification What at least one embodiment provided releases in scheme abnormality detection solution to model, with above-mentioned training abnormality detection model and application It is unrelated that the model, which carries out prediction, that is, the training prediction that solution to model is released with model is two independently operated parts.
Continuing with referring to Fig. 1, and as shown in connection with fig. 2, Fig. 2 describes a kind of determination side of the explanation feature of abnormality detection Method.Wherein, it is necessary first to which explanation, this method are explained, i.e. needle when explaining abnormality detection model using partial model Respective explanations are provided to the prediction of the specific sample of a certain item.
As shown in Fig. 2, this method may include:
In step 200, it according to the training set data of abnormality detection model, obtains respectively each in the training set data The distribution parameter of sample characteristics.
In this step, which can be unsupervised model.
The training set data can be the data for training abnormality detection model, can be in the training set data It may include at least one sample characteristics in each sample including multiple samples.
Illustratively, which can be the user that user identifier is 1100, at least one sample for including in the sample Feature may include: age, address, length of service, annual income of the user etc..
Each sample characteristics can obtain a corresponding distribution parameter, for example, sample characteristics " age " corresponding one A distribution parameter S 1, the corresponding distribution parameter S 2 of sample characteristics " length of service ".
And the acquisition of the distribution parameter of each sample characteristics, it can be in each sample by the training set data respectively Identical sample characteristics are obtained, which is properly termed as target sample feature, and then obtains including multiple targets The target signature collection of sample characteristics;And according to the target signature collection, the distribution parameter of the target sample feature is determined.
For example, may include multiple samples in training set data by taking sample characteristics " annual income " as an example, it is assumed that including mark For 1100 user, be identified as 1101 user and be identified as 1102 user.Include in the sample characteristics of each user It is somebody's turn to do " annual income "." annual income " sample characteristics can be somebody's turn to do by obtaining respectively in each sample, this feature is properly termed as target sample Feature.An available target signature collection, the target signature concentrate " annual income " including above three user.It then can be with According to the characteristic value of " annual income " that the target signature is concentrated, the corresponding distribution parameter of this feature " annual income " is determined.
Distribution parameter can be used to indicate that characteristic distributions of the sample characteristics in the training set data of abnormality detection model.Example Such as, in abnormality detection, multivariate Gaussian models are a kind of classic algorithms, and data are assumed to be every dimensional feature distribution and meet normal state point Cloth has a famous 3-sigma principle under this hypothesis, is contained near the mean value in 3 variance regional scopes 99.7% data, and other than this region can be considered as an abnormal point (outlier).Certainly there can also be 2- Sigma principle, 1-sigma principle etc..
The description above illustrates a kind of data distribution feature, the abnormal point of the abnormality detection identification of being detected, by dividing From the point of view of in cloth feature, usually deviate the point of most of data regions, and most of data regions are that have Certain features, for example, near the mean value in the regional scope of 3 variances.
Based on above-mentioned, for example, the distribution parameter calculated in this step may include: the mean value and variance of sample characteristics.Example Such as, mean value can be indicated with u, and variance can be indicated with s.
In step 202, for a sample of input abnormality detection model, the input sample includes at least one sample Eigen determines the drift rate of the sample characteristics according to the distribution parameter of each sample characteristics.
In this step, the sample is a sample in test set data, and test set data may include multiple samples This, each sample may include at least one sample characteristics.As previously described, this method is to the interpretation scheme of abnormality detection It explains, i.e., the abnormality detection of each specific sample is explained applied to partial model.
Input different for example, the abnormality detection model that sample Y1 input training is completed obtains model output result D1, sample Y2 Normal detection model obtains model output result D2, and the model explanation of this method is applied to explain the pass between Y1 and D1 respectively Connection and the association between Y2 and D2.For example, which feature of Y1 is larger to the contribution for obtaining result D1, which feature of Y2 It is larger to the contribution for obtaining D2.Therefore, step 202 and step 204, which can be, holds one of sample in test set data Row.
Similar with training set data, each of test set data sample also may include multiple sample characteristics.This In step, its corresponding drift rate is calculated to each sample characteristics, which can be one for measuring the sample characteristics Whether the index of above-mentioned " most of data region " is in.
For example, drift rate can be calculated based on following principle: to every one-dimensional characteristic, it is inclined that each new samples can be calculated With a distance from several times from mean value on training set variances, the deviation the more, prove that data are more abnormal.So, using distribution parameter as mean value and For variance, following formula (1) can be used as the calculation formula of drift rate:
N=(v-u)/s ... ... (1)
In above-mentioned formula (1), n is drift rate, which can provide a unified exception for different sample characteristics Measurement index.V is the actual characteristic value of a sample characteristics in the sample in sample;U is united based on training set data Count the mean value of the obtained sample characteristics;S is the variance of the sample characteristics counted based on training set data.According to formula (1), determine that the actual value deviates the distance of several times of variances of the mean value, as the drift rate.
In step 204, according to the drift rate of each sample characteristics in the sample, at least one sample characteristics is determined Explanation feature as this corresponding abnormality detection of the sample.
Wherein, the explanation feature is for explaining the sample inputted in this abnormality detection and model output result Between association.For example, sample Y1 input abnormality detection model is obtained model output result D1, and the explanation determined is characterized in T1 and t2, then, it include this feature t1 and t2 in sample Y1, also, the t1 and t2 is higher to the contribution margin of output D1, it may be possible to Since the two sample characteristics t1 and t2 just cause to have obtained model output result D1.It is, of course, also possible in the base for explaining feature The reason of corresponding abnormality detection of this Y1 of further detailed analysis exports result D1 on plinth.
For example, explaining that the preparation method of feature may is that the inclined of each sample characteristics in the sample according to input model Each sample characteristics are carried out descending arrangement by shifting degree, and at least one sample characteristics by sequence in preceding presetting digit capacity are made For the explanation feature.This method is to have chosen the higher sample characteristics of several drift rates as explanation feature.In specific implementation, It is not limited to this method, for example, it is also possible to set drift rate threshold value, drift rate is higher than the sample characteristics of the threshold value as explanation Feature.
Above-mentioned each step can be executed on the same device respectively, can also be executed on different devices.For example, Step 200 can be executed in an equipment, belong to the training stage, i.e. the training stage of abnormality detection model may include two portions Point, a part is the training of conventional abnormality detection model, and another part is to obtain distribution parameter according to training set data.And it walks Rapid 202 and step 204 can be executed in another equipment (can also be with same equipment), belong to the forecast period of model, i.e., extremely The forecast period of detection model also includes two parts, a part be it is conventional carry out predicting whether exception using model, it is another Part is to be obtained explaining feature according to distribution parameter.In each stage, training stage or forecast period, model explanation scheme and The training prediction scheme of model, can be independent operating.It is of course also possible to be calculate distribution parameter on one side trained while, or It is calculated while predicting according to input sample and explains feature.
The determination method of the explanation feature of the abnormality detection of at least one embodiment of this specification, by according to distribution parameter Abnormal explanation feature is found, this is the data distribution feature of characteristic value based on sample characteristics itself, come feature of finding the explanation, Unrelated with model and independent of model, therefore, the not perfect such as sample imbalance of model relevant information does not interfere with It explains the detection of feature, also, explains feature using distribution parameter identification, the abnormal point numerical distribution for meeting abnormality detection is special Point, the accuracy for explaining that feature obtains are higher.
Fig. 3 is a kind of explanation feature determining device for abnormality detection that this specification one or more embodiment provides, such as Shown in Fig. 3, the apparatus may include: drift rate computing module 31 and characteristic determination module 32.
Drift rate computing module 31, for a sample for inputting abnormality detection model, the sample includes at least One sample characteristics, the drift rate of the sample characteristics is determined according to the distribution parameter of each sample characteristics;The distribution parameter For indicating characteristic distributions of the sample characteristics in the training set data of the abnormality detection model;The abnormality detection model It is unsupervised model;
Characteristic determination module 32 determines at least one for the drift rate according to each sample characteristics in the sample Sample characteristics as the corresponding explanation feature of the sample, the explanation feature for explain the sample with it is corresponding described different Association between the model output result of normal detection model.
Fig. 4 is the explanation feature determining device for another abnormality detection that this specification one or more embodiment provides, As shown in figure 4, can also include: distribution calculation module 33 on the basis of device structure shown in Fig. 3.
Distribution calculation module 33 is obtained for obtaining target sample feature respectively in each sample by training set data Target signature collection including multiple target sample features;According to the target signature collection, point of the target sample feature is determined Cloth parameter;The training set data includes multiple samples, and each sample includes at least one sample characteristics.
In another example, drift rate computing module 31, is specifically used for: for the test set of the abnormality detection model One of sample characteristics of sample described in data determine the actual value of the sample characteristics in the sample;Obtain institute State mean value of the sample characteristics in training set data;Determine that the actual value deviates the distance of several times of variances of the mean value, as The drift rate;The distribution parameter includes: the mean value and variance of the sample characteristics.
The explanation feature that at least one embodiment of this specification additionally provides a kind of abnormality detection determines equipment, the equipment Including memory, processor and the computer program that can be run on a memory and on a processor is stored, the processor is held It is performed the steps of when row described program
For inputting a sample of abnormality detection model, the sample includes at least one sample characteristics, according to each The distribution parameter of sample characteristics determines the drift rate of the sample characteristics;The distribution parameter is for indicating the sample characteristics in institute State the characteristic distributions in the training set data of abnormality detection model;The abnormality detection model is unsupervised model;
According to the drift rate of each sample characteristics in the sample, determine at least one sample characteristics as the sample Corresponding explanation feature, it is described to explain that feature is used to explain that the sample to be exported with the model of the corresponding abnormality detection model As a result the association between.
Each step in process shown in above method embodiment, execution sequence are not limited to suitable in flow chart Sequence.In addition, the description of each step, can be implemented as software, hardware or its form combined, for example, those skilled in the art Member can implement these as the form of software code, can be can be realized the computer of the corresponding logic function of the step can It executes instruction.When it is realized in the form of software, the executable instruction be can store in memory, and by equipment Processor execute.
The device or module that above-described embodiment illustrates can specifically realize by computer chip or entity, or by having The product of certain function is realized.A kind of typically to realize that equipment is computer, the concrete form of computer can be personal meter Calculation machine, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation are set It is any several in standby, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various modules when description apparatus above with function to describe respectively.Certainly, implementing this The function of each module can be realized in the same or multiple software and or hardware when specification one or more embodiment.
It should be understood by those skilled in the art that, this specification one or more embodiment can provide for method, system or Computer program product.Therefore, complete hardware embodiment can be used in this specification one or more embodiment, complete software is implemented The form of example or embodiment combining software and hardware aspects.Moreover, this specification one or more embodiment can be used one It is a or it is multiple wherein include computer usable program code computer-usable storage medium (including but not limited to disk storage Device, CD-ROM, optical memory etc.) on the form of computer program product implemented.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
This specification one or more embodiment can computer executable instructions it is general on It hereinafter describes, such as program module.Generally, program module includes executing particular task or realization particular abstract data type Routine, programs, objects, component, data structure etc..Can also practice in a distributed computing environment this specification one or Multiple embodiments, in these distributed computing environments, by being executed by the connected remote processing devices of communication network Task.In a distributed computing environment, the local and remote computer that program module can be located at including storage equipment is deposited In storage media.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.It is adopted especially for data For collecting equipment or data processing equipment embodiment, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
The foregoing is merely the preferred embodiments of this specification one or more embodiment, not to limit this public affairs It opens, all within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the disclosure Within the scope of protection.

Claims (10)

1. a kind of explanation feature of abnormality detection determines method, which comprises
For inputting a sample of abnormality detection model, the sample includes at least one sample characteristics, according to each sample The distribution parameter of feature determines the drift rate of the sample characteristics;The distribution parameter is for indicating the sample characteristics described different Characteristic distributions in the training set data of normal detection model;The abnormality detection model is unsupervised model;
According to the drift rate of each sample characteristics in the sample, determine that at least one sample characteristics is corresponding as the sample Explanation feature, the model output result explained feature and be used to explain the sample and the corresponding abnormality detection model Between association.
2. according to the method described in claim 1, the distribution parameter according to each sample characteristics determines the sample characteristics Drift rate before, the method also includes:
According to the training set data of the abnormality detection model, point of each sample characteristics in the training set data is obtained respectively Cloth parameter.
3. according to the method described in claim 2, the distribution for obtaining each sample characteristics in the training set data respectively Parameter, comprising:
The training set data includes multiple samples, and each sample includes at least one sample characteristics;
By obtaining target sample feature respectively in each sample of the training set data, obtain including multiple target sample features Target signature collection;
According to the target signature collection, the distribution parameter of the target sample feature is determined.
4. according to the method described in claim 1,
The distribution parameter includes: the mean value and variance of the sample characteristics.
5. according to the method described in claim 4, the distribution parameter according to each sample characteristics determines the sample characteristics Drift rate, comprising:
One of sample characteristics of sample described in test set data for the abnormality detection model, determine the sample The actual value of feature in the sample;
Obtain mean value of the sample characteristics in training set data;
Determine that the actual value deviates the distance of several times of variances of the mean value, as the drift rate.
6. according to the method described in claim 1, the drift rate according to each sample characteristics in sample, determines at least one A sample characteristics are as the corresponding explanation feature of the sample, comprising:
According to the drift rate of each sample characteristics in the sample, each sample characteristics are subjected to descending arrangement, and will At least one sample characteristics described in preceding presetting digit capacity sort as the explanation feature.
7. a kind of explanation feature determining device of abnormality detection, described device include:
Drift rate computing module, for a sample for inputting abnormality detection model, the sample includes at least one sample Eigen determines the drift rate of the sample characteristics according to the distribution parameter of each sample characteristics;The distribution parameter is used for table Show characteristic distributions of the sample characteristics in the training set data of the abnormality detection model;The abnormality detection model is no prison Superintend and direct model;
Characteristic determination module determines at least one sample spy for the drift rate according to each sample characteristics in the sample Sign is as the corresponding explanation feature of the sample, and the explanation feature is for explaining the sample and the corresponding abnormality detection Association between the model output result of model.
8. device according to claim 7, described device further include:
Distribution calculation module obtains including more for obtaining target sample feature respectively in each sample by training set data The target signature collection of a target sample feature;According to the target signature collection, the distribution parameter of the target sample feature is determined; The training set data includes multiple samples, and each sample includes at least one sample characteristics.
9. device according to claim 7,
Drift rate computing module, is specifically used for: sample described in the test set data for the abnormality detection model is wherein One sample characteristics determines the actual value of the sample characteristics in the sample;The sample characteristics are obtained in training set number Mean value in;Determine that the actual value deviates the distance of several times of variances of the mean value, as the drift rate;The distribution ginseng Number includes: the mean value and variance of the sample characteristics.
10. a kind of explanation feature of abnormality detection determines equipment, the equipment includes memory, processor and is stored in memory Computer program that is upper and can running on a processor, the processor perform the steps of when executing described program
For inputting a sample of abnormality detection model, the sample includes at least one sample characteristics, according to each sample The distribution parameter of feature determines the drift rate of the sample characteristics;The distribution parameter is for indicating the sample characteristics described different Characteristic distributions in the training set data of normal detection model;The abnormality detection model is unsupervised model;
According to the drift rate of each sample characteristics in the sample, determine that at least one sample characteristics is corresponding as the sample Explanation feature, the model output result explained feature and be used to explain the sample and the corresponding abnormality detection model Between association.
CN201811208609.2A 2018-10-17 2018-10-17 A kind of explanation feature of abnormality detection determines method and apparatus Pending CN109583470A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201811208609.2A CN109583470A (en) 2018-10-17 2018-10-17 A kind of explanation feature of abnormality detection determines method and apparatus
PCT/CN2019/097171 WO2020078059A1 (en) 2018-10-17 2019-07-23 Interpretation feature determination method and device for anomaly detection
TW108126301A TWI723476B (en) 2018-10-17 2019-07-25 Interpretation feature determination method, device and equipment for abnormal detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811208609.2A CN109583470A (en) 2018-10-17 2018-10-17 A kind of explanation feature of abnormality detection determines method and apparatus

Publications (1)

Publication Number Publication Date
CN109583470A true CN109583470A (en) 2019-04-05

Family

ID=65920123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811208609.2A Pending CN109583470A (en) 2018-10-17 2018-10-17 A kind of explanation feature of abnormality detection determines method and apparatus

Country Status (3)

Country Link
CN (1) CN109583470A (en)
TW (1) TWI723476B (en)
WO (1) WO2020078059A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027607A (en) * 2019-11-29 2020-04-17 泰康保险集团股份有限公司 Unsupervised high-dimensional data feature importance evaluation and selection method and unsupervised high-dimensional data feature importance evaluation and selection device
WO2020078059A1 (en) * 2018-10-17 2020-04-23 阿里巴巴集团控股有限公司 Interpretation feature determination method and device for anomaly detection
CN111262887A (en) * 2020-04-26 2020-06-09 腾讯科技(深圳)有限公司 Network risk detection method, device, equipment and medium based on object characteristics
CN111340102A (en) * 2020-02-24 2020-06-26 支付宝(杭州)信息技术有限公司 Method and apparatus for evaluating model interpretation tools
CN116130095A (en) * 2023-04-04 2023-05-16 深圳市金瑞铭科技有限公司 State monitoring method and device based on sensing technology and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767938B (en) * 2020-05-09 2023-12-19 北京奇艺世纪科技有限公司 Abnormal data detection method and device and electronic equipment
CN116304641B (en) * 2023-05-15 2023-09-15 山东省计算中心(国家超级计算济南中心) Anomaly detection interpretation method and system based on reference point search and feature interaction
CN116881724B (en) * 2023-09-07 2023-12-19 中国电子科技集团公司第十五研究所 Sample labeling method, device and equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9218698B2 (en) * 2012-03-14 2015-12-22 Autoconnect Holdings Llc Vehicle damage detection and indication
ES2700498T3 (en) * 2012-07-25 2019-02-18 Theranos Ip Co Llc System for the analysis of a sample
CN106776641B (en) * 2015-11-24 2020-09-08 华为技术有限公司 Data processing method and device
WO2018061842A1 (en) * 2016-09-27 2018-04-05 東京エレクトロン株式会社 Abnormality detection program, abnormality detection method and abnormality detection device
CN108108743B (en) * 2016-11-24 2022-06-24 百度在线网络技术(北京)有限公司 Abnormal user identification method and device for identifying abnormal user
CN108038211A (en) * 2017-12-13 2018-05-15 南京大学 A kind of unsupervised relation data method for detecting abnormality based on context
CN108512827B (en) * 2018-02-09 2021-09-21 世纪龙信息网络有限责任公司 Method, device, equipment and storage medium for establishing abnormal login identification and supervised learning model
CN109583470A (en) * 2018-10-17 2019-04-05 阿里巴巴集团控股有限公司 A kind of explanation feature of abnormality detection determines method and apparatus

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020078059A1 (en) * 2018-10-17 2020-04-23 阿里巴巴集团控股有限公司 Interpretation feature determination method and device for anomaly detection
CN111027607A (en) * 2019-11-29 2020-04-17 泰康保险集团股份有限公司 Unsupervised high-dimensional data feature importance evaluation and selection method and unsupervised high-dimensional data feature importance evaluation and selection device
CN111027607B (en) * 2019-11-29 2023-10-17 泰康保险集团股份有限公司 Unsupervised high-dimensional data feature importance assessment and selection method and device
CN111340102A (en) * 2020-02-24 2020-06-26 支付宝(杭州)信息技术有限公司 Method and apparatus for evaluating model interpretation tools
CN111262887A (en) * 2020-04-26 2020-06-09 腾讯科技(深圳)有限公司 Network risk detection method, device, equipment and medium based on object characteristics
CN111262887B (en) * 2020-04-26 2020-08-28 腾讯科技(深圳)有限公司 Network risk detection method, device, equipment and medium based on object characteristics
CN116130095A (en) * 2023-04-04 2023-05-16 深圳市金瑞铭科技有限公司 State monitoring method and device based on sensing technology and storage medium
CN116130095B (en) * 2023-04-04 2023-07-11 深圳市金瑞铭科技有限公司 State monitoring method and device based on sensing technology and storage medium

Also Published As

Publication number Publication date
TW202044111A (en) 2020-12-01
TWI723476B (en) 2021-04-01
WO2020078059A1 (en) 2020-04-23

Similar Documents

Publication Publication Date Title
CN109583470A (en) A kind of explanation feature of abnormality detection determines method and apparatus
CN107633331A (en) Time series models method for building up and device
CN109063886A (en) A kind of method for detecting abnormality, device and equipment
TWI675580B (en) Method and device for user authentication based on feature information
CN106453437A (en) Equipment identification code acquisition method and device
JP2020501232A (en) Risk control event automatic processing method and apparatus
CN109934268B (en) Abnormal transaction detection method and system
JP6737277B2 (en) Manufacturing process analysis device, manufacturing process analysis method, and manufacturing process analysis program
CN105677572B (en) Based on self organizing maps model cloud software performance exception error diagnostic method and system
Grbac et al. Stability of software defect prediction in relation to levels of data imbalance
CN108647737A (en) A kind of auto-adaptive time sequence variation detection method and device based on cluster
JP6739622B2 (en) Measurement-yield correlation analysis method and system
WO2021120845A1 (en) Homogeneous risk unit feature set generation method, apparatus and device, and medium
CN109976986B (en) Abnormal equipment detection method and device
CN102508766A (en) Static analysis method of errors during operation of aerospace embedded C language software
CN111540202B (en) Similar bayonet determining method and device, electronic equipment and readable storage medium
Armah et al. A deep analysis of the precision formula for imbalanced class distribution
CN110213123A (en) A kind of flux monitoring method, device and equipment
CN115665783A (en) Abnormal index tracing method and device, electronic equipment and storage medium
US10817366B2 (en) Method and apparatus for tracing common cause failure in integrated drawing
CN102902838A (en) Trend-based target setting method and system for process control
US9286353B2 (en) Method for generating processing specifications for a stream of data items
US20190384272A1 (en) System section data management device and method thereof
CN110309047A (en) A kind of test point generation method, apparatus and system
CN110018957A (en) A kind of money damage verification script detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190405

RJ01 Rejection of invention patent application after publication