CN111539769A - Training method and device of anomaly detection model based on differential privacy - Google Patents

Training method and device of anomaly detection model based on differential privacy Download PDF

Info

Publication number
CN111539769A
CN111539769A CN202010343419.2A CN202010343419A CN111539769A CN 111539769 A CN111539769 A CN 111539769A CN 202010343419 A CN202010343419 A CN 202010343419A CN 111539769 A CN111539769 A CN 111539769A
Authority
CN
China
Prior art keywords
vector
gradient
sample
evaluation
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010343419.2A
Other languages
Chinese (zh)
Inventor
熊涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010343419.2A priority Critical patent/CN111539769A/en
Publication of CN111539769A publication Critical patent/CN111539769A/en
Priority to TW110110603A priority patent/TWI764640B/en
Priority to PCT/CN2021/089398 priority patent/WO2021218828A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the specification provides a training method of an anomaly detection model based on differential privacy, which comprises the following steps: inputting a first vector of any sample in the training set into an encoding network, outputting a second vector of the reduced dimension through an encoder, and outputting a third vector of the restored vector through a decoder. Then, an evaluation vector is constructed based on the second vector, the evaluation vector is input to an evaluation network, and the sub-distribution probability that the sample output by the evaluation network belongs to K sub-Gaussian distributions in the mixed Gaussian distribution is obtained. And then, according to the evaluation vector and the sub-distribution probability corresponding to each sample in the training set, obtaining a first probability of the arbitrary sample in the Gaussian mixture distribution. A prediction penalty is determined therefrom that is inversely related to the first probability for each sample and inversely related to a similarity between the first vector and the third vector. Further, noise is added to the original gradient obtained based on the predicted loss by the differential privacy method, and the model parameters of the abnormality detection model are adjusted by using the gradient including the noise.

Description

Training method and device of anomaly detection model based on differential privacy
Technical Field
One or more embodiments of the present specification relate to the field of computer technology, and in particular, to a method and an apparatus for training an anomaly detection model based on differential privacy, which are executed by a computer.
Background
With the development of computer technology, security becomes an increasing concern, such as security of computer data, security of transactions for electronic payments, security of network access, and the like. For this reason, in many scenarios, it is necessary to find abnormal samples that may affect security from a large number of samples and take measures against the abnormal samples.
For example, it is desirable to detect abnormal transaction operations from a large sample of transaction operations, thereby protecting against fraudulent transactions in advance; it is desirable to detect anomalous accesses from a sample of network accesses in order to discover insecure accesses, such as hacking; hopefully, abnormal accounts are found from user accounts which perform various operations, so that accounts suspected of performing high-risk operations (fraud transactions, false transactions such as bill swiping and the like, network attacks) are locked; it is desirable to discover abnormal operations from among a large number of benefit-drawing operations (e.g., operations to draw marketing red packs, rewards, coupons, etc.), to protect against "black-produce" operations to maliciously draw benefits, etc.
However, in many cases, the calibration of the abnormal samples is very time and labor consuming, and the number of abnormal samples is generally small, which makes the conventional typical supervised learning method difficult to function. Therefore, some unsupervised approaches have been proposed to try to detect abnormal samples from a large number of samples. Unsupervised anomaly detection is typically based on an estimate of the distribution probability or density of samples, and statistically finds those outlier samples that deviate from most conventional samples as anomalous samples.
However, the existing unsupervised anomaly detection model often has the risks of leaking the training samples and the defects of insufficient robustness and insufficient generalization capability caused by overfitting. Accordingly, improved approaches are desired that result in safer and more efficient anomaly detection models.
Disclosure of Invention
One or more embodiments of the present specification describe a method for training an anomaly detection model based on differential privacy, so as to obtain an anomaly detection model with privacy protection and robustness.
According to a first aspect, there is provided a training method of an anomaly detection model based on differential privacy, the anomaly detection model comprising a self-coding network and an evaluation network, the self-coding network comprising an encoder and a decoder; the method comprises the following steps:
inputting a first feature vector corresponding to any service sample in a training set into the self-coding network, outputting a second feature vector for reducing the dimension of the first feature vector through the encoder, and outputting a third feature vector for restoring the first feature vector based on the second feature vector through the decoder;
constructing an evaluation vector based on the second feature vector, and inputting the evaluation vector into the evaluation network;
acquiring the sub-distribution probability of K sub-Gaussian distributions of the arbitrary service sample output by the evaluation network, wherein the K sub-Gaussian distributions belong to the mixed Gaussian distribution;
obtaining a first probability of any service sample in the Gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the training set;
determining a prediction loss corresponding to the training set, wherein the prediction loss is inversely related to the first probability corresponding to each business sample and inversely related to a similarity between the first feature vector and the third feature vector corresponding to each business sample;
and adding noise to the original gradient obtained based on the prediction loss by using a differential privacy mode, and adjusting the model parameters of the anomaly detection model by using the gradient containing the noise.
In one embodiment, the evaluation vector is the second feature vector.
In another embodiment, the evaluation vector is constructed by: obtaining a reconstruction error vector based on the first feature vector and the third feature vector; combining the second feature vector and the reconstructed error vector as the evaluation vector.
According to one embodiment, the first probability is determined by: determining the mean value and covariance of each sub-Gaussian distribution in the K sub-Gaussian distributions and the occurrence probability of the sub-Gaussian distribution in the K sub-Gaussian distributions according to the evaluation vector and the sub-distribution probability of each service sample; reconstructing the mixed Gaussian distribution according to the mean value, covariance and occurrence probability of each sub-Gaussian distribution; and substituting the evaluation vector of any service sample into the reconstructed mixed Gaussian distribution to obtain the first probability.
In one embodiment, the step of determining the predicted loss corresponding to the training set may include: determining a first loss item according to the first probability corresponding to each business sample, wherein the first loss item is inversely related to the first probability of each business sample; determining a second loss item according to the similarity between the first eigenvector and the third eigenvector corresponding to each service sample, wherein the second loss item is inversely related to the similarity; and according to a preset weight factor, carrying out weighted summation on the first loss term and the second loss term to obtain the predicted loss.
According to an embodiment, adding noise to the original gradient obtained based on the prediction loss by using a differential privacy method may specifically include: determining an original gradient that reduces the prediction loss according to the prediction loss; based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
In one embodiment, a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network are determined by gradient back propagation, respectively; respectively adding noise in the first original gradient and the second original gradient by using a differential privacy mode to obtain a first noise gradient and a second noise gradient; adjusting a parameter of the evaluation network using the first noise gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
In another embodiment, on the basis of respectively determining a first original gradient and a second original gradient through gradient back propagation, noise is added to the second original gradient in a differential privacy mode to obtain a second noise gradient; adjusting parameters of the evaluation network by using the first original gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
In various embodiments, the arbitrary traffic sample may include one of: sample user, sample merchant, sample event.
According to a second aspect, there is provided a method of predicting an abnormal sample, comprising:
acquiring an anomaly detection model based on differential privacy, which is obtained by training according to the method of the first aspect, wherein the anomaly detection model comprises a self-coding network and an evaluation network, and the self-coding network comprises an encoder and a decoder;
inputting a first target vector corresponding to a target service sample to be detected into the self-coding network, and outputting a second target vector for reducing the dimension of the first target vector through the encoder;
constructing a target evaluation vector based on the second target vector;
inputting the target evaluation vector into a Gaussian mixture distribution constructed by the evaluation network to obtain a target probability of the target service sample in the Gaussian mixture distribution;
and determining whether the target service sample is an abnormal sample or not according to the target probability.
According to a third aspect, there is provided a training apparatus for an anomaly detection model based on differential privacy, the anomaly detection model comprising a self-encoding network and an evaluation network, the self-encoding network comprising an encoder and a decoder; the device comprises:
a first input unit, configured to input a first feature vector corresponding to any service sample in a training set into the self-coding network, output a second feature vector for reducing the dimension of the first feature vector through the encoder, and output a third feature vector for restoring the first feature vector based on the second feature vector through the decoder;
a second input unit configured to construct an evaluation vector based on the second feature vector, and input the evaluation vector into the evaluation network;
a sub-distribution obtaining unit configured to obtain a sub-distribution probability that the arbitrary service sample output by the evaluation network belongs to K sub-Gaussian distributions in a mixed Gaussian distribution;
a probability determining unit configured to obtain a first probability of the arbitrary service sample in the gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the training set;
a loss determining unit configured to determine a prediction loss corresponding to the training set, wherein the prediction loss is negatively correlated with the first probability corresponding to each business sample and is negatively correlated with a similarity between a first feature vector and a third feature vector corresponding to each business sample;
and a parameter adjusting unit configured to add noise to an original gradient obtained based on the prediction loss in a differential privacy manner, and adjust a model parameter of the abnormality detection model using a gradient including the noise.
According to a fourth aspect, there is provided an apparatus for predicting abnormal samples, comprising:
a model obtaining unit configured to obtain an anomaly detection model based on differential privacy, which is trained by the apparatus according to the third aspect, the anomaly detection model including a self-coding network and an evaluation network, the self-coding network including an encoder and a decoder;
the input unit is configured to input a first target vector corresponding to a target service sample to be detected into the self-coding network, and output a second target vector for reducing the dimension of the first target vector through the encoder;
a vector construction unit configured to construct a target evaluation vector based on the second target vector;
the probability determining unit is configured to input the target evaluation vector into a Gaussian mixture distribution constructed by the evaluation network to obtain a target probability of the target service sample in the Gaussian mixture distribution;
and the abnormity judging unit is configured to determine whether the target service sample is an abnormal sample according to the target probability.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.
By the method and the device provided by the embodiment of the specification, the differential privacy is introduced into the anomaly detection model in a gradient descending mode of the differential privacy. The anomaly detection model thus obtained has at least two advantages. Firstly, because differential privacy is introduced, the information of the training sample is difficult to reverse-deduce or identify based on the public model, and privacy protection is provided for the model. Furthermore, the objective of the training process of the unsupervised anomaly detection model is to fit the distribution of the training samples. Conventional training often causes overfitting of some samples, and in particular, some noise samples sometimes exist in a training set, and when a model is overfitting to the noise samples, the predictive performance of the model is reduced. Due to the introduction of the differential privacy, noise is added in the gradient, so that the model can resist the influence of noise samples, the over-fitting condition is avoided, and the robustness and the prediction performance of the abnormal detection model are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 illustrates an architectural diagram of an anomaly detection model according to the concepts of the present technology;
FIG. 2 illustrates a flow diagram of a method of training a differential privacy based anomaly detection model, according to one embodiment;
FIG. 3 illustrates a flow diagram of a method for anomaly detection of traffic samples in one embodiment;
FIG. 4 shows a schematic block diagram of a training apparatus of an anomaly detection model according to one embodiment;
FIG. 5 shows a schematic block diagram of an apparatus to predict an abnormal sample according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Fig. 1 illustrates an architecture diagram of an anomaly detection model according to the technical concept of the present specification. As shown in fig. 1, the anomaly detection model generally includes a self-encoding network 100 and an evaluation network 200, the self-encoding network 100 including an encoder 110 and a decoder 120. The encoder 110 is used for encoding a high-dimensional feature vector x of an input traffic sample into a low-dimensional vector zcThe decoder 120 is based on the low-dimensional vector zcAnd outputting a decoding vector x' for restoring the high-dimensional feature vector x. Trained self-encoding network, encoder derived low-dimensional vector zcThe core characteristics of the original high-dimensional characteristic vector x can be well characterized, and the function of vector dimension reduction is achieved.
The distribution statistics of the samples in the sample set are based on the reduced-dimension low-dimension vector zcAnd then the process is carried out. In particular, a low-dimensional vector z of individual samples of the encoder output may be encodedcInput into the evaluation network 200. According to an embodiment of the present description, the evaluation network 200 is a network based on a mixed gaussian distribution model gmm (gaussian Mixture model) that assumes that a plurality of samples as a whole obey a mixed gaussian distribution that is a combination of K sub-gaussian distributions. Thus, the evaluation network 200 may output, for each sample, its sub-distribution probability belonging to K sub-gaussian distributions, respectively. The whole sub-distribution probability of a plurality of samples can be used for reconstructing the mixed Gaussian distribution, so that the unsupervised training and learning of the GMM are realized.
Further, to enhance the privacy security and robustness of the model, differential privacy may be introduced in the anomaly detection model, in particular in the encoder 110. Specifically, the encoder based on the differential privacy can be obtained by adopting gradient descent based on the differential privacy and adding noise in the gradient in the training process. Therefore, on one hand, the safety of private data is protected, the training samples are prevented from being reversely deduced from the abnormal detection model obtained through training, on the other hand, due to the introduction of differential privacy, the model is prevented from being over-fitted to some samples (particularly the samples with noise interference), and therefore the robustness of the abnormal detection model is improved.
The following describes a specific implementation of the above concept.
FIG. 2 illustrates a flow diagram of a method of training a differential privacy based anomaly detection model, according to one embodiment. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. The following describes a training process of the anomaly detection model based on differential privacy, with reference to the architecture of the anomaly detection model shown in fig. 1 and the method flow shown in fig. 2.
First, in step 21, a first feature vector x corresponding to an arbitrary first service sample in a training set is input into a coding network, and a second feature vector z for reducing the dimension of the first feature vector x is output through a codercAnd outputting, by the decoder, a third eigenvector x' that restores the first eigenvector x based on the second eigenvector z.
Specifically, the training set may be a sample set obtained by randomly sampling the service samples, and each service sample is not labeled with an artificially labeled anomaly/normal label. In various embodiments, the business sample can be a sample user, a sample merchant, a sample event, and the like, where the sample event can in turn include, for example, a transaction event, a login event, a purchase event, a social interaction event, and the like.
Assuming that the training set contains N traffic samples, the first traffic sample may be any one of the traffic samples. The first feature vector x may contain different content depending on the specific instance of the traffic sample. For example, when the business sample is a user, the first feature vector x may contain attribute features of the user, such as basic attribute features of age, gender, registration duration, education level, and behavior attribute features such as recent browsing history and recent shopping history. For another example, when the business sample is a merchant, the first feature vector x may include attribute features of the merchant, such as merchant category, registration duration, commodity quantity, sales volume, attention number, and the like. Or, in an example, the service sample is a service event, such as a login event, and the corresponding first feature vector x may include attribute features of a login user, behavior features of a login behavior, device features of a device used for login, and the like.
Generally, to better characterize the traffic samples, the first feature vector x may be a feature vector of a higher dimension, e.g. several hundred dimensions, or even higher. High-dimensional vectors present certain difficulties for sample distribution statistics. Therefore, in the embodiments of the present specification, a self-coding network is used to perform dimension reduction.
Specifically, the first feature vector x is input to the encoder 110 shown in fig. 1. The encoder 110 may be implemented as a multi-layer perceptron, in which the number of neurons in each layer decreases gradually, and a second feature vector z is obtained in the output layercAlso known as code vectors. Coded vector zcD is much smaller than the dimension D of the input first feature vector x, thereby realizing the dimension reduction of the input vector. For example, a feature vector x of several hundred dimensions may be compressed into a coded vector z of several tens or even several dimensionsc
The coded vector zcIs further input to a decoder 120. The decoder 120 is structurally symmetric to the encoder 110, and its algorithm and model parameters are associated with (e.g., inverse to) the corresponding ones in the encoder 110. Thus, the decoder 120 may be based on the encoded vector zcRestoring the first eigenvector x and outputting a third eigenvectorx'. It will be appreciated that the code vector zcThe first feature vector x is reduced in dimension, and the information loss of the dimension reduction operation is smaller, or the code vector z after dimension reduction is performedcThe higher the information content is, the easier it is to restore the input feature vector x, i.e. the higher the similarity between the first feature vector x and the restored third feature vector x'. This property can be subsequently used to train the self-encoding network.
Next, in step 22, a second feature vector z is obtained based on the above dimension reductioncAnd constructing an evaluation vector z, and inputting the evaluation vector z into an evaluation network.
In one embodiment, the second feature vector z may be directly combinedcAs evaluation vector z, the evaluation network 200 of fig. 1 is entered.
In another embodiment, the reconstructed error vector z may be obtained based on the first eigenvector x and the restored third eigenvector x' described aboverThen the second feature vector zcAnd the reconstructed error vector zrCombined as an evaluation vector z. This process can be expressed as:
zr=f(x,x’) (1)
z=[zc,zr](2)
wherein f in the above formula (1) represents calculating the reconstruction error vector zrAs a function of (c). In different examples, the function f may be to calculate an absolute euclidean distance, a relative euclidean distance, a cosine similarity, etc. of the first eigenvector x and the third eigenvector x'.
Second feature vector z in equation (2)cAnd reconstructing the error vector zrThe combining may include splicing, summing, weighted summing, and the like.
In the above various ways, an evaluation vector z can be obtained, which has a dimension much smaller than the original first feature vector x. The evaluation vector z is then input into the evaluation network 200.
As previously mentioned, the evaluation network 200 is based on a mixture gaussian distribution model GMM. According to GMM, the sample distribution is assumed to follow a Gaussian mixture distribution, the mixtureThe gaussian distribution can be decomposed into a combination of K sub-gaussian distributions. When the evaluation vector z corresponding to the first traffic sample is input into the evaluation network 200, in step 23, the evaluation network 200 may output the sub-distribution probabilities of the first traffic sample in K sub-gaussian distributions, respectively, based on the evaluation vector z
Figure BDA0002469298080000091
Wherein
Figure BDA0002469298080000092
Is a K-dimensional vector, wherein the kth element is the probability of the first traffic sample in the kth sub-gaussian distribution. In one example, the sub-distribution probabilities are as described above
Figure BDA0002469298080000093
Is the probability of distribution normalized using the softmax function, where the sum of K elements is 1.
It is understood that the above first traffic sample is any one of N samples included in the training set. For each sample i of the N samples, an evaluation vector z thereof can be obtained through the above steps 21-23iProbability of sum sub-distribution
Figure BDA0002469298080000101
Then, in step 24, the gaussian mixture distribution may be reconstructed according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the N samples of the training set, so as to obtain the first probability of the first service sample in the gaussian mixture distribution.
In one embodiment, the vector z of evaluations for each traffic sample i may be first determinediAnd corresponding sub-distribution probabilities
Figure BDA0002469298080000102
And determining the occurrence probability, the mean and the covariance of any K-th sub-Gaussian distribution in the K sub-Gaussian distributions, wherein the occurrence probability is the occurrence probability of the K-th sub-Gaussian distribution in the K sub-Gaussian distributions.
Specifically, inIn one example, the probability of occurrence of the K-th sub-Gaussian distribution in the K-th sub-Gaussian distribution can be determined by the following formula (3)
Figure BDA0002469298080000103
Figure BDA0002469298080000104
Wherein the content of the first and second substances,
Figure BDA0002469298080000105
represents the probability of the sample i in the k-th sub-Gaussian distribution, in other words, it is the sub-distribution probability vector corresponding to the sample i
Figure BDA0002469298080000106
The kth element in (1). The probability of occurrence of the K-th sub-Gaussian distribution in the K-th sub-Gaussian distribution is obtained by summing the probabilities of the N samples in the K-th sub-Gaussian distribution
Figure BDA0002469298080000107
From the definitions of the mean and covariance of the Gaussian distribution, the mean of the kth sub-Gaussian distribution can be determined by the following equation (4)
Figure BDA0002469298080000108
Determining the covariance of the kth sub-Gaussian distribution by the following equation (5)
Figure BDA0002469298080000109
Figure BDA00024692980800001010
Figure BDA00024692980800001011
In the above formulas (4) and (5),
Figure BDA00024692980800001012
denotes the probability of a sample i of the N samples in the k-th sub-Gaussian distribution, ziIs the evaluation vector for sample i.
Thus, based on the respective evaluation vectors and the respective sub-distribution probabilities of the N samples in the training set, the occurrence probability, the mean value and the covariance of each sub-Gaussian distribution are obtained. Reconstructing each sub-Gaussian distribution through the mean and covariance of each sub-Gaussian distribution; and further combining the occurrence probability of each sub-Gaussian distribution to reconstruct and obtain the mixed Gaussian distribution. Specifically, the mixture gaussian distribution may be a total distribution obtained by combining the sub-gaussian distributions together with the occurrence probability as a weight.
Based on the reconstructed gaussian mixture distribution, a first probability P of the first service sample in the gaussian mixture distribution can be obtained:
Figure BDA0002469298080000111
that is, the first probability P is obtained by substituting the evaluation vector z of the first service sample into the gaussian mixture distribution.
Next, in step 25, according to the restoration degree of the third feature vector output by the decoder for each sample in the training set to the first feature vector and the first probability of each sample obtained as above, the prediction loss L corresponding to the training set is determined, where the prediction loss L is negatively correlated with the first probability P corresponding to each service sample and with the similarity between the first feature vector and the third feature vector corresponding to each service sample.
Specifically, in one embodiment, a first loss term L1 may be determined based on the first probability for the respective sample, the first loss term L1 being inversely related to the first probability for the respective sample. For example, the probability loss corresponding to the arbitrary first traffic sample is set to e (z) (or referred to as sample energy), and the probability loss e (z) is negatively related to the first probability P corresponding to the sample. For example, in one example:
e (z) ═ -logP, i.e.:
Figure BDA0002469298080000112
as such, the first loss term L1 may be a sum or mean of the probability losses of N samples, such as:
Figure BDA0002469298080000113
it should be understood that, the gaussian mixture is reconstructed based on the sub-distribution probability of each sample in each sub-gaussian distribution, and then the probability of each sample in the reconstructed gaussian mixture is obtained, so that the whole first probability of the N samples may reflect the fitting condition of the gaussian mixture to the N sample distributions, and the first loss term L1 actually corresponds to the fitting loss of the whole N samples fitting the gaussian mixture.
On the other hand, a second loss term L2 may be determined according to the similarity between the first eigenvector and the third eigenvector corresponding to each traffic sample, where the second loss term L2 is negatively correlated with the similarity. For example, the vector reconstruction loss corresponding to the arbitrary first traffic sample is set to Lr (x, x '), which is negatively related to the similarity between x and x ', i.e., the more similar x and x ', the smaller Lr value. The similarity between two vectors can be calculated and measured in a number of ways, such as cosine similarity, Euclidean distance, and so on. As such, the second loss term L2 may be the sum or mean of the vector reconstruction losses of N samples, such as:
Figure BDA0002469298080000121
then, the first loss term L1 and the second loss term L2 are weighted and summed according to a preset weighting factor to obtain the total predicted loss L of the training set. In one example, the predicted loss L can be written as:
Figure BDA0002469298080000122
wherein λ is1As a weighting factor, a hyperparameter may be used.
In another embodiment, the predicted loss L may also be set to:
Figure BDA0002469298080000123
in formula (11), λ1And λ2For the weighting factor, the last term is used to represent the covariance matrix
Figure BDA0002469298080000124
Is used to prevent the matrix from being irreversible.
Thus, in the above manner, a prediction loss for the training set is obtained. Next, based on the predicted loss, a gradient of model parameters that reduces the loss may be determined for updating and tuning the model parameters.
Innovatively, in the embodiment of the present specification, in step 26, based on the original gradient obtained from the above prediction loss, noise is added to the original gradient in a differential privacy manner, and the model parameters of the anomaly detection model are adjusted by using the gradient containing the noise.
Differential privacy (differential privacy) is a means in cryptography that aims to maximize the accuracy of a data query while minimizing the chance of identifying its records when querying from a statistical database]<=e×Pr[M(D')∈SM]Algorithm M is then said to provide-differential privacy protection, where the parameters are referred to as privacy protection budget, for balancing the degree and accuracy of privacy protection. And may be generally preset. The closer to 0, eThe closer to 1, the closer the processing results of the random algorithm to the two neighboring data sets D and D', the stronger the degree of privacy protection.
Implementations of differential privacy include noise mechanisms, exponential mechanisms, and the like. In order to introduce differential privacy in the model, according to embodiments of the present specification, a noise mechanism is utilized herein to achieve differential privacy by adding noise in the parameter gradient. Depending on the noise scheme, the noise may be embodied as laplacian noise, gaussian noise, or the like. In this step 26, differential privacy is achieved by adding gaussian noise in the gradient, according to one embodiment. The specific process may include the following steps.
Firstly, an original gradient which reduces the prediction loss can be determined according to the prediction loss L; then, based on a preset cutting threshold value, cutting the original gradient to obtain a cutting gradient; then, determining Gaussian noise for realizing difference privacy by utilizing Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; then, the gaussian noise thus obtained is superimposed on the clipping gradient to obtain a gradient including noise.
More specifically, as an example, assume that for the above training set, the resulting raw gradient is:
Figure BDA0002469298080000131
wherein t represents the current t-th iteration training, X represents the training set used by the current round, and gt(X) represents the resulting loss gradient, θ, for the training set batchtRepresents the model parameter at the start of the t-th round of training, L (θ)tAnd X) represents the aforementioned prediction loss.
As described above, the addition of the noise for implementing the differential privacy to the original gradient may be implemented by means such as laplacian noise, gaussian noise, or the like. In an embodiment, for example, gaussian noise is taken as an example, gradient clipping may be performed on an original gradient based on a preset clipping threshold to obtain a clipping gradient, gaussian noise for implementing differential privacy is determined based on the clipping threshold and a predetermined noise scaling coefficient (a preset super parameter), and then the clipping gradient and the gaussian noise are fused (e.g., summed) to obtain a gradient including noise. It can be understood that this way, on one hand, performs clipping on the original gradient, and on the other hand, superimposes the clipped gradients, thereby performing differential privacy processing satisfying gaussian noise on the gradient.
For example, the original gradient is gradient clipped to:
Figure BDA0002469298080000141
wherein the content of the first and second substances,
Figure BDA0002469298080000142
representing the gradient after clipping, C representing the clipping threshold, | g (X) | non-woven cells2Denotes gt(X) second order norm. That is, in the case where the gradient is less than or equal to the clipping threshold C, the original gradient is retained, and in the case where the gradient is greater than the clipping threshold C, the original gradient is clipped to a corresponding size in a proportion greater than the clipping threshold C.
Adding gaussian noise to the clipped gradient to obtain a gradient containing noise, for example:
Figure BDA0002469298080000143
wherein the content of the first and second substances,
Figure BDA0002469298080000144
representing gradients containing noise;
Figure BDA0002469298080000145
representing the probability density coincidence with 0 as mean, σ2C2Gaussian noise which is a Gaussian distribution of variances; sigma represents the noise scaling coefficient, is a preset hyper parameter and can be set as required; c is the clipping threshold; indicating function, may take 0 or 1, for example, it may be set that even rounds in a plurality of rounds of training take 1 and odd rounds take 0.
Then, the gradient after gaussian noise addition can be used to adjust the model parameters to, with the goal of minimizing the aforementioned prediction loss L:
Figure BDA0002469298080000146
wherein, ηtThe learning step length or learning rate is a preset hyper-parameter, such as 0.5, 0.3, etc.; thetat+1And (4) showing the adjusted model parameters obtained by the t-th round of training. And under the condition that the difference privacy is met by gradient-added Gaussian noise, the adjustment of the model parameters meets the difference privacy.
The above describes an implementation of adding noise to the gradient and updating the model parameters according to the gradient containing the noise.
On the other hand, as shown in fig. 1, the anomaly detection model in the present solution includes a self-coding network and an evaluation network, and accordingly, the model parameters may be divided into self-coding network parameters and evaluation network parameters, and the two parameters are updated according to corresponding gradients respectively. Generally, in models implemented by multilayer neural networks, the gradient is typically determined layer by back propagation. Therefore, in the anomaly detection model shown in fig. 1, after the prediction loss is obtained from the model output, the first original gradient corresponding to the evaluation network is determined first by the gradient back propagation, and then the back propagation is continued to determine the second original gradient corresponding to the self-coding network. When noise is added to the gradient based on the differential privacy, the noise may be added from the first original gradient or may be added only for the second original gradient.
Specifically, in one embodiment, on the basis of respectively determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-coding network, noise is respectively added to the first original gradient and the second original gradient in a differential privacy manner, so as to obtain a first noise gradient and a second noise gradient. Then, adjusting parameters of the evaluation network by using the first noise gradient; and adjusting parameters of the self-coding network by using the second noise gradient. In this way, differential privacy is introduced throughout the anomaly detection model.
In another embodiment, based on the determination of a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-coding network, respectively, noise is added to the second original gradient in a differential privacy manner to obtain a second noise gradient. Then, adjusting parameters of the evaluation network by utilizing the first original gradient; and adjusting parameters of the self-coding network by using the second noise gradient. The core of adjusting the model parameters from the encoder network is to adjust the model parameters of the encoder, since the parameters of the decoder are associated with the encoder. As such, differential privacy is introduced in the encoder.
It is to be understood that the encoder is located most upstream of the entire network model when processing traffic samples in the forward direction. The differential privacy is introduced into the encoder, so that the subsequent processing has the characteristic of differential privacy, and the effect of enabling the whole anomaly detection model to have the characteristic of differential privacy can be achieved.
Thus, differential privacy is introduced into the anomaly detection model through a gradient descent mode of the differential privacy. The anomaly detection model thus obtained has at least two advantages. Firstly, because differential privacy is introduced, the information of the training sample is difficult to reverse-deduce or identify based on the public model, and privacy protection is provided for the model. Furthermore, the objective of the training process of the unsupervised anomaly detection model is to fit the distribution of the training samples. Conventional training often causes overfitting of some samples, and in particular, some noise samples sometimes exist in a training set, and when a model is overfitting to the noise samples, the predictive performance of the model is reduced. Due to the introduction of the differential privacy, noise is added in the gradient, so that the model can resist the influence of noise samples, the over-fitting condition is avoided, and the robustness and the prediction performance of the abnormal detection model are improved.
By using the anomaly detection model based on the difference privacy obtained by the training mode, the anomaly of the target sample to be detected can be detected. FIG. 3 illustrates a flow diagram of a method for anomaly detection of traffic samples in one embodiment. Similarly, the method may be performed by any apparatus, device, platform, cluster of devices having computing, processing capabilities.
As shown in fig. 3, in step 31, the difference privacy-based abnormality detection model trained in the above manner is first obtained. As shown in fig. 1, the anomaly detection model includes a self-encoding network including an encoder and a decoder, and an evaluation network. Through the training process, the evaluation network constructs a Gaussian mixture model which can better fit the distribution of the service samples. The anomaly detection model is a model in which differential privacy is introduced. More particularly, at least the encoder therein has differential privacy features.
In step 32, a first target vector x corresponding to the target service sample to be tested is obtainedtInputting the self-coding network, and outputting a second target vector for reducing the dimension of the first target vector through the coder. This process is similar to step 21 of fig. 2 and will not be repeated.
Then, in step 33, a target evaluation vector z is constructed based on the second target vectort. It is to be understood that the target evaluation vector is constructed in a manner corresponding to the training phase. In one case, the second target vector is directly taken as the target evaluation vector. In another case, a third target vector x 'of decoder output is obtained't(ii) a Based on a first target vector xtAnd a third target vector x'tObtaining a reconstruction error vector; the second target vector and the reconstructed error vector are then combined as a target evaluation vector zt
Next, at step 34, the target evaluation vector ztAnd inputting the target business sample into the Gaussian mixture distribution constructed by the evaluation network to obtain the target probability of the target business sample in the Gaussian mixture distribution. In particular, the target evaluation vector z can be directly evaluatedtAnd substituting the parameters into the formula (6), wherein the parameters of the mixed Gaussian distribution are the parameters which are determined by the evaluation network through a training process.
Then, in step 35, it is determined whether the target traffic sample is an abnormal sample according to the target probability. Specifically, the target probability may be compared with a preset probability threshold, and when the target probability is smaller than the probability threshold, the current target service sample is considered as an abnormal sample.
In another example, the target probability may be further substituted into the foregoing formula (7) (or it may be considered as directly substituting the target evaluation vector into the formula (7)), so as to obtain the probability loss E (z) of the traffic samplet). And when the probability loss is greater than a certain threshold value, the current target service sample is considered as an abnormal sample. Thus, the abnormity detection of the traffic sample is realized.
According to another aspect of the embodiments, there is also provided a training apparatus for an anomaly detection model based on differential privacy, which may be deployed in any apparatus, device, platform, or device cluster having computing and processing capabilities. FIG. 4 shows a schematic block diagram of a training apparatus of an anomaly detection model according to one embodiment. As shown in fig. 4, the training apparatus 400 includes:
a first input unit 41, configured to input a first feature vector corresponding to any service sample in a training set into the self-coding network, output a second feature vector for dimensionality reduction of the first feature vector through the encoder, and output a third feature vector for restoration of the first feature vector based on the second feature vector through the decoder;
a second input unit 42 configured to construct an evaluation vector based on the second feature vector, and input the evaluation vector into the evaluation network;
a sub-distribution obtaining unit 43, configured to obtain a sub-distribution probability that the arbitrary service sample output by the evaluation network belongs to K sub-gaussian distributions in a mixture gaussian distribution;
a probability determining unit 44, configured to obtain a first probability of the arbitrary service sample in the gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the training set;
a loss determining unit 45 configured to determine a prediction loss corresponding to the training set, where the prediction loss is negatively correlated with the first probability corresponding to each business sample, and is negatively correlated with a similarity between the first feature vector and the third feature vector corresponding to each business sample;
a parameter adjusting unit 46 configured to add noise to the original gradient obtained based on the prediction loss by using a differential privacy method, and adjust the model parameters of the abnormality detection model by using a gradient including the noise.
In one embodiment, the second input unit 42 is configured to: and taking the second feature vector as the evaluation vector.
In another embodiment, the second input unit 42 is configured to: obtaining a reconstruction error vector based on the first feature vector and the third feature vector; combining the second feature vector and the reconstructed error vector as the evaluation vector.
According to one embodiment, the probability determination unit 44 is configured to: determining the mean value and covariance of each sub-Gaussian distribution in the K sub-Gaussian distributions and the occurrence probability of the sub-Gaussian distribution in the K sub-Gaussian distributions according to the evaluation vector and the sub-distribution probability of each service sample; reconstructing the mixed Gaussian distribution according to the mean value, covariance and occurrence probability of each sub-Gaussian distribution; and substituting the evaluation vector of any service sample into the reconstructed mixed Gaussian distribution to obtain the first probability.
In one embodiment, the loss determination unit 45 is configured to: determining a first loss item according to the first probability corresponding to each business sample, wherein the first loss item is inversely related to the first probability of each business sample; determining a second loss item according to the similarity between the first eigenvector and the third eigenvector corresponding to each service sample, wherein the second loss item is inversely related to the similarity; and according to a preset weight factor, carrying out weighted summation on the first loss term and the second loss term to obtain the predicted loss.
According to an embodiment, the parameter adjustment unit 46 is configured to: determining an original gradient that reduces the prediction loss according to the prediction loss; based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient; determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold; and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
In one embodiment, the parameter adjustment unit 46 may be configured to:
determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; respectively adding noise in the first original gradient and the second original gradient by using a differential privacy mode to obtain a first noise gradient and a second noise gradient;
adjusting a parameter of the evaluation network using the first noise gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
In another embodiment, the parameter adjustment unit 46 may be configured to:
determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; adding noise in the second original gradient by using a differential privacy mode to obtain a second noise gradient;
adjusting parameters of the evaluation network by using the first original gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
In various embodiments, the traffic samples may include one of: sample user, sample merchant, sample event.
It should be noted that the apparatus 400 shown in fig. 4 is an apparatus embodiment corresponding to the method embodiment shown in fig. 2, and the corresponding description in the method embodiment shown in fig. 2 is also applicable to the apparatus 400, and is not repeated herein.
According to another aspect, an apparatus for predicting an abnormal sample is also provided, which may be deployed in any apparatus, device, platform, or device cluster having computing and processing capabilities. FIG. 5 shows a schematic block diagram of an apparatus to predict an abnormal sample according to one embodiment. As shown in fig. 5, the prediction apparatus 500 includes:
a model obtaining unit 51 configured to obtain an anomaly detection model based on differential privacy trained according to the apparatus of fig. 4, where the anomaly detection model includes a self-coding network and an evaluation network, and the self-coding network includes an encoder and a decoder;
an input unit 52, configured to input a first target vector corresponding to a target service sample to be measured into the self-encoding network, and output a second target vector for reducing the dimension of the first target vector through the encoder;
a vector construction unit 53 configured to construct a target evaluation vector based on the second target vector;
a probability determining unit 54 configured to input the target evaluation vector into a gaussian mixture distribution constructed by the evaluation network, so as to obtain a target probability of the target service sample in the gaussian mixture distribution;
and the abnormality judgment unit 55 is configured to determine whether the target service sample is an abnormal sample according to the target probability.
In an embodiment, the vector constructing unit 53 is specifically configured to: acquiring a third target vector output by the decoder; obtaining a reconstruction error vector based on the first target vector and the third target vector; combining the second target vector and the reconstructed error vector as the target evaluation vector.
According to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method described in connection with fig. 2.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of this specification may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments are intended to explain the technical idea, technical solutions and advantages of the present specification in further detail, and it should be understood that the above-mentioned embodiments are merely specific embodiments of the technical idea of the present specification, and are not intended to limit the scope of the technical idea of the present specification, and any modification, equivalent replacement, improvement, etc. made on the basis of the technical solutions of the embodiments of the present specification should be included in the scope of the technical idea of the present specification.

Claims (23)

1. A training method of an anomaly detection model based on differential privacy is disclosed, wherein the anomaly detection model comprises a self-coding network and an evaluation network, and the self-coding network comprises an encoder and a decoder; the method comprises the following steps:
inputting a first feature vector corresponding to any service sample in a training set into the self-coding network, outputting a second feature vector for reducing the dimension of the first feature vector through the encoder, and outputting a third feature vector for restoring the first feature vector based on the second feature vector through the decoder;
constructing an evaluation vector based on the second feature vector, and inputting the evaluation vector into the evaluation network;
acquiring the sub-distribution probability of K sub-Gaussian distributions of the arbitrary service sample output by the evaluation network, wherein the K sub-Gaussian distributions belong to the mixed Gaussian distribution;
obtaining a first probability of any service sample in the Gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the training set;
determining a prediction loss corresponding to the training set, wherein the prediction loss is inversely related to the first probability corresponding to each business sample and inversely related to a similarity between the first feature vector and the third feature vector corresponding to each business sample;
and adding noise to the original gradient obtained based on the prediction loss by using a differential privacy mode, and adjusting the model parameters of the anomaly detection model by using the gradient containing the noise.
2. The method of claim 1, wherein constructing an evaluation vector based on the second feature vector comprises: and taking the second feature vector as the evaluation vector.
3. The method of claim 1, wherein constructing an evaluation vector based on the second feature vector comprises:
obtaining a reconstruction error vector based on the first feature vector and the third feature vector;
combining the second feature vector and the reconstructed error vector as the evaluation vector.
4. The method of claim 1, wherein obtaining a first probability of the arbitrary traffic sample in the gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each traffic sample in the training set comprises:
determining the mean value and covariance of each sub-Gaussian distribution in the K sub-Gaussian distributions and the occurrence probability of the sub-Gaussian distribution in the K sub-Gaussian distributions according to the evaluation vector and the sub-distribution probability of each service sample;
reconstructing the mixed Gaussian distribution according to the mean value, covariance and occurrence probability of each sub-Gaussian distribution;
and substituting the evaluation vector of any service sample into the reconstructed mixed Gaussian distribution to obtain the first probability.
5. The method of claim 1, wherein determining the predicted loss for the training set comprises:
determining a first loss item according to the first probability corresponding to each business sample, wherein the first loss item is inversely related to the first probability of each business sample;
determining a second loss item according to the similarity between the first eigenvector and the third eigenvector corresponding to each service sample, wherein the second loss item is inversely related to the similarity;
and according to a preset weight factor, carrying out weighted summation on the first loss term and the second loss term to obtain the predicted loss.
6. The method of claim 1, wherein adding noise to the original gradient based on the predicted loss using differential privacy comprises:
determining an original gradient that reduces the prediction loss according to the prediction loss;
based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient;
determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;
and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
7. The method of claim 1, wherein adding noise to the original gradient based on the predicted loss using differential privacy comprises: determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; respectively adding noise in the first original gradient and the second original gradient by using a differential privacy mode to obtain a first noise gradient and a second noise gradient;
adjusting model parameters of the abnormal sample detection model using a gradient containing noise, comprising:
adjusting a parameter of the evaluation network using the first noise gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
8. The method of claim 1, wherein adding noise to the original gradient based on the predicted loss using differential privacy comprises: determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; adding noise in the second original gradient by using a differential privacy mode to obtain a second noise gradient;
adjusting model parameters of the abnormal sample detection model using a gradient containing noise, comprising:
adjusting parameters of the evaluation network by using the first original gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
9. The method of claim 1, wherein the arbitrary traffic sample comprises one of: sample user, sample merchant, sample event.
10. A method of predicting an abnormal sample, comprising:
acquiring an anomaly detection model based on differential privacy trained according to the method of claim 1, wherein the anomaly detection model comprises a self-coding network and an evaluation network, and the self-coding network comprises an encoder and a decoder;
inputting a first target vector corresponding to a target service sample to be detected into the self-coding network, and outputting a second target vector for reducing the dimension of the first target vector through the encoder;
constructing a target evaluation vector based on the second target vector;
inputting the target evaluation vector into a Gaussian mixture distribution constructed by the evaluation network to obtain a target probability of the target service sample in the Gaussian mixture distribution;
and determining whether the target service sample is an abnormal sample or not according to the target probability.
11. The method of claim 10, wherein constructing a target evaluation vector based on the second target vector comprises:
acquiring a third target vector output by the decoder;
obtaining a reconstruction error vector based on the first target vector and the third target vector;
combining the second target vector and the reconstructed error vector as the target evaluation vector.
12. An anomaly detection model training device based on differential privacy, wherein the anomaly detection model comprises a self-coding network and an evaluation network, and the self-coding network comprises an encoder and a decoder; the device comprises:
a first input unit, configured to input a first feature vector corresponding to any service sample in a training set into the self-coding network, output a second feature vector for reducing the dimension of the first feature vector through the encoder, and output a third feature vector for restoring the first feature vector based on the second feature vector through the decoder;
a second input unit configured to construct an evaluation vector based on the second feature vector, and input the evaluation vector into the evaluation network;
a sub-distribution obtaining unit configured to obtain a sub-distribution probability that the arbitrary service sample output by the evaluation network belongs to K sub-Gaussian distributions in a mixed Gaussian distribution;
a probability determining unit configured to obtain a first probability of the arbitrary service sample in the gaussian mixture distribution according to the evaluation vector and the sub-distribution probability corresponding to each service sample in the training set;
a loss determining unit configured to determine a prediction loss corresponding to the training set, wherein the prediction loss is negatively correlated with the first probability corresponding to each business sample and is negatively correlated with a similarity between a first feature vector and a third feature vector corresponding to each business sample;
and a parameter adjusting unit configured to add noise to an original gradient obtained based on the prediction loss in a differential privacy manner, and adjust a model parameter of the abnormality detection model using a gradient including the noise.
13. The apparatus of claim 12, wherein the second input unit is configured to: and taking the second feature vector as the evaluation vector.
14. The apparatus of claim 12, wherein the second input unit is configured to:
obtaining a reconstruction error vector based on the first feature vector and the third feature vector;
combining the second feature vector and the reconstructed error vector as the evaluation vector.
15. The apparatus of claim 12, wherein the probability determination unit is configured to:
determining the mean value and covariance of each sub-Gaussian distribution in the K sub-Gaussian distributions and the occurrence probability of the sub-Gaussian distribution in the K sub-Gaussian distributions according to the evaluation vector and the sub-distribution probability of each service sample;
reconstructing the mixed Gaussian distribution according to the mean value, covariance and occurrence probability of each sub-Gaussian distribution;
and substituting the evaluation vector of any service sample into the reconstructed mixed Gaussian distribution to obtain the first probability.
16. The apparatus of claim 12, wherein the loss determination unit is configured to:
determining a first loss item according to the first probability corresponding to each business sample, wherein the first loss item is inversely related to the first probability of each business sample;
determining a second loss item according to the similarity between the first eigenvector and the third eigenvector corresponding to each service sample, wherein the second loss item is inversely related to the similarity;
and according to a preset weight factor, carrying out weighted summation on the first loss term and the second loss term to obtain the predicted loss.
17. The apparatus of claim 12, wherein the parameter adjustment unit is configured to:
determining an original gradient that reduces the prediction loss according to the prediction loss;
based on a preset clipping threshold value, clipping the original gradient to obtain a clipping gradient;
determining Gaussian noise for realizing differential privacy by utilizing a Gaussian distribution determined based on the clipping threshold, wherein the variance of the Gaussian distribution is positively correlated with the square of the clipping threshold;
and superposing the Gaussian noise and the cutting gradient to obtain the gradient containing the noise.
18. The apparatus of claim 12, wherein the parameter adjustment unit is configured to:
determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; respectively adding noise in the first original gradient and the second original gradient by using a differential privacy mode to obtain a first noise gradient and a second noise gradient;
adjusting a parameter of the evaluation network using the first noise gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
19. The apparatus of claim 12, wherein the parameter adjustment unit is configured to:
determining a first original gradient corresponding to the evaluation network and a second original gradient corresponding to the self-encoding network respectively through gradient back propagation; adding noise in the second original gradient by using a differential privacy mode to obtain a second noise gradient;
adjusting parameters of the evaluation network by using the first original gradient; and adjusting the parameters of the self-coding network by using the second noise gradient.
20. The apparatus method of claim 12, wherein the arbitrary traffic sample comprises one of: sample user, sample merchant, sample event.
21. An apparatus for predicting an abnormal sample, comprising:
a model obtaining unit configured to obtain an anomaly detection model based on differential privacy trained by the apparatus according to claim 12, the anomaly detection model including a self-encoding network and an evaluation network, the self-encoding network including an encoder and a decoder;
the input unit is configured to input a first target vector corresponding to a target service sample to be detected into the self-coding network, and output a second target vector for reducing the dimension of the first target vector through the encoder;
a vector construction unit configured to construct a target evaluation vector based on the second target vector;
the probability determining unit is configured to input the target evaluation vector into a Gaussian mixture distribution constructed by the evaluation network to obtain a target probability of the target service sample in the Gaussian mixture distribution;
and the abnormity judging unit is configured to determine whether the target service sample is an abnormal sample according to the target probability.
22. The apparatus of claim 21, wherein the vector construction unit is configured to:
acquiring a third target vector output by the decoder;
obtaining a reconstruction error vector based on the first target vector and the third target vector;
combining the second target vector and the reconstructed error vector as the target evaluation vector.
23. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, performs the method of any of claims 1-11.
CN202010343419.2A 2020-04-27 2020-04-27 Training method and device of anomaly detection model based on differential privacy Pending CN111539769A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010343419.2A CN111539769A (en) 2020-04-27 2020-04-27 Training method and device of anomaly detection model based on differential privacy
TW110110603A TWI764640B (en) 2020-04-27 2021-03-24 Training method and device for anomaly detection model based on differential privacy
PCT/CN2021/089398 WO2021218828A1 (en) 2020-04-27 2021-04-23 Training for differential privacy-based anomaly detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010343419.2A CN111539769A (en) 2020-04-27 2020-04-27 Training method and device of anomaly detection model based on differential privacy

Publications (1)

Publication Number Publication Date
CN111539769A true CN111539769A (en) 2020-08-14

Family

ID=71977322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010343419.2A Pending CN111539769A (en) 2020-04-27 2020-04-27 Training method and device of anomaly detection model based on differential privacy

Country Status (3)

Country Link
CN (1) CN111539769A (en)
TW (1) TWI764640B (en)
WO (1) WO2021218828A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101946A (en) * 2020-11-20 2020-12-18 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model
CN112434213A (en) * 2020-10-15 2021-03-02 中国科学院深圳先进技术研究院 Network model training method, information pushing method and related device
CN112446040A (en) * 2020-11-24 2021-03-05 平安科技(深圳)有限公司 Federal modeling method based on selective gradient update and related equipment
CN112541574A (en) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 Privacy-protecting business prediction method and device
CN113052693A (en) * 2021-06-02 2021-06-29 北京轻松筹信息技术有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113055930A (en) * 2021-03-09 2021-06-29 Oppo广东移动通信有限公司 Data processing method, communication device, server, and storage medium
CN113127931A (en) * 2021-06-18 2021-07-16 国网浙江省电力有限公司信息通信分公司 Federal learning differential privacy protection method for adding noise based on Rayleigh divergence
CN113591479A (en) * 2021-07-23 2021-11-02 深圳供电局有限公司 Named entity identification method and device for power metering and computer equipment
WO2021218828A1 (en) * 2020-04-27 2021-11-04 支付宝(杭州)信息技术有限公司 Training for differential privacy-based anomaly detection model
CN113762967A (en) * 2021-03-31 2021-12-07 北京沃东天骏信息技术有限公司 Risk information determination method, model training method, device, and program product
CN113779045A (en) * 2021-11-12 2021-12-10 航天宏康智能科技(北京)有限公司 Training method and training device for industrial control protocol data anomaly detection model
CN114297036A (en) * 2022-01-05 2022-04-08 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and readable storage medium
CN116150622A (en) * 2023-02-17 2023-05-23 支付宝(杭州)信息技术有限公司 Model training method and device, storage medium and electronic equipment
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116756656A (en) * 2023-08-11 2023-09-15 北京航空航天大学 Engineering structure anomaly identification method, system, electronic equipment and storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114186583B (en) * 2021-12-02 2022-12-27 国家石油天然气管网集团有限公司 Method and system for recovering abnormal signal of corrosion detection of tank wall of oil storage tank
CN114283306A (en) * 2021-12-23 2022-04-05 福州大学 Industrial control network anomaly detection method and system
TWI781874B (en) * 2022-01-19 2022-10-21 中華電信股份有限公司 Electronic device and method for detecting anomaly of telecommunication network based on autoencoder neural network model
CN115184054B (en) * 2022-05-30 2022-12-27 深圳技术大学 Mechanical equipment semi-supervised fault detection and analysis method, device, terminal and medium
CN114974220A (en) * 2022-06-17 2022-08-30 中国电信股份有限公司 Network model training method, and voice object gender identification method and device
CN115238827B (en) * 2022-09-16 2022-11-25 支付宝(杭州)信息技术有限公司 Privacy-protecting sample detection system training method and device
CN115842812B (en) * 2022-11-21 2024-04-12 浪潮通信信息系统有限公司 User perception evaluation method and system based on PCA and integrated self-encoder
CN115564577B (en) * 2022-12-02 2023-04-07 成都新希望金融信息有限公司 Abnormal user identification method and device, electronic equipment and storage medium
CN117474464B (en) * 2023-09-28 2024-05-07 光谷技术有限公司 Multi-service processing model training method, multi-service processing method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101792520B1 (en) * 2016-12-30 2017-11-03 한라대학교 산학협력단 Differential privacy method using secret sharing scheme
CN107368752A (en) * 2017-07-25 2017-11-21 北京工商大学 A kind of depth difference method for secret protection based on production confrontation network
CN110334548A (en) * 2019-07-16 2019-10-15 桂林电子科技大学 A kind of data exception detection method based on difference privacy
CN110796497A (en) * 2019-10-31 2020-02-14 支付宝(杭州)信息技术有限公司 Method and device for detecting abnormal operation behaviors

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7779268B2 (en) * 2004-12-07 2010-08-17 Mitsubishi Electric Research Laboratories, Inc. Biometric based user authentication and data encryption
US20190244138A1 (en) * 2018-02-08 2019-08-08 Apple Inc. Privatized machine learning using generative adversarial networks
CN109033854B (en) * 2018-07-17 2020-06-09 阿里巴巴集团控股有限公司 Model-based prediction method and device
CN109886388B (en) * 2019-01-09 2024-03-22 平安科技(深圳)有限公司 Training sample data expansion method and device based on variation self-encoder
CN111046433B (en) * 2019-12-13 2021-03-05 支付宝(杭州)信息技术有限公司 Model training method based on federal learning
CN111539769A (en) * 2020-04-27 2020-08-14 支付宝(杭州)信息技术有限公司 Training method and device of anomaly detection model based on differential privacy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101792520B1 (en) * 2016-12-30 2017-11-03 한라대학교 산학협력단 Differential privacy method using secret sharing scheme
CN107368752A (en) * 2017-07-25 2017-11-21 北京工商大学 A kind of depth difference method for secret protection based on production confrontation network
CN110334548A (en) * 2019-07-16 2019-10-15 桂林电子科技大学 A kind of data exception detection method based on difference privacy
CN110796497A (en) * 2019-10-31 2020-02-14 支付宝(杭州)信息技术有限公司 Method and device for detecting abnormal operation behaviors

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021218828A1 (en) * 2020-04-27 2021-11-04 支付宝(杭州)信息技术有限公司 Training for differential privacy-based anomaly detection model
CN112434213A (en) * 2020-10-15 2021-03-02 中国科学院深圳先进技术研究院 Network model training method, information pushing method and related device
CN112434213B (en) * 2020-10-15 2023-09-29 中国科学院深圳先进技术研究院 Training method of network model, information pushing method and related devices
CN112101946A (en) * 2020-11-20 2020-12-18 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model
CN112446040A (en) * 2020-11-24 2021-03-05 平安科技(深圳)有限公司 Federal modeling method based on selective gradient update and related equipment
CN112541574A (en) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 Privacy-protecting business prediction method and device
CN112541574B (en) * 2020-12-03 2022-05-17 支付宝(杭州)信息技术有限公司 Privacy-protecting business prediction method and device
CN113055930A (en) * 2021-03-09 2021-06-29 Oppo广东移动通信有限公司 Data processing method, communication device, server, and storage medium
CN113762967A (en) * 2021-03-31 2021-12-07 北京沃东天骏信息技术有限公司 Risk information determination method, model training method, device, and program product
CN113052693A (en) * 2021-06-02 2021-06-29 北京轻松筹信息技术有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113127931B (en) * 2021-06-18 2021-09-03 国网浙江省电力有限公司信息通信分公司 Federal learning differential privacy protection method for adding noise based on Rayleigh divergence
CN113127931A (en) * 2021-06-18 2021-07-16 国网浙江省电力有限公司信息通信分公司 Federal learning differential privacy protection method for adding noise based on Rayleigh divergence
CN113591479A (en) * 2021-07-23 2021-11-02 深圳供电局有限公司 Named entity identification method and device for power metering and computer equipment
CN113779045A (en) * 2021-11-12 2021-12-10 航天宏康智能科技(北京)有限公司 Training method and training device for industrial control protocol data anomaly detection model
CN114297036A (en) * 2022-01-05 2022-04-08 腾讯科技(深圳)有限公司 Data processing method and device, electronic equipment and readable storage medium
CN114297036B (en) * 2022-01-05 2023-06-09 腾讯科技(深圳)有限公司 Data processing method, device, electronic equipment and readable storage medium
CN116188834A (en) * 2022-12-08 2023-05-30 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116188834B (en) * 2022-12-08 2023-10-20 赛维森(广州)医疗科技服务有限公司 Full-slice image classification method and device based on self-adaptive training model
CN116150622A (en) * 2023-02-17 2023-05-23 支付宝(杭州)信息技术有限公司 Model training method and device, storage medium and electronic equipment
CN116150622B (en) * 2023-02-17 2023-08-11 支付宝(杭州)信息技术有限公司 Model training method and device, storage medium and electronic equipment
CN116756656A (en) * 2023-08-11 2023-09-15 北京航空航天大学 Engineering structure anomaly identification method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
TW202143146A (en) 2021-11-16
WO2021218828A1 (en) 2021-11-04
TWI764640B (en) 2022-05-11

Similar Documents

Publication Publication Date Title
CN111539769A (en) Training method and device of anomaly detection model based on differential privacy
Razavi et al. A practical feature-engineering framework for electricity theft detection in smart grids
Benchaji et al. Enhanced credit card fraud detection based on attention mechanism and LSTM deep model
EP3723008A1 (en) Method for protecting a machine learning model against extraction
Maamar et al. A Hybrid Model for Anomalies Detection in AMI System Combining K-means Clustering and Deep Neural Network.
Afriyie et al. A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions
CN111523668B (en) Training method and device of data generation system based on differential privacy
Roseline et al. Autonomous credit card fraud detection using machine learning approach☆
Ashofteh et al. A conservative approach for online credit scoring
Singh et al. Energy theft detection for AMI using principal component analysis based reconstructed data
US20060236395A1 (en) System and method for conducting surveillance on a distributed network
Óskarsdóttir et al. Social network analytics for supervised fraud detection in insurance
Giudici et al. Artificial Intelligence risk measurement
Bazán et al. Power and reversal power links for binary regressions: An application for motor insurance policyholders
Madhure et al. Cnn-lstm based electricity theft detector in advanced metering infrastructure
Zhang et al. The optimized anomaly detection models based on an approach of dealing with imbalanced dataset for credit card fraud detection
CN114782161A (en) Method, device, storage medium and electronic device for identifying risky users
CN114548241A (en) Stolen account detection method and device and electronic equipment
El-Toukhy et al. Electricity theft detection using deep reinforcement learning in smart power grids
Ashofteh et al. A non-parametric-based computationally efficient approach for credit scoring
CN116823428A (en) Anti-fraud detection method, device, equipment and storage medium
CN114003960A (en) Training method of neural network model
CN116911882B (en) Insurance fraud prevention prediction method and system based on machine learning
Xiang et al. A bonus-malus framework for cyber risk insurance and optimal cybersecurity provisioning
CN115345727B (en) Method and device for identifying fraudulent loan application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40035499

Country of ref document: HK