CN117151768A - Construction method and system of wind control rule base of generated marketing event - Google Patents

Construction method and system of wind control rule base of generated marketing event Download PDF

Info

Publication number
CN117151768A
CN117151768A CN202311417507.2A CN202311417507A CN117151768A CN 117151768 A CN117151768 A CN 117151768A CN 202311417507 A CN202311417507 A CN 202311417507A CN 117151768 A CN117151768 A CN 117151768A
Authority
CN
China
Prior art keywords
generator
constructing
discriminator
sample
marketing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311417507.2A
Other languages
Chinese (zh)
Inventor
杨柳欣
孙钢
沈然
徐世予
刘欢
程杰慧
蒋弋帆
郭励
罗宇恒
方炳坤
章一新
金王英
汪金荣
谷泓杰
章江铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Zhejiang Electric Power Co Ltd
Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Zhejiang Electric Power Co Ltd, Marketing Service Center of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Zhejiang Electric Power Co Ltd
Priority to CN202311417507.2A priority Critical patent/CN117151768A/en
Publication of CN117151768A publication Critical patent/CN117151768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for constructing a wind control rule base of a generated marketing event, and belongs to the field of unsupervised risk identification. The invention discloses a method for constructing a wind control rule base of a generated marketing event, which comprises the following steps: constructing a generator; constructing a discriminator; defining a loss function; training an unsupervised generation cooperative learning model through negative learning feedback, wherein the unsupervised generation cooperative learning model comprises a generator and a discriminator, and the generator and the discriminator are alternately trained in the training process; identifying risk events; and writing into a risk rule base. According to the invention, by establishing a cross supervision mechanism between the generator and the discriminator, unsupervised learning is realized by using the abnormal ultralow frequency, the conditions of lack of marked data and unknown abnormal types can be processed, and the method has good applicability and flexibility.

Description

Construction method and system of wind control rule base of generated marketing event
Technical Field
The invention relates to the field of anomaly detection of deep learning, in particular to a method and a system for constructing a generated marketing event wind control rule base based on unsupervised learning.
Background
Because of the low frequency of occurrence of anomalies and the general lack of clear definition, and often the lack of authentic labels, unsupervised anomaly detection methods are currently relatively rarely studied. Furthermore, in marketing events there are also cases where there is a lack of a real tag, which can add significant difficulty to detecting an abnormal risk event.
As the number of sensors in the modern intelligent industry continues to increase, anomaly detection of marketing events becomes increasingly difficult. How to effectively process the multi-sensor measurements to learn the dynamics and context of the environment and to improve the operating efficiency of the system is a new challenge. Conventional mathematical learning methods have limitations in processing large-scale data sources, and thus introducing Deep Learning (DL) is an attractive option. DL can effectively explain the potential patterns of context awareness and is widely recognized as one of the most advanced methods in various fields. In DL-based characterization learning, some research has been devoted to designing false behavior detection methods to detect abnormal events in multi-sensor data. However, these methods generally cannot extract a strong and uniform representation of the substantially normal behavior from the multi-sensor data while preserving dependencies between events. Thus, there is a need to further explore and develop innovative approaches to overcome these challenges, improving the ability to accurately detect and efficiently model risk events in multi-sensor data.
Disclosure of Invention
The invention aims to solve the problems in the unsupervised anomaly detection aspect of the prior art, and provides a method and a system for constructing a generated marketing event wind control rule base based on unsupervised learning, which effectively provide support for constructing a risk rule base and risk control of marketing events by combining negative learning and unsupervised learning so as to challenge lacking real labels and multi-sensor data.
Therefore, the invention adopts the following technical scheme: a construction method of a generated marketing event wind control rule base comprises the following steps:
step 1), data preparation, collection and preparation of a marketing risk event sample dataset for training and testing;
step 2), constructing a generator, wherein the generator receives marketing risk event sample data as input and generates a marketing risk event reconstruction sample according to the marketing risk event sample;
step 3), constructing a discriminator for distinguishing a real marketing risk event sample from a marketing risk event reconstruction sample generated by the generator;
step 4), defining a loss function, generating a vivid reconstruction sample by using a generator by minimizing the loss functions of the generator and the discriminator, and enabling the discriminator to have better discrimination capability;
step 5), training an unsupervised generation cooperative learning model (GCL) through Negative Learning (NL) feedback, wherein the unsupervised generation cooperative learning model comprises a generator and a discriminator; in the training process, the training generator and the discriminator are alternated, firstly, the generator is fixed, and parameters of the discriminator are updated by minimizing the loss of the discriminator; then, fixing the discriminant, and updating parameters of the generator by minimizing loss of the generator; the performance of the generator is improved through a negative learning feedback mechanism until the generator and the discriminator reach a convergence state;
step 6), risk event identification, namely performing risk event identification by using a trained unsupervised generation collaborative learning model;
and 7) writing the risk rule base, writing the data which is judged to be the abnormal risk event into the risk rule base, and updating the risk rule base in real time.
The invention provides a new unsupervised Generation Collaborative Learning (GCL) anomaly detection method, which realizes anomaly detection and discrimination of marketing risk events by utilizing the low-frequency characteristic of the anomaly events and a cross-supervision mode and mutual learning of a generator and a discriminator; meanwhile, a unified learning network (USMD) of multi-sensor data is adopted as a discriminator, the discriminator uses an automatic encoder architecture, a Time Dependent Network (TDN) and an attention unit based on LSTM and an isolated forest algorithm to discriminate a reconstructed sample and a real sample generated by a generator, then a loss function value (loss value) of the generator is calculated, the loss value is compared with a given threshold value to judge whether the sample is abnormal, if so, the event is written into a risk rule base.
The method is particularly suitable for identifying the scenes of the risk events, can locate the abnormal events in the complex monitoring scenes without marked training data, realizes the virtual generation and identification of the novel marketing event risks, and enriches the wind control rule base.
Further, in step 2), the generator is an Automatic Encoder (AE) consisting of an encoder that maps the input data to a low-dimensional representation in the potential space and a decoder that maps the low-dimensional representation in the potential space back to the original data space.
Further, in step 2), the training process of the generator is as follows:
1) Inputting real marketing risk event sample data;
2) The encoder converts the actual marketing risk event sample data into potential representation vectors;
3) The decoder decodes the potential representation vector into risk event reconstructed samples;
4) Calculating the difference between the reconstructed sample and the real sample, namely the reconstruction loss;
5) The parameters of the generator are updated using a back-propagation algorithm to minimize reconstruction losses, bringing the reconstructed samples closer to the real samples.
Further, in step 3), the arbiter adopts a model for detecting system anomalies, called a unified learning network, and in anomaly detection, the arbiter uses an isolated forest algorithm as an anomaly detection method; the arbiter accepts a sample as input and outputs a scalar value representing whether the input sample is a true sample or a reconstructed sample.
Furthermore, in step 3), the isolated forest algorithm is a tree-based unsupervised anomaly detection algorithm for quickly identifying outliers in the reconstructed samples; the specific steps of the forest isolation algorithm are as follows:
1) Constructing an isolation tree: each isolation tree is constructed by randomly selecting characteristics and segmentation values; firstly, randomly selecting a feature from an input feature set; then randomly selecting a segmentation value from the value range of the selected feature; then placing the samples smaller than or equal to the segmentation value in the data set into a left subset, and placing the samples larger than the segmentation value into a right subset; finally, respectively carrying out recursion operation on the left subset and the right subset, and continuing to select the characteristics and the segmentation values until the stopping condition is met;
2) Constructing an isolation forest: repeating the step 1), constructing a plurality of isolation trees, and forming an isolation forest;
3) Calculating an anomaly score for the sample: for each sample, calculating its anomaly score in the isolated forest through the path length of the tree, the path length representing the number of edges traversed from the root node to the terminal node;
4) Setting a threshold value and identifying an abnormal point: the samples are divided into normal points and abnormal points according to the abnormality score and a preset threshold value.
Further, the unified learning network specifically includes:
the unified learning network uses the combination of long-term memory and attention units to perform unified representation on input data, and then uses an isolated forest algorithm as an error detector to perform discrimination.
Further, in step 4), the loss function of the generator is a reconstructed loss functionReconstructing the loss function->Is in an automatic encoderAn objective function used in the method, reconstruct the loss function +.>Calculating by using a mean square error loss; the loss function of the arbiter measures its classification performance using a binary cross entropy loss function.
Further, in step 5), the negative learning is specifically described as follows:
negative learning is a feedback mechanism of the discriminator to the generator, which is encouraged to perform label-flipping reconstruction on samples with abnormal pseudo labels while training the generator, and to perform normal reconstruction on samples with normal pseudo labels with minimal error; the original vector judged to be abnormal is replaced by a vector of all 1, and the original vector is used if the original vector is normal.
Further, in step 6), the risk event is identified as follows:
inputting data, preprocessing the data, inputting the data into an unsupervised generation collaborative learning model, selecting a proper threshold according to actual conditions, and reconstructing a loss function when a generatorAnd judging that the event is abnormal if the binary cross loss function of the discriminator is larger than a given threshold value or the binary cross loss function of the discriminator is larger than the given threshold value.
The invention also provides a construction system of the generated marketing event wind control rule base, which is used for realizing the construction method of the generated marketing event wind control rule base.
Compared with the prior art, the invention has the following beneficial effects:
(1) Unsupervised learning ability: according to the invention, by establishing a cross supervision mechanism between the generator and the discriminator, unsupervised learning is realized by using the abnormal ultralow frequency, the conditions of lack of marked data and unknown abnormal types can be processed, and the method has good applicability and flexibility.
(2) Large-scale data processing capability: the method is suitable for large-scale data processing, and can efficiently process abnormal event judgment in marketing events. By means of an unsupervised learning method, a large amount of data can be rapidly and accurately detected in an abnormal mode, and a risk rule base is updated in real time, so that reliable risk management and decision support are provided.
(3) Updating and optimizing in real time: the invention can update the risk rule base of the abnormal event in real time, so that the model can adapt to new data and condition changes in time. The real-time updating and optimizing mechanism enables the abnormality detection model to continuously improve the performance and the accuracy and the reliability of abnormality detection.
Drawings
FIG. 1 is a schematic diagram of a method of constructing a wind-controlled rule base for a generated marketing event of the present invention;
FIG. 2 is a diagram of the construction of an unsupervised generation collaborative learning model of the present invention;
fig. 3 is a schematic diagram of the construction of the discriminator of the invention.
Detailed Description
The invention will be further described with reference to the drawings and detailed description.
Examples
Aiming at the defect that the prior art cannot detect the abnormal risk event in the marketing event scene, the embodiment provides a construction method of a generated marketing event wind control rule base, which can accurately identify the abnormal event in the marketing event.
The method for constructing the wind control rule base of the generated marketing event comprises the following steps:
step 1), data preparation, collection and preparation of a marketing risk event sample dataset for training and testing;
step 2), constructing a generator, wherein the generator receives marketing risk event sample data as input and generates a marketing risk event reconstruction sample according to the marketing risk event sample;
step 3), constructing a discriminator for distinguishing a real marketing risk event sample from a marketing risk event reconstruction sample generated by the generator;
step 4), defining a loss function, generating a vivid reconstruction sample by using a generator by minimizing the loss functions of the generator and the discriminator, and enabling the discriminator to have better discrimination capability;
step 5), training an unsupervised generation cooperative learning model through negative learning feedback, wherein the unsupervised generation cooperative learning model comprises a generator and a discriminator; in the training process, the training generator and the discriminator are alternated, firstly, the generator is fixed, and parameters of the discriminator are updated by minimizing the loss of the discriminator; then, fixing the discriminant, and updating parameters of the generator by minimizing loss of the generator; the performance of the generator is improved through a negative learning feedback mechanism until the generator and the discriminator reach a convergence state;
step 6), risk event identification, namely performing risk event identification by using a trained unsupervised generation collaborative learning model;
step 7), writing in a risk rule base: and writing the data which is judged to be the abnormal risk event into a risk rule base, and updating the risk rule base in real time.
The principle of the method for constructing the generated marketing event wind control rule base is shown in figure 1.
Specifically, in step 2), the generator is an automatic encoder, which is composed of an encoder that maps the input data to a low-dimensional representation in the potential space and a decoder that maps the low-dimensional representation in the potential space back to the original data space.
More specifically, in step 2), the training process of the generator is as follows:
1) Inputting real marketing risk event sample data;
2) The encoder converts the actual marketing risk event sample data into potential representation vectors;
3) The decoder decodes the potential representation vector into risk event reconstructed samples;
4) Calculating the difference between the reconstructed sample and the real sample, namely the reconstruction loss;
5) The parameters of the generator are updated using a back-propagation algorithm to minimize reconstruction losses, bringing the reconstructed samples closer to the real samples.
Specifically, in step 3), the arbiter adopts a model for detecting system anomalies, called a unified learning network, and in anomaly behavior detection, the arbiter uses an isolated forest algorithm as an anomaly detection method; the arbiter accepts a sample as input and outputs a scalar value representing whether the input sample is a true sample or a reconstructed sample.
More specifically, in step 3), the isolated forest algorithm is a tree-based unsupervised anomaly detection algorithm for quickly identifying outliers in the reconstructed samples; the specific steps of the forest isolation algorithm are as follows:
1) Constructing an isolation tree: each isolation tree is constructed by randomly selecting characteristics and segmentation values; firstly, randomly selecting a feature from an input feature set; then randomly selecting a segmentation value from the value range of the selected feature; then placing the samples smaller than or equal to the segmentation value in the data set into a left subset, and placing the samples larger than the segmentation value into a right subset; finally, respectively carrying out recursion operation on the left subset and the right subset, and continuing to select the characteristics and the segmentation values until the stopping condition is met;
2) Constructing an isolation forest: repeating the step 1), constructing a plurality of isolation trees, and forming an isolation forest;
3) Calculating an anomaly score for the sample: for each sample, calculating its anomaly score in the isolated forest through the path length of the tree, the path length representing the number of edges traversed from the root node to the terminal node;
4) Setting a threshold value and identifying an abnormal point: the samples are divided into normal points and abnormal points according to the abnormality score and a preset threshold value.
Specifically, the unified learning network specifically includes:
the unified learning network uses the combination of long-short-term memory (LSTM) and attention units to perform unified representation on input data, and then uses an isolated forest algorithm as an error detector to perform discrimination.
Specifically, in step 4), the loss function of the generator is a reconstructed loss functionReconstructing the loss function->Is an objective function used in an automatic encoder, reconstruct the loss function +.>Calculating by using a mean square error loss; the loss function of the arbiter measures its classification performance using a binary cross entropy loss function.
Specifically, in step 5), the negative learning is specifically described as follows:
negative learning is a feedback mechanism of the arbiter to the generator in which the generator is encouraged to reconstruct label-flipped samples with abnormal pseudo-labels, for samples with normal pseudo-labels intended to be reconstructed normally with minimal error; the original vector judged to be abnormal is replaced by a vector of all 1, and the original vector is used if the original vector is normal.
Specifically, in step 6), the risk event is identified as follows:
inputting data, preprocessing the data, inputting the data into an unsupervised generation collaborative learning model, selecting a proper threshold according to actual conditions, and reconstructing a loss function when a generatorAnd judging that the event is abnormal if the binary cross loss function of the discriminator is larger than a given threshold value or the binary cross loss function of the discriminator is larger than the given threshold value.
The embodiment also provides a system for constructing the generated marketing event wind control rule base, which is used for realizing the method for constructing the generated marketing event wind control rule base.
Application example
The method for constructing the generated marketing event wind control rule base can reconstruct the event by using a generator to generate data which is as similar as the original data as possible, inputs the generated data and the original data into a discriminator, and judges that the event is an abnormal event when the original data is normal, if the original data is abnormal, the reconstructed data and the original data generated by the generator are greatly different, so that the loss function of the discriminator is overlarge, the abnormal amount of the same user can be judged, then the risk event of the type is classified into a type forming risk rule by using clustering, for example, the total amount of the same user number is transferred into the red packet in one hour to exceed XX element, and the abnormal event is judged when the type event exceeds XX element next time.
Table 1 power marketing event data
Further description of the creation of an unsupervised collaborative learning model, as shown in fig. 2, includes the following specific steps:
(1) The build generator step, typically, the generator G is built by minimizing reconstruction lossesTraining, wherein->The definition is as follows:
(1)
(2)
wherein,is the feature vector data input to the generator, +.>Is the corresponding reconstructed feature vector data, q is the dataSize, b is batch size. />Is the Euclidean distance between the reconstructed vector generated by the generator and the input feature vector is taken as the reconstruction loss of the instance, and +.>Is described->The average divided by the batch size b is used as the reconstruction penalty for the generator. By minimizing +.>This loss, the generator is trained to generate reconstructed features that are similar to the input features.
(2) Constructing a pseudo tag of a generator from which pseudo tags are created to train the arbiter, the pseudo tag being generated based on the reconstruction loss for each instanceDistribution is performed. The main idea is to treat the feature vector that generates higher loss value as abnormal and the feature vector that generates smaller loss value as normal. To achieve this, it is conceivable to select a suitable threshold value +.>
(3)
Wherein,is a pseudo tag generated by each instance correspondence generator, < >>Is the reconstruction penalty for each instance, +.>Is a suitable threshold value selected according to the specific situation.
(3) In the step of constructing the discriminator, a unified learning network (USMD) model is used as a detection function, as shown in fig. 3, and the USMD model uses an Isolation Forest algorithm (iflasted) as an anomaly detection method. The iferst is an unsupervised anomaly detection method suitable for continuous data, and is characterized in that normal data and anomaly data are segmented by isolating the anomaly data, a data set is recursively and randomly segmented in an isolated forest until all sample points are isolated, under the segmentation strategy, the anomaly points generally have shorter paths, and for each instance Q, Q is the size of the data, the average length of a tree is calculated firstly as follows:
(4)
wherein,H(i)is the sum of the quotations, which is ln (q)+yyIs an Euler-Mascheroni constant, corresponding to 0.5772156649.
The data of instance Q is passed through each tree (itere) of the isolated forest and the path length of each tree is predictedThe average expected path length can be calculated>
(5)
The isolated forest score can be calculated:
(6)
the obtained isolated forest score values
(7)
If the experimental result is thatsClose to 1, this will be classified as abnormal ifsMuch less than 1, the experimental results were considered normal. If the result is thatsApproaching 0.5, then the whole sample is virtually free of obvious anomalies, which can be considered normal, and thuss=0.5 as a threshold value, and data abnormality is judged.
The following algorithm flow is the process of realizing the USMD:
the above procedure gives the procedure steps of constructing the USMD, step 1 using orthogonal initializationRandom gradient descent optimization is performed to learn +.>Step 5 shows a time dependent network, the attention unit is shown in step 6-step 8, and then the time dependent network maps the unified representation back to the original input space in step 9. Step 11 performs a gradient descent step for optimization, step 18 is based on sub-sample size of the approximate average tree height +.>Setting the tree height limit height, this operation concentrates the interest on observations that have path lengths less than the average length, as they are more likely to be anomalous. Steps 19-23 show the stages of constructing the isolated forest, and finally returning to a forest set, thus obtaining the output of the USMD, i.e. the output of the arbiter.
Calculating a loss function of the discriminator, inputting the generated reconstruction data and the real sample data into the discriminator, wherein the loss function of the discriminator is calculated by pseudo tags generated by the generator G, and the loss of binary cross entropy on one batch b is minimizedTo train the arbiter. The loss function is defined as follows:
(8)
wherein,is a pseudo tag generated by the generator,>when the input feature vector is +.>The output of the time discriminator.
(4) Generating pseudo tag of the discriminator, wherein the pseudo tag of the discriminator is used for improving reconstruction recognition capability of the generator, and output of the discriminatorIs a feature vector +.>Is an abnormal probability of (a) is determined. Thus, by outputting->The feature vector with higher probability is regarded as abnormal using a threshold mechanism. The pseudo tag generated using the arbiter is then used to fine tune the generator in the next iteration.
(9)
Wherein the method comprises the steps ofIs a pseudo tag generated by the arbiter, +.>Is the output of the arbiter, +.>Is a selected one.
(5) Negative learning construction step of the generator, training of the generator employing negative learning (negative learning, NL) using pseudo tags from the discriminant to increase the distinction between reconstruction of normal and abnormal inputs, encouraging the generator to perform tag-flipped reconstruction of samples with abnormal pseudo tags, which are intended to be reconstructed as usual with minimal error. Modification of equation (1) gives a loss function that includes negative learning:
(10)
(11)
wherein,namely the reconstructed tag vector, and the loss function calculated by taking in the reconstructed tag vector>By minimizing this loss function, the generator is trained and optimized.
The above steps are cyclically performed, alternating the training generator and the arbiter until a predetermined number of training times or stop condition is reached. In each iteration, the generator counterfeits the tag by generating reconstructed samples for training the discriminant. The trained discriminant then creates pseudo tags for improved training of the generator. Through the alternate training generator and the discriminator, the whole system gradually learns the distribution and the characteristics of the data so as to realize the detection and the reconstruction of the abnormal sample.
(6) And constructing a risk rule base, namely virtually generating and identifying marketing risk events by using a GCL model, if one marketing risk event is judged to be abnormal, adding the marketing risk event into the risk rule base to update the risk rule base in real time, and classifying the marketing risk events of the same type into one type by using a K-Mean clustering method to form the risk rule base. The method comprises the following specific steps:
(1) The risk event data is input into the GCL model.
(2) The reconstruction loss function of the generator or the loss function of the arbiter may be selected for calculation.
(3) And judging whether the marketing risk event is abnormal or not by comparing the loss function with a set threshold value. If the loss function exceeds a threshold, i.e., is greater than a given threshold, then the marketing risk event may be judged to be abnormal.
(4) If the risk event is judged to be abnormal, the risk event is added into a risk rule base to update the risk rule base in real time.
(5) The risk rule base stores the risk event judged to be abnormal, the K-Means clustering algorithm is used for clustering the risk event, and the risk rule is formed according to the clustering result.
Therefore, marketing events such as account number integral sudden increase and other risk events can be classified into one type by using a clustering algorithm, and the abnormal situation of the user can be judged to be the integral sudden increase when the situation is encountered next time, so that a risk rule base can be constructed by the method.

Claims (10)

1. The method for constructing the wind control rule base of the generated marketing event is characterized by comprising the following steps of:
step 1), data preparation, collection and preparation of a marketing risk event sample dataset for training and testing;
step 2), constructing a generator, wherein the generator receives marketing risk event sample data as input and generates a marketing risk event reconstruction sample according to the marketing risk event sample;
step 3), constructing a discriminator for distinguishing a real marketing risk event sample from a marketing risk event reconstruction sample generated by the generator;
step 4), defining a loss function, generating a vivid reconstruction sample by using a generator by minimizing the loss functions of the generator and the discriminator, and enabling the discriminator to have better discrimination capability;
step 5), training an unsupervised generation cooperative learning model through negative learning feedback, wherein the unsupervised generation cooperative learning model comprises a generator and a discriminator; in the training process, the training generator and the discriminator are alternated, firstly, the generator is fixed, and parameters of the discriminator are updated by minimizing the loss of the discriminator; then, fixing the discriminant, and updating parameters of the generator by minimizing loss of the generator; the performance of the generator is improved through a negative learning feedback mechanism until the generator and the discriminator reach a convergence state;
step 6), risk event identification, namely carrying out identification of marketing risk events by using a trained unsupervised generation collaborative learning model;
and 7) writing the risk rule base, writing the data which is judged to be the abnormal risk event into the risk rule base, and updating the risk rule base in real time.
2. The method of claim 1, wherein in step 2), the generator is an automatic encoder comprising an encoder and a decoder, the encoder mapping the input data to a low-dimensional representation in the potential space, and the decoder mapping the low-dimensional representation in the potential space back to the original data space.
3. The method for constructing a wind-control rule base for generated marketing events according to claim 2, wherein in the step 2), the training process of the generator is as follows:
1) Inputting real marketing risk event sample data;
2) The encoder converts the actual marketing risk event sample data into potential representation vectors;
3) The decoder decodes the potential representation vector into marketing risk event reconstruction samples;
4) Calculating the difference between the reconstructed sample and the real sample, namely the reconstruction loss;
5) The parameters of the generator are updated using a back-propagation algorithm to minimize reconstruction losses, bringing the reconstructed samples closer to the real samples.
4. The method for constructing a wind-control rule base for generated marketing events according to claim 1, wherein in the step 3), the discriminator adopts a model for detecting system anomalies, called a unified learning network, and in the anomaly behavior detection, the discriminator uses an isolated forest algorithm as an anomaly detection method; the arbiter accepts a sample as input and outputs a scalar value representing whether the input sample is a true sample or a reconstructed sample.
5. The method for constructing a wind-control rule base for generated marketing events according to claim 4, wherein in the step 3), the isolated forest algorithm is a tree-based unsupervised anomaly detection algorithm for rapidly identifying anomaly points in the reconstructed samples; the specific steps of the forest isolation algorithm are as follows:
1) Constructing an isolation tree: each isolation tree is constructed by randomly selecting characteristics and segmentation values; firstly, randomly selecting a feature from an input feature set; then randomly selecting a segmentation value from the value range of the selected feature; then placing the samples smaller than or equal to the segmentation value in the data set into a left subset, and placing the samples larger than the segmentation value into a right subset; finally, respectively carrying out recursion operation on the left subset and the right subset, and continuing to select the characteristics and the segmentation values until the stopping condition is met;
2) Constructing an isolation forest: repeating the step 1), constructing a plurality of isolation trees, and forming an isolation forest;
3) Calculating an anomaly score for the sample: for each sample, calculating its anomaly score in the isolated forest through the path length of the tree, the path length representing the number of edges traversed from the root node to the terminal node;
4) Setting a threshold value and identifying an abnormal point: the samples are divided into normal points and abnormal points according to the abnormality score and a preset threshold value.
6. The method for constructing the wind control rule base for the generated marketing event according to claim 4, wherein the unified learning network specifically comprises:
the unified learning network uses the combination of long-term memory and attention units to perform unified representation on input data, and then uses an isolated forest algorithm as an error detector to perform discrimination.
7. The method for building a wind-control rule base for generated marketing events according to claim 1, wherein in the step 4),
the loss function of the generator is a reconstructed loss functionReconstructing the loss function->Is an objective function used in an automatic encoder, reconstruct the loss function +.>Calculating by using a mean square error loss; the loss function of the arbiter measures its classification performance using a binary cross entropy loss function.
8. The method for constructing a wind-control rule base for generated marketing events according to claim 1, wherein in the step 5), the negative learning is specifically described as follows:
negative learning is a feedback mechanism of the arbiter to the generator, during which time the generator is encouraged to re-construct label-flipping samples with abnormal pseudo-labels, for samples with normal pseudo-labels intended to be re-created normally with minimal error; the original vector judged to be abnormal is replaced by a vector of all 1, and the original vector is used if the original vector is normal.
9. The method for constructing a wind-control rule base for generated marketing events according to claim 1, wherein in the step 6), the risk event is identified as follows:
inputting data, preprocessing the data, inputting the data into an unsupervised generation collaborative learning model, selecting a proper threshold according to actual conditions, and reconstructing a loss function when a generatorAnd judging that the event is abnormal if the binary cross loss function of the discriminator is larger than a given threshold value or the binary cross loss function of the discriminator is larger than the given threshold value.
10. A system for constructing a wind control rule base of a generated marketing event, which is used for realizing the method for constructing the wind control rule base of the generated marketing event according to any one of claims 1 to 9.
CN202311417507.2A 2023-10-30 2023-10-30 Construction method and system of wind control rule base of generated marketing event Pending CN117151768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311417507.2A CN117151768A (en) 2023-10-30 2023-10-30 Construction method and system of wind control rule base of generated marketing event

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311417507.2A CN117151768A (en) 2023-10-30 2023-10-30 Construction method and system of wind control rule base of generated marketing event

Publications (1)

Publication Number Publication Date
CN117151768A true CN117151768A (en) 2023-12-01

Family

ID=88884790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311417507.2A Pending CN117151768A (en) 2023-10-30 2023-10-30 Construction method and system of wind control rule base of generated marketing event

Country Status (1)

Country Link
CN (1) CN117151768A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543943A (en) * 2018-10-17 2019-03-29 国网辽宁省电力有限公司电力科学研究院 A kind of electricity price inspection execution method based on big data deep learning
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN111275480A (en) * 2020-01-07 2020-06-12 成都信息工程大学 Multi-dimensional sparse sales data warehouse oriented fraud behavior mining method
CN111582651A (en) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 User risk analysis model training method and device and electronic equipment
CN112199670A (en) * 2020-09-30 2021-01-08 西安理工大学 Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
WO2021258348A1 (en) * 2020-06-24 2021-12-30 深圳市欢太科技有限公司 Abnormal flow detection method and system and computer storage medium
CN114707571A (en) * 2022-02-24 2022-07-05 南京审计大学 Credit data anomaly detection method based on enhanced isolation forest
CN116128544A (en) * 2022-12-20 2023-05-16 烟台海颐软件股份有限公司 Active auditing method and system for electric power marketing abnormal business data
CN116307281A (en) * 2023-05-18 2023-06-23 深圳市迪博企业风险管理技术有限公司 Enterprise operation abnormity early warning method for generating countermeasure network based on time sequence
CN116541782A (en) * 2023-04-26 2023-08-04 国网新疆电力有限公司哈密供电公司 Power marketing data anomaly identification method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543943A (en) * 2018-10-17 2019-03-29 国网辽宁省电力有限公司电力科学研究院 A kind of electricity price inspection execution method based on big data deep learning
CN110210686A (en) * 2019-06-13 2019-09-06 郑州轻工业学院 A kind of electricity charge risk model construction method of electric power big data
CN111275480A (en) * 2020-01-07 2020-06-12 成都信息工程大学 Multi-dimensional sparse sales data warehouse oriented fraud behavior mining method
CN111582651A (en) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 User risk analysis model training method and device and electronic equipment
WO2021258348A1 (en) * 2020-06-24 2021-12-30 深圳市欢太科技有限公司 Abnormal flow detection method and system and computer storage medium
CN115606162A (en) * 2020-06-24 2023-01-13 深圳市欢太科技有限公司(Cn) Abnormal flow detection method and system, and computer storage medium
CN112199670A (en) * 2020-09-30 2021-01-08 西安理工大学 Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
CN114707571A (en) * 2022-02-24 2022-07-05 南京审计大学 Credit data anomaly detection method based on enhanced isolation forest
CN116128544A (en) * 2022-12-20 2023-05-16 烟台海颐软件股份有限公司 Active auditing method and system for electric power marketing abnormal business data
CN116541782A (en) * 2023-04-26 2023-08-04 国网新疆电力有限公司哈密供电公司 Power marketing data anomaly identification method
CN116307281A (en) * 2023-05-18 2023-06-23 深圳市迪博企业风险管理技术有限公司 Enterprise operation abnormity early warning method for generating countermeasure network based on time sequence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
1. EZGI GURSEL: "Using artificial intelligence to detect human errors in nuclear power plants: A case in operation and maintenance", 《NUCLEAR ENGINEERING AND TECHNOLOGY》 *
陈杰: "基于改进生成式对抗网络的电网异常数据辨识方法", 《电力建设》, vol. 42, no. 5 *

Similar Documents

Publication Publication Date Title
CN106817248B (en) APT attack detection method
CN109902740B (en) Re-learning industrial control intrusion detection method based on multi-algorithm fusion parallelism
CN110460605B (en) Abnormal network flow detection method based on automatic coding
CN109871954B (en) Training sample generation method, abnormality detection method and apparatus
Park et al. Graph transplant: Node saliency-guided graph mixup with local structure preservation
CN108520272A (en) A kind of semi-supervised intrusion detection method improving blue wolf algorithm
CN112686775A (en) Power network attack detection method and system based on isolated forest algorithm
CN111598179B (en) Power monitoring system user abnormal behavior analysis method, storage medium and equipment
CN112087447B (en) Rare attack-oriented network intrusion detection method
CN112732748B (en) Non-invasive household appliance load identification method based on self-adaptive feature selection
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN109902754A (en) A kind of efficiently semi-supervised multi-level intrusion detection method and system
CN113949549A (en) Real-time traffic anomaly detection method for intrusion and attack defense
CN114301719B (en) Malicious update detection method and system based on variational self-encoder
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN108763926B (en) Industrial control system intrusion detection method with safety immunity capability
CN114781779A (en) Unsupervised energy consumption abnormity detection method and device and storage medium
CN117349786B (en) Evidence fusion transformer fault diagnosis method based on data equalization
CN113343123A (en) Training method and detection method for generating confrontation multiple relation graph network
CN111797899B (en) Low-voltage transformer area kmeans clustering method and system
CN117633780A (en) Intrusion detection method combining attention and CNN-BiLSTM
Houben et al. Coupling of K-NN with decision trees for power system transient stability assessment
CN117151768A (en) Construction method and system of wind control rule base of generated marketing event
CN110348489A (en) A kind of partial discharge of transformer mode identification method based on autoencoder network
CN114124437B (en) Encrypted flow identification method based on prototype convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20231201

RJ01 Rejection of invention patent application after publication