CN112232948A - Method and device for detecting abnormality of flow data - Google Patents

Method and device for detecting abnormality of flow data Download PDF

Info

Publication number
CN112232948A
CN112232948A CN202011205158.4A CN202011205158A CN112232948A CN 112232948 A CN112232948 A CN 112232948A CN 202011205158 A CN202011205158 A CN 202011205158A CN 112232948 A CN112232948 A CN 112232948A
Authority
CN
China
Prior art keywords
flow data
vector
network model
inputting
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011205158.4A
Other languages
Chinese (zh)
Inventor
柳毅
郭三田
凌捷
罗玉
陈家辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202011205158.4A priority Critical patent/CN112232948A/en
Publication of CN112232948A publication Critical patent/CN112232948A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs

Abstract

The invention discloses a method and a device for detecting the abnormity of flow data, an electronic device and a storage medium, wherein the method comprises the following steps: inputting the preprocessed flow data into an automatic coding network model to automatically code and decode the flow data to obtain a reconstructed feature vector; then inputting the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; the internal characteristic information is used for representing the front-back correlation of the internal characteristics of the flow data; and finally, classifying the internal characteristic information through a Sigmoid function to obtain a two-classification result for representing whether the flow data is abnormal or not, so that the accuracy rate of abnormal detection can be improved.

Description

Method and device for detecting abnormality of flow data
Technical Field
The present invention relates to the field of neural network technologies, and in particular, to a method and an apparatus for detecting an anomaly of traffic data, an electronic device, and a storage medium.
Background
Nowadays, with the popularization and development of wechat, pay-for-use and various credit cards, more and more people select a convenient and quick online transaction payment mode, so that transaction fraud behaviors in the financial field become more and more, for example, a credit card is acquired from a card issuing bank through stolen identity information or a forged credit card is bound to the credit card for shopping consumption, further cash register and the like, so that economic losses are brought to various financial institutions, and great negative influences are brought to reputation and image of the financial institutions, and how to effectively detect the transaction fraud behaviors with high possibility becomes a key point of attention of various banking financial institutions.
For fraud prevention and detection, the traditional method is based on rules, and the rules are manually formulated by experts and then used for marking transaction data, but the method has low flexibility, so that lawless persons can easily bypass the formulated rules, and various fraud behaviors are carried out by utilizing platform vulnerabilities, thereby causing huge losses to bank enterprises and various financial and financial product customers.
The prior art is more applicable to risk control architectures and employs credit assessment systems based on big data, which usually only perform passive post-analysis on problematic transaction data, but cannot recover the resulting loss. Moreover, the risk examination work is generally done manually, but the manual examination has great uncertainty and poor stability, and cannot really meet the increasing financial transaction data requirement of the market at present.
In addition, most of these systems directly bring the preprocessed raw data into the learning algorithm to construct an anti-fraud detection model, such as the technical scheme disclosed in the patent of "a fraud detection model training method and apparatus and fraud detection method and apparatus" applied in china (published japanese 2019.03.01, publication No. CN 109410036A). The defects of the method are as follows: before the original data is brought into the algorithm training, the artificial data cleaning, data characteristic selection or data dimension reduction process is carried out; due to the existence of artificial subjective factors, deep-level features of some original data may be filtered or lost in the feature selection process, so that the recognition and detection of a subsequent model are influenced, and the detection result is not accurate enough.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a method and a device for detecting the abnormality of the flow data, an electronic device and a storage medium, which can improve the accuracy of abnormality detection.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the first aspect of the embodiments of the present invention discloses a method for detecting an anomaly of traffic data, including the following steps:
s1: inputting the preprocessed flow data into an automatic coding network model to automatically code and decode the flow data to obtain a reconstructed feature vector;
s2: inputting the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; wherein the internal characteristic information is used for representing the front-back correlation of the internal characteristic of the flow data;
s3: classifying the internal characteristic information through a Sigmoid function to obtain two classification results; and the classification result is used for representing whether the flow data is abnormal or not.
Further, the automatic coding network model comprises an encoder and a decoder; step S1 includes:
s1.1: inputting the preprocessed flow data into the encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
s1.2: and inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
Further, step S1.2 comprises:
inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain a reconstruction characteristic vector which can be input into the recurrent neural network model.
Further, in step S3, the two classification results are specifically 1 or 0, and when the two classification results are 1, the two classification results are used to characterize the traffic data is abnormal; and when the two-classification result is 0, the two-classification result is used for representing that the flow data is not abnormal.
A second aspect of the present invention discloses an anomaly detection device for traffic data, including:
the reconstruction unit is used for inputting the preprocessed flow data into an automatic coding network model so as to automatically code and decode the flow data to obtain a reconstruction characteristic vector;
the characteristic acquisition unit is used for inputting the reconstructed characteristic vector into a recurrent neural network model so as to obtain internal characteristic information; wherein the internal characteristic information is used for representing the front-back correlation of the internal characteristic of the flow data;
the classification unit is used for classifying the internal characteristic information through a Sigmoid function to obtain a two-classification result; and the classification result is used for representing whether the flow data is abnormal or not.
Further, the automatic coding network model comprises an encoder and a decoder; the reconstruction unit includes:
the dimensionality reduction module is used for inputting the preprocessed flow data into the encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
and the reconstruction module is used for inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
Further, the reconstruction module is specifically configured to input the low-dimensional intermediate vector into the decoder, so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain a reconstruction characteristic vector which can be input into the recurrent neural network model.
Further, the two classification results are specifically 1 or 0, and when the two classification results are 1, the two classification results are used for representing that the flow data are abnormal; and when the two-classification result is 0, the two-classification result is used for representing that the flow data is not abnormal.
A third aspect of an embodiment of the present invention discloses an electronic device, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the method for detecting the abnormal flow data disclosed by the first aspect of the embodiment of the invention.
A fourth aspect of the present invention discloses a computer-readable storage medium, which stores a computer program, where the computer program causes a computer to execute the method for detecting an anomaly of traffic data disclosed in the first aspect of the present invention. The computer readable storage medium includes a ROM/RAM, a magnetic or optical disk, or the like.
A fifth aspect of the embodiments of the present invention discloses a computer program product, which, when running on a computer, causes the computer to perform part or all of the steps of any one of the methods of the first aspect.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the invention discloses an anomaly detection method and device of flow data, electronic equipment and a storage medium, wherein the flow data after pretreatment is input into an automatic coding network model to automatically code and decode the flow data to obtain a reconstruction characteristic vector, and then the reconstruction characteristic vector is input into a recurrent neural network model to obtain internal characteristic information; the internal feature information is used for representing the front-back correlation of the internal features of the flow data, and is classified through a Sigmoid function to obtain a classification result for representing whether the flow data is abnormal or not, so that the complex high-dimensional data can be mapped into a low-dimensional vector through an automatic coding network model on the premise of ensuring the completeness of the data, the front-back correlation of the internal features of the flow data is deeply mined, the implicit sequence features of the flow data are obtained, the characteristics among the data can be better utilized, and the accuracy of abnormality detection is improved. Meanwhile, the neural network algorithm is used for detection, so that the efficiency of anomaly detection can be greatly improved.
Drawings
Fig. 1 is a flowchart of an anomaly detection method for traffic data according to an embodiment of the present invention.
Fig. 2 is a schematic network structure diagram of a BiLSTM network model disclosed in the embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an anomaly detection device for traffic data according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Wherein: 301. a reconstruction unit; 302. a feature acquisition unit; 303. a classification unit; 401. a memory; 402. a processor.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, the present embodiment provides a method for detecting an abnormality of traffic data, including the following steps:
s1: and inputting the preprocessed flow data into an automatic coding network model so as to automatically code and decode the flow data to obtain a reconstructed feature vector.
The traffic data may include, but is not limited to, financial transaction data, code traffic data, and the like.
It should be noted that, in an application scenario, that is, when the traffic data includes financial transaction data, the anomaly detection method for the traffic data disclosed in the embodiment of the present invention is implemented, compared with a conventional credit evaluation system based on big data, on the premise of ensuring data integrity, the complex high-dimensional data is mapped into a low-dimensional vector through an automatic coding network model, and deep mining is performed on the front-back correlation of the internal features of the financial transaction data to obtain the implicit sequence features of the financial transaction data, so that the characteristics between the data can be better utilized, the abnormal financial transaction data can be more accurately detected, fraud is further identified, the financial transaction risk is reduced, and meanwhile, the abnormal financial transaction data can be more efficiently detected by utilizing the high efficiency of a neural network algorithm.
The automatic coding network model includes, but is not limited to, an Encoder network model or a noise reduction self-Encoder DAE. The automatic coding network model is obtained based on neural network training, for example, given a neural network, and assuming that its output is the same as its input, then training the neural network to adjust its parameters, and obtaining the weights in each layer, it means that the automatic coding network model training is completed. An automatic coding network model is a neural network that reproduces the input signal as much as possible. In order to realize the reproduction, the automatic coding network model must capture the most important factors representing the input signal, find the main components representing the original information (input signal), which belongs to the coding process, and then decode according to the main components, i.e. reconstruct the input signal.
Naturally, when the preprocessed flow data is input into the automatic coding network model, the model automatically performs feature extraction on the flow data according to the weight in each layer adjusted during training, so that several different representations (each layer represents one representation) of the flow data can be obtained, the representation of any layer is the main component of the flow data, the representation process belongs to the coding process, and then decoding is performed according to the representation of any layer, so that the reconstruction of the flow data can be realized, and the reconstruction feature vector which is very close to the flow data can be reproduced.
Optionally, before step S1, the following steps may be further included:
the method comprises the steps of preprocessing acquired flow data, including but not limited to cleaning and normalizing missing values, repeated values and abnormal values in the flow data, and performing resampling processing on positive and negative samples according to unbalanced flow data type distribution. Therefore, the accuracy of detection can be further improved by adopting a small number of classes of oversampling technology smote to enhance the data.
Optionally, the automatic coding network model comprises an encoder and a decoder; step S1 may include:
s1.1: inputting the preprocessed flow data into an encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
assuming that the preprocessed traffic data is x, after x is input into the automatically coded network model, the encoder of the automatically coded network model performs dimension reduction on x by using the following formula (1), that is, mapping x into a low-dimensional intermediate vector y, where the dimension of the low-dimensional intermediate vector is lower than that of the traffic data:
y=σ(Wx+b) (1)
where σ denotes a nonlinear activation function, W denotes a coding weight value, and b denotes a coding bias value.
S1.2: and inputting the low-dimensional intermediate vector into a decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
Then, after the low-dimensional intermediate vector y is input into a decoder of the automatic coding network model, the decoder may perform decoding reconstruction on the low-dimensional intermediate vector generated by the encoder through the following formula (2) to obtain a reconstructed feature vector Z with most of key information retained:
Z=σ(W'y+b') (2)
where W 'represents a decoding weight value and b' represents a decoding bias value.
Further optionally, step S1.2 may comprise: inputting the low-dimensional intermediate vector into a decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain the reconstruction characteristic vector capable of being input into the recurrent neural network model.
The specified conditions can be set according to actual conditions. The closeness degree between the candidate reconstruction feature vector and the flow data represents the closeness degree between most of key information reserved by the candidate reconstruction feature vector and the key information of the flow data, when the closeness degree reaches a specified condition, the fact that most of key information reserved by the candidate reconstruction feature vector is enough is indicated, the fact that the candidate reconstruction feature vector can better reserve the main features of the flow data is indicated, the dimension of the candidate reconstruction feature vector can be further adjusted to obtain the reconstruction feature vector of the dimension adaptive recurrent neural network model, and the dimension of the reconstruction feature vector is smaller than that of the flow data. By the implementation mode, the flow data can be compressed and dimension reduced, the dimension of the middle layer is smaller than that of the input layer, the compressed reconstruction feature vector is obtained for representation, and the most representative and meaningful significant features of the flow data can be obtained.
Alternatively, the reconstructed feature vector may be a three-dimensional vector.
S2: inputting the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; the internal characteristic information is used for representing the front-back correlation of the internal characteristics of the flow data;
the recurrent neural network model is used for learning implicit sequence features of the reconstructed feature vector so as to obtain internal feature information for representing the front-back correlation of the internal features of the flow data. The recurrent neural network model may remember the output value of the network at the last time and use this value for the generation of the output value at the current time, which is implemented by the recurrent layer in the network. The input of the recurrent neural network model is the reconstructed feature vector, and the recurrent neural network model can receive an input at each moment, generate an output value after weighting the input value, and then act on an activation function to obtain an output (i.e. internal feature information). This output is determined by the previous sequence co-operation, i.e. the state value at the previous time is integrated with the input value at the current time, so that the internal characteristic information can be used to characterize the contextual relevance of the internal characteristics of the flow data.
Optionally, the recurrent neural network model includes, but is not limited to, a BilSTM network model or gated recurrent units GRU, etc.
Optionally, if the automatic coding network model is an Encoder network model, the recurrent neural network model is a BiLSTM network model. Then before step S1, a model framework of an Encoder-BiLSTM combination, including an encor network model and a BiLSTM network model, may be designed; then, acquiring an original data sample set used for training a model, and training a model frame to obtain a trained Encoder network model and a BiLSTM network model; the Encoder network model is used for carrying out automatic coding and decoding processing on the preprocessed flow data and generating a reconstruction characteristic vector, and the BilSTM network model is used for learning the implicit sequence characteristics of the reconstruction characteristic vector so as to obtain internal characteristic information for representing the front-back correlation of the internal characteristics of the flow data.
Alternatively, the set of raw data samples may be a set of financial transaction records for an individual, the set of financial transaction records including a number of financial transaction data samples. Wherein, before training the model, the raw data sample can be preprocessed. Specifically, firstly, considering the privacy of the financial transaction data samples, some sensitive characteristic information contained in the financial transaction data samples may be deleted, and each deleted financial transaction data sample includes characteristic information of each dimension, such as a transaction commodity, a transaction time, a transaction amount, a transaction location, and the like. Meanwhile, if the financial transaction data sample may have a missing value, a repeated value or an abnormal value, whether the financial transaction data sample has the missing value or not can be further judged, and irrelevant redundant information is removed; next, because the values of different characteristics are different, the characteristics of the financial transaction data sample can be subjected to standard normalization processing, and the characteristics are mapped to the [0, 1] interval, so that the network is easier to learn, and the training efficiency of the model is improved.
And optionally, in the original data sample set, the proportion of the positive and negative sample categories is usually extremely unbalanced, and the financial transaction data samples can be resampled by further using a synthetic minority oversampling technology smote, so that the model trained according to the financial transaction data samples has better accuracy.
Optionally, in order to improve the accuracy of the encor network model, parameters of the encor network model may be optimized by a mean square error function MSE during the training of the encor network model.
Wherein, the averaging error function can be shown as the following formula (3):
Figure BDA0002756793510000071
in the formula, yiA vector label representing the true output,
Figure BDA0002756793510000072
and a vector label representing the prediction output is used for marking whether the flow data is abnormal or not, and n represents the number of data samples in the original data sample set.
Alternatively, the network structure of the BiLSTM network model may be as shown in fig. 2, which is a modified version of LSTM, and includes two LSTM layers, i.e., a forward LSTM layer of a right-to-left processing sequence and a backward LSTM layer of a left-to-right processing sequence, where each LSTM layer includes n LSTM neurons, as shown by circles in fig. 2, weight sharing is performed between each LSTM neuron, and each LSTM neuron includes a forgetting gate, an input gate and an output gate, which are not shown, inside thereof, where the forgetting gate is used to control elimination of redundant information, the input gate is used to control retention of input information, and the output gate is used to receive information of the forgetting gate and the output gate, so that each LSTM neuron may filter signals inside thereof and then transmit the signals to the next LSTM neuron. Therefore, the BilSTM network model can well analyze the bidirectional data information, store the dependency relationship of long data characteristics and provide a finer-grained calculation.
As shown in FIG. 2, X0、X1、Xt-1、Xt、XnRespectively representing the input signals at 0, 1, t-1, t, n, which are embodied as data dimensional matrices, i.e. reconstructed eigenvectors. Accordingly, y0、y1、yt-1、yt、ynRespectively represent the output signals at the above respective moments, i.e. the internal characteristic information. Taking the signal processing flow at time t as an example, the signal X is inputtAnd output of the previous time
Figure BDA0002756793510000081
Post-output via backward LSTM neurons
Figure BDA0002756793510000082
And input signal XtAnd output of the previous time
Figure BDA0002756793510000083
Post-output via forward LSTM neurons
Figure BDA0002756793510000084
Finally, according to the output result
Figure BDA0002756793510000085
And
Figure BDA0002756793510000086
activating a function g, outputting a weight matrix U and outputting a bias c to obtain the output y of the BilSTM network model at the time tt
Specifically, optionally, the calculation process of the BiLSTM network model may be represented by the following formula:
Figure BDA0002756793510000087
Figure BDA0002756793510000088
Figure BDA0002756793510000089
in the formula (I), the compound is shown in the specification,
Figure BDA00027567935100000810
and
Figure BDA00027567935100000811
respectively representing the output results of the backward LSTM layer and the forward LSTM layer at time t,
Figure BDA00027567935100000812
and
Figure BDA00027567935100000813
respectively represent the output results of the backward LSTM layer and the forward LSTM layer at the time t-1,
Figure BDA00027567935100000814
and
Figure BDA00027567935100000815
network hidden layer parameters, X, representing backward and forward LSTM layers, respectivelytA reconstructed feature vector is represented that represents the feature vector,
Figure BDA00027567935100000816
and
Figure BDA00027567935100000817
respectively representing the bias values of a backward LSTM layer and a forward LSTM layer, g representing a Sigmoid activation function, U representing a weight matrix of an output, c representing the bias of the output, y representing the bias of the outputtRepresented is the output of the BiLSTM network model at time t.
Optionally, in order to obtain a more accurate bilst network model, in the training process for the bilst network model, a log-loss function and Adaptive Moment Estimation (Adam) optimizer may be used to continuously perform an iterative optimization process on the model to obtain a desired model.
S3: and classifying the internal characteristic information through a Sigmoid function to obtain a binary classification result.
And the classification result is used for representing whether the flow data is abnormal or not.
The Sigmoid function is a common Sigmoid function, and is also called an S-type growth curve. Due to the characteristics of single increase and single increase of an inverse function, the Sigmoid function can be used as a threshold function of the neural network, and internal feature information is mapped between 0 and 1, which can be specifically calculated by the following formula (7):
Figure BDA0002756793510000091
in the formula, s (z) represents the mapping result, e is a constant, and z represents the internal feature information.
Therefore, step S3 may include: the internal feature information is mapped by the above formula (7), a mapping result of the internal feature information in the [0, 1] interval is obtained, and then whether the mapping result is biased to 0 or 1 is judged, so that a binary result of the internal feature information can be obtained. Optionally, the result of the binary classification may be 1 or 0, and when it is 1, it may indicate that the risk is high, and then the traffic data is abnormal, and when it is 0, it indicates that the risk is low, and then the traffic data is not abnormal.
The embodiment provides an anomaly detection method for flow data, which includes inputting preprocessed flow data into an automatic coding network model to perform automatic coding and decoding on the flow data to obtain a reconstructed feature vector, and then inputting the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; the internal feature information is used for representing the front-back correlation of the internal features of the flow data, and is classified through a Sigmoid function to obtain a classification result for representing whether the flow data is abnormal or not, so that the complex high-dimensional data can be mapped into a low-dimensional vector through an automatic coding network model on the premise of ensuring the completeness of the data, the front-back correlation of the internal features of the flow data is deeply mined, the implicit sequence features of the flow data are obtained, the characteristics among the data can be better utilized, and the accuracy of abnormality detection is improved. Meanwhile, the neural network algorithm is used for detection, so that the efficiency of anomaly detection can be greatly improved.
Example 2
As shown in fig. 3, the present embodiment provides an abnormality detection apparatus for flow data, including a reconstruction unit 301, a feature acquisition unit 302, and a classification unit 303, where:
a reconstructing unit 301, configured to input the preprocessed traffic data into an automatic coding network model, so as to perform automatic coding and decoding on the traffic data to obtain a reconstructed feature vector;
a feature obtaining unit 302, configured to input the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; the internal characteristic information is used for representing the front-back correlation of the internal characteristics of the flow data;
a classifying unit 303, configured to classify the internal feature information through a Sigmoid function to obtain a classification result; and the classification result is used for representing whether the flow data is abnormal or not.
Optionally, the automatic coding network model comprises an encoder and a decoder; the reconstruction unit 301 described above may include the following modules, not shown:
the dimensionality reduction module is used for inputting the preprocessed flow data into the encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
and the reconstruction module is used for inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
Optionally, the reconstruction module is specifically configured to input the low-dimensional intermediate vector into a decoder, so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain the reconstruction characteristic vector capable of being input into the recurrent neural network model.
Optionally, the two-classification result is specifically 1 or 0, and when the two-classification result is 1, the two-classification result is used for representing that the flow data is abnormal; and when the classification result is 0, the classification result is used for representing that the flow data is not abnormal.
The embodiment provides an anomaly detection device for flow data, which inputs preprocessed flow data into an automatic coding network model to automatically code and decode the flow data to obtain a reconstructed feature vector, and then inputs the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; the internal feature information is used for representing the front-back correlation of the internal features of the flow data, and is classified through a Sigmoid function to obtain a classification result for representing whether the flow data is abnormal or not, so that the complex high-dimensional data can be mapped into a low-dimensional vector through an automatic coding network model on the premise of ensuring the completeness of the data, the front-back correlation of the internal features of the flow data is deeply mined, the implicit sequence features of the flow data are obtained, the characteristics among the data can be better utilized, and the accuracy of abnormality detection is improved. Meanwhile, the neural network algorithm is used for detection, so that the efficiency of anomaly detection can be greatly improved.
Example 3
As shown in fig. 4, the present embodiment provides an electronic device, including:
a memory 401 storing executable program code;
a processor 402 coupled with the memory 401;
the processor 402 calls the executable program code stored in the memory 401 to execute the method for detecting the traffic data exception described in the above embodiments.
It should be noted that the electronic device shown in fig. 4 may further include components, which are not shown, such as a power supply, an input key, a speaker, a microphone, a screen, an RF circuit, a Wi-Fi module, a bluetooth module, and a sensor, which are not described in detail in this embodiment. The mobile phone can also comprise undisplayed parts such as a loudspeaker module, a camera module, a display screen, a light projection module, a battery module, a wireless communication module (such as a mobile communication module, a WIFI module, a Bluetooth module and the like), a sensor module (such as a proximity sensor, a pressure sensor and the like), an input module (such as a microphone and a key) and a user interface module (such as a charging interface, an external power supply interface, a clamping groove and a wired earphone interface and the like).
An embodiment of the present application discloses a computer-readable storage medium, which stores a computer program, wherein the computer program causes a computer to execute the anomaly detection method for traffic data described in each embodiment.
The embodiments of the present application also disclose a computer program product, wherein, when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. An anomaly detection method for traffic data is characterized by comprising the following steps:
s1: inputting the preprocessed flow data into an automatic coding network model to automatically code and decode the flow data to obtain a reconstructed feature vector;
s2: inputting the reconstructed feature vector into a recurrent neural network model to obtain internal feature information; wherein the internal characteristic information is used for representing the front-back correlation of the internal characteristic of the flow data;
s3: classifying the internal characteristic information through a Sigmoid function to obtain two classification results; and the classification result is used for representing whether the flow data is abnormal or not.
2. The method of claim 1, wherein the automatically encoded network model comprises an encoder and a decoder; step S1 includes:
s1.1: inputting the preprocessed flow data into the encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
s1.2: and inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
3. A method for detecting anomalies in flow data as claimed in claim 2, characterized in that step S1.2 comprises:
inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain a reconstruction characteristic vector which can be input into the recurrent neural network model.
4. The method according to any one of claims 1 to 3, wherein in step S3, the classification result is specifically 1 or 0, and when the classification result is 1, the two classification results are used to characterize the flow data as abnormal; and when the two-classification result is 0, the two-classification result is used for representing that the flow data is not abnormal.
5. An abnormality detection device for traffic data, characterized by comprising:
the reconstruction unit is used for inputting the preprocessed flow data into an automatic coding network model so as to automatically code and decode the flow data to obtain a reconstruction characteristic vector;
the characteristic acquisition unit is used for inputting the reconstructed characteristic vector into a recurrent neural network model so as to obtain internal characteristic information; wherein the internal characteristic information is used for representing the front-back correlation of the internal characteristic of the flow data;
the classification unit is used for classifying the internal characteristic information through a Sigmoid function to obtain a two-classification result; and the classification result is used for representing whether the flow data is abnormal or not.
6. The apparatus for detecting abnormality of traffic data according to claim 5, wherein said automatic coding network model includes an encoder and a decoder; the reconstruction unit includes:
the dimensionality reduction module is used for inputting the preprocessed flow data into the encoder so that the encoder maps the flow data into a low-dimensional intermediate vector;
and the reconstruction module is used for inputting the low-dimensional intermediate vector into the decoder so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a reconstructed feature vector.
7. The apparatus according to claim 6, wherein the reconstruction module is specifically configured to input the low-dimensional intermediate vector to the decoder, so that the decoder decodes and reconstructs the low-dimensional intermediate vector to obtain a candidate reconstructed feature vector; and if the proximity degree between the candidate reconstruction characteristic vector and the flow data reaches a specified condition, adjusting the candidate reconstruction characteristic vector through a reshape function to obtain a reconstruction characteristic vector which can be input into the recurrent neural network model.
8. The apparatus for detecting abnormality of flow data according to any one of claims 5 to 7, wherein the two classification results are specifically 1 or 0, and when the two classification results are 1, the two classification results are used to characterize the flow data abnormality; and when the two-classification result is 0, the two-classification result is used for representing that the flow data is not abnormal.
9. An electronic device, comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory for executing a method of anomaly detection of traffic data according to any one of claims 1 to 4.
10. A computer-readable storage medium characterized in that the computer-readable storage medium stores a computer program, wherein the computer program causes a computer to execute a method of abnormality detection of flow data according to any one of claims 1 to 4.
CN202011205158.4A 2020-11-02 2020-11-02 Method and device for detecting abnormality of flow data Pending CN112232948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011205158.4A CN112232948A (en) 2020-11-02 2020-11-02 Method and device for detecting abnormality of flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011205158.4A CN112232948A (en) 2020-11-02 2020-11-02 Method and device for detecting abnormality of flow data

Publications (1)

Publication Number Publication Date
CN112232948A true CN112232948A (en) 2021-01-15

Family

ID=74122439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011205158.4A Pending CN112232948A (en) 2020-11-02 2020-11-02 Method and device for detecting abnormality of flow data

Country Status (1)

Country Link
CN (1) CN112232948A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112857669A (en) * 2021-03-30 2021-05-28 武汉飞恩微电子有限公司 Fault detection method, device and equipment of pressure sensor and storage medium
CN113037775A (en) * 2021-03-31 2021-06-25 上海天旦网络科技发展有限公司 Network application layer full-flow vectorization record generation method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095301A1 (en) * 2017-09-22 2019-03-28 Penta Security Systems Inc. Method for detecting abnormal session
CN109639739A (en) * 2019-01-30 2019-04-16 大连理工大学 A kind of anomalous traffic detection method based on autocoder network
CN111275098A (en) * 2020-01-17 2020-06-12 同济大学 Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095301A1 (en) * 2017-09-22 2019-03-28 Penta Security Systems Inc. Method for detecting abnormal session
CN109639739A (en) * 2019-01-30 2019-04-16 大连理工大学 A kind of anomalous traffic detection method based on autocoder network
CN111275098A (en) * 2020-01-17 2020-06-12 同济大学 Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112857669A (en) * 2021-03-30 2021-05-28 武汉飞恩微电子有限公司 Fault detection method, device and equipment of pressure sensor and storage medium
CN112857669B (en) * 2021-03-30 2022-12-06 武汉飞恩微电子有限公司 Fault detection method, device and equipment of pressure sensor and storage medium
CN113037775A (en) * 2021-03-31 2021-06-25 上海天旦网络科技发展有限公司 Network application layer full-flow vectorization record generation method and system
CN113037775B (en) * 2021-03-31 2022-07-29 上海天旦网络科技发展有限公司 Network application layer full-flow vectorization record generation method and system

Similar Documents

Publication Publication Date Title
Zhang et al. Demeshnet: Blind face inpainting for deep meshface verification
Rahmouni et al. Distinguishing computer graphics from natural images using convolution neural networks
US20230018848A1 (en) Anomaly detector, method of anomaly detection and method of training an anomaly detector
CN111428853B (en) Negative sample countermeasure generation method with noise learning function
CN108985929B (en) Training method, business data classification processing method and device, and electronic equipment
CN111954250B (en) Lightweight Wi-Fi behavior sensing method and system
CN108268785B (en) Sensitive data identification and desensitization device and method
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
CN112232948A (en) Method and device for detecting abnormality of flow data
CN111275098A (en) Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof
CN111950429B (en) Face recognition method based on weighted collaborative representation
Boncolmo et al. Gender Identification Using Keras Model Through Detection of Face
CN115954019A (en) Environmental noise identification method and system integrating self-attention and convolution operation
Zorion et al. Credit Card Financial Fraud Detection Using Deep Learning
CN113657498B (en) Biological feature extraction method, training method, authentication method, device and equipment
CN115187266A (en) Credit card fraud detection method and system based on memory variation self-coding model
WO2021179198A1 (en) Image feature visualization method, image feature visualization apparatus, and electronic device
CN114387553A (en) Video face recognition method based on frame structure perception aggregation
Su et al. High-Similarity-Pass Attention for Single Image Super-Resolution
Zhang et al. Feature compensation network based on non-uniform quantization of channels for digital image global manipulation forensics
Dhar et al. Detecting deepfake images using deep convolutional neural network
CN117115928B (en) Rural area network co-construction convenience service terminal based on multiple identity authentications
CN116258579B (en) Training method of user credit scoring model and user credit scoring method
Li et al. Attention Inspiring Receptive-Fields Multi-Task Network via Self-supervised Learning for Violence Recognition
CN107273817A (en) A kind of face identification method and system based on rarefaction representation and average Hash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination