CN108596630B - Fraud transaction identification method, system and storage medium based on deep learning - Google Patents

Fraud transaction identification method, system and storage medium based on deep learning Download PDF

Info

Publication number
CN108596630B
CN108596630B CN201810407275.5A CN201810407275A CN108596630B CN 108596630 B CN108596630 B CN 108596630B CN 201810407275 A CN201810407275 A CN 201810407275A CN 108596630 B CN108596630 B CN 108596630B
Authority
CN
China
Prior art keywords
neural network
layer
rbm neural
training
fraud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810407275.5A
Other languages
Chinese (zh)
Other versions
CN108596630A (en
Inventor
许泰清
盛帅
张文慧
曾征
曾卓然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN201810407275.5A priority Critical patent/CN108596630B/en
Publication of CN108596630A publication Critical patent/CN108596630A/en
Application granted granted Critical
Publication of CN108596630B publication Critical patent/CN108596630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Abstract

The invention discloses a fraud transaction identification method, a system and a storage medium based on deep learning, wherein the method comprises the following steps: acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model; constructing a stacked RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer; reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model; and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions. The invention can improve the accuracy of fraud transaction identification without predefining a similarity measurement method, thereby reducing difficulty and cost and having high tolerance to sample data.

Description

Fraud transaction identification method, system and storage medium based on deep learning
Technical Field
The invention relates to the field of financial risk control, in particular to a fraud transaction identification method and system based on deep learning and a storage medium.
Background
The financial field has high requirements for transaction risk control. In the identification of fraud transactions by deep learning, a supervised learning algorithm is generally adopted to train a detection model at present, and features for training the detection model are constructed based on labeled historical transaction data, so that the detection model trained by the supervised learning algorithm can effectively identify historical fraud types, but generally has no capability of identifying unknown fraud types (such as fraud transactions which do not occur or are varied) lacking fraud samples, and the posterior property causes transaction risk identification to have hysteresis and lower accuracy.
On the other hand, when the existing unsupervised learning algorithm is adopted to identify the fraudulent transactions, the K-Means algorithm or the density-based clustering algorithm is generally adopted to directly cluster the data (without dimension reduction), the essence of the clustering algorithm is similarity-based metric learning (metric learning), the distance between samples needs to be manually defined in advance according to experience, however, the features used for training the detection model belong to high-dimensional features, and for the data of the high-dimensional features, the appropriate similarity measurement method is difficult to manually determine, and only the appropriate similarity can be determined through a large number of experiments, so that a large amount of time and manpower are consumed; however, the currently common feature dimension reduction method is Principal Component Analysis (PCA), the PCA is adapted to linear data complying with gaussian distribution, the data in practical application is basically nonlinear, the applicable conditions of the PCA are difficult to meet, and the PCA cannot achieve the expected dimension reduction effect or even fails in practice. In a word, the existing mode of carrying out fraud transaction identification through an unsupervised learning algorithm has huge difficulty and cost, and has strict requirements on sample data, and the sample data provided in practical application hardly meets the requirements.
Disclosure of Invention
The invention mainly aims to provide a fraud transaction identification method based on deep learning, and aims to solve the technical problems that the existing fraud transaction identification method is not accurate enough, has huge difficulty and cost, and has strict requirements on sample data.
In order to achieve the above object, the present invention provides a deep learning based fraud transaction identification method, including:
acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
Optionally, the step of constructing a stacked RBM neural network structure includes:
and setting the number of layers of the stacked RBM neural network structure and the number of output nodes of each layer of RBM neural network.
Optionally, the training the stacked RBM neural network structure based on the training samples, and the step of generating the dimensionality reducer includes:
determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and stacking the trained RBM neural networks to generate the dimensionality reducer.
Optionally, the step of training each layer of the stacked RBM neural network structure one by one based on the high-dimensional feature space includes:
and based on the high-dimensional feature space, determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one.
Optionally, the step of determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space includes:
randomly generating a parameter initial value of a first layer RBM neural network by utilizing normal distribution;
training a first layer of RBM neural network by using the dimension of the high-dimensional feature space as the input node number of the first layer of RBM neural network, and obtaining the parameters of the first layer of RBM neural network by adjusting the initial parameter value of the first layer of RBM neural network during training;
after obtaining the parameters of the N-1 layer RBM neural network, randomly generating the parameter initial value of the N layer RBM neural network by utilizing normal distribution;
and training the Nth layer of neural network by using the number of output nodes of the (N-1) th layer of RBM neural network as the number of input nodes of the Nth layer of RBM neural network, and obtaining parameters of the Nth layer of RBM neural network by adjusting the initial values of the parameters of the Nth layer of RBM neural network during training so as to obtain the parameters of each layer of RBM neural network, wherein N is more than or equal to 2.
Optionally, the each layer of RBM neural network includes a visible layer and a hidden layer, and the parameters of the each layer of RBM neural network include a weight matrix between the visible layer and the hidden layer, an offset of a visible node in the visible layer, and an offset of a hidden node in the hidden layer.
Optionally, the step of performing dimensionality reduction on the training sample by the dimensionality reducer and clustering the binary state vectors obtained through dimensionality reduction includes:
mapping the training samples into binary state vectors through the dimensionality reducer;
training samples with the same binary state vector are grouped into the same group so that the training samples are divided into groups.
Optionally, the step of analyzing the transaction data to be detected according to the fraud transaction detection model to identify a fraud transaction includes:
substituting the transaction data to be detected into the fraud transaction detection model to sequentially perform dimensionality reduction and clustering to obtain a transaction group to be detected;
and analyzing the fraud probability corresponding to each transaction group to be detected, and determining the transaction group to be detected which needs to be subjected to heavy review according to each analyzed fraud probability so as to identify the fraud transaction.
In addition, to achieve the above object, the present invention further provides a fraud transaction identification system based on deep learning, including: a memory, a processor, and a deep learning based fraud transaction identification program stored on the memory and executable on the processor, the deep learning based fraud transaction identification program when executed by the processor implementing the steps of:
acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
Furthermore, to achieve the above object, the present invention further provides a storage medium having a deep learning based fraud transaction identification program stored thereon, the deep learning based fraud transaction identification program implementing the following steps when executed by a processor:
acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
The method comprises the steps of constructing a stacked RBM neural network structure, training the RBM neural network structure based on unsupervised high-dimensional data samples, and generating a dimensionality reducer; and then, dimensionality reduction is carried out on unsupervised high-dimensional data samples through the generated dimensionality reducer, and binary state vectors obtained through dimensionality reduction are clustered, a similarity measurement method does not need to be defined in advance in the process, difficulty and cost are reduced, tolerance of sample data is high, so that a fraud transaction detection model is established for analyzing transaction data to be detected, dimensionality reduction and clustering of the transaction data to be detected based on the fraud transaction detection model are achieved, each transaction group to be detected with vivid characteristics can be obtained, fraud risk identification is carried out on each transaction group to be detected, historical fraud types and unknown fraud types of fraud transactions can be effectively identified, and accuracy of fraud transaction identification is improved.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a deep learning based fraud transaction identification method according to the present invention;
FIG. 3 is a schematic diagram of a 3-layer RBM neural network according to the present invention;
FIG. 4 is a detailed flowchart of a deep learning-based fraud transaction identification method according to a first embodiment of the present invention;
FIG. 5 is a schematic diagram of a RBM neural network of each layer according to the present invention;
fig. 6 is a schematic flow chart of a deep learning-based fraudulent transaction identification method according to a second embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model; constructing a stacked RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer; reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model; and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention is provided with a fraud transaction identification system based on deep learning.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a deep learning based fraud transaction identification program therein.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke the deep learning based fraudulent transaction identification program stored in the memory 1005 and perform the following operations:
acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model;
constructing a stacked RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
and setting the number of layers of the stacked RBM neural network structure and the number of output nodes of each layer of RBM neural network.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and stacking the trained RBM neural networks to generate the dimensionality reducer.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
and based on the high-dimensional feature space, determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
randomly generating a parameter initial value of a first layer RBM neural network by utilizing normal distribution;
training a first layer of RBM neural network by using the dimension of the high-dimensional feature space as the input node number of the first layer of RBM neural network, and obtaining the parameters of the first layer of RBM neural network by adjusting the initial parameter value of the first layer of RBM neural network during training;
after obtaining the parameters of the N-1 layer RBM neural network, randomly generating the parameter initial value of the N layer RBM neural network by utilizing normal distribution;
and training the Nth layer of neural network by using the number of output nodes of the (N-1) th layer of RBM neural network as the number of input nodes of the Nth layer of RBM neural network, and obtaining parameters of the Nth layer of RBM neural network by adjusting the initial values of the parameters of the Nth layer of RBM neural network during training so as to obtain the parameters of each layer of RBM neural network, wherein N is more than or equal to 2.
Further, each layer of RBM neural network comprises a visible layer and a hidden layer, and the parameters of each layer of RBM neural network comprise a weight matrix between the visible layer and the hidden layer, an offset of a visible node in the visible layer and an offset of a hidden node in the hidden layer.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
mapping the training samples into binary state vectors through the dimensionality reducer;
training samples with the same binary state vector are grouped into the same group so that the training samples are divided into groups.
Further, processor 1001 may invoke a deep learning based fraudulent transaction identification program stored in memory 1005, and also perform the following operations:
substituting the transaction data to be detected into the fraud transaction detection model to sequentially perform dimensionality reduction and clustering to obtain a transaction group to be detected;
and analyzing the fraud probability corresponding to each transaction group to be detected, and determining the transaction group to be detected which needs to be subjected to heavy review according to each analyzed fraud probability so as to identify the fraud transaction.
Based on the hardware structure, the invention provides various embodiments of the fraud transaction identification method based on deep learning.
Referring to fig. 2, a first embodiment of the deep learning-based fraudulent transaction identification method of the present invention provides a deep learning-based fraudulent transaction identification method, which includes:
step S10, obtaining training samples, wherein the training samples are transaction data used for establishing a fraud transaction detection model;
in the embodiment, the fraud transaction identification method based on deep learning is applied to a fraud transaction identification system based on deep learning. The present embodiment utilizes an unsupervised learning approach to establish a fraudulent transaction detection model.
In this embodiment, before step S10, a step of collecting historical transaction data over a period of time is included, where the historical transaction data includes basic information such as transaction time, transaction IP address, transaction area, transaction amount, transaction device, and the like, and the collected historical transaction data is used as a training sample for establishing a fraud transaction detection model, where the training sample is a set, and elements in the set are transaction data samples, for example:
training samples { transaction data sample 1, transaction data sample 2
{ (transaction time 1, transaction IP address 1, transaction area 1, transaction amount 1, transaction device 1), (transaction time 2, transaction IP address 2, transaction area 2, transaction amount 2, transaction device 2), }.
That is, each of the training samples is high-dimensional data composed of basic information of a transaction, and the training samples have no data label.
Step S20, constructing a stacked RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
in this embodiment, a stacked RBM neural network structure needs to be constructed, and the RBM neural network structure is trained based on training samples. RBM (Restricted Boltzmann Machine) is a Probabilistic graphical model (Probabilistic graphical model) that can be interpreted using Stochastic neural networks (Stochastic neural networks); the term "random" means that the neurons in the network are random neurons, the output of which has only two states (inactive and active), and is generally represented by binary 0 and 1, that is, each output node of the RBM takes a value of 0 or 1, and the specific value thereof needs to be determined according to a probability statistical rule; the connection between the neurons has the characteristics of no connection in layers and full connection between layers. As can be seen, the RBM is based on a bipartite (probability) graph structure. The number of layers of the RBM neural network structure and the number of output nodes of each layer of the RBM neural network may be set to construct a stacked RBM neural network structure, taking fig. 3 as an example, and fig. 3 is a schematic view of a 3-layer RBM neural network structure, that is, the number of layers of the RBM neural network structure is set to 3, and the number of output nodes of each layer of the RBM neural network is set to 6, 4, and 3, respectively, from bottom to top. And then, training the constructed RBM neural network structure according to the training samples so as to generate the dimensionality reducer. Specifically, referring to fig. 4, the step of training the stacked RBM neural network structure based on the training samples and generating the dimensionality reducer comprises:
step S21, determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
step S22, training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and step S23, stacking the RBM neural networks of each layer after training to generate the dimensionality reducer.
In specific implementation, the features of the training samples are determined first. Because each sample in the training samples is high-dimensional data composed of basic information of the transaction, a high-dimensional feature vector can be constructed based on the basic information of the transaction, and the high-dimensional feature vector forms a high-dimensional feature space. For example, when each of the training samples is 5-dimensional data composed of transaction time, transaction IP address, transaction area, transaction amount, and transaction device, a feature vector based on the transaction time, transaction IP address, transaction area, transaction amount, and transaction device may be constructed for each sample, whereby each of the training samples contains a 5-dimensional feature space.
Based on the high-dimensional feature space, from bottom to top, each layer of RBM neural network can be trained one by one from the bottom layer (the first layer), and the trained RBM neural networks are stacked to generate a dimensionality reducer. By the dimensionality reducer, the high-dimensional feature space of each sample in the training sample is mapped to a low-dimensional space. For example, taking the RBM neural network structure shown in fig. 3 as an example, the RBM neural network structure has 3 layers of RBM neural networks, and the number of output nodes of the top layer of RBM neural network is set to 3, so that the 5-dimensional feature space is reduced to 3 dimensions after being processed by the 3 layers of RBM neural networks.
Specifically, the step S22 may include:
step S220, based on the high-dimensional feature space, determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one.
Wherein, step S220 may include:
step S221, randomly generating a parameter initial value of a first layer RBM neural network by utilizing normal distribution;
step S222, training a first layer of RBM neural network by using the dimension of the high-dimensional feature space as the input node number of the first layer of RBM neural network, and obtaining the parameters of the first layer of RBM neural network by adjusting the initial values of the parameters of the first layer of RBM neural network during training;
step S223, after obtaining the parameters of the (N-1) th layer RBM neural network, randomly generating the parameter initial values of the (N) th layer RBM neural network by utilizing normal distribution;
step S224, training the Nth layer of neural network by using the output node number of the (N-1) th layer of RBM neural network as the input node number of the Nth layer of RBM neural network, and obtaining the parameters of the Nth layer of RBM neural network by adjusting the initial parameter value of the Nth layer of RBM neural network during training so as to obtain the parameters of each layer of RBM neural network, wherein N is more than or equal to 2.
Training each layer of RBM neural network one by one means adjusting parameters of each layer of RBM neural network. Referring to fig. 5, fig. 5 is a schematic diagram of each layer of RBM neural network, each layer of RBM neural network includes a visible layer and a hidden layer, and parameters of each layer of RBM neural network include a weight matrix w between the visible layer and the hidden layeri,j,wi,jRepresenting a connection weight between an ith visible node (neuron) in the visible layer and a jth hidden node (neuron) in the hidden layer; the parameters of each layer of RBM neural network further comprise an offset b-b of a visible node in the visible layer1,b2,b3,…,bi),biRepresenting an offset of the ith visible node in the visible layer; the parameters of each layer of RBM neural network further comprise the offset c ═ of hidden nodes in the hidden layer (c ═ c1,c2,c3,…,cj),cjIndicating the offset of the jth hidden node in the hidden layer.
Specifically, the step of training each layer of RBM neural network one by one is as follows:
firstly, initializing parameters of a bottom layer (first layer) RBM neural network, namely randomly generating parameter initial values of the bottom layer (first layer) RBM neural network by utilizing normal distribution, namely, the parameter initial values of the bottom layer (first layer) RBM neural network are random numbers from normal distribution (0, 1), then training the first layer RBM neural network by utilizing the dimensionality of the high-dimensional feature space as the input node number of the bottom layer (first layer) RBM neural network, and learning during training to obtain parameters of the bottom layer (first layer) RBM neural network, namely, obtaining the parameters of the bottom layer (first layer) RBM neural network by adjusting the parameter initial values of the first layer RBM neural network; after obtaining parameters of a bottom layer (first layer) RBM neural network, randomly generating parameter initial values of a second layer RBM neural network by utilizing normal distribution, then training the second layer neural network by utilizing the number of output nodes of the bottom layer (first layer) RBM neural network as the number of input nodes of the second layer RBM neural network, and obtaining parameters of the second layer RBM neural network by adjusting the parameter initial values of the second layer RBM neural network during training; and analogizing, namely training the Nth layer of RBM neural network by using the output node number of the (N-1) th layer of RBM neural network as the input node number of the Nth layer of RBM neural network, and obtaining the parameters of the Nth layer of RBM neural network by adjusting the initial values of the parameters of the Nth layer of RBM neural network during training, wherein N is more than or equal to 2, so that the parameters of each layer of RBM neural network can be obtained, the training of each layer of RBM neural network is completed, and the trained RBM neural networks are stacked to generate a dimensionality reducer.
Step S30, dimension reduction is carried out on the training sample through the dimension reducer, and the binary state vectors obtained through dimension reduction are clustered to establish a fraud transaction detection model; the step of performing dimensionality reduction on the training sample through the dimensionality reducer and clustering the binary state vectors obtained through dimensionality reduction may include:
step S31, mapping the training sample into a binary state vector through the dimensionality reducer;
in step S32, training samples with the same binary state vector are grouped into a same group, so that the training samples are divided into several groups.
After the dimension reducer is generated, the dimension reducer can be used to reduce the dimensions of the training samples, so that each sample of the training samples is mapped to a binary state vector. Assuming that the dimensionality reducer reduces the dimension of a training sample with an n-dimensional feature space to m-dimensions (m ≦ n), 2 can be theoretically generatedmA binary state vector. It should be noted that the purpose of reducing dimension can be achieved only when m is less than or equal to n. In practical experiments, a 2000-dimensional sample is mapped to a 35-dimensional binary state vector: (11101111001011111110111111111111111), when n is 2000, m is 35 after dimensionality reduction by dimensionality reducer. In this process, there is no need to define the similarity measure in advanceAccording to the method, the similarity of the high-dimensional data samples does not need to be determined manually through a large number of experiments, the difficulty is reduced, and the cost is reduced.
And then, clustering the binary state vectors obtained through dimensionality reduction, namely grouping training samples with the same binary state vectors into the same group, wherein the training samples are divided into a plurality of groups (defined as G groups), so that a fraud transaction detection model with dimensionality reduction and clustering functions can be established. In addition, in practical experiments, the number G of groups obtained finally is far lower than 2mThis also shows that the RBM has strong feature extraction capability and noise processing capability, and has high tolerance to the sample.
In the embodiment, the training sample is an unsupervised high-dimensional data sample without a data label, and can effectively represent data characteristics, the sample is used for training a pre-constructed RBM neural network structure, and the RBM has strong characteristic extraction capability and noise processing capability, so that the accuracy of the dimensionality reducer is greatly improved, and the accuracy of a fraud transaction detection model is also improved.
And step S40, acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model to identify fraud transactions.
In this embodiment, when the transaction data to be detected is received, the transaction data to be detected can be substituted into the fraud transaction detection model, dimension reduction and clustering are sequentially performed, each transaction group to be detected can be obtained, and then the transaction fraud risk of each group is identified, so that fraud transactions are identified.
The method comprises the steps of constructing a stacked RBM neural network structure, training the RBM neural network structure based on unsupervised high-dimensional data samples, and generating a dimensionality reducer; and then, dimensionality reduction is carried out on unsupervised high-dimensional data samples through the generated dimensionality reducer, and binary state vectors obtained through dimensionality reduction are clustered, a similarity measurement method does not need to be defined in advance in the process, difficulty and cost are reduced, tolerance of sample data is high, so that a fraud transaction detection model is established for analyzing transaction data to be detected, dimensionality reduction and clustering of the transaction data to be detected based on the fraud transaction detection model are achieved, each transaction group to be detected with vivid characteristics can be obtained, fraud risk identification is carried out on each transaction group to be detected, historical fraud types and unknown fraud types of fraud transactions can be effectively identified, and accuracy of fraud transaction identification is improved.
Further, referring to fig. 6, a second embodiment of the deep learning based fraud transaction identification method according to the present invention provides a deep learning based fraud transaction identification method, based on the above embodiment shown in fig. 2, the step S40 may include:
step S41, substituting the transaction data to be detected into the fraud transaction detection model to sequentially perform dimensionality reduction and clustering to obtain a transaction group to be detected;
and step S42, analyzing the fraud probability corresponding to each transaction group to be detected, and determining the transaction group to be detected which needs to be subjected to heavy review according to each analyzed fraud probability so as to identify the fraud transaction.
In the embodiment, when transaction data to be detected is received, the transaction data to be detected is substituted into a fraud transaction detection model for dimensionality reduction and clustering, because the fraud transaction detection model is established based on unsupervised high-dimensional data samples and RBMs with strong feature extraction capability and noise processing capability, transaction data to be detected can obtain transaction groups to be detected with distinct characteristics after dimensionality reduction and clustering of the fraud transaction detection model, then fraud probabilities corresponding to the transaction groups to be detected are analyzed, the transaction groups to be detected which need to be subjected to weighted auditing are determined according to the analyzed fraud probabilities, then fraud transactions are identified from the transaction groups to be detected which need to be subjected to weighted auditing, and the accuracy of fraud transaction identification can be improved.
In addition, the embodiment of the invention also provides a storage medium.
The storage medium of the invention stores a fraud transaction identification program based on deep learning, and the fraud transaction identification program based on deep learning realizes the following operations when being executed by a processor:
acquiring a training sample, wherein the training sample is transaction data used for establishing a fraud transaction detection model;
constructing a stacked RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and generating a dimensionality reducer;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
and acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model so as to identify fraud transactions.
The specific embodiment of the storage medium of the present invention is substantially the same as the embodiments of the above-mentioned fraud transaction identification method based on deep learning, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A deep learning based fraud transaction identification method, said method comprising:
obtaining a training sample, wherein the training sample is historical transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and setting the number of layers of the stacked RBM neural network structure and the number of output nodes of each layer of RBM neural network to generate a dimensionality reducer, wherein the numerical value of the output nodes is calculated through a probability statistical rule, and neurons of the RBM neural network structure are internally connected without layers and are fully connected with each other;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model to identify fraud transactions;
wherein the step of training the stacked RBM neural network structure based on the training samples to generate a dimensionality reducer comprises:
determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and stacking the trained RBM neural networks to generate the dimensionality reducer.
2. The deep learning based fraud transaction identification method of claim 1, wherein the step of training each layer of RBM neural networks in the stacked RBM neural network structure one by one based on the high-dimensional feature space comprises:
and based on the high-dimensional feature space, determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one.
3. The deep learning based fraud transaction identification method of claim 2, wherein the step of determining parameters of each layer of RBM neural network by training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space comprises:
randomly generating a parameter initial value of a first layer RBM neural network by utilizing normal distribution;
training a first layer of RBM neural network by using the dimension of the high-dimensional feature space as the input node number of the first layer of RBM neural network, and obtaining the parameters of the first layer of RBM neural network by adjusting the initial parameter value of the first layer of RBM neural network during training;
after obtaining the parameters of the N-1 layer RBM neural network, randomly generating the parameter initial value of the N layer RBM neural network by utilizing normal distribution;
and training the Nth layer of neural network by using the number of output nodes of the (N-1) th layer of RBM neural network as the number of input nodes of the Nth layer of RBM neural network, and obtaining parameters of the Nth layer of RBM neural network by adjusting the initial values of the parameters of the Nth layer of RBM neural network during training so as to obtain the parameters of each layer of RBM neural network, wherein N is more than or equal to 2.
4. The deep learning based fraud transaction identification method of claim 3, wherein each layer of RBM neural network comprises a visible layer and a hidden layer, and the parameters of each layer of RBM neural network comprise a weight matrix between the visible layer and the hidden layer, an offset of a visible node in the visible layer and an offset of a hidden node in the hidden layer.
5. The method for identifying fraud transactions according to claim 1, wherein the step of performing dimensionality reduction on the training samples by the dimensionality reducer and clustering the binary state vectors obtained through dimensionality reduction comprises:
mapping the training samples into binary state vectors through the dimensionality reducer;
training samples with the same binary state vector are grouped into the same group so that the training samples are divided into groups.
6. The deep learning based fraud transaction identification method of claim 1, wherein said step of analyzing the transaction data to be detected to identify fraud transactions according to said fraud transaction detection model comprises:
substituting the transaction data to be detected into the fraud transaction detection model to sequentially perform dimensionality reduction and clustering to obtain a transaction group to be detected;
and analyzing the fraud probability corresponding to each transaction group to be detected, and determining the transaction group to be detected which needs to be subjected to heavy review according to each analyzed fraud probability so as to identify the fraud transaction.
7. A deep learning based fraud transaction identification system characterized in that said deep learning based fraud transaction identification system comprises: a memory, a processor, and a deep learning based fraud transaction identification program stored on the memory and executable on the processor, the deep learning based fraud transaction identification program when executed by the processor implementing the steps of:
obtaining a training sample, wherein the training sample is historical transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and setting the number of layers of the stacked RBM neural network structure and the number of output nodes of each layer of RBM neural network to generate a dimensionality reducer, wherein the numerical value of the output nodes is calculated through a probability statistical rule, and neurons of the RBM neural network structure are internally connected without layers and are fully connected with each other;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model to identify fraud transactions;
wherein the step of training the stacked RBM neural network structure based on the training samples to generate a dimensionality reducer comprises:
determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and stacking the trained RBM neural networks to generate the dimensionality reducer.
8. A storage medium having stored thereon a deep learning based fraud transaction identification program, said deep learning based fraud transaction identification program when executed by a processor implementing the steps of:
obtaining a training sample, wherein the training sample is historical transaction data used for establishing a fraud transaction detection model;
constructing a stacked restricted Boltzmann machine RBM neural network structure, training the stacked RBM neural network structure based on the training sample, and setting the number of layers of the stacked RBM neural network structure and the number of output nodes of each layer of RBM neural network to generate a dimensionality reducer, wherein the numerical value of the output nodes is calculated through a probability statistical rule, and neurons of the RBM neural network structure are internally connected without layers and are fully connected with each other;
reducing the dimension of the training sample through the dimension reducer, and clustering the binary state vectors obtained through dimension reduction to establish a fraud transaction detection model;
acquiring transaction data to be detected, and analyzing the transaction data to be detected according to the fraud transaction detection model to identify fraud transactions;
wherein the step of training the stacked RBM neural network structure based on the training samples to generate a dimensionality reducer comprises:
determining the characteristics of the training sample, constructing a high-dimensional characteristic vector according to the characteristics, and forming a high-dimensional characteristic space by the high-dimensional characteristic vector;
training each layer of RBM neural network in the stacked RBM neural network structure one by one based on the high-dimensional feature space;
and stacking the trained RBM neural networks to generate the dimensionality reducer.
CN201810407275.5A 2018-04-28 2018-04-28 Fraud transaction identification method, system and storage medium based on deep learning Active CN108596630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810407275.5A CN108596630B (en) 2018-04-28 2018-04-28 Fraud transaction identification method, system and storage medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810407275.5A CN108596630B (en) 2018-04-28 2018-04-28 Fraud transaction identification method, system and storage medium based on deep learning

Publications (2)

Publication Number Publication Date
CN108596630A CN108596630A (en) 2018-09-28
CN108596630B true CN108596630B (en) 2022-03-01

Family

ID=63619471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810407275.5A Active CN108596630B (en) 2018-04-28 2018-04-28 Fraud transaction identification method, system and storage medium based on deep learning

Country Status (1)

Country Link
CN (1) CN108596630B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544163B (en) * 2018-11-30 2021-01-29 华青融天(北京)软件股份有限公司 Risk control method, device, equipment and medium for user payment behavior
CN111445259A (en) * 2018-12-27 2020-07-24 中国移动通信集团辽宁有限公司 Method, device, equipment and medium for determining business fraud behaviors
CN111401393B (en) * 2019-01-02 2023-04-07 中国移动通信有限公司研究院 Data processing method and device, electronic equipment and storage medium
CN109784403B (en) * 2019-01-16 2022-07-05 武汉斗鱼鱼乐网络科技有限公司 Method for identifying risk equipment and related equipment
CN109858930B (en) * 2019-01-24 2023-06-09 同济大学 Online transaction fraud detection method based on association graph characterization learning
CN112330328A (en) * 2019-08-05 2021-02-05 四川大学 Credit card fraud detection method based on feature extraction
CN111275098A (en) * 2020-01-17 2020-06-12 同济大学 Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof
CN111415167B (en) * 2020-02-19 2023-05-16 同济大学 Network fraud transaction detection method and device, computer storage medium and terminal
CN113469695B (en) * 2020-03-30 2023-06-30 同济大学 Electronic fraud transaction identification method, system and device based on kernel supervision hash model
CN111507382B (en) * 2020-04-01 2023-05-05 北京互金新融科技有限公司 Sample file clustering method and device and electronic equipment
CN111340509B (en) * 2020-05-22 2020-08-21 支付宝(杭州)信息技术有限公司 False transaction identification method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894130A (en) * 2010-06-08 2010-11-24 浙江大学 Sparse dimension reduction-based spectral hash indexing method
CN106033426A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 A latent semantic min-Hash-based image retrieval method
CN106779094A (en) * 2017-01-13 2017-05-31 湖南文理学院 A kind of limitation Boltzmann machine learning method and device based on random feedback
CN107044976A (en) * 2017-05-10 2017-08-15 中国科学院合肥物质科学研究院 Heavy metal content in soil analyzing and predicting method based on LIBS Yu stack RBM depth learning technologies
CN107679859A (en) * 2017-07-18 2018-02-09 中国银联股份有限公司 A kind of Risk Identification Method and system based on Transfer Depth study
CN107688201A (en) * 2017-08-23 2018-02-13 电子科技大学 Based on RBM earthquake prestack signal clustering methods

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8873813B2 (en) * 2012-09-17 2014-10-28 Z Advanced Computing, Inc. Application of Z-webs and Z-factors to analytics, search engine, learning, recognition, natural language, and other utilities
US10162051B2 (en) * 2014-03-13 2018-12-25 Kustom Signals, Inc. USB/Wireless based traffic radar system
US9690898B2 (en) * 2015-06-25 2017-06-27 Globalfoundries Inc. Generative learning for realistic and ground rule clean hot spot synthesis
US10365639B2 (en) * 2016-01-06 2019-07-30 Kla-Tencor Corporation Feature selection and automated process window monitoring through outlier detection
CN106997474A (en) * 2016-12-29 2017-08-01 南京邮电大学 A kind of node of graph multi-tag sorting technique based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894130A (en) * 2010-06-08 2010-11-24 浙江大学 Sparse dimension reduction-based spectral hash indexing method
CN106033426A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 A latent semantic min-Hash-based image retrieval method
CN106779094A (en) * 2017-01-13 2017-05-31 湖南文理学院 A kind of limitation Boltzmann machine learning method and device based on random feedback
CN107044976A (en) * 2017-05-10 2017-08-15 中国科学院合肥物质科学研究院 Heavy metal content in soil analyzing and predicting method based on LIBS Yu stack RBM depth learning technologies
CN107679859A (en) * 2017-07-18 2018-02-09 中国银联股份有限公司 A kind of Risk Identification Method and system based on Transfer Depth study
CN107688201A (en) * 2017-08-23 2018-02-13 电子科技大学 Based on RBM earthquake prestack signal clustering methods

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An optimized dimensionality reduction model for high-dimensional data based on Restricted Boltzmann Machines;Ke Zhang 等;《The 27th Chinese Control and Decision Conference (2015 CCDC)》;20150720;2939-2944 *
基于深度学习技术的信用卡交易欺诈侦测研究;丁卫星;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170315(第03(2017)期);I138-3834 *
基于高斯伯努利受限玻尔兹曼机的过程监测研究;陈曦;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160815(第08(2016)期);I140-617 *

Also Published As

Publication number Publication date
CN108596630A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108596630B (en) Fraud transaction identification method, system and storage medium based on deep learning
CN108629593B (en) Fraud transaction identification method, system and storage medium based on deep learning
Zhou et al. Recommendation attack detection based on deep learning
Hong et al. Advances in predictive models for data mining
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
JP6971514B1 (en) Information processing equipment, information processing methods and programs
CN111415167B (en) Network fraud transaction detection method and device, computer storage medium and terminal
CN112883990A (en) Data classification method and device, computer storage medium and electronic equipment
CN112734569A (en) Stock risk prediction method and system based on user portrait and knowledge graph
CN113222668A (en) Value-added service pushing method, device, equipment and storage medium
CN111062444A (en) Credit risk prediction method, system, terminal and storage medium
CN113535964B (en) Enterprise classification model intelligent construction method, device, equipment and medium
CN115964461A (en) Network data matching method and platform based on artificial intelligence and big data analysis
CN112487284A (en) Bank customer portrait generation method, equipment, storage medium and device
CN114399367A (en) Insurance product recommendation method, device, equipment and storage medium
CN108647714A (en) Acquisition methods, terminal device and the medium of negative label weight
CN113011961A (en) Method, device and equipment for monitoring risk of company associated information and storage medium
CN111709766A (en) User behavior prediction method and device, storage medium and electronic equipment
CN116029760A (en) Message pushing method, device, computer equipment and storage medium
CN115964944A (en) Power industry industrial chain model generation method and device, storage medium and equipment
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN111861493B (en) Information processing method, information processing device, electronic equipment and storage medium
CN113569910A (en) Account type identification method and device, computer equipment and storage medium
CN113657440A (en) Rejection sample inference method and device based on user feature clustering
CN111597934A (en) System and method for processing training data for statistical applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant