CN112270548A - Credit card fraud detection method based on deep learning - Google Patents

Credit card fraud detection method based on deep learning Download PDF

Info

Publication number
CN112270548A
CN112270548A CN202011283215.0A CN202011283215A CN112270548A CN 112270548 A CN112270548 A CN 112270548A CN 202011283215 A CN202011283215 A CN 202011283215A CN 112270548 A CN112270548 A CN 112270548A
Authority
CN
China
Prior art keywords
data
neural network
model
deep learning
neurons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011283215.0A
Other languages
Chinese (zh)
Other versions
CN112270548B (en
Inventor
程光权
黄亭飞
黄魁华
杜航
成清
胡星辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202011283215.0A priority Critical patent/CN112270548B/en
Publication of CN112270548A publication Critical patent/CN112270548A/en
Application granted granted Critical
Publication of CN112270548B publication Critical patent/CN112270548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a credit card fraud detection method based on deep learning, which comprises the following steps: obtaining an original data set of a credit card fraudulent transaction, and preprocessing the data set; dividing data into two parts, wherein one part is used as a training set, and the other part is used as a test set; inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal; and inputting the test set into the trained model to obtain a classification label. The method provides a feature engineering framework comprising two neural networks to generate feature variables for a fraud detection model, then fuses the extracted features with raw data, and inputs the fused features into a classifier to obtain good fraud detection performance.

Description

Credit card fraud detection method based on deep learning
Technical Field
The invention belongs to the technical field of credit card fraud detection, and particularly relates to a credit card fraud detection method based on Deep learning (Deep learning).
Background
In recent years, the number of credit card transactions has increased dramatically with the spread of mobile payment. With the corresponding problem that with the large-scale use of credit cards, the problem of credit card fraud is becoming more and more prevalent and significant losses occur.
Some machine learning has been applied to the relevant fraud detection problem and achieved excellent performance. It should be noted that these methods all belong to supervised learning and are statistical models with shallow structures. The shallow layer here refers to a model that contains only one layer of non-linear variation. The function of this structure is to map the input data from the original space to the feature space for feature extraction. In contrast, a deep structure refers to a structure having multiple layers of non-linear variations. These structures are connected layer by layer, with the output of the previous layer serving as the input to the next layer. The deep structure may extract high-level features of the data, which is a recombination of the extracted features, while the high-level features are a high generalization of the original data properties. In recent years, the depth structure model has achieved great results in the fields of image and voice coding, image and voice recognition, information retrieval and the like, and the results are superior to those of the traditional machine learning method.
Disclosure of Invention
In view of the above, the present invention provides a credit card fraud detection method based on deep learning, which proposes a feature engineering framework including two neural networks to generate feature variables for a fraud detection model, then fuses the extracted features with raw data, and then inputs the fused features into a classifier to obtain good fraud detection performance.
The invention is realized in such a way that a credit card fraud detection method based on deep learning comprises the following steps:
step 1, obtaining an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
and 4, inputting the test set into the trained model to obtain a classification label.
Specifically, the training of the deep learning model in step 3 includes the following steps:
step 301, the original data feature data passes through a fully-connected neural network fc1, the neural network has 29 neurons, and data feature data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
Preferably, the training set is two thirds of the total number of samples, and the testing set is one third of the total number of samples.
Specifically, the data tag is 0 or 1, 0 indicates that the transaction is a normal transaction, and 1 indicates that the transaction is a fraudulent transaction.
Furthermore, the classification labels are subjected to related calculation to obtain the numerical values of corresponding indexes, wherein the indexes comprise accuracy and recall rate.
The method can detect the online transaction data after historical data training so as to identify the fraudulent transaction, extracts deep features of the original data by using a full-connection network and fuses the deep features with the original data into new features, provides a new credit card fraud detection model, can more effectively simulate transaction behaviors and obtain excellent performance on sensitivity.
Drawings
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a flowchart illustrating a deep learning method according to an embodiment of the present invention;
FIG. 3 is a graph comparing results of accuracy in examples of the present invention;
FIG. 4 is a graph comparing recall results according to an embodiment of the present invention;
FIG. 5 is a graph comparing the results of F1 scores in examples of the present invention;
FIG. 6 is a graph showing a comparison of the results of the specificity in the examples of the present invention;
FIG. 7 is a graph comparing results of accuracy in the examples of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 shows a flow chart of an embodiment of the invention, and a deep learning-based credit card fraud detection method comprises the following steps:
step 1, obtaining an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
and 4, inputting the test set into the trained model to obtain a classification label.
Specifically, as shown in fig. 2, the training of the deep learning model in step 3 includes the following steps:
step 301, the original data feature data passes through a fully-connected neural network fc1, the neural network has 29 neurons, and data feature data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
Preferably, the training set is two thirds of the total number of samples, and the testing set is one third of the total number of samples.
Specifically, the data tag is 0 or 1, 0 indicates that the transaction is a normal transaction, and 1 indicates that the transaction is a fraudulent transaction.
Furthermore, the classification labels are subjected to related calculation to obtain the numerical values of corresponding indexes, wherein the indexes comprise accuracy and recall rate.
The data set is from the well-known contest website kaggle, which contains credit card transaction data for two days of September 2012. The data contains only digital information, and for privacy reasons, most of the data attributes are data after conversion via PCA, and these attributes are 27 in total. The only unprocessed data portion is the transaction amount and transaction time, so there are 29 attribute features after data preprocessing.
The tags in the dataset are binary class tags, divided by whether or not they are fraudulent transactions. The data has 284807 cases of common transactions, 492 of which are fraudulent transactions, which are typical unbalanced classification data. The data is divided into a training set and a testing set, wherein the training set comprises 184694 transactions, and 314 fraudulent transactions account for about two thirds; the test set contained 90969 transactions and 159, approximately one-third, fraudulent transactions.
In the experiment, a plurality of independent repeated experiments are carried out, and the results are averaged. The model training has 70 iterations, and the model starts to converge at the 10 th iteration, so that the performance of the two algorithms is analyzed and compared by taking 10 iterations later.
Fig. 3 shows the accuracy of both algorithms, the solid line showing the accuracy of our proposed method and the dashed line the traditional deep learning method. It can be seen that after the model is basically converged, the accuracy of the method of the present invention is always above that of the conventional method until the iteration is finished. And the accuracy of the two algorithms always keeps a steady ascending trend, and the method is characterized in that the accuracy of the two algorithms is increased sharply from the 10 th iteration to the 30 th iteration, then is increased slowly in the 10 th iteration, and is increased gradually from the 40 th iteration to the end.
Fig. 4 shows the recall ratio of both algorithms, the solid line shows the recall ratio of our proposed method and the dashed line shows the conventional deep learning method. It can be seen that after the model has substantially converged, the recall rate of the method of the present invention is above that of the conventional method until the end of 60 iterations, and then the conventional method performs better. During 20 to 50 iterations, the accuracy of the two algorithms always keeps a steady ascending trend, the difference value of the two algorithms is large, and then the two algorithms are gradually reduced; at 60 to 70 iterations, the recall rate of the method of the present invention begins to decrease, while the recall rate of the conventional algorithm is still increasing. From the above, the recall rate convergence rate of the method is superior to that of the traditional algorithm, and the recall rate is high.
Fig. 5 shows the F1 scores for both algorithms, the solid line shows the F1 score for our proposed method, and the dashed line is the traditional deep learning method. It can be seen that after the model has substantially converged, the F1 score for the method of the present invention is above the conventional method until the end of the iteration. And the accuracy of the two algorithms always maintains a steady rising trend, wherein the difference between the two algorithms is large in about 30 to 50 iterations and then gradually decreases.
Fig. 6 shows the specificity of the two algorithms, the solid line shows the specificity of our proposed method, and the dotted line shows the traditional deep learning method. It can be seen that after the model has substantially converged, the specificity of the method of the present invention is still above that of the conventional method until the end of the iteration. And the accuracy of the two algorithms always keeps a steady ascending trend, and the accuracy of the two algorithms is sharply increased from the 10 th iteration to the 30 th iteration, then is kept steady and unchanged in the 10 th iteration, and is gradually increased from the 40 th iteration to the end.
Fig. 7 shows the accuracy of both algorithms, the solid line shows the accuracy of our proposed method and the dashed line is the traditional deep learning method. It can be seen that after the model has substantially converged, the accuracy of the method of the present invention is still above that of the conventional method, and is not alleviated until 60 iterations. The reason for this is presumed to be model overfitting, which causes the accuracy of both algorithms to start to decline. During which the accuracy of both algorithms always maintains a steady rising trend, growing faster before 40 iterations, and then growing slowly until the end.
TABLE 1 Algorithm comparison results
Figure BDA0002781473070000071
Table 1 shows the mean of the two algorithms, i.e. the mean of 10 independent repeated experiments from the start of the iteration to the end of the iteration. The improvement of the accuracy and the specificity is small because more negative samples exist in the data, and the number of the corrected samples is too small relative to the number of the negative samples; and the improvement of the other three measures is obvious and reaches about 2 percent.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (4)

1. A credit card fraud detection method based on deep learning is characterized by comprising the following steps:
step 1, obtaining an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
step 4, inputting the test set into the trained model to obtain a classification label;
the training of the deep learning model in the step 3 comprises the following steps:
step 301, the original data feature data passes through a fully-connected neural network fc1, the neural network has 29 neurons, and data feature data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
2. The method of claim 1, wherein the training set is two-thirds of the total number of samples and the testing set is one-third of the total number of samples.
3. The method of claim 1 or 2, wherein the data tag is 0 or 1, 0 indicating that the transaction is a normal transaction, and 1 indicating that the transaction is a fraudulent transaction.
4. The credit card fraud detection method of claim 3, wherein the classification tags are correlated to derive values for corresponding indicators, including accuracy and recall.
CN202011283215.0A 2020-11-17 2020-11-17 Credit card fraud detection method based on deep learning Active CN112270548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011283215.0A CN112270548B (en) 2020-11-17 2020-11-17 Credit card fraud detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011283215.0A CN112270548B (en) 2020-11-17 2020-11-17 Credit card fraud detection method based on deep learning

Publications (2)

Publication Number Publication Date
CN112270548A true CN112270548A (en) 2021-01-26
CN112270548B CN112270548B (en) 2022-09-20

Family

ID=74339064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011283215.0A Active CN112270548B (en) 2020-11-17 2020-11-17 Credit card fraud detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN112270548B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202022107234U1 (en) 2022-12-23 2023-02-13 Jalawi Sulaiman Alshudukhi Online banking fraud detection system using blockchain and artificial intelligence through backlogging

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188212A1 (en) * 2016-07-27 2019-06-20 Anomalee Inc. Prioritized detection and classification of clusters of anomalous samples on high-dimensional continuous and mixed discrete/continuous feature spaces
CN110321950A (en) * 2019-06-30 2019-10-11 哈尔滨理工大学 A kind of credit card fraud recognition methods
CN111105303A (en) * 2019-11-12 2020-05-05 同济大学 Network loan fraud detection method based on incremental network characterization learning
CN111275098A (en) * 2020-01-17 2020-06-12 同济大学 Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190188212A1 (en) * 2016-07-27 2019-06-20 Anomalee Inc. Prioritized detection and classification of clusters of anomalous samples on high-dimensional continuous and mixed discrete/continuous feature spaces
CN110321950A (en) * 2019-06-30 2019-10-11 哈尔滨理工大学 A kind of credit card fraud recognition methods
CN111105303A (en) * 2019-11-12 2020-05-05 同济大学 Network loan fraud detection method based on incremental network characterization learning
CN111275098A (en) * 2020-01-17 2020-06-12 同济大学 Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘中华: "应用多层神经网络模型的信用卡欺诈算法", 《福建电脑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202022107234U1 (en) 2022-12-23 2023-02-13 Jalawi Sulaiman Alshudukhi Online banking fraud detection system using blockchain and artificial intelligence through backlogging

Also Published As

Publication number Publication date
CN112270548B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
WO2021164382A1 (en) Method and apparatus for performing feature processing for user classification model
WO2022121145A1 (en) Ethereum phishing scam detection method and apparatus based on graph classification
CN111666350B (en) Medical text relation extraction method based on BERT model
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN110473140B (en) Image dimension reduction method of extreme learning machine based on graph embedding
CN111325248A (en) Method and system for reducing pre-loan business risk
CN110826618A (en) Personal credit risk assessment method based on random forest
Lu et al. Telecom fraud identification based on ADASYN and random forest
CN113283590B (en) Defending method for back door attack
Chen et al. CatBoost for fraud detection in financial transactions
CN114298176A (en) Method, device, medium and electronic equipment for detecting fraudulent user
CN112270548B (en) Credit card fraud detection method based on deep learning
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN116010793A (en) Classification model training method and device and category detection method
CN113569048A (en) Method and system for automatically dividing affiliated industries based on enterprise operation range
CN111737688B (en) Attack defense system based on user portrait
CN114119191A (en) Wind control method, overdue prediction method, model training method and related equipment
CN111369339A (en) Over-sampling improved svdd-based bank client transaction behavior abnormity identification method
CN112733144B (en) Intelligent malicious program detection method based on deep learning technology
Lai Default Prediction of Internet Finance Users Based on Imbalance-XGBoost
Xiao et al. Explainable fraud detection for few labeled time series data
CN113641824A (en) Text classification system and method based on deep learning
CN116843432B (en) Anti-fraud method and device based on address text information
CN114936615B (en) Small sample log information anomaly detection method based on characterization consistency correction
CN118154292A (en) Money back-flushing identification method based on BiLSTM and graph convolution neural network combination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant