CN112270548B - Credit card fraud detection method based on deep learning - Google Patents
Credit card fraud detection method based on deep learning Download PDFInfo
- Publication number
- CN112270548B CN112270548B CN202011283215.0A CN202011283215A CN112270548B CN 112270548 B CN112270548 B CN 112270548B CN 202011283215 A CN202011283215 A CN 202011283215A CN 112270548 B CN112270548 B CN 112270548B
- Authority
- CN
- China
- Prior art keywords
- data
- neural network
- model
- deep learning
- neurons
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Technology Law (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Computer Security & Cryptography (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a credit card fraud detection method based on deep learning, which comprises the following steps: obtaining an original data set of a credit card fraudulent transaction, and preprocessing the data set; dividing data into two parts, wherein one part is used as a training set, and the other part is used as a test set; inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal; and inputting the test set into the trained model to obtain a classification label. The method provides a feature engineering framework comprising two neural networks to generate feature variables for a fraud detection model, then fuses the extracted features with raw data, and inputs the fused features into a classifier to obtain good fraud detection performance.
Description
Technical Field
The invention belongs to the technical field of credit card fraud detection, and particularly relates to a credit card fraud detection method based on Deep learning (Deep learning).
Background
In recent years, the number of credit card transactions has increased dramatically with the spread of mobile payment. With the corresponding problem that with the large-scale use of credit cards, the problem of credit card fraud is becoming more and more prevalent and significant losses occur.
Some machine learning has been applied to the relevant fraud detection problem and achieved excellent performance. It should be noted that these methods all belong to supervised learning and are statistical models with shallow structures. The shallow layer here refers to a model that contains only one layer of non-linear variation. The function of this structure is to map the input data from the original space to the feature space for feature extraction. In contrast, a deep structure refers to a structure having multiple layers of non-linear variations. These structures are connected layer by layer, with the output of the previous layer serving as the input to the next layer. The deep structure may extract high-level features of the data, which is a recombination of the extracted features, while the high-level features are a high generalization of the original data properties. In recent years, the depth structure model has achieved great results in the fields of image and voice coding, image and voice recognition, information retrieval and the like, and the results are superior to those of the traditional machine learning method.
Disclosure of Invention
In view of the above, the present invention provides a credit card fraud detection method based on deep learning, which proposes a feature engineering framework including two neural networks to generate feature variables for a fraud detection model, then fuses the extracted features with raw data, and then inputs the fused features into a classifier to obtain good fraud detection performance.
The invention is realized in such a way that a credit card fraud detection method based on deep learning comprises the following steps:
step 1, obtaining an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
and 4, inputting the test set into the trained model to obtain a classification label.
Specifically, the training of the deep learning model in step 3 includes the following steps:
step 301, the original data characteristic data firstly passes through a fully-connected neural network fc1, wherein the neural network comprises 29 neurons, and data characteristic data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
Preferably, the training set is two thirds of the total number of samples, and the testing set is one third of the total number of samples.
Specifically, the data tag is 0 or 1, 0 indicates that the transaction is a normal transaction, and 1 indicates that the transaction is a fraudulent transaction.
Furthermore, the classification label is subjected to related calculation to obtain the numerical value of the corresponding index, and the index comprises the accuracy and the recall rate.
The method can detect the online transaction data after historical data training so as to identify the fraudulent transaction, extracts deep features of the original data by using a full-connection network and fuses the deep features with the original data into new features, provides a new credit card fraud detection model, can more effectively simulate transaction behaviors and obtain excellent performance on sensitivity.
Drawings
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a flowchart illustrating a deep learning method according to an embodiment of the present invention;
FIG. 3 is a graph comparing results of accuracy in examples of the present invention;
FIG. 4 is a graph comparing recall results according to an embodiment of the present invention;
FIG. 5 is a graph comparing the results of F1 scores in examples of the present invention;
FIG. 6 is a graph showing a comparison of the results of the specificity in the examples of the present invention;
FIG. 7 is a graph comparing results of accuracy in the examples of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 shows a flow chart of an embodiment of the invention, and a deep learning-based credit card fraud detection method comprises the following steps:
step 1, acquiring an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
and 4, inputting the test set into the trained model to obtain a classification label.
Specifically, as shown in fig. 2, the training of the deep learning model in step 3 includes the following steps:
step 301, the original data feature data passes through a fully-connected neural network fc1, the neural network has 29 neurons, and data feature data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
Preferably, the training set is two thirds of the total number of samples, and the testing set is one third of the total number of samples.
Specifically, the data tag is 0 or 1, 0 indicates that the transaction is a normal transaction, and 1 indicates that the transaction is a fraudulent transaction.
Furthermore, the classification labels are subjected to related calculation to obtain the numerical values of corresponding indexes, wherein the indexes comprise accuracy and recall rate.
The data set is from the well-known contest website kaggle, which contains credit card transaction data for two days of September 2012. The data contains only digital information, and for privacy reasons, most of the data attributes are data after conversion via PCA, and these attributes are 27 in total. The only unprocessed data portion is the transaction amount and transaction time, so there are 29 attribute features after data preprocessing.
The tags in the dataset are binary class tags, classified according to whether fraudulent transactions are occurring. The data has 284807 cases of common transactions, 492 of which are fraudulent transactions, which are typical unbalanced classification data. The data is divided into a training set and a testing set, wherein the training set comprises 184694 transactions, and 314 fraudulent transactions account for about two thirds; the test set contained 90969 transactions and 159, approximately one-third, fraudulent transactions.
In the experiment, a plurality of independent repeated experiments are carried out, and the results are averaged. The model training has 70 iterations, and the model starts to converge at the 10 th iteration, so that the performance of the two algorithms is analyzed and compared by taking 10 iterations later.
Fig. 3 shows the accuracy of both algorithms, the solid line showing the accuracy of our proposed method and the dashed line the traditional deep learning method. It can be seen that after the model is basically converged, the accuracy of the method of the present invention is always above that of the conventional method until the iteration is finished. And the accuracy of the two algorithms always keeps a steady ascending trend, and the method is characterized in that the accuracy of the two algorithms is increased sharply from the 10 th iteration to the 30 th iteration, then is increased slowly in the 10 th iteration, and is increased gradually from the 40 th iteration to the end.
Fig. 4 shows the recall ratio of both algorithms, the solid line shows the recall ratio of our proposed method and the dashed line shows the conventional deep learning method. It can be seen that after the model has substantially converged, the recall rate of the method of the present invention is above that of the conventional method until the end of 60 iterations, and then the conventional method performs better. During 20 to 50 iterations, the accuracy of the two algorithms always keeps a steady ascending trend, the difference value of the two algorithms is large, and then the two algorithms are gradually reduced; at 60 to 70 iterations, the recall rate of the method of the present invention begins to decrease, while the recall rate of the conventional algorithm is still increasing. From the above, the recall rate convergence rate of the method is superior to that of the traditional algorithm, and the recall rate is high.
Fig. 5 shows the F1 scores for both algorithms, the solid line shows the F1 score for our proposed method, and the dashed line is the traditional deep learning method. It can be seen that after the model has substantially converged, the F1 score of the method of the present invention is above that of the conventional method until the end of the iteration. And the accuracy of the two algorithms always maintains a steady rising trend, wherein the difference between the two algorithms is large in about 30 to 50 iterations and then gradually decreases.
Fig. 6 shows the specificity of the two algorithms, the solid line shows the specificity of our proposed method, and the dotted line shows the traditional deep learning method. It can be seen that after the model has substantially converged, the specificity of the method of the present invention is still above that of the conventional method until the end of the iteration. And the accuracy of the two algorithms always keeps a steady ascending trend, and the accuracy of the two algorithms is sharply increased from the 10 th iteration to the 30 th iteration, then is kept steady and unchanged in the 10 th iteration, and is gradually increased from the 40 th iteration to the end.
Fig. 7 shows the accuracy of both algorithms, the solid line shows the accuracy of our proposed method and the dashed line is the traditional deep learning method. It can be seen that after the model has substantially converged, the accuracy of the method of the present invention is still above that of the conventional method, and is not alleviated until 60 iterations. The reason for this is presumed to be model overfitting, which causes the accuracy of both algorithms to start to decline. During which the accuracy of both algorithms always maintains a steady rising trend, growing faster before 40 iterations, and then growing slowly until the end.
TABLE 1 Algorithm comparison results
Table 1 shows the mean of the two algorithms, i.e. the mean of 10 independent repeated experiments from the start of the iteration to the end of the iteration. The improvement of the accuracy and the specificity is small because more negative samples exist in the data, and the number of the corrected samples is too small relative to the number of the negative samples; and the improvement of the other three measures is obvious and reaches about 2 percent.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Claims (4)
1. A credit card fraud detection method based on deep learning is characterized by comprising the following steps:
step 1, obtaining an original data set of credit card fraudulent transactions, and preprocessing the data set;
step 2, dividing the data into two parts, wherein one part is used as a training set, and the other part is used as a test set;
step 3, inputting the training set into a deep learning model for training, optimizing model parameters, and adjusting hyper-parameters to enable the performance of the model to be optimal;
step 4, inputting the test set into the trained model to obtain a classification label;
the training of the deep learning model in the step 3 comprises the following steps:
step 301, the original data feature data passes through a fully-connected neural network fc1, the neural network has 29 neurons, and data feature data1 is obtained;
step 302, the data characteristic data1 passes through a fully connected neural network fc2, the neural network has 116 neurons, and the data characteristic data2 is obtained;
step 303, the data characteristic data2 passes through a fully connected neural network fc3, the neural network has 99 neurons, and the data characteristic data3 is obtained;
step 304, fusing the data characteristic data3 with the original data characteristic data, and then passing through a fully-connected neural network fc4, wherein the neural network has 128 neurons, so as to obtain data characteristic data 4;
305, the data characteristic data4 passes through a fully connected neural network fc5, the neural network has 64 neurons, and data characteristic data5 is obtained;
and step 306, the data characteristic data5 passes through a fully connected neural network fc6, the neural network has 2 neurons, and a data tag is obtained.
2. The method of claim 1, wherein the training set is two-thirds of the total number of samples and the testing set is one-third of the total number of samples.
3. The method of claim 1 or 2, wherein the data tag is 0 or 1, 0 indicating that the transaction is a normal transaction, and 1 indicating that the transaction is a fraudulent transaction.
4. The credit card fraud detection method of claim 3, wherein the classification tags are correlated to derive values for corresponding indicators, including accuracy and recall.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011283215.0A CN112270548B (en) | 2020-11-17 | 2020-11-17 | Credit card fraud detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011283215.0A CN112270548B (en) | 2020-11-17 | 2020-11-17 | Credit card fraud detection method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112270548A CN112270548A (en) | 2021-01-26 |
CN112270548B true CN112270548B (en) | 2022-09-20 |
Family
ID=74339064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011283215.0A Active CN112270548B (en) | 2020-11-17 | 2020-11-17 | Credit card fraud detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112270548B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE202022107234U1 (en) | 2022-12-23 | 2023-02-13 | Jalawi Sulaiman Alshudukhi | Online banking fraud detection system using blockchain and artificial intelligence through backlogging |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321950A (en) * | 2019-06-30 | 2019-10-11 | 哈尔滨理工大学 | A kind of credit card fraud recognition methods |
CN111105303A (en) * | 2019-11-12 | 2020-05-05 | 同济大学 | Network loan fraud detection method based on incremental network characterization learning |
CN111275098A (en) * | 2020-01-17 | 2020-06-12 | 同济大学 | Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10846308B2 (en) * | 2016-07-27 | 2020-11-24 | Anomalee Inc. | Prioritized detection and classification of clusters of anomalous samples on high-dimensional continuous and mixed discrete/continuous feature spaces |
-
2020
- 2020-11-17 CN CN202011283215.0A patent/CN112270548B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321950A (en) * | 2019-06-30 | 2019-10-11 | 哈尔滨理工大学 | A kind of credit card fraud recognition methods |
CN111105303A (en) * | 2019-11-12 | 2020-05-05 | 同济大学 | Network loan fraud detection method based on incremental network characterization learning |
CN111275098A (en) * | 2020-01-17 | 2020-06-12 | 同济大学 | Encoder-LSTM deep learning model applied to credit card fraud detection and method thereof |
Non-Patent Citations (1)
Title |
---|
应用多层神经网络模型的信用卡欺诈算法;刘中华;《福建电脑》;20200425(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112270548A (en) | 2021-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109918511B (en) | BFS and LPA based knowledge graph anti-fraud feature extraction method | |
CN111798312A (en) | Financial transaction system abnormity identification method based on isolated forest algorithm | |
CN111695597B (en) | Credit fraud group identification method and system based on improved isolated forest algorithm | |
CN110826618A (en) | Personal credit risk assessment method based on random forest | |
Chen et al. | CatBoost for fraud detection in financial transactions | |
CN110473140B (en) | Image dimension reduction method of extreme learning machine based on graph embedding | |
CN112270548B (en) | Credit card fraud detection method based on deep learning | |
CN112215629B (en) | Multi-target advertisement generating system and method based on construction countermeasure sample | |
CN113887214B (en) | Willingness presumption method based on artificial intelligence and related equipment thereof | |
CN112966728B (en) | Transaction monitoring method and device | |
CN113569048A (en) | Method and system for automatically dividing affiliated industries based on enterprise operation range | |
CN111737688B (en) | Attack defense system based on user portrait | |
CN114119191A (en) | Wind control method, overdue prediction method, model training method and related equipment | |
CN116805245A (en) | Fraud detection method and system based on graph neural network and decoupling representation learning | |
CN117034110A (en) | Stem cell exosome detection method based on deep learning | |
CN111369339A (en) | Over-sampling improved svdd-based bank client transaction behavior abnormity identification method | |
CN116821688A (en) | Method for processing data set in credit card fraud transaction based on clustering downsampling technology | |
CN116468271A (en) | Enterprise risk analysis method, system and medium based on big data | |
CN114936615A (en) | Small sample log information anomaly detection method based on characterization consistency correction | |
Chen et al. | Semi-supervised convolutional neural networks with label propagation for image classification | |
CN113641824A (en) | Text classification system and method based on deep learning | |
CN112733144A (en) | Malicious program intelligent detection method based on deep learning technology | |
Sun et al. | [Retracted] Enterprise Financial Risk Analysis Based on Improved Model C‐Means Clustering Algorithm | |
CN111401783A (en) | Power system operation data integration feature selection method | |
Sharma et al. | Combatting Digital Financial Fraud through Strategic Deep Learning Approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |