CN111369339A - Over-sampling improved svdd-based bank client transaction behavior abnormity identification method - Google Patents
Over-sampling improved svdd-based bank client transaction behavior abnormity identification method Download PDFInfo
- Publication number
- CN111369339A CN111369339A CN202010137063.7A CN202010137063A CN111369339A CN 111369339 A CN111369339 A CN 111369339A CN 202010137063 A CN202010137063 A CN 202010137063A CN 111369339 A CN111369339 A CN 111369339A
- Authority
- CN
- China
- Prior art keywords
- abnormal
- data
- behaviors
- behavior
- svdd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000005070 sampling Methods 0.000 title claims description 5
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 50
- 230000006399 behavior Effects 0.000 claims abstract description 45
- 230000002159 abnormal effect Effects 0.000 claims abstract description 26
- 238000012795 verification Methods 0.000 claims abstract description 10
- 230000005856 abnormality Effects 0.000 claims description 11
- 230000007547 defect Effects 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4016—Transaction verification involving fraud or risk level assessment in transaction processing
Landscapes
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Engineering & Computer Science (AREA)
- Finance (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- Technology Law (AREA)
- Development Economics (AREA)
- Computer Security & Cryptography (AREA)
- Complex Calculations (AREA)
Abstract
A bank client transaction behavior abnormity identification method based on oversampling improved svdd relates to the technical field of bank wind control data processing, and comprises the following steps: s1, carrying out consistency check on the original data; s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm; s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model; s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior. The defects of the existing abnormal recognition algorithm in the bank customer transaction behaviors are overcome, so that the abnormal bank customer transaction behaviors are recognized. And finally, reporting the identified abnormal transaction behavior to a verification module for further security verification so as to achieve the purpose of better preventing the bank transaction risk.
Description
Technical Field
The invention relates to the technical field of bank wind control data processing, in particular to an improvement aspect of a data analysis method for customer transaction behavior abnormity identification in bank wind control.
Background
Wind control is one of the most important links in the banking industry, and the wind control capability and level of a bank can be effectively improved by identifying the abnormity of the transaction behaviors of customers.
The general method for identifying abnormal behaviors of clients is to construct a supervised classification model for analysis, wherein one type is abnormal and the other type is abnormal. This approach has a significant drawback: for a transaction behavior of a customer, for example, a credit card of a bank customer is stolen and swiped, it can be determined that the transaction behavior is abnormal, but the transaction behavior without abnormality can only be regarded as that no abnormality occurs temporarily, and an abnormality may occur later. Supervised models are not suitable in this case because this type of data without anomalies is not completely accurate. In this case, we can use the semi-supervised model svdd to identify abnormal transaction behaviour.
The semi-supervised model svdd needs to be more accurate under the condition of large data quantity such as the label. In the identification of abnormal transaction behaviors of customers at banks, tagged data refers to data that is determined to be at risk, such as transactions in which credit cards are swiped illegally, and this data is rare.
Disclosure of Invention
The invention aims to overcome the defects of the existing abnormal recognition algorithm in bank client transaction behaviors, and provides an over-sampling improved svdd-based bank client transaction behavior abnormal recognition method, which is an effective semi-supervised algorithm. The data with abnormal behaviors are expanded by using a smote oversampling algorithm, and whether the abnormal transaction behaviors are abnormal or not is judged by analyzing rules in the transaction behavior data of the clients, so that the abnormal transaction behaviors of the clients of the bank are identified. And finally, reporting the identified abnormal transaction behavior to a verification module for further security verification so as to achieve the purpose of better preventing the bank transaction risk.
In order to solve the technical problems provided by the invention, the technical scheme is as follows: a bank customer transaction behavior abnormity identification method based on oversampling improved svdd is characterized in that: the method comprises the following steps:
s1, giving the original data of the bank customer transaction behaviors, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the result recorded in the original data; regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
The technical scheme for further limiting the invention comprises the following steps:
the step S2 includes:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
For each xiAnd performing p times of operation to obtain the data set with abnormal behaviors after the data set is expanded by p times based on the smote oversampling algorithm.
The step S3 includes:
the data set with abnormal behavior is represented by (x, y), x represents the feature, and y represents the abnormality. Constructing a hypersphere for a dataset (x, y) with abnormal behavior, which hypersphere can be described asSo that (x)i-a)T(xi-a)≤R2+ξiWhere C is a penalty parameter, ξiIs the relaxation variable.
Converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiIs a Lagrange multiplier calculated using convex optimization αi;
Calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi;
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
The invention has the beneficial effects that: the oversampling improvement svdd used by the invention is an effective semi-supervised method when the data classes are unbalanced. From the data acquirability, the abnormal transaction behavior of the bank customer can be usually determined only, but the transaction behavior is difficult to ensure to be not abnormal, svdd is an efficient semi-supervised method, the abnormal data is only needed to be known in the method, modeling is carried out on the data, and the established model is used for analyzing the transaction behavior which is unknown whether the abnormal transaction behavior exists, so that the method is very consistent with the actual situation of the transaction behavior data of the bank customer, and an accurate result is obtained. From the viewpoint of data category balance, svdd can be more accurate under the condition of large data volume such as the label, in order to ensure the accuracy, a smote oversampling algorithm is used for expanding abnormal behavior data before modeling, and then the expanded data with larger sample volume is used for modeling, so that a more accurate result can be obtained.
Drawings
Fig. 1 is a flow chart of a bank customer transaction behavior abnormality identification method based on oversampling improved svdd in the invention.
Detailed Description
In order that the invention may be more readily understood, reference will now be made in detail to specific embodiments thereof, and the accompanying drawings will be used to illustrate the invention:
referring to fig. 1, the invention discloses a bank customer transaction behavior abnormity identification method based on oversampling improved svdd, which comprises the following steps:
and S1, giving the original data of the transaction behaviors of the bank customers, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the results recorded in the original data. Regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
The technical scheme for further limiting the invention comprises the following steps:
the step S2 includes:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
For each xiAnd performing operation p times to obtain a data set with abnormal behaviors after p times of oversampling and expansion based on smote.
The step S3 includes:
the data set with abnormal behavior is represented by (x, y), x represents the feature, and y represents the abnormality. Constructing a hypersphere for a dataset (x, y) with abnormal behavior, which hypersphere can be described asSo that (x)i-a)T(xi-a)≤R2+ξiWhere C is a penalty parameter, ξiIs the relaxation variable.
Converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiIs a Lagrange multiplier calculated using convex optimization αi;
Calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi;
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
The method uses a smote oversampling algorithm to expand the data with the abnormality, and then uses svdd to identify the expanded data with the abnormality.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (3)
1. A bank customer transaction behavior abnormity identification method based on oversampling improved svdd is characterized in that: the method comprises the following steps:
s1, giving the original data of the bank customer transaction behaviors, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the result recorded in the original data; regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
2. The method for identifying abnormal bank customer transaction behaviors based on over-sampling improved svdd as claimed in claim 1, wherein said step S2 comprises:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
For each sample xiAnd performing p times of linear interpolation operation, and generating a new sample every time to obtain the data set with abnormal behaviors after the data set is expanded by p times based on the smote oversampling algorithm.
3. The method for identifying abnormal bank customer transaction behaviors based on over-sampling improved svdd as claimed in claim 1, wherein said step S3 comprises:
the abnormal row is represented by (x, y)Is a data set, x represents a feature, and y represents an anomaly; constructing a hypersphere for the data set (x, y) with abnormal behavior, the hypersphere is described asSo that (x)i-a)T(xi-a)≤R2+ξiWhere C is a penalty parameter, ξiIs a relaxation variable;
converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiCalculating the center a and the radius R of the hyper-sphere by using a Lagrange multiplier;
calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi;
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010137063.7A CN111369339A (en) | 2020-03-02 | 2020-03-02 | Over-sampling improved svdd-based bank client transaction behavior abnormity identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010137063.7A CN111369339A (en) | 2020-03-02 | 2020-03-02 | Over-sampling improved svdd-based bank client transaction behavior abnormity identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111369339A true CN111369339A (en) | 2020-07-03 |
Family
ID=71206532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010137063.7A Pending CN111369339A (en) | 2020-03-02 | 2020-03-02 | Over-sampling improved svdd-based bank client transaction behavior abnormity identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111369339A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112306835A (en) * | 2020-11-02 | 2021-02-02 | 平安科技(深圳)有限公司 | User data monitoring and analyzing method, device, equipment and medium |
CN113191409A (en) * | 2021-04-20 | 2021-07-30 | 国网江苏省电力有限公司营销服务中心 | Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130097103A1 (en) * | 2011-10-14 | 2013-04-18 | International Business Machines Corporation | Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set |
CN104091073A (en) * | 2014-07-11 | 2014-10-08 | 中国人民解放军国防科学技术大学 | Sampling method for unbalanced transaction data of fictitious assets |
CN107563431A (en) * | 2017-08-28 | 2018-01-09 | 西南交通大学 | A kind of image abnormity detection method of combination CNN transfer learnings and SVDD |
CN108848068A (en) * | 2018-05-29 | 2018-11-20 | 上海海事大学 | Based on deepness belief network-Support Vector data description APT attack detection method |
CN109766956A (en) * | 2018-07-19 | 2019-05-17 | 西北工业大学 | Method for detecting abnormality based on express delivery big data |
CN110825545A (en) * | 2019-08-31 | 2020-02-21 | 武汉理工大学 | Cloud service platform anomaly detection method and system |
-
2020
- 2020-03-02 CN CN202010137063.7A patent/CN111369339A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130097103A1 (en) * | 2011-10-14 | 2013-04-18 | International Business Machines Corporation | Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set |
CN104091073A (en) * | 2014-07-11 | 2014-10-08 | 中国人民解放军国防科学技术大学 | Sampling method for unbalanced transaction data of fictitious assets |
CN107563431A (en) * | 2017-08-28 | 2018-01-09 | 西南交通大学 | A kind of image abnormity detection method of combination CNN transfer learnings and SVDD |
CN108848068A (en) * | 2018-05-29 | 2018-11-20 | 上海海事大学 | Based on deepness belief network-Support Vector data description APT attack detection method |
CN109766956A (en) * | 2018-07-19 | 2019-05-17 | 西北工业大学 | Method for detecting abnormality based on express delivery big data |
CN110825545A (en) * | 2019-08-31 | 2020-02-21 | 武汉理工大学 | Cloud service platform anomaly detection method and system |
Non-Patent Citations (1)
Title |
---|
张浩等: "基于数据增强和模型更新的异常流量检测技术", 《信息网络安全》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112306835A (en) * | 2020-11-02 | 2021-02-02 | 平安科技(深圳)有限公司 | User data monitoring and analyzing method, device, equipment and medium |
CN112306835B (en) * | 2020-11-02 | 2024-05-28 | 平安科技(深圳)有限公司 | User data monitoring and analyzing method, device, equipment and medium |
CN113191409A (en) * | 2021-04-20 | 2021-07-30 | 国网江苏省电力有限公司营销服务中心 | Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7102344B2 (en) | Machine learning model modeling methods and devices | |
CN111798312B (en) | Financial transaction system anomaly identification method based on isolated forest algorithm | |
CN110998608B (en) | Machine learning system for various computer applications | |
WO2018103456A1 (en) | Method and apparatus for grouping communities on the basis of feature matching network, and electronic device | |
US8543522B2 (en) | Automatic rule discovery from large-scale datasets to detect payment card fraud using classifiers | |
CN102291392B (en) | Hybrid intrusion detection method based on Bagging algorithm | |
Klerx et al. | Model-based anomaly detection for discrete event systems | |
CN117155706B (en) | Network abnormal behavior detection method and system | |
CN110084609B (en) | Transaction fraud behavior deep detection method based on characterization learning | |
CN111369339A (en) | Over-sampling improved svdd-based bank client transaction behavior abnormity identification method | |
CN116400168A (en) | Power grid fault diagnosis method and system based on depth feature clustering | |
CN114818999A (en) | Account identification method and system based on self-encoder and generation countermeasure network | |
Sun et al. | Intrusion detection system based on in-depth understandings of industrial control logic | |
CN115330368A (en) | Block chain abnormal transaction identification method and system integrating unsupervised machine learning | |
CN117853226A (en) | Anti-fraud feature variable screening method for e-commerce scene admission | |
CN113283901A (en) | Byte code-based fraud contract detection method for block chain platform | |
Ezeme et al. | An imputation-based augmented anomaly detection from large traces of operating system events | |
CN116805245A (en) | Fraud detection method and system based on graph neural network and decoupling representation learning | |
CN115907954A (en) | Account identification method and device, computer equipment and storage medium | |
CN115567224A (en) | Method for detecting abnormal transaction of block chain and related product | |
CN114792007A (en) | Code detection method, device, equipment, storage medium and computer program product | |
CN114462510A (en) | Equipment classification method and system for precise protection of Internet of things | |
CN114140246A (en) | Model training method, fraud transaction identification method, device and computer equipment | |
CN118333763B (en) | Financial transaction risk control method based on financial sequence generation technology | |
Balne et al. | Credit card fraud detection using autoencoders |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200703 |