CN111833174A - Internet financial application anti-fraud identification method based on LOF algorithm - Google Patents
Internet financial application anti-fraud identification method based on LOF algorithm Download PDFInfo
- Publication number
- CN111833174A CN111833174A CN202010493203.4A CN202010493203A CN111833174A CN 111833174 A CN111833174 A CN 111833174A CN 202010493203 A CN202010493203 A CN 202010493203A CN 111833174 A CN111833174 A CN 111833174A
- Authority
- CN
- China
- Prior art keywords
- data
- lof
- abnormal
- local
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an Internet financial application anti-fraud identification method based on an LOF algorithm, which comprises the steps of collecting data and preprocessing the data; selecting data characteristics to obtain a data set of an LOF algorithm and randomly dividing the data set into different data subsets; calculating local reachable distance, local density reachable density and local outlier LOF value of the data points; the LOF value is used to determine whether the data point is an outlier as to whether the requested action is fraudulent. By implementing the technical scheme of the invention, the running time of abnormal point detection is effectively shortened, the efficiency of abnormal value detection of the high-dimensional large data set is improved, the Internet application behaviors can be monitored in real time, the abnormal application fraud behaviors can be timely and accurately detected and found, the credit loss is reduced, and the method and the system are more suitable for the current requirements of large data wind control.
Description
Technical Field
The invention relates to the technical field of wind control in the Internet financial industry, in particular to a wind control system.
Background
Along with the development of internet finance, the types and modes of fraud behaviors such as grey products, black products and the like are more and more, according to incomplete statistics, the loss caused by fraud can reach 500 to 1000 billion every year, and the fraud risk becomes the important factor of internet finance prevention risk. Statistically, fraud belongs to outliers relative to normal behavior, and in a scatter plot of data, their attribute values are far from other data points, and significantly deviate from expected or common attribute values, and outlier detection is a common method for financial anti-fraud, and how to effectively detect fraud at a high probability becomes the main work of anti-fraud of large financial institutions.
In the prior art, there are three main methods for outlier detection: an outlier detection method based on statistics (HBOS: histogram-based outlier score), an outlier detection method based on distance (such as K nearest neighbor KNN), an outlier detection method based on clustering (such as K-means clustering K-means and DBSCAN) and the like, but the algorithms in the prior art are complex, large in computation amount, large in time complexity, low in precision and the like, and the detection efficiency for high-dimensional and large data is low. How to reduce the calculation amount and the operation time of outlier detection becomes a technical problem to be solved urgently.
The LOF algorithm (Local Outlier Factor) is an abnormal data detection method based on density, and introduces the concepts of the reachable distance and the reachable density of each data object to judge whether one data object is an Outlier or not, calculates a Local abnormal Factor LOF for each data in a data set to reflect the abnormal degree of one data, because the LOF algorithm calculates the density by the kth neighborhood of the point, only carries out mining on the Outlier of a boundary unit where the Outlier is likely to appear, but not carries out global calculation, and can accurately find the Outlier under the condition that the sample space data is not uniformly distributed, thereby effectively reducing the data volume, the calculated amount and the running time length of the Outlier to be detected, having higher detection efficiency for high-dimensional large data, and being more suitable for the current large data pneumatic control requirement.
Disclosure of Invention
In order to solve the technical problem, the invention discloses an internet financial application anti-fraud identification method based on an LOF algorithm, and the technical scheme of the invention is implemented as follows:
an Internet financial application anti-fraud identification method based on an LOF algorithm comprises the following steps: the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client; step two: data preprocessing, including abnormal value processing and normalization processing; step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets; step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p; step five: calculating the local reachable density of the object p according to the local reachable distance; step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density; step seven: and a recursion step I to a step six, wherein in the loop calculation, the obtained LOF value is compared with a set threshold psi, the object with the LOF value smaller than the threshold psi is judged as a normal point, the object is continuously removed, the object with the LOF value larger than the threshold psi is judged as an abnormal point, and the abnormal point is output.
Further, the outlier processing includes culling data of the extraneous dimension and deleting outliers in the data.
Further, the normalization process adopts a dispersion normalization method.
Further, the kth distance domain, the local reachable distance, and the local reachable density are only calculated in the data subset where the object p is located.
Further, the threshold ψ is dynamically set and adjusted depending on empirical values or actual traffic variations.
According to the technical scheme, in the anti-fraud identification of the Internet financial application based on the LOF algorithm, the outlier threshold psi is set according to experience and actual business, non-outliers with high density and outliers with high probability of outputting the outliers are continuously removed in recursive computation, the running time of outlier detection is effectively shortened, the efficiency of detecting outliers of high-dimensional large data sets is improved, the Internet application behavior can be monitored in real time, the application abnormal fraud behavior can be timely and accurately detected, and credit loss is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An Internet financial application anti-fraud identification method based on an LOF algorithm comprises the following steps: the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client; step two: data preprocessing, including abnormal value processing and normalization processing; step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets; step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p; step five: calculating the local reachable density of the object p according to the local reachable distance; step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density; step seven: and in the loop calculation, comparing the obtained LOF value with a set threshold psi, determining the object with the LOF value smaller than the threshold psi as a normal point, continuously eliminating the object with the LOF value larger than the threshold psi as an abnormal point, and outputting the abnormal point.
In the embodiment, data can be acquired through the flow acquisition equipment deployed on the network node, and the acquired data characteristics can comprehensively reflect the comprehensive conditions of the repayment capacity and the repayment willingness of the application user; the personal basic information includes traditional data such as personal and family status, work and income levels, etc.
In this embodiment, the data set of the LOF algorithm is divided into different data sets, including a training set and a verification set, in the high-dimensional data set, some data dimensions are divided into n segments, the data set is divided along a dividing point connecting line labeled by each dimension, the divided irregular section is a grid boundary, and a specific boundary value of the grid boundary needs to be determined according to the dimensions and the size of the data set and a given dividing interval n.
In this embodiment, the subdata set in which the object p is located is defined as pi(ii) a The distance d between the object p and its k-th nearest neighbork(p) then there are at least k objects oiSatisfy d (o)i,p)≤d(okP), there are at most k-1 objects ojAnd satisfies the following conditions: d (o)j,p)<d(okP); the k neighbor of the object p is represented by the distance between all the k neighbors and the object p being less than dk(p) and then averaging the distances from the object p to k neighbors, i.e., the m-distance of p, the calculation formula is:
the m-neighbors of object p represent the set of all objects whose distance from p is less than m, the reachable distance reach _ dist of object p with respect to object om(o, p) represents the maximum of the m-distance of the object p and the distance between the objects p and o, the local achievable density lrd of the object pm(p) is the inverse of the average reachable distance from a point within the Kth distance neighborhood of object p to p, then the local reachable density of p lrdm(p) the value is:
the local anomaly factor for object p is then:
in a preferred embodiment, the outlier processing includes culling data of the extraneous dimension and removing outliers in the data.
In a preferred embodiment, the normalization process uses a dispersion normalization method, and the normalization process enables data to be mapped to [0, 1 ]]In the interval, the dispersion normalization formula is:wherein x' is the normalized value, x is the data before normalization, xminIs the minimum value, x, in the featuremaxIs the maximum value in the feature;
in a preferred embodiment, the kth distance domain, the local reachable distance and the local reachable density are calculated only in the subset of data where the object p is located.
In a preferred embodiment, the threshold ψ is dynamically adjusted depending on empirical values or actual traffic variations. The threshold ψ is 1 by default in this embodiment.
It should be understood that the above-described embodiments are merely exemplary of the present invention, and are not intended to limit the present invention, and that any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (5)
1. An Internet financial application anti-fraud identification method based on an LOF algorithm is characterized by comprising the following steps:
the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client;
step two: data preprocessing, including abnormal value processing and normalization processing;
step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets;
step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p;
step five: calculating the local reachable density of the object p according to the local reachable distance;
step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density;
step seven: and a recursion step I to a step six, wherein in the loop calculation, the obtained LOF value is compared with a set threshold psi, the object with the LOF value smaller than the threshold psi is judged as a normal point, the object is continuously removed, the object with the LOF value larger than the threshold psi is judged as an abnormal point, and the abnormal point is output.
2. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the abnormal value processing includes removing data of irrelevant dimension and deleting abnormal value in data.
3. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the normalization process adopts a dispersion normalization method.
4. The method for recognizing internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the Kth distance field, the local reachable distance and the local reachable density are calculated only in the data subset where the object p is located.
5. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the threshold ψ is dynamically set and adjusted depending on empirical values or actual traffic variation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010493203.4A CN111833174A (en) | 2020-06-03 | 2020-06-03 | Internet financial application anti-fraud identification method based on LOF algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010493203.4A CN111833174A (en) | 2020-06-03 | 2020-06-03 | Internet financial application anti-fraud identification method based on LOF algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111833174A true CN111833174A (en) | 2020-10-27 |
Family
ID=72897546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010493203.4A Pending CN111833174A (en) | 2020-06-03 | 2020-06-03 | Internet financial application anti-fraud identification method based on LOF algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111833174A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326862A (en) * | 2021-01-12 | 2021-08-31 | 南京审计大学 | Audit big data fusion clustering and risk data detection method, medium and equipment |
CN117575675A (en) * | 2023-11-17 | 2024-02-20 | 中电智恒信息科技服务有限公司 | Telecommunication user loss prediction method, device, equipment and medium |
CN118228131A (en) * | 2024-05-24 | 2024-06-21 | 暨南大学 | KNN missing value filling model-oriented data poisoning detection method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160132886A1 (en) * | 2013-08-26 | 2016-05-12 | Verafin, Inc. | Fraud detection systems and methods |
CN106330624A (en) * | 2016-11-07 | 2017-01-11 | 国网江苏省电力公司南京供电公司 | Method for detecting power information network traffic abnormality |
CN109102028A (en) * | 2018-08-20 | 2018-12-28 | 南京邮电大学 | Based on improved fast density peak value cluster and LOF outlier detection algorithm |
CN109284371A (en) * | 2018-09-03 | 2019-01-29 | 平安证券股份有限公司 | Anti- fraud method, electronic device and computer readable storage medium |
CN109948724A (en) * | 2019-03-28 | 2019-06-28 | 山东浪潮云信息技术有限公司 | A kind of electric business brush single act detection method based on improvement LOF algorithm |
-
2020
- 2020-06-03 CN CN202010493203.4A patent/CN111833174A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160132886A1 (en) * | 2013-08-26 | 2016-05-12 | Verafin, Inc. | Fraud detection systems and methods |
CN106330624A (en) * | 2016-11-07 | 2017-01-11 | 国网江苏省电力公司南京供电公司 | Method for detecting power information network traffic abnormality |
CN109102028A (en) * | 2018-08-20 | 2018-12-28 | 南京邮电大学 | Based on improved fast density peak value cluster and LOF outlier detection algorithm |
CN109284371A (en) * | 2018-09-03 | 2019-01-29 | 平安证券股份有限公司 | Anti- fraud method, electronic device and computer readable storage medium |
CN109948724A (en) * | 2019-03-28 | 2019-06-28 | 山东浪潮云信息技术有限公司 | A kind of electric business brush single act detection method based on improvement LOF algorithm |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326862A (en) * | 2021-01-12 | 2021-08-31 | 南京审计大学 | Audit big data fusion clustering and risk data detection method, medium and equipment |
CN117575675A (en) * | 2023-11-17 | 2024-02-20 | 中电智恒信息科技服务有限公司 | Telecommunication user loss prediction method, device, equipment and medium |
CN118228131A (en) * | 2024-05-24 | 2024-06-21 | 暨南大学 | KNN missing value filling model-oriented data poisoning detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111833174A (en) | Internet financial application anti-fraud identification method based on LOF algorithm | |
CN109729090B (en) | Slow denial of service attack detection method based on WEDMS clustering | |
CN109962909B (en) | Network intrusion anomaly detection method based on machine learning | |
CN111798312A (en) | Financial transaction system abnormity identification method based on isolated forest algorithm | |
Dheepa et al. | Analysis of credit card fraud detection methods | |
CN111191720B (en) | Service scene identification method and device and electronic equipment | |
CN111970259B (en) | Network intrusion detection method and alarm system based on deep learning | |
CN112906738B (en) | Water quality detection and treatment method | |
CN110661802A (en) | Low-speed denial of service attack detection method based on PCA-SVM algorithm | |
CN115622806B (en) | Network intrusion detection method based on BERT-CGAN | |
CN114417971A (en) | Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering | |
CN112288561A (en) | Internet financial fraud behavior detection method based on DBSCAN algorithm | |
CN112330158A (en) | Method for identifying traffic index time sequence based on autoregressive differential moving average-convolution neural network | |
CN112185108A (en) | Urban road network congestion mode identification method, equipment and medium based on space-time characteristics | |
Li | The intrusion data mining method for distributed network based on fuzzy kernel clustering algorithm | |
CN114528909A (en) | Unsupervised anomaly detection method based on flow log feature extraction | |
CN114666075B (en) | Distributed network anomaly detection method and system based on depth feature coarse coding | |
CN117729043A (en) | Network security early warning method and system based on big data | |
CN115115369A (en) | Data processing method, device, equipment and storage medium | |
CN113419883A (en) | High-dimensional anomaly detection preprocessing method based on mutual information and feature grouping strategy | |
CN116187423A (en) | Behavior sequence anomaly detection method and system based on unsupervised algorithm | |
Prerau et al. | Unsupervised anomaly detection using an optimized K-nearest neighbors algorithm | |
CN113706279B (en) | Fraud analysis method, fraud analysis device, electronic equipment and storage medium | |
CN115834156A (en) | Abnormal behavior detection method based on web access log | |
CN115277178A (en) | Method, device and storage medium for monitoring abnormity based on enterprise network traffic |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201027 |