CN111209299A - Real-time judgment method for anti-fraud of finance - Google Patents

Real-time judgment method for anti-fraud of finance Download PDF

Info

Publication number
CN111209299A
CN111209299A CN202010312133.8A CN202010312133A CN111209299A CN 111209299 A CN111209299 A CN 111209299A CN 202010312133 A CN202010312133 A CN 202010312133A CN 111209299 A CN111209299 A CN 111209299A
Authority
CN
China
Prior art keywords
data
time
real
fraud
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010312133.8A
Other languages
Chinese (zh)
Inventor
谭巍
陈思成
李烨
陈卫
张奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN202010312133.8A priority Critical patent/CN111209299A/en
Publication of CN111209299A publication Critical patent/CN111209299A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Abstract

The invention relates to a real-time judgment method for financial anti-fraud, which comprises the following steps: A. heterogeneous data processing: acquiring externally generated real-time data from the message queue cluster, converting the data into the same data structure, and associating the data according to the user ID; B. and (3) regular cleaning: classifying and aggregating the obtained associated data in sequence according to the cascade relation of each rule, and flowing the data into a data temporary storage structure for storage; C. marking time: marking time for each piece of data in a data temporary storage structure; D. and (3) real-time judgment: calculating according to the defined rule and the time marked by each piece of data, and informing the corresponding anti-fraud contact when the calculation result reaches the data of the trigger threshold; and discarding data of which the calculation result does not reach the trigger threshold. The invention can detect abnormal data through the cascade connection among a plurality of rules and the mutual cooperation among the rules, thereby greatly improving the detection capability of anti-fraud to the hidden data.

Description

Real-time judgment method for anti-fraud of finance
Technical Field
The invention relates to a data processing method in the financial field, in particular to a real-time judgment method for financial anti-fraud.
Background
With the accelerated advance of technologies such as big data, internet of things and AI (artificial intelligence) in the financial industry, the digitization, the mobilization and the real-time of the financial industry are accelerated, and the financial industry chain is stretched. The relevance of various aspects such as financial institution accounts, channels, data and the like is continuously enhanced, so that the management difficulty of the business continuity is continuously increased. Meanwhile, a large amount of customer behavior data and transaction data are generated, and illegal behaviors such as various transaction frauds, card stealing and the like need to be identified in time under the large amount of data so as to guarantee property safety of users. In the current financial anti-fraud identification, the characteristics of higher difficulty in risk identification and more concealed transaction fraud methods are increasingly presented.
In order to enhance the security control in financial transactions, a real-time analysis method capable of realizing multi-dimensional detection, multi-level rule detection and low time delay of data is needed. The traditional real-time analysis method basically adopts the technologies of Storm, Spark, Flink and the like. The technologies are based on time window sliding, the influence of data can be considered, one-dimensional data anomaly detection is realized, and the problems of memory overflow and fine granularity performance can occur under the conditions that the time window is too long and the sliding time interval is small; meanwhile, in rule configuration, the data flow is polled through rules, an abnormal alarm is generated after a certain rule is triggered, and then the next rule is determined in a stateless mode.
The technologies have an obvious defect that longitudinal comparison aiming at data under a large window cannot be realized, and data detection under a three-dimensional environment cannot be realized; meanwhile, mutual cooperation before the multi-level rules is not supported, for example, abnormal data is detected under the condition that the rules form a cascade rule; in addition, the traditional real-time calculation needs to build an efficient and stable real-time calculation cluster, which is a test for the initial construction cost and the later maintenance cost of a company.
Disclosure of Invention
The invention provides a real-time judgment method for financial anti-fraud, which is used for detecting abnormal data through cascade connection among a plurality of rules and mutual cooperation among the rules and improving the detection capability of anti-fraud on concealed data.
The invention discloses a real-time judgment method for financial anti-fraud, which comprises the following steps:
A. heterogeneous data processing: acquiring externally generated real-time data from the message queue cluster, converting data of different formats of a plurality of message sources into data of the same data structure, and then associating the data according to the user ID;
B. and (3) regular cleaning: extracting rule keywords from various defined rules, classifying and aggregating the associated data obtained in the step A in sequence according to the cascade relation of the rules, and then flowing the processed data into a data temporary storage structure realized based on an Hbase database cluster to be stored under the control of a timer;
C. marking time: marking the corresponding time of each piece of stored data in a data temporary storage structure;
D. and (3) real-time judgment: calculating according to the time marked by each piece of data through a defined data calculation rule and a field of a calculation time dimension, and if the calculation result of each piece of data reaches a trigger threshold value, sending the corresponding data to the corresponding anti-fraud contact; and discarding data of which the calculation result does not reach the trigger threshold.
Wherein the heterogeneous data processing is to acquire externally generated real-time data from a message queue cluster such as Kafka cluster. Then, different data formats of a plurality of external message sources, such as Json, XML, txt, and the like, need to be processed and converted into a unified protocol Buffer (PB for short) format, and then subsequent processing is performed. The protocol Buffer is a data structure with extremely high transmission performance and extremely small space occupation. And then, the data are classified and aggregated through the cascade connection of the rules, so that the range of judgment through the rules is expanded, and the abnormal data are detected through the mutual cooperation among the rules. And the data temporary storage structure is realized based on the Hbase database cluster, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized.
Furthermore, because different data are acquired with delay, in the step a, by adopting a cache technology, each item of data of the same data structure after conversion can be directly associated according to data associated with the user ID and then output to the subsequent step, data which cannot be associated is output to the cache and associated with other data which cannot be associated and is subsequently input to the cache, and if the association is performed, the data associated in the cache is removed from the cache.
Further, in step B, the associated data obtained in step a is first entered into a lockless ring buffer queue (RingBuffer), and then each associated data is taken out one by one from the lockless ring buffer queue for classification and aggregation through a cascading rule. By combining a data structure of a Protobuf Buffer with a lock-free circular Buffer queue, the pressure of a server GC (garbage collector) under large data flow is effectively reduced, and the time delay caused by the GC is reduced.
Further, the keywords in the rule in step B include a rule ID, a product number, and a dimension field.
Further, in the step B, the classified and aggregated data flows into a data structure of a HashMap, a key in the data structure of the HashMap is a combination of a rule ID and a product number, and a value is an aggregation of data with the same product number and dimension field, and then the data on the data structure of the HashMap flows into a data temporary storage structure through a set timer for storage.
Further, in the temporary data storage structure implemented based on the Hbase database cluster in step C, the time marked for each piece of saved data includes: the week time is as follows: the week of the year; the time of day: the day of the week; time-lapse: hours in 24 hours a day; time carving: the number of times in 1 hour is 15 minutes as a unit of one time; and moreover, the data in the data temporary storage structure is set with the expiration time, and the data with the expiration time is automatically cleaned by the data temporary storage structure, so that the overlarge occupation space of redundant data is avoided. The data temporary storage structure is a cluster structure formed by a plurality of servers provided with Hbase databases, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized.
Specifically, the method for calculating according to the time marked by each piece of data in step D includes: summing, differencing, averaging, variance, 1/4 quantile, and 3/4 quantile.
And further, in the step D, after the corresponding data are sent to the corresponding anti-fraud contact person, the alarm information is also visually displayed.
Specifically, the visual display uses a distributed full-text real-time search engine as a storage medium at the back end, and a front-end Web system obtains the alarm information from the distributed full-text real-time search engine for display.
The real-time judgment method for financial anti-fraud can detect abnormal data through the cascade connection of a plurality of rules and the mutual cooperation of the rules, thereby greatly improving the detection capability of anti-fraud on concealed data. In addition, the invention also overcomes the defect of realizing the longitudinal comparison of the data in the field of real-time financial anti-fraud, and can more effectively detect the risk anti-fraud through the longitudinal comparison of the data. Meanwhile, the problems of high cost and trouble in later maintenance exist in the traditional technology aiming at the problem that a real-time cluster needs to be built for real-time calculation. The invention can have the detection capability aiming at real-time financial anti-fraud at lower cost.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.
Drawings
FIG. 1 is a flow chart of a real-time financial fraud detection method according to the present invention.
Detailed Description
The real-time judgment method for financial anti-fraud of the invention as shown in fig. 1 comprises the following steps:
A. heterogeneous data processing: the method has the main function of processing, cleaning and converting the real-time heterogeneous data externally sent to a message queue cluster, wherein the message queue cluster consists of a plurality of servers provided with Kafka. Which comprises the following steps:
data conversion: the method comprises the steps of obtaining externally generated real-time data from a message queue cluster of a Kafka cluster, and converting data of multiple message sources in different formats such as Json, XML and txt into data in a unified Protobuf Buffer (PB for short) format.
Data association: and associating the data with the unified format according to the user ID. Because there is delay in data acquisition of different message sources, in step a, a cache technology is adopted, and in the step a, each item of data of the same data structure after conversion can be directly associated according to data associated with the user ID and then output to the subsequent step, the data which cannot be associated is output to the cache and associated with other data which cannot be associated and is subsequently input to the cache, and if the association is performed, the data associated in the cache is removed from the cache.
B. And (3) regular cleaning: rule keywords such as rule ID, product number, and dimension field are extracted from various defined rules, and the rules are cascaded according to the cascade relationship of the rules, where the structure of the cascaded rules is, for example: rule A-rule B-rule C. The associated data obtained from step a enters a Ring Buffer queue (Ring Buffer) without lock, and then the associated data obtained from step a is classified and aggregated in sequence by the rules after cascade connection, for example, the associated data is processed by rule a, then by rule B and rule C in cascade connection.
The classified and aggregated data flow into a data structure of a HashMap, a key (key) in the data structure of the HashMap is a combination of a rule ID and a product number, a value (value) is an aggregation of data with the same product number and dimension fields, and then the data on the data structure of the HashMap flow into a data temporary storage structure through a set timer for storage.
The range of judgment through the rules is expanded through the cascade connection of the rules, and the abnormal data are detected through the mutual cooperation of the rules. By combining a data structure of a Protobuf Buffer with a lock-free circular Buffer queue, the pressure of a server GC (garbage collector) under large data flow is effectively reduced, and the time delay caused by the GC is reduced.
And then under the control of a timer, flowing the processed data into a data temporary storage structure realized based on the Hbase database cluster for storage.
C. Marking time: the data temporary storage structure is a cluster structure formed by a plurality of servers provided with Hbase databases, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized. In the data temporary storage structure, marking the corresponding time of each stored data, including:
the week time is as follows: the week of the year;
the time of day: the day of the week;
time-lapse: hours in 24 hours a day;
time carving: the number of 1 hour was one hour in 15 minutes.
And moreover, the data in the data temporary storage structure is set with the expiration time, and the data with the expiration time is automatically cleaned by the data temporary storage structure, so that the overlarge occupation space of redundant data is avoided.
D. And (3) real-time judgment: performing a calculation based on the time of each data marker by using defined data calculation rules and fields for calculating a time dimension, the calculation logic comprising: summing, differencing, averaging, variance, 1/4 quantile, 3/4 quantile, and the like. The calculated times include: the same time of day, the same time of week, etc.
And if the calculation result of each piece of data reaches the trigger threshold, sending the corresponding data to the corresponding anti-fraud contact person in real time through a short message or a telephone, storing the alarm information into a distributed full-text real-time Search engine (Elastic Search) at the rear end, and acquiring the alarm information from the distributed full-text real-time Search engine through a Web system at the front end for visual display.
And discarding the data of which the calculation result does not reach the trigger threshold.

Claims (9)

1. The real-time judgment method for the financial anti-fraud is characterized by comprising the following steps:
A. heterogeneous data processing: acquiring externally generated real-time data from the message queue cluster, converting data of different formats of a plurality of message sources into data of the same data structure, and then associating the data according to the user ID;
B. and (3) regular cleaning: extracting rule keywords from various defined rules, classifying and aggregating the associated data obtained in the step A in sequence according to the cascade relation of the rules, and then flowing the processed data into a data temporary storage structure realized based on an Hbase database cluster to be stored under the control of a timer;
C. marking time: marking the corresponding time of each piece of stored data in a data temporary storage structure;
D. and (3) real-time judgment: calculating according to the time marked by each piece of data through a defined data calculation rule and a field of a calculation time dimension, and if the calculation result of each piece of data reaches a trigger threshold value, sending the corresponding data to the corresponding anti-fraud contact; and discarding data of which the calculation result does not reach the trigger threshold.
2. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step A, the converted data with the same data structure can be directly associated according to the data associated with the user ID and then output to the subsequent step, the data which cannot be associated is output to the cache and associated with other data which cannot be associated and subsequently input to the cache, and if the data which cannot be associated is associated, the data associated in the cache is removed from the cache.
3. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step B, the associated data obtained from the step A are firstly entered into a lock-free circular buffer queue, and then each associated data is taken out one by one from the lock-free circular buffer queue to be classified and aggregated through a cascading rule.
4. The real-time financial anti-fraud determination method of claim 1, characterized by: the keywords in the rule in step B include a rule ID, a product number and a dimension field.
5. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step B, the classified and aggregated data flow into a data structure of the HashMap, keys in the data structure of the HashMap are the combination of the rule ID and the product number, the value is the aggregation of the data with the same product number and the same dimension field, and then the data in the data structure of the HashMap flow into a data temporary storage structure through a set timer for storage.
6. The real-time financial anti-fraud determination method of claim 1, characterized by: in the temporary data storage structure implemented based on the Hbase database cluster in step C, the time marked for each piece of stored data includes: the week time is as follows: the week of the year; the time of day: the day of the week; time-lapse: hours in 24 hours a day; time carving: the number of times in 1 hour is 15 minutes as a unit of one time; and setting expiration time for data in the data temporary storage structure, and automatically cleaning the data with the expired time by the data temporary storage structure.
7. The real-time financial anti-fraud determination method of claim 1, characterized by: the calculation method according to the time marked by each piece of data in step D includes: summing, differencing, averaging, variance, 1/4 quantile, and 3/4 quantile.
8. The real-time financial anti-fraud determination method of claim 1, characterized by: and D, after the corresponding data are sent to the corresponding anti-fraud contact persons, the alarm information is also visually displayed.
9. The method for real-time determination of financial anti-fraud according to claim 8, characterized by: the visual display takes a distributed full-text real-time search engine as a storage medium at the rear end, and a Web system at the front end acquires alarm information from the distributed full-text real-time search engine for display.
CN202010312133.8A 2020-04-20 2020-04-20 Real-time judgment method for anti-fraud of finance Pending CN111209299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010312133.8A CN111209299A (en) 2020-04-20 2020-04-20 Real-time judgment method for anti-fraud of finance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010312133.8A CN111209299A (en) 2020-04-20 2020-04-20 Real-time judgment method for anti-fraud of finance

Publications (1)

Publication Number Publication Date
CN111209299A true CN111209299A (en) 2020-05-29

Family

ID=70784746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010312133.8A Pending CN111209299A (en) 2020-04-20 2020-04-20 Real-time judgment method for anti-fraud of finance

Country Status (1)

Country Link
CN (1) CN111209299A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094741A (en) * 2021-03-15 2021-07-09 北京懿医云科技有限公司 Role information processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183507A1 (en) * 2015-05-14 2016-11-17 Alibaba Group Holding Limited Stream computing system and method
CN106372185A (en) * 2016-08-31 2017-02-01 广东京奥信息科技有限公司 Data preprocessing method for heterogeneous data sources
CN107145587A (en) * 2017-05-11 2017-09-08 成都四方伟业软件股份有限公司 A kind of anti-fake system of medical insurance excavated based on big data
CN107194281A (en) * 2017-05-25 2017-09-22 成都知道创宇信息技术有限公司 A kind of anti-fake system based on block chain technology
CN110209507A (en) * 2019-05-16 2019-09-06 厦门市美亚柏科信息股份有限公司 Data processing method, device, system and storage medium based on message queue
CN110727922A (en) * 2019-10-11 2020-01-24 集奥聚合(北京)人工智能科技有限公司 Anti-fraud decision model construction method based on multi-dimensional data flow
CN110795574A (en) * 2019-11-07 2020-02-14 北京集奥聚合科技有限公司 Knowledge graph construction method based on finance anti-fraud
CN110956547A (en) * 2019-11-28 2020-04-03 广州及包子信息技术咨询服务有限公司 Search engine-based method and system for identifying cheating group in real time

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016183507A1 (en) * 2015-05-14 2016-11-17 Alibaba Group Holding Limited Stream computing system and method
CN106372185A (en) * 2016-08-31 2017-02-01 广东京奥信息科技有限公司 Data preprocessing method for heterogeneous data sources
CN107145587A (en) * 2017-05-11 2017-09-08 成都四方伟业软件股份有限公司 A kind of anti-fake system of medical insurance excavated based on big data
CN107194281A (en) * 2017-05-25 2017-09-22 成都知道创宇信息技术有限公司 A kind of anti-fake system based on block chain technology
CN110209507A (en) * 2019-05-16 2019-09-06 厦门市美亚柏科信息股份有限公司 Data processing method, device, system and storage medium based on message queue
CN110727922A (en) * 2019-10-11 2020-01-24 集奥聚合(北京)人工智能科技有限公司 Anti-fraud decision model construction method based on multi-dimensional data flow
CN110795574A (en) * 2019-11-07 2020-02-14 北京集奥聚合科技有限公司 Knowledge graph construction method based on finance anti-fraud
CN110956547A (en) * 2019-11-28 2020-04-03 广州及包子信息技术咨询服务有限公司 Search engine-based method and system for identifying cheating group in real time

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094741A (en) * 2021-03-15 2021-07-09 北京懿医云科技有限公司 Role information processing method and device

Similar Documents

Publication Publication Date Title
CN108537544B (en) Real-time monitoring method and monitoring system for transaction system
Fox et al. Modeling e-mail networks and inferring leadership using self-exciting point processes
Chen et al. Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs
CN103795612B (en) Rubbish and illegal information detecting method in instant messaging
CN103116605B (en) A kind of microblog hot event real-time detection method based on monitoring subnet and system
Toledano et al. Real-time anomaly detection system for time series at scale
CN101296128A (en) Method for monitoring abnormal state of internet information
CN104142986B (en) A kind of big data Study on Trend method for early warning and system based on cluster
US20160080476A1 (en) Meme discovery system
CN112506743A (en) Log monitoring method and device and server
Singh et al. Analyzing the sentiment of crowd for improving the emergency response services
Al-Janabi A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms
CN109359234B (en) Multi-dimensional network security event grading device
CN112328425A (en) Anomaly detection method and system based on machine learning
US11636157B2 (en) Data trend analysis based on real-time data aggregation
CN112559771A (en) Intelligent capital transaction monitoring method and system based on knowledge graph
Alkhamees et al. Event detection from social network streams using frequent pattern mining with dynamic support values
CN104951553A (en) Content collecting and data mining platform accurate in data processing and implementation method thereof
Woo et al. An event-driven SIR model for topic diffusion in web forums
CN111209299A (en) Real-time judgment method for anti-fraud of finance
CN111191720A (en) Service scene identification method and device and electronic equipment
Sun et al. Detecting Crime Types Using Classification Algorithms.
Girish et al. Extreme event detection and management using twitter data analysis
Wan et al. Link-based event detection in email communication networks
CN114090850A (en) Log classification method, electronic device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200529

RJ01 Rejection of invention patent application after publication