CN111209299A

CN111209299A - Real-time judgment method for anti-fraud of finance

Info

Publication number: CN111209299A
Application number: CN202010312133.8A
Authority: CN
Inventors: 谭巍; 陈思成; 李烨; 陈卫; 张奎
Original assignee: Sichuan XW Bank Co Ltd
Current assignee: Sichuan XW Bank Co Ltd
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2020-05-29

Abstract

The invention relates to a real-time judgment method for financial anti-fraud, which comprises the following steps: A. heterogeneous data processing: acquiring externally generated real-time data from the message queue cluster, converting the data into the same data structure, and associating the data according to the user ID; B. and (3) regular cleaning: classifying and aggregating the obtained associated data in sequence according to the cascade relation of each rule, and flowing the data into a data temporary storage structure for storage; C. marking time: marking time for each piece of data in a data temporary storage structure; D. and (3) real-time judgment: calculating according to the defined rule and the time marked by each piece of data, and informing the corresponding anti-fraud contact when the calculation result reaches the data of the trigger threshold; and discarding data of which the calculation result does not reach the trigger threshold. The invention can detect abnormal data through the cascade connection among a plurality of rules and the mutual cooperation among the rules, thereby greatly improving the detection capability of anti-fraud to the hidden data.

Description

Real-time judgment method for anti-fraud of finance

Technical Field

The invention relates to a data processing method in the financial field, in particular to a real-time judgment method for financial anti-fraud.

Background

With the accelerated advance of technologies such as big data, internet of things and AI (artificial intelligence) in the financial industry, the digitization, the mobilization and the real-time of the financial industry are accelerated, and the financial industry chain is stretched. The relevance of various aspects such as financial institution accounts, channels, data and the like is continuously enhanced, so that the management difficulty of the business continuity is continuously increased. Meanwhile, a large amount of customer behavior data and transaction data are generated, and illegal behaviors such as various transaction frauds, card stealing and the like need to be identified in time under the large amount of data so as to guarantee property safety of users. In the current financial anti-fraud identification, the characteristics of higher difficulty in risk identification and more concealed transaction fraud methods are increasingly presented.

In order to enhance the security control in financial transactions, a real-time analysis method capable of realizing multi-dimensional detection, multi-level rule detection and low time delay of data is needed. The traditional real-time analysis method basically adopts the technologies of Storm, Spark, Flink and the like. The technologies are based on time window sliding, the influence of data can be considered, one-dimensional data anomaly detection is realized, and the problems of memory overflow and fine granularity performance can occur under the conditions that the time window is too long and the sliding time interval is small; meanwhile, in rule configuration, the data flow is polled through rules, an abnormal alarm is generated after a certain rule is triggered, and then the next rule is determined in a stateless mode.

The technologies have an obvious defect that longitudinal comparison aiming at data under a large window cannot be realized, and data detection under a three-dimensional environment cannot be realized; meanwhile, mutual cooperation before the multi-level rules is not supported, for example, abnormal data is detected under the condition that the rules form a cascade rule; in addition, the traditional real-time calculation needs to build an efficient and stable real-time calculation cluster, which is a test for the initial construction cost and the later maintenance cost of a company.

Disclosure of Invention

The invention provides a real-time judgment method for financial anti-fraud, which is used for detecting abnormal data through cascade connection among a plurality of rules and mutual cooperation among the rules and improving the detection capability of anti-fraud on concealed data.

The invention discloses a real-time judgment method for financial anti-fraud, which comprises the following steps:

A. heterogeneous data processing: acquiring externally generated real-time data from the message queue cluster, converting data of different formats of a plurality of message sources into data of the same data structure, and then associating the data according to the user ID;

B. and (3) regular cleaning: extracting rule keywords from various defined rules, classifying and aggregating the associated data obtained in the step A in sequence according to the cascade relation of the rules, and then flowing the processed data into a data temporary storage structure realized based on an Hbase database cluster to be stored under the control of a timer;

C. marking time: marking the corresponding time of each piece of stored data in a data temporary storage structure;

D. and (3) real-time judgment: calculating according to the time marked by each piece of data through a defined data calculation rule and a field of a calculation time dimension, and if the calculation result of each piece of data reaches a trigger threshold value, sending the corresponding data to the corresponding anti-fraud contact; and discarding data of which the calculation result does not reach the trigger threshold.

Wherein the heterogeneous data processing is to acquire externally generated real-time data from a message queue cluster such as Kafka cluster. Then, different data formats of a plurality of external message sources, such as Json, XML, txt, and the like, need to be processed and converted into a unified protocol Buffer (PB for short) format, and then subsequent processing is performed. The protocol Buffer is a data structure with extremely high transmission performance and extremely small space occupation. And then, the data are classified and aggregated through the cascade connection of the rules, so that the range of judgment through the rules is expanded, and the abnormal data are detected through the mutual cooperation among the rules. And the data temporary storage structure is realized based on the Hbase database cluster, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized.

Furthermore, because different data are acquired with delay, in the step a, by adopting a cache technology, each item of data of the same data structure after conversion can be directly associated according to data associated with the user ID and then output to the subsequent step, data which cannot be associated is output to the cache and associated with other data which cannot be associated and is subsequently input to the cache, and if the association is performed, the data associated in the cache is removed from the cache.

Further, in step B, the associated data obtained in step a is first entered into a lockless ring buffer queue (RingBuffer), and then each associated data is taken out one by one from the lockless ring buffer queue for classification and aggregation through a cascading rule. By combining a data structure of a Protobuf Buffer with a lock-free circular Buffer queue, the pressure of a server GC (garbage collector) under large data flow is effectively reduced, and the time delay caused by the GC is reduced.

Further, the keywords in the rule in step B include a rule ID, a product number, and a dimension field.

Further, in the step B, the classified and aggregated data flows into a data structure of a HashMap, a key in the data structure of the HashMap is a combination of a rule ID and a product number, and a value is an aggregation of data with the same product number and dimension field, and then the data on the data structure of the HashMap flows into a data temporary storage structure through a set timer for storage.

Further, in the temporary data storage structure implemented based on the Hbase database cluster in step C, the time marked for each piece of saved data includes: the week time is as follows: the week of the year; the time of day: the day of the week; time-lapse: hours in 24 hours a day; time carving: the number of times in 1 hour is 15 minutes as a unit of one time; and moreover, the data in the data temporary storage structure is set with the expiration time, and the data with the expiration time is automatically cleaned by the data temporary storage structure, so that the overlarge occupation space of redundant data is avoided. The data temporary storage structure is a cluster structure formed by a plurality of servers provided with Hbase databases, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized.

Specifically, the method for calculating according to the time marked by each piece of data in step D includes: summing, differencing, averaging, variance, 1/4 quantile, and 3/4 quantile.

And further, in the step D, after the corresponding data are sent to the corresponding anti-fraud contact person, the alarm information is also visually displayed.

Specifically, the visual display uses a distributed full-text real-time search engine as a storage medium at the back end, and a front-end Web system obtains the alarm information from the distributed full-text real-time search engine for display.

The real-time judgment method for financial anti-fraud can detect abnormal data through the cascade connection of a plurality of rules and the mutual cooperation of the rules, thereby greatly improving the detection capability of anti-fraud on concealed data. In addition, the invention also overcomes the defect of realizing the longitudinal comparison of the data in the field of real-time financial anti-fraud, and can more effectively detect the risk anti-fraud through the longitudinal comparison of the data. Meanwhile, the problems of high cost and trouble in later maintenance exist in the traditional technology aiming at the problem that a real-time cluster needs to be built for real-time calculation. The invention can have the detection capability aiming at real-time financial anti-fraud at lower cost.

The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. Various substitutions and alterations according to the general knowledge and conventional practice in the art are intended to be included within the scope of the present invention without departing from the technical spirit of the present invention as described above.

Drawings

FIG. 1 is a flow chart of a real-time financial fraud detection method according to the present invention.

Detailed Description

The real-time judgment method for financial anti-fraud of the invention as shown in fig. 1 comprises the following steps:

A. heterogeneous data processing: the method has the main function of processing, cleaning and converting the real-time heterogeneous data externally sent to a message queue cluster, wherein the message queue cluster consists of a plurality of servers provided with Kafka. Which comprises the following steps:

data conversion: the method comprises the steps of obtaining externally generated real-time data from a message queue cluster of a Kafka cluster, and converting data of multiple message sources in different formats such as Json, XML and txt into data in a unified Protobuf Buffer (PB for short) format.

Data association: and associating the data with the unified format according to the user ID. Because there is delay in data acquisition of different message sources, in step a, a cache technology is adopted, and in the step a, each item of data of the same data structure after conversion can be directly associated according to data associated with the user ID and then output to the subsequent step, the data which cannot be associated is output to the cache and associated with other data which cannot be associated and is subsequently input to the cache, and if the association is performed, the data associated in the cache is removed from the cache.

B. And (3) regular cleaning: rule keywords such as rule ID, product number, and dimension field are extracted from various defined rules, and the rules are cascaded according to the cascade relationship of the rules, where the structure of the cascaded rules is, for example: rule A-rule B-rule C. The associated data obtained from step a enters a Ring Buffer queue (Ring Buffer) without lock, and then the associated data obtained from step a is classified and aggregated in sequence by the rules after cascade connection, for example, the associated data is processed by rule a, then by rule B and rule C in cascade connection.

The classified and aggregated data flow into a data structure of a HashMap, a key (key) in the data structure of the HashMap is a combination of a rule ID and a product number, a value (value) is an aggregation of data with the same product number and dimension fields, and then the data on the data structure of the HashMap flow into a data temporary storage structure through a set timer for storage.

The range of judgment through the rules is expanded through the cascade connection of the rules, and the abnormal data are detected through the mutual cooperation of the rules. By combining a data structure of a Protobuf Buffer with a lock-free circular Buffer queue, the pressure of a server GC (garbage collector) under large data flow is effectively reduced, and the time delay caused by the GC is reduced.

And then under the control of a timer, flowing the processed data into a data temporary storage structure realized based on the Hbase database cluster for storage.

C. Marking time: the data temporary storage structure is a cluster structure formed by a plurality of servers provided with Hbase databases, so that the fast writing capability and the low-delay reading characteristic of the Hbase database cluster can be fully utilized. In the data temporary storage structure, marking the corresponding time of each stored data, including:

the week time is as follows: the week of the year;

the time of day: the day of the week;

time-lapse: hours in 24 hours a day;

time carving: the number of 1 hour was one hour in 15 minutes.

And moreover, the data in the data temporary storage structure is set with the expiration time, and the data with the expiration time is automatically cleaned by the data temporary storage structure, so that the overlarge occupation space of redundant data is avoided.

D. And (3) real-time judgment: performing a calculation based on the time of each data marker by using defined data calculation rules and fields for calculating a time dimension, the calculation logic comprising: summing, differencing, averaging, variance, 1/4 quantile, 3/4 quantile, and the like. The calculated times include: the same time of day, the same time of week, etc.

And if the calculation result of each piece of data reaches the trigger threshold, sending the corresponding data to the corresponding anti-fraud contact person in real time through a short message or a telephone, storing the alarm information into a distributed full-text real-time Search engine (Elastic Search) at the rear end, and acquiring the alarm information from the distributed full-text real-time Search engine through a Web system at the front end for visual display.

And discarding the data of which the calculation result does not reach the trigger threshold.

Claims

1. The real-time judgment method for the financial anti-fraud is characterized by comprising the following steps:

2. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step A, the converted data with the same data structure can be directly associated according to the data associated with the user ID and then output to the subsequent step, the data which cannot be associated is output to the cache and associated with other data which cannot be associated and subsequently input to the cache, and if the data which cannot be associated is associated, the data associated in the cache is removed from the cache.

3. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step B, the associated data obtained from the step A are firstly entered into a lock-free circular buffer queue, and then each associated data is taken out one by one from the lock-free circular buffer queue to be classified and aggregated through a cascading rule.

4. The real-time financial anti-fraud determination method of claim 1, characterized by: the keywords in the rule in step B include a rule ID, a product number and a dimension field.

5. The real-time financial anti-fraud determination method of claim 1, characterized by: in the step B, the classified and aggregated data flow into a data structure of the HashMap, keys in the data structure of the HashMap are the combination of the rule ID and the product number, the value is the aggregation of the data with the same product number and the same dimension field, and then the data in the data structure of the HashMap flow into a data temporary storage structure through a set timer for storage.

6. The real-time financial anti-fraud determination method of claim 1, characterized by: in the temporary data storage structure implemented based on the Hbase database cluster in step C, the time marked for each piece of stored data includes: the week time is as follows: the week of the year; the time of day: the day of the week; time-lapse: hours in 24 hours a day; time carving: the number of times in 1 hour is 15 minutes as a unit of one time; and setting expiration time for data in the data temporary storage structure, and automatically cleaning the data with the expired time by the data temporary storage structure.

7. The real-time financial anti-fraud determination method of claim 1, characterized by: the calculation method according to the time marked by each piece of data in step D includes: summing, differencing, averaging, variance, 1/4 quantile, and 3/4 quantile.

8. The real-time financial anti-fraud determination method of claim 1, characterized by: and D, after the corresponding data are sent to the corresponding anti-fraud contact persons, the alarm information is also visually displayed.

9. The method for real-time determination of financial anti-fraud according to claim 8, characterized by: the visual display takes a distributed full-text real-time search engine as a storage medium at the rear end, and a Web system at the front end acquires alarm information from the distributed full-text real-time search engine for display.