CN115423542B

CN115423542B - Old belt new activity anti-fraud identification method and system

Info

Publication number: CN115423542B
Application number: CN202211381968.4A
Authority: CN
Inventors: 韩柳; 李远鑫; 郑宇晟; 黄文辉; 钟佳; 邹健娣
Original assignee: China Post Consumer Finance Co ltd
Current assignee: China Post Consumer Finance Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-24
Anticipated expiration: 2042-11-07
Also published as: CN115423542A

Abstract

The invention relates to an anti-fraud identification method and system for old and new activities, which comprises the following steps: s1: defining black, white and gray samples and judging the number of the samples, and executing according to the white samples when the ratio of the black and white samples is a first preset value; the white samples were: (1) determining a loan without overdue and white lists; (2) the repayment rate is a second preset value; (3) the examination and verification are passed; s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation; s3: carrying out similarity calculation of a time sequence on a tensor generated by a user based on a dynamic time warping model; s4: establishing graph data, taking user IDs as nodes, and establishing edges between users and between nodes; s5: and modifying a sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to a case and post, and providing a conversational design for case and post before loan.

Description

Old belt new activity anti-fraud identification method and system

Technical Field

The invention relates to the technical field of computers, in particular to an anti-fraud identification method and system for old and new activities.

Background

FinTech is continuously expanding in financial ecology, and financial institutions are rapidly migrating a large amount of services to the internet, but providing convenient services on the internet and simultaneously facing a serious challenge of new transaction fraud and security threat; the MGM fission pulling-in is one of important links of the operation of internet financial customers, the operation threshold is simple, common customers who pull out wool are attracted, black yield damage is also attracted to be serious, the operation cost is out of control, data distortion is caused, and the later-stage operation strategy is also influenced.

The traditional anti-fraud technology has the following limitations:

(1) The traditional anti-fraud means is often because the fraud behavior occurs before the loan, the preventable means can not use too strict rules like the anti-fraud in the loan, otherwise the common customer experience and the update effect are influenced;

(2) The single profit cost is not high, the traditional technical means is adopted, the data of the client behaviors needing to be processed is huge, and the traditional characteristic engineering mode is difficult to extract the relation mode characteristics (technical level) of the human and the behaviors;

(3) Fission is a business property aimed at pulling new, which does not allow adding too much authentication and setting too many activity thresholds (activity levels).

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an anti-fraud recognition method and system for old and new activities, which solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between samples in the training of a model as prior knowledge, i.e. deep social relations, node relations, operation habits and other combined forms can be mined and presented in the modes of graph structure characteristics, point characteristics and edge characteristics.

In order to achieve the purpose of the invention, the invention provides an anti-fraud identification method for old and new activities, which comprises the following steps:

s1: defining black, white and gray samples and judging the number of the samples, and executing according to a white sample rule when the ratio of the black and white samples to the white samples is a first preset value;

the white sample rule is:

(1) Determining the loan without overdue or white list;

(2) The repayment rate is a second preset value;

(3) And the examination and verification are passed;

s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation;

s3: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;

s4: establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes;

s5: and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.

Preferably, the step S1 of defining and determining the black, white and gray sample includes:

and when the number of the gray samples needs to be increased or decreased, the proportion of the gray samples is controlled through the hard label proportion returned by the gray level iteration of the online model.

Preferably, the step S2 of establishing a four-dimensional tensor for each user, and performing a re-encoding operation includes the specific steps of:

establishing a four-dimensional tensor for each user, wherein each dimension respectively represents dwell time, event _ title vector, degree vector and time stamp, and performing a recoding operation.

Preferably, the specific steps of step S3 include:

based on a dynamic time warping model, calculating the similarity of time sequences of unequal-length behavior tensors generated by users, and calculating a similarity numerical value according to the behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.

Preferably, the specific steps of respectively establishing edges between users and between nodes in step S4 are as follows:

and establishing edges according to the relation whether the users are invited or not, taking the similarity as the edges of the connection between the nodes, weighting the two edges, and then carrying out normalization processing.

Preferably, the step S5 of modifying the sampling strategy of the GraphSage model includes the specific steps of:

the sampling method of each layer of the GraphSage model is modified according to the rule that the top k with the largest weighted average of the edges is used as the sampling calculation.

Preferably, the dialogical design provided in step S5 specifically includes:

the tactical design includes asking for loan requirements and rating the activity.

Preferably, the present invention further provides an old and new activity anti-fraud recognition system, including:

a configuration and determination module: the system is used for defining black, white and gray samples and judging the number of the samples, and when the ratio of the black and white samples to the white samples is a first preset value, the execution is carried out according to a white sample rule;

the white sample rule is:

(1) Determining the loan without overdue and white lists;

(2) The repayment rate is a second preset value;

(3) And the examination and verification are passed;

a data module: the behavior data of the fission refreshing activity is cleaned, a four-dimensional tensor is established for each user, and recoding operation is performed;

a calculation module: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;

an editing and control module: the method comprises the steps of establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes; and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.

Preferably, the configuration and determination module specifically includes:

and when the ratio of the black samples to the total samples is a fourth preset value, supplementing the number of the black samples by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, controlling the ratio of the gray samples by the hard label ratio returned by the gray iteration of the online model.

Preferably, the calculation module specifically includes:

the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.

The invention has the beneficial effects that: the anti-fraud recognition method and the anti-fraud recognition system for old and new activities provided by the invention solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between samples as prior knowledge in the training of a model, i.e. deep social relations, node relations, operation habits and other combined forms can be mined out and presented in the modes of graph structural features, point features and edge features, and the method and the system are simultaneously suitable for the scenes that black products or wool individual black samples are few and gray samples (users without default at present are difficult to determine whether risks) are too many, i.e. a model with high precision can be trained by only a small amount of labeled samples.

Drawings

The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intended to be drawn to scale in actual dimensions, emphasis instead being placed upon illustrating the principles of the invention.

FIG. 1 is a schematic diagram of the general steps of the old and new activities anti-fraud recognition method and system according to the embodiment of the present invention;

FIG. 2 is a schematic view of the flowchart design of the method and system for anti-fraud identification of old and new activities according to the embodiment of the present invention;

FIG. 3 is a business flow diagram of a fission activity provided by the present invention.

Detailed Description

The present invention will be better understood and implemented by those skilled in the art by the following detailed description of the embodiments taken in conjunction with the accompanying drawings, which are not intended to limit the scope of the present invention.

Referring to fig. 1-3, an embodiment of the present invention provides a method and a system for anti-fraud identification of old and new activities, including the following steps:

s1: defining black and white gray samples (black samples are poor in credibility or lower than a preset score (the preset score is 60-70 points if the total credibility score is 100 points), white samples are good in credibility or higher than the preset score, gray samples are customers located between the black samples and the white samples), judging the number of the samples, and executing according to a white sample rule when the ratio of the black samples to the white samples is a first preset value (the ratio of the black samples to the white samples is not more than 1;

the white sample rule is (three rules are executed from high to low in priority, and stop until the ratio of black and white samples does not exceed 1:

(1) Determining the loan without overdue or white list;

(2) The rewarding rate is a second preset value (> = 2);

(3) And the examination and verification are passed;

s3: carrying out similarity calculation of time series on unequal length behavior tensors generated by users based on a dynamic time warping model (the reason for adopting the method is that the method is suitable for time series with different lengths and different rhythms, and finally a similarity numerical value is calculated for behavior similarity between each user and other users, and the subsequent similarity numerical value is used as an edge type during composition and is weighted at the same time);

s5: modifying a sampling strategy of the GraphSage model, training the model, performing online iteration when the trained inference model is expected to reach a third preset value (70%) of a recall rate (aiming at enhancing data for subsequent training by actually scheduling suspected cheating customers after online), wherein the process design needs to take account of the scheduling throughput of each company, for example, the suspected cases cannot exceed hundreds of cases every day, and pushing the cases to schedule posts, and meanwhile, providing a different speech design for pre-loan schedule posts compared with the traditional credit schedule posts.

The invention has the beneficial effects that: the method solves the problem that most of the traditional cheating methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between the samples as prior knowledge in the training of the model, i.e. deep social relations, node relations, operation habits and other combined forms can be mined out and presented in the mode of graph structure characteristics, point characteristics and edge characteristics, and meanwhile, the method is suitable for the scenes that black products or wool individuals have few black samples and gray samples (at present, users without default can hardly determine whether to be risky) are too many, i.e. a model with high precision can be trained by only needing a small amount of labeled samples.

Referring to fig. 1-2, in a preferred embodiment, the step S1 of defining and determining the black-white-gray sample includes the following specific steps:

when the proportion of the black samples and the total samples is a fourth preset value (less than 0.2%), the number of the black samples is supplemented by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, the proportion of the gray samples is controlled by the proportion of the hard label returned by the gray iteration of the online model, so that the model convergence cannot be influenced due to overlarge noise data.

Wherein black sample is the customer that the reputation is relatively poor, and black sample still includes the sample of surveying through artifical case simultaneously, and in financial industry, often the quantity of black sample can not be too much, consequently when the proportion of black sample and total sample is less than 0.2%, then need do black sample data analysis and fit out in certain time other effective characteristics that have the correlation, promptly: here for fission pull-up activities, in combination with Vintage's curve, the number of black samples can be supplemented with the conversion rate for a certain sliding time window; the ash sample is: if the proportion of the gray samples is too small, other data in credit is added to be used as background data supplement, and the time window may be exceeded.

Referring to fig. 1-2, in a preferred embodiment, the step S2 of establishing a four-dimensional tensor for each user, and the specific step of performing a re-encoding operation includes:

establishing a four-dimensional tensor for each user, wherein each dimension respectively represents retention time, an event _ title vector, a time vector and a timestamp, and performing recoding operation (the characteristics of season, morning, noon and evening and the like of the timestamp are not retained, and then, according to the characteristics of data ecology and customer groups of each company, node labels and weights thereof on a graph are added to embody the finetune stage of the graph model, and the method steps only retain the most basic and generalized method).

Referring to fig. 1-2, in a further preferred embodiment, the specific steps of step S3 include:

the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: when the sequences Q and C are matched from (0, 0), every time one point is reached, the distances calculated by all the previous points are accumulated, and after the end point (n, m) is reached, the accumulated distance (cumulative distances) is the total distance, namely the similarity of the sequences Q and C.

The cumulative distance γ (i, j) can be expressed as follows:

the accumulated distance gamma (i, j) is the current grid point distance d (q) _i ，c _j ) I.e. point q _i And c _j And the cumulative distance of the smallest neighboring element that can reach the point, the formula for which is calculated as follows:

referring to fig. 1-2, in a further preferred embodiment, the specific steps of respectively establishing edges between users and between nodes in step S4 are as follows:

Referring to fig. 1-2, in a preferred embodiment, the step S5 of modifying the sampling policy of the GraphSage model (mainly modifying the sampling rule) includes the specific steps of:

the sampling method of each layer of the GraphSage model is modified according to the rule that the top k with the maximum weighted average of the edges is used as the sampling calculation (the attention layer is not added, but the strategy is adopted).

Referring to fig. 1-2, in a further preferred embodiment, the dialogical design provided in step S5 specifically includes:

Referring to fig. 1-3, the method and system for identifying new old belt activities against fraud according to the present invention first uses each system to pull through behavior and financial transaction data, then divides the activity time window, and determines whether the ratio of black and white samples exceeds 1:6 (not exceeding 1.

Referring to fig. 1-3, in a preferred embodiment, the present invention further provides an old and new activity anti-fraud recognition system, comprising:

the white sample rule is:

(1) Determining the loan without overdue or white list;

(2) The repayment rate is a second preset value;

(3) And the examination and verification are passed;

Referring to fig. 1-3, in a preferred embodiment, the configuration and determination module specifically includes:

Referring to fig. 1-3, in a preferred embodiment, the computing module specifically includes:

The anti-fraud identification method and the anti-fraud identification system for the old and new activities provided by the invention also have the following characteristics:

1. based on the rules of the profit rule pattern of the financial credit fission activity, outliers can be identified, the rules having: the fission pull-in activity has short release period and high frequency, and is easy to have activity loopholes, for example, the reward is regained through a logout number, the task is released on a task platform by using a reward rule, and a crowd-sourcing-like mode can summon a large number of people who are simply profitable.

2. Based on the law of the behavior pattern of the guest group of the finance credit fission activity, the abnormal point can be identified, and the law has: the crowd's action pattern of black product gathering is regular, and the operation is skilled, has fixed operation mode like having trained, and the time is concentrated, and platform dwell time can be observed, obtains rewarding the clear route, mostly does not have follow-up business action after getting the benefit.

The invention has the beneficial effects that: the invention provides an old and new activity anti-fraud recognition method and system, which solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, a graph neural network can embody the associated information between samples as priori knowledge in the training of a model, namely, deep social relations, node relations, operation habits and other combined forms can be mined and presented in the modes of graph structural features, point features and edge features, and meanwhile, the graph neural network model GraphSage model is based on and is suitable for the scenes that black products or wool individual black samples are few and grey samples (a user without default at present is difficult to determine whether risks) are too many, namely, a model with high precision can be trained by only needing a small number of samples with labels.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An anti-fraud identification method for old and new activities is characterized by comprising the following steps:

the white sample rule is:

(1) Determining the loan without overdue or white list;

(2) The repayment rate is a second preset value;

(3) And the examination and verification are passed;

s4: establishing graph data, taking user ID as a node, establishing edges between users through the relation of whether the users are invited or not, taking the similarity of time series as the edges of connection between the nodes, weighting the two edges, and then carrying out normalization processing;

2. The method for anti-fraud recognition of old and new activities according to claim 1, wherein the step S1 of defining and determining black, white and gray samples comprises the following specific steps:

3. The method for anti-fraud recognition of old new activities according to claim 1, wherein the step S2 of establishing a four-dimensional tensor for each user and performing the recoding operation comprises the following specific steps:

4. The method for anti-fraud recognition of old and new activities according to claim 1, characterized in that the specific steps of step S3 include:

5. The method for recognizing anti-fraud of old and new activities according to claim 1, wherein the step S5 of modifying the sampling strategy of the GraphSage model comprises the steps of:

6. The method for anti-fraud recognition of old and new activities according to claim 1, wherein the dialogical design provided in step S5 specifically includes:

7. An old and new activity anti-fraud identification system, comprising:

the white sample rule is:

(1) Determining the loan without overdue or white list;

(2) The repayment rate is a second preset value;

(3) And the examination and verification are passed;

an editing and control module: the graph data is established, the user ID is used as a node, edges between users are established according to the invited or not relationship between the users, the similarity of the time sequence is used as the edges of the connection between the nodes, and the two edges are weighted and then are normalized; and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.

8. The system according to claim 7, wherein the configuration and determination module specifically comprises:

9. The system according to claim 7, wherein the computing module specifically comprises: