CN115423542B - Old belt new activity anti-fraud identification method and system - Google Patents

Old belt new activity anti-fraud identification method and system Download PDF

Info

Publication number
CN115423542B
CN115423542B CN202211381968.4A CN202211381968A CN115423542B CN 115423542 B CN115423542 B CN 115423542B CN 202211381968 A CN202211381968 A CN 202211381968A CN 115423542 B CN115423542 B CN 115423542B
Authority
CN
China
Prior art keywords
samples
white
model
similarity
black
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211381968.4A
Other languages
Chinese (zh)
Other versions
CN115423542A (en
Inventor
韩柳
李远鑫
郑宇晟
黄文辉
钟佳
邹健娣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Consumer Finance Co ltd
Original Assignee
China Post Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Consumer Finance Co ltd filed Critical China Post Consumer Finance Co ltd
Priority to CN202211381968.4A priority Critical patent/CN115423542B/en
Publication of CN115423542A publication Critical patent/CN115423542A/en
Application granted granted Critical
Publication of CN115423542B publication Critical patent/CN115423542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention relates to an anti-fraud identification method and system for old and new activities, which comprises the following steps: s1: defining black, white and gray samples and judging the number of the samples, and executing according to the white samples when the ratio of the black and white samples is a first preset value; the white samples were: (1) determining a loan without overdue and white lists; (2) the repayment rate is a second preset value; (3) the examination and verification are passed; s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation; s3: carrying out similarity calculation of a time sequence on a tensor generated by a user based on a dynamic time warping model; s4: establishing graph data, taking user IDs as nodes, and establishing edges between users and between nodes; s5: and modifying a sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to a case and post, and providing a conversational design for case and post before loan.

Description

Old belt new activity anti-fraud identification method and system
Technical Field
The invention relates to the technical field of computers, in particular to an anti-fraud identification method and system for old and new activities.
Background
FinTech is continuously expanding in financial ecology, and financial institutions are rapidly migrating a large amount of services to the internet, but providing convenient services on the internet and simultaneously facing a serious challenge of new transaction fraud and security threat; the MGM fission pulling-in is one of important links of the operation of internet financial customers, the operation threshold is simple, common customers who pull out wool are attracted, black yield damage is also attracted to be serious, the operation cost is out of control, data distortion is caused, and the later-stage operation strategy is also influenced.
The traditional anti-fraud technology has the following limitations:
(1) The traditional anti-fraud means is often because the fraud behavior occurs before the loan, the preventable means can not use too strict rules like the anti-fraud in the loan, otherwise the common customer experience and the update effect are influenced;
(2) The single profit cost is not high, the traditional technical means is adopted, the data of the client behaviors needing to be processed is huge, and the traditional characteristic engineering mode is difficult to extract the relation mode characteristics (technical level) of the human and the behaviors;
(3) Fission is a business property aimed at pulling new, which does not allow adding too much authentication and setting too many activity thresholds (activity levels).
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an anti-fraud recognition method and system for old and new activities, which solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between samples in the training of a model as prior knowledge, i.e. deep social relations, node relations, operation habits and other combined forms can be mined and presented in the modes of graph structure characteristics, point characteristics and edge characteristics.
In order to achieve the purpose of the invention, the invention provides an anti-fraud identification method for old and new activities, which comprises the following steps:
s1: defining black, white and gray samples and judging the number of the samples, and executing according to a white sample rule when the ratio of the black and white samples to the white samples is a first preset value;
the white sample rule is:
(1) Determining the loan without overdue or white list;
(2) The repayment rate is a second preset value;
(3) And the examination and verification are passed;
s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation;
s3: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;
s4: establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes;
s5: and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.
Preferably, the step S1 of defining and determining the black, white and gray sample includes:
and when the number of the gray samples needs to be increased or decreased, the proportion of the gray samples is controlled through the hard label proportion returned by the gray level iteration of the online model.
Preferably, the step S2 of establishing a four-dimensional tensor for each user, and performing a re-encoding operation includes the specific steps of:
establishing a four-dimensional tensor for each user, wherein each dimension respectively represents dwell time, event _ title vector, degree vector and time stamp, and performing a recoding operation.
Preferably, the specific steps of step S3 include:
based on a dynamic time warping model, calculating the similarity of time sequences of unequal-length behavior tensors generated by users, and calculating a similarity numerical value according to the behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.
Preferably, the specific steps of respectively establishing edges between users and between nodes in step S4 are as follows:
and establishing edges according to the relation whether the users are invited or not, taking the similarity as the edges of the connection between the nodes, weighting the two edges, and then carrying out normalization processing.
Preferably, the step S5 of modifying the sampling strategy of the GraphSage model includes the specific steps of:
the sampling method of each layer of the GraphSage model is modified according to the rule that the top k with the largest weighted average of the edges is used as the sampling calculation.
Preferably, the dialogical design provided in step S5 specifically includes:
the tactical design includes asking for loan requirements and rating the activity.
Preferably, the present invention further provides an old and new activity anti-fraud recognition system, including:
a configuration and determination module: the system is used for defining black, white and gray samples and judging the number of the samples, and when the ratio of the black and white samples to the white samples is a first preset value, the execution is carried out according to a white sample rule;
the white sample rule is:
(1) Determining the loan without overdue and white lists;
(2) The repayment rate is a second preset value;
(3) And the examination and verification are passed;
a data module: the behavior data of the fission refreshing activity is cleaned, a four-dimensional tensor is established for each user, and recoding operation is performed;
a calculation module: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;
an editing and control module: the method comprises the steps of establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes; and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.
Preferably, the configuration and determination module specifically includes:
and when the ratio of the black samples to the total samples is a fourth preset value, supplementing the number of the black samples by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, controlling the ratio of the gray samples by the hard label ratio returned by the gray iteration of the online model.
Preferably, the calculation module specifically includes:
the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.
The invention has the beneficial effects that: the anti-fraud recognition method and the anti-fraud recognition system for old and new activities provided by the invention solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between samples as prior knowledge in the training of a model, i.e. deep social relations, node relations, operation habits and other combined forms can be mined out and presented in the modes of graph structural features, point features and edge features, and the method and the system are simultaneously suitable for the scenes that black products or wool individual black samples are few and gray samples (users without default at present are difficult to determine whether risks) are too many, i.e. a model with high precision can be trained by only a small amount of labeled samples.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intended to be drawn to scale in actual dimensions, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a schematic diagram of the general steps of the old and new activities anti-fraud recognition method and system according to the embodiment of the present invention;
FIG. 2 is a schematic view of the flowchart design of the method and system for anti-fraud identification of old and new activities according to the embodiment of the present invention;
FIG. 3 is a business flow diagram of a fission activity provided by the present invention.
Detailed Description
The present invention will be better understood and implemented by those skilled in the art by the following detailed description of the embodiments taken in conjunction with the accompanying drawings, which are not intended to limit the scope of the present invention.
Referring to fig. 1-3, an embodiment of the present invention provides a method and a system for anti-fraud identification of old and new activities, including the following steps:
s1: defining black and white gray samples (black samples are poor in credibility or lower than a preset score (the preset score is 60-70 points if the total credibility score is 100 points), white samples are good in credibility or higher than the preset score, gray samples are customers located between the black samples and the white samples), judging the number of the samples, and executing according to a white sample rule when the ratio of the black samples to the white samples is a first preset value (the ratio of the black samples to the white samples is not more than 1;
the white sample rule is (three rules are executed from high to low in priority, and stop until the ratio of black and white samples does not exceed 1:
(1) Determining the loan without overdue or white list;
(2) The rewarding rate is a second preset value (> = 2);
(3) And the examination and verification are passed;
s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation;
s3: carrying out similarity calculation of time series on unequal length behavior tensors generated by users based on a dynamic time warping model (the reason for adopting the method is that the method is suitable for time series with different lengths and different rhythms, and finally a similarity numerical value is calculated for behavior similarity between each user and other users, and the subsequent similarity numerical value is used as an edge type during composition and is weighted at the same time);
s4: establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes;
s5: modifying a sampling strategy of the GraphSage model, training the model, performing online iteration when the trained inference model is expected to reach a third preset value (70%) of a recall rate (aiming at enhancing data for subsequent training by actually scheduling suspected cheating customers after online), wherein the process design needs to take account of the scheduling throughput of each company, for example, the suspected cases cannot exceed hundreds of cases every day, and pushing the cases to schedule posts, and meanwhile, providing a different speech design for pre-loan schedule posts compared with the traditional credit schedule posts.
The invention has the beneficial effects that: the method solves the problem that most of the traditional cheating methods can only analyze pain points of risk information of a single sample, and a graph neural network (GraphSage) can embody the associated information between the samples as prior knowledge in the training of the model, i.e. deep social relations, node relations, operation habits and other combined forms can be mined out and presented in the mode of graph structure characteristics, point characteristics and edge characteristics, and meanwhile, the method is suitable for the scenes that black products or wool individuals have few black samples and gray samples (at present, users without default can hardly determine whether to be risky) are too many, i.e. a model with high precision can be trained by only needing a small amount of labeled samples.
Referring to fig. 1-2, in a preferred embodiment, the step S1 of defining and determining the black-white-gray sample includes the following specific steps:
when the proportion of the black samples and the total samples is a fourth preset value (less than 0.2%), the number of the black samples is supplemented by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, the proportion of the gray samples is controlled by the proportion of the hard label returned by the gray iteration of the online model, so that the model convergence cannot be influenced due to overlarge noise data.
Wherein black sample is the customer that the reputation is relatively poor, and black sample still includes the sample of surveying through artifical case simultaneously, and in financial industry, often the quantity of black sample can not be too much, consequently when the proportion of black sample and total sample is less than 0.2%, then need do black sample data analysis and fit out in certain time other effective characteristics that have the correlation, promptly: here for fission pull-up activities, in combination with Vintage's curve, the number of black samples can be supplemented with the conversion rate for a certain sliding time window; the ash sample is: if the proportion of the gray samples is too small, other data in credit is added to be used as background data supplement, and the time window may be exceeded.
Referring to fig. 1-2, in a preferred embodiment, the step S2 of establishing a four-dimensional tensor for each user, and the specific step of performing a re-encoding operation includes:
establishing a four-dimensional tensor for each user, wherein each dimension respectively represents retention time, an event _ title vector, a time vector and a timestamp, and performing recoding operation (the characteristics of season, morning, noon and evening and the like of the timestamp are not retained, and then, according to the characteristics of data ecology and customer groups of each company, node labels and weights thereof on a graph are added to embody the finetune stage of the graph model, and the method steps only retain the most basic and generalized method).
Referring to fig. 1-2, in a further preferred embodiment, the specific steps of step S3 include:
the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: when the sequences Q and C are matched from (0, 0), every time one point is reached, the distances calculated by all the previous points are accumulated, and after the end point (n, m) is reached, the accumulated distance (cumulative distances) is the total distance, namely the similarity of the sequences Q and C.
The cumulative distance γ (i, j) can be expressed as follows:
the accumulated distance gamma (i, j) is the current grid point distance d (q) i ,c j ) I.e. point q i And c j And the cumulative distance of the smallest neighboring element that can reach the point, the formula for which is calculated as follows:
Figure 628597DEST_PATH_IMAGE001
referring to fig. 1-2, in a further preferred embodiment, the specific steps of respectively establishing edges between users and between nodes in step S4 are as follows:
and establishing edges according to the relation whether the users are invited or not, taking the similarity as the edges of the connection between the nodes, weighting the two edges, and then carrying out normalization processing.
Referring to fig. 1-2, in a preferred embodiment, the step S5 of modifying the sampling policy of the GraphSage model (mainly modifying the sampling rule) includes the specific steps of:
the sampling method of each layer of the GraphSage model is modified according to the rule that the top k with the maximum weighted average of the edges is used as the sampling calculation (the attention layer is not added, but the strategy is adopted).
Referring to fig. 1-2, in a further preferred embodiment, the dialogical design provided in step S5 specifically includes:
the tactical design includes asking for loan requirements and rating the activity.
Referring to fig. 1-3, the method and system for identifying new old belt activities against fraud according to the present invention first uses each system to pull through behavior and financial transaction data, then divides the activity time window, and determines whether the ratio of black and white samples exceeds 1:6 (not exceeding 1.
Referring to fig. 1-3, in a preferred embodiment, the present invention further provides an old and new activity anti-fraud recognition system, comprising:
a configuration and determination module: the system is used for defining black, white and gray samples and judging the number of the samples, and when the ratio of the black and white samples to the white samples is a first preset value, the execution is carried out according to a white sample rule;
the white sample rule is:
(1) Determining the loan without overdue or white list;
(2) The repayment rate is a second preset value;
(3) And the examination and verification are passed;
a data module: the behavior data of the fission refreshing activity is cleaned, a four-dimensional tensor is established for each user, and recoding operation is performed;
a calculation module: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;
an editing and control module: the method comprises the steps of establishing graph data, taking user IDs as nodes, and respectively establishing edges between users and between nodes; and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.
Referring to fig. 1-3, in a preferred embodiment, the configuration and determination module specifically includes:
and when the ratio of the black samples to the total samples is a fourth preset value, supplementing the number of the black samples by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, controlling the ratio of the gray samples by the hard label ratio returned by the gray iteration of the online model.
Referring to fig. 1-3, in a preferred embodiment, the computing module specifically includes:
the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.
The anti-fraud identification method and the anti-fraud identification system for the old and new activities provided by the invention also have the following characteristics:
1. based on the rules of the profit rule pattern of the financial credit fission activity, outliers can be identified, the rules having: the fission pull-in activity has short release period and high frequency, and is easy to have activity loopholes, for example, the reward is regained through a logout number, the task is released on a task platform by using a reward rule, and a crowd-sourcing-like mode can summon a large number of people who are simply profitable.
2. Based on the law of the behavior pattern of the guest group of the finance credit fission activity, the abnormal point can be identified, and the law has: the crowd's action pattern of black product gathering is regular, and the operation is skilled, has fixed operation mode like having trained, and the time is concentrated, and platform dwell time can be observed, obtains rewarding the clear route, mostly does not have follow-up business action after getting the benefit.
The invention has the beneficial effects that: the invention provides an old and new activity anti-fraud recognition method and system, which solve the problem that most of the traditional fraud methods can only analyze pain points of risk information of a single sample, a graph neural network can embody the associated information between samples as priori knowledge in the training of a model, namely, deep social relations, node relations, operation habits and other combined forms can be mined and presented in the modes of graph structural features, point features and edge features, and meanwhile, the graph neural network model GraphSage model is based on and is suitable for the scenes that black products or wool individual black samples are few and grey samples (a user without default at present is difficult to determine whether risks) are too many, namely, a model with high precision can be trained by only needing a small number of samples with labels.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. An anti-fraud identification method for old and new activities is characterized by comprising the following steps:
s1: defining black, white and gray samples and judging the number of the samples, and executing according to a white sample rule when the ratio of the black and white samples to the white samples is a first preset value;
the white sample rule is:
(1) Determining the loan without overdue or white list;
(2) The repayment rate is a second preset value;
(3) And the examination and verification are passed;
s2: cleaning the behavior data of the fission refreshing activity, establishing a four-dimensional tensor for each user, and performing recoding operation;
s3: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;
s4: establishing graph data, taking user ID as a node, establishing edges between users through the relation of whether the users are invited or not, taking the similarity of time series as the edges of connection between the nodes, weighting the two edges, and then carrying out normalization processing;
s5: and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.
2. The method for anti-fraud recognition of old and new activities according to claim 1, wherein the step S1 of defining and determining black, white and gray samples comprises the following specific steps:
and when the ratio of the black samples to the total samples is a fourth preset value, supplementing the number of the black samples by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, controlling the ratio of the gray samples by the hard label ratio returned by the gray iteration of the online model.
3. The method for anti-fraud recognition of old new activities according to claim 1, wherein the step S2 of establishing a four-dimensional tensor for each user and performing the recoding operation comprises the following specific steps:
establishing a four-dimensional tensor for each user, wherein each dimension respectively represents dwell time, event _ title vector, degree vector and time stamp, and performing a recoding operation.
4. The method for anti-fraud recognition of old and new activities according to claim 1, characterized in that the specific steps of step S3 include:
the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.
5. The method for recognizing anti-fraud of old and new activities according to claim 1, wherein the step S5 of modifying the sampling strategy of the GraphSage model comprises the steps of:
the sampling method of each layer of the GraphSage model is modified according to the rule that the top k with the largest weighted average of the edges is used as the sampling calculation.
6. The method for anti-fraud recognition of old and new activities according to claim 1, wherein the dialogical design provided in step S5 specifically includes:
the tactical design includes asking for loan requirements and rating the activity.
7. An old and new activity anti-fraud identification system, comprising:
a configuration and determination module: the system is used for defining black, white and gray samples and judging the number of the samples, and when the ratio of the black and white samples to the white samples is a first preset value, the execution is carried out according to a white sample rule;
the white sample rule is:
(1) Determining the loan without overdue or white list;
(2) The repayment rate is a second preset value;
(3) And the examination and verification are passed;
a data module: the behavior data of the fission refreshing activity is cleaned, a four-dimensional tensor is established for each user, and recoding operation is performed;
a calculation module: calculating the similarity of the time sequence of the unequal length behavior tensors generated by the user based on a dynamic time warping model;
an editing and control module: the graph data is established, the user ID is used as a node, edges between users are established according to the invited or not relationship between the users, the similarity of the time sequence is used as the edges of the connection between the nodes, and the two edges are weighted and then are normalized; and modifying the sampling strategy of the GraphSage model, training the model, performing online iteration when the recall rate is a third preset value, pushing to the case and post adjustment, and providing a conversational design for the case and post adjustment before loan.
8. The system according to claim 7, wherein the configuration and determination module specifically comprises:
and when the ratio of the black samples to the total samples is a fourth preset value, supplementing the number of the black samples by the conversion rate of the sliding time window, and when the number of the gray samples needs to be increased or decreased, controlling the ratio of the gray samples by the hard label ratio returned by the gray iteration of the online model.
9. The system according to claim 7, wherein the computing module specifically comprises:
the method comprises the following steps of calculating similarity of time sequences of unequal length behavior tensors generated by users based on a dynamic time warping model, and calculating a similarity numerical value according to behavior similarity between each user and other users, wherein the calculation method comprises the following steps: and matching the sequences Q and C from (0, 0), accumulating the distances calculated by all the previous points when each point is reached, and obtaining the total distance after the end point (n, m) is reached, namely the similarity of the sequences Q and C.
CN202211381968.4A 2022-11-07 2022-11-07 Old belt new activity anti-fraud identification method and system Active CN115423542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211381968.4A CN115423542B (en) 2022-11-07 2022-11-07 Old belt new activity anti-fraud identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211381968.4A CN115423542B (en) 2022-11-07 2022-11-07 Old belt new activity anti-fraud identification method and system

Publications (2)

Publication Number Publication Date
CN115423542A CN115423542A (en) 2022-12-02
CN115423542B true CN115423542B (en) 2023-03-24

Family

ID=84207459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211381968.4A Active CN115423542B (en) 2022-11-07 2022-11-07 Old belt new activity anti-fraud identification method and system

Country Status (1)

Country Link
CN (1) CN115423542B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272201B (en) * 2023-09-15 2024-03-12 中邮消费金融有限公司 Financial behavior anomaly detection method and system based on 4W1H language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395466A (en) * 2020-11-27 2021-02-23 上海交通大学 Fraud node identification method based on graph embedded representation and recurrent neural network
CN114861746A (en) * 2021-12-15 2022-08-05 平安科技(深圳)有限公司 Anti-fraud identification method and device based on big data and related equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11403643B2 (en) * 2020-01-24 2022-08-02 Adobe Inc. Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
CN112053221A (en) * 2020-08-14 2020-12-08 百维金科(上海)信息科技有限公司 Knowledge graph-based internet financial group fraud detection method
CN112396160A (en) * 2020-11-02 2021-02-23 北京大学 Transaction fraud detection method and system based on graph neural network
CN114117248A (en) * 2021-10-28 2022-03-01 北京百度网讯科技有限公司 Data processing method and device and electronic equipment
CN114513791A (en) * 2022-01-13 2022-05-17 浙江鸿程计算机系统有限公司 Telecom anti-fraud method based on machine learning
CN114782161A (en) * 2022-03-31 2022-07-22 度小满科技(北京)有限公司 Method, device, storage medium and electronic device for identifying risky users

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395466A (en) * 2020-11-27 2021-02-23 上海交通大学 Fraud node identification method based on graph embedded representation and recurrent neural network
CN114861746A (en) * 2021-12-15 2022-08-05 平安科技(深圳)有限公司 Anti-fraud identification method and device based on big data and related equipment

Also Published As

Publication number Publication date
CN115423542A (en) 2022-12-02

Similar Documents

Publication Publication Date Title
Machado et al. LightGBM: An effective decision tree gradient boosting method to predict customer loyalty in the finance industry
WO2021155706A1 (en) Method and device for training business prediction model by using unbalanced positive and negative samples
US11651237B2 (en) Predicting aggregate value of objects representing potential transactions based on potential transactions expected to be created
Lu et al. A customer churn prediction model in telecom industry using boosting
Bozorgi et al. Process mining meets causal machine learning: Discovering causal rules from event logs
CN112115963B (en) Method for generating unbiased deep learning model based on transfer learning
Agrawal et al. Customer churn prediction modelling based on behavioural patterns analysis using deep learning
Goel et al. The importance of modeling data missingness in algorithmic fairness: A causal perspective
CN115423542B (en) Old belt new activity anti-fraud identification method and system
Cui et al. Cost-sensitive learning via priority sampling to improve the return on marketing and CRM investment
CN108804577B (en) Method for estimating interest degree of information tag
Sjarif et al. A customer Churn prediction using Pearson correlation function and K nearest neighbor algorithm for telecommunication industry
CN111798310A (en) Rejection inference method based on Cox regression and logistic regression and electronic equipment
Kim et al. An empirical analysis of a crowdfunding platform
Li et al. Graph mining assisted semi-supervised learning for fraudulent cash-out detection
Geng et al. Prospect theory based crowdsourcing for classification in the presence of spammers
Rafail et al. Permeable Participation: Civic Engagement and Protest Mobilization in 20 OECD Countries, 1981–2008
Paul Many dropouts? Never mind!–Employment prospects of dropouts from training programs
CN116993490B (en) Automatic bank scene processing method and system based on artificial intelligence
Wang Churn Prediction for High-Value Players in Freemium Mobile Games: Using Random Under-Sampling.
Jinbo et al. The application ofadaboost in customer churn prediction
Zhou Data mining for individual consumer credit default prediction under e-commence context: a comparative study
CN109145207B (en) Information personalized recommendation method and device based on classification index prediction
McCarthy et al. Predictive models using decision trees
Severiukhina et al. Segment-wise Users' Response Prediction based on Activity Traces in Online Social Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant