CN111105303B

CN111105303B - Network lending fraud detection method based on incremental network characterization learning

Info

Publication number: CN111105303B
Application number: CN201911101580.2A
Authority: CN
Inventors: 王成; 朱航宇; 胡瑞鑫
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2019-11-12
Filing date: 2019-11-12
Publication date: 2023-05-12
Anticipated expiration: 2039-11-12
Also published as: CN111105303A

Abstract

A network lending fraud detection method based on incremental network characterization learning. The principle of the invention is as follows: real world loan data is analyzed in the form of a heterogeneous information network that is robust in nature, and a relational loan network is established from the loan data in the form of the heterogeneous information network. And extracting a specific relation from the multi-type heterogeneous relation lending network to form a homogeneous lending network which only retains one node type. And sequentially updating the relation lending network and the homogeneous lending network according to each batch of the arrived lending data, and timely updating the vector characterization of the nodes in the homogeneous lending network by using an incremental network characterization learning algorithm so as to capture the latest relevance among the data. Based on the learned vector characterization structure and new features related to time sequence (such as the relation between single numbers and the first n single numbers), a classifier is combined to realize a classification model for detecting the fraud of lending data, so that the fraud is detected and identified.

Description

Network lending fraud detection method based on incremental network characterization learning

Technical Field

The invention relates to anti-fraud detection of internet financial network loans.

Background

With the rapid development of the internet, various conventional services are gradually shifted to online, and network loans in internet finance are rapidly developed, and the generation of the network loans brings about a large amount of electronic transaction data, and simultaneously, the network loan fraud amount is greatly increased [1]. In recent years, B2C network lending has progressed rapidly worldwide, especially in china, wherein B2C network lending institutions suffer from a large number of bad accounts and lending, resulting in a large economic loss [2]. Fraudsters complete a large batch of network lending fraud by forging false borrower information and even creating a ganged false borrower. To ensure the business safety of investment institutions and normal users in network lending, a practical and effective network lending fraud detection system needs to be established.

In a B2C lending scene, an individual may acquire credit resources through fake application, fake data and fake contacts, multi-head lending and other modes; furthermore, the amount and funds are obtained by means of the black gray industry, such as the proxy package, group cheating and the like. There is often a potential association in these spurious lending data. Network characterization learning has shown strong forces on potential links between mined data [3]. However, most fraud detection systems today update networks periodically based on static lending data networks, which cannot accommodate rapid changes in network age fraud, such as: the dark gray industry generates large amounts of associated lending data in a short period of time, which cannot be effectively prevented from fraud because the static lending network does not learn the associations in time. In addition, B2C network loans can generate a large amount of loan data in a very short time, the loan data is continuously increased and the fraud means are continuously changed, and dynamic addition of new data and deletion of old data are urgently needed, which results in that the fraud detection method based on static network characterization learning cannot adapt to the change of the loan network structure.

So far, research on network lending has focused mainly on how to build an efficient fraud detection model on static data [4], with little research involving dynamically updating the model. Talaver et al [5] trains a radial basis function network to distinguish whether a customer has lending fraud and establishes a fuzzy c-means cluster to group data points to create a customer profile by grouping data within the cluster. Babaev et al [6] use neural networks on fine-grained cross-country data to process loan data, and propose a new method, E.T.RNN, based on business data alone, to enable automated decision making of loan applications.

From the above studies, it was found that one major problem with B2C network lending fraud detection is the lack of a method to cope with novel fraud measures in the short term. The traditional detection method has a longer period, and a plurality of fraud methods change along with the time, so that the traditional detection method lacks better generalization capability.

Disclosure of Invention

Fraudulent loan applications often pass audit systems by way of counterfeit applications, providing fraudulent data, and multi-headed loans, and the like, with potential associations between such fraudulent information often being more evident, particularly in the black gray industry's proxy packaging, group fraud. The invention discloses a network loan fraud detection method which is beneficial to analyzing and taking rich loan data generated by the current network loan as a basis, and protects the safety of users and enterprises.

The principle of the invention is as follows: real world loan data is analyzed in the form of a heterogeneous information network with high characterizing power, and the loan data is formed into a relational loan network in the form of a heterogeneous information network (including various types of nodes and edges, such as a loan bill number, license plate number, telephone number, address, etc.). The specific relationship is extracted from the multi-type heterogeneous relationship lending network to form a homogeneous lending network which only retains one node type (the homogeneous network generation process of lending data is shown in fig. 1). And sequentially updating the relation lending network and the homogeneous lending network according to each batch of the arrived lending data, and timely updating the vector characterization of the nodes in the homogeneous lending network by using an incremental network characterization learning algorithm so as to capture the latest relevance among the data. Based on the learned vector characterization structure and new features related to time sequence (such as the relation between single numbers and the first n single numbers), a classifier is combined to realize a classification model for detecting the fraud of lending data, so that the fraud is detected and identified.

The technical scheme of the method is as follows:

a network lending fraud detection method based on incremental network characterization learning is characterized by comprising the following steps:

step 1, establishing a relation lending network and completing homogenization

Collecting rich loan data generated by historical network loans, establishing a heterogeneous relationship loan network, taking a single number as a node, taking the attribute relationship simultaneously owned by different loan data as an edge, and deriving a homogeneous loan network; providing to step 2;

step 2, constructing a training sample set

Collecting original static data, establishing an initial static data set, transforming a network structure by using a network characterization learning algorithm, carrying out vectorization to obtain vector characterization corresponding to nodes based on the initial network lending data set, and forming a training sample set by the learned vector data; providing to step 3;

step 3, feature construction

Performing feature construction on vector data in a training sample set to prepare for inputting a fraud detection model; providing to step 4;

step 4, training fraud detection model

Adopting an XGBoost classifier in a machine learning integrated library scikit-learn in python as a fraud detection model, and inputting the features constructed in the step 3 into the classifier to train the fraud detection model; providing to step 7;

step 5, updating the relationship lending network and the homogeneous lending network

Updating and collecting the currently generated loan data of the network loan, and providing the updated relation loan network and the homogeneous loan network for the increment flow-type loan data which arrive in sequence in time sequence to the step 6;

step 6: updating a current test dataset

And (3) constructing a current test data set by utilizing the training sample set constructed in the step (2) and the streaming lending data which arrive in sequence in time sequence, namely: adding k new loan data, and deleting k loan data with earliest time in the initial data set to update the current test data set in real time;

referring to step 2, transforming the network structure by using a network representation learning algorithm, carrying out vectorization to obtain a vector representation corresponding to a node of the current test data set, and updating the learned vector data to update the current test data set; providing to step 7;

step 7, feature construction

Referring to step 3, performing feature construction on vector data in the test data set to prepare for inputting a fraud detection model; providing to step 8;

step 8, testing the fraud detection model

And (3) inputting the current test data set in the step (7) into the fraud detection model in the step (4) to obtain a judgment result of the fraud detection model.

Further, judging whether the corresponding moment of the current test data set exceeds the model updating period, if not, repeating the step 5, and if so, repeating the step 1. Until fraud detection is completed for all test data sets, the algorithm ends.

The invention aims at overcoming the debilitation of the static fraud detection method for the rapidly changing network loan fraud, increasing the adaptability of the fraud detection system to the changing environment, and better guaranteeing the detection of the fraud loan, interception of the fraud loan and the protection of the fund safety of users and enterprises.

The invention discloses a network lending fraud detection method based on incremental network characterization, which realizes dynamic update of a lending data network, and the method is characterized in that the lending data network is mined to the characterization with strong generalization capability by means of incremental network characterization learning, so that the real-time performance, accuracy and robustness of the model interception fraudulent lending are improved.

Drawings

Fig. 1: the invention relates to a homogeneous network generation process example graph of lending data in a network lending scene;

fig. 2: the invention relates to a network lending fraud detection method based on incremental network characterization learning, which comprises the following steps of;

fig. 3: the lending data of the embodiment is transformed into a vector representation diagram;

fig. 4: example an incremental lending dataset partitioning scheme at a time.

Detailed Description

The technical scheme of the invention is further described below by combining the embodiment and the attached drawings.

The network lending fraud detection method based on incremental network characterization learning is shown in the flowchart of fig. 2, and the process is as follows:

step 1, establishing a relation lending network and completing homogenization

step 2, constructing a training sample set

step 3, feature construction

step 4, training fraud detection model

step 6: updating a current test dataset

step 7, feature construction

step 8, testing the fraud detection model

Further, detailed examples are given.

Example 1

Is divided into four steps

The first part, generates an initial network characterization, which is as follows:

input:

the data B of the subscriber network lending data,

network characterization learning method parameter W _e 。

And (3) outputting:

mapping relation gamma=f between node v and corresponding vector gamma at initial time t _t (v)。

In detail, an initial network characterization is generated as follows:

step 1.1: and screening available original fields (shown in table 1) from the original lending data, performing data preprocessing operations such as field type conversion, null value removal filling and the like, formulating a discretization rule for each field, and discretizing the value to reduce the data precision. Such as: in the embodiment, the amount is divided into a limited number of categories according to different areas; the address is divided into coarse-granularity discretization values according to different streets.

The original lending data is divided into a single number (applync) type and an ATTRIBUTE (ATTRIBUTE) type, wherein the ATTRIBUTE (ATTRIBUTE) is other data except the single number (applync) in the lending data. For a borrowing data, it is noted as (b) _i ，ATT(b _i ))，b _i Is the single number of the lending data b, ATT (b _i ) Is the attribute set corresponding to the lending data b, att _k (b _i ) Is ATT (b) _i ) Is the kth element in (c).

Establishing a relational loan network N based on original loan data _r = (V, E), V is a node set, E is an edge set, where edge e= (u, V), u and V belong to a nodePoint set V (contains multiple types of nodes). For each item of data b in the loan data b _i First b _i Adding node set V, adding ATT (b) _i ) Each element of the list is added to the node set V in turn, and the edge (b) _i ，att _k (b _i ) Add edge set E, att _k (b _i ) Is ATT (b) _i ) Is the kth element in (c). Step 1.2 is performed. The left part of FIG. 1 is a relational lending network N _r Is shown in the drawings.

Step 1.2: establishing a homogeneous lending network N based on a relational lending network _h ＝(V ^h ,E ^h )，V ^h Is a node set, E ^h Is an edge set, where edge e= (u, V, w), u and V belong to node set V ^h (only nodes of the type lending list number are included). When att _k (b _i )＝att _k (b _j ) When a pair of edges (b _i ，att _k (b _i ) (b) _j ，att _k (b _j ) Is regarded as edge set E ^h Edge (b) of (b) _i ，b _j ) W is the edge (b) _i ，b _j ) The number of occurrences as a homogeneous lending network N _h Is a weight of (a). Based on relation lending network N _r Adding all nodes with the types of lending single numbers in the node set V into the node set V ^h . Each pair of edges (b _i ，att _k (b _i ) (b) _j ，att _k (b _j ) When att _k (b _-i )＝att _k (b _i ) When the edge (b) _i ，b _j ) Adding edge set E ^h . Obtaining a homogeneous lending network N _h ＝(V ^h ,E ^h ). Step 1.3 is performed.

The right part of FIG. 1 is a lending network N based on the left part relation _r Generating a homogeneous lending network N _h Is shown in the drawings.

Step 1.3: based on constructed homogeneous lending network N _h In this embodiment, the existing network characterization learning method NetWalk is used to learn the homogeneous lending network N _h The vector representation of all network nodes in the network is avoided, the trouble of manually extracting the characteristics is avoided, and the characteristic information is automatically extracted. Network characterization learning methodThe main parameters of the NetWalk learning vector representation are shown in the table 2, the parameter setting is related to the network structure, the parameters walk-length, number _walks are in direct proportion to the number of nodes and edges in the network in general, and the parameters walk-length and number_walks are larger as the number of nodes and edges in the network is larger; the parameter learning_rate affects the performance of the NetWalk method for learning network characterization, and an excessive value may cause over-fitting, and an insufficient value causes under-fitting, and the embodiment is set to 0.01; the parameter dim is the dimension of the resulting output vector representation, a large dimension often contains more potential correlations, but with a consequent higher computational complexity, the embodiment being set to 128; in the network characterization learning method of this embodiment, init is an edge set of the homogeneous lending network generated based on the initial lending data, and snap is an edge set added or deleted in the homogeneous lending network generated based on the streaming lending data. Step 1.4 is performed.

Step 1.4: aiming at a homogeneous lending network N, the network characterization learning method NetWalk in the step 1.3 _h Obtaining a vector representation gamma of a node v and a corresponding vector representation gamma in a network at an initial time t, and establishing a mapping relation gamma=F _t (v) A. The invention relates to a method for producing a fibre-reinforced plastic composite According to the mapping relation gamma=f _t (v) The initial lending data is represented as a vector representation, as shown in fig. 4, in which a set of vector representations consisting of a number of specific field values are converted into a set of fixed dimensions (vector dimensions dim in fig. 4 are determined by parameters dim in the NetWalk of the network representation learning method).

Table 1 available raw fields

TABLE 2 NetWalk principal parameters

/>

Secondly, establishing a fraud detection model, wherein the fraud detection model comprises the following steps:

classifier environment: python, XGBoost classifier

Input:

time t _k Mapping relation between corresponding node v and corresponding vector gamma

Classifier parameter set W _c ，

The number of features h entered by the classifier,

set B for model training lending data _train (t _k )。

And (3) outputting:

fraud detection model

In detail, the fraud detection model is built as follows:

step 2.1: loan data B containing n available original fields _train (t _k ) N corresponding nodes may be associated in the homogeneous lending network. As can be seen from step 1.4, based on t _k Time node and mapping relation

The lending data is converted into vectors with dimension dim corresponding to each lending unit number. After the vector is obtained, the vector can be directly input into a classification model to carry out subsequent tasks of node classification. (this is "method one").

This embodiment is further innovative and gives a further disclosure of "method two": based on the obtained vector characterization, sequentially calculating Euclidean distance (Euclidean distance is a calculation method of vector similarity) between each single number and the first h single numbers in a data set (single numbers are ordered according to generation time) for each borrowing data, and ordering the h single numbers according to the order from small to large to be used as the constructed time sequence characteristics of the corresponding single numbers. Then, the similarity of the to-be-detected single number and the vector corresponding to the first h single numbers is introduced as the input of the fraud detection model.

Comparison:

the method one only considers the absolute space position of the vector, and has poor performance in lending data.

Compared with the method one, the method two is more beneficial to detecting the problem of bulk fraud in lending fraud, does not use absolute space position, uses vector similarity, and enhances the generalization capability of a follow-up fraud detection model. Face vector x= (X) ₁ ，····，x _dim )、Y＝(y ₁ ，····，y _dim ) The Euclidean distance is calculated as follows

Step 2.2: based on the time sequence characteristics constructed in the step 2.1, according to the classifier parameter set W _c Setting a classifier to make t _k Time sequence characteristics corresponding to the time lending data are used as data, whether the corresponding lending data is a fraud transaction or not is used as a label, the time sequence characteristics are imported into a classifier for training, and the trained two classification models are regarded as fraud detection models

And thirdly, generating incremental network characterization, wherein the incremental network characterization comprises the following steps of:

input:

/>

Time t _k Time networkCharacterization of the data set B for learning _train (t _k )，

Streaming incoming t _k+1 Time of day network lending data set B _test (t _k+1 )。

And (3) outputting:

time t _k+1 Mapping relation between time node v and corresponding vector gamma

In detail, an incremental network characterization is generated, which proceeds as follows:

step 3.1: according to data set B _train (t _k ) Time stamp sequence, selection and data set B _test (t _k+1 ) The same amount of earliest data is placed into data set B' _test (t _k+1 ). Data set B _test (t _k+1 ) And B' _test (t _k+1 ) The same preprocessing operation as in step 1.1 is adopted to process the data set B after processing _test (t _k+1 ) And B' _test (t _k+1 ) Based on data set B _train (t _k ) And updating the relationship lending network. Based on the definition of step 1.1, the network lending data B are processed separately _test (t _k+1 ) And B' _test (t _k+1 ) Obtaining node set V in relational lending network _test (t _k+1 ) And V' _test (t _k+1 ) And edge set E _test (t _k+1 ) And E' _test (t _k+1 )，E _test (t _k+1 ) Is the single number in the lending data of the stream arrival and the lending network N related to the last moment _r A set of edges of existing relationships between existing nodes,

is a relational lending network N _r The set of expiring edges to be deleted. Let v=v & &v _test (t _k+1 )-V′ _test (t _k+1 ) And e=e% _test (t _k+1 )-E′ _test (t _k+1 ) Updating a relational lending network N _r = (V, E). Step 3.2 is performed.

Step 3.2: based on updated relationship lending network N _r = (V, E), the updated homogeneous lending network N is obtained using step 1.2 _h ＝(V ^h ,E ^h ). Step 3.3 is performed.

Step 3.3: based on time t _k Mapping relation between corresponding node v and corresponding vector gamma

Respectively set edge sets E _test (t _k+1 ) And E' _test (t _k+1 ) For the newly arrived edge set and the edge set to be deleted, a network characterization learning method NetWalk is applied to the related edge set E _test (t _k+1 ) And E' _test (t _k+1 ) Incremental network characterization learning is carried out on the nodes and the edges in the network to obtain a time t _k+1 Mapping relation between corresponding node v and corresponding vector gamma>

Step 3.4 is performed.

Step 3.4: the step 3.3 is directed to the homogeneous lending network N _h At time t _k Mapping relation between node v and its corresponding vector representation gamma in time network

According to the mapping relation gamma=f _t (v) The streaming lending data is re-represented as a vector representation, as shown in fig. 4, where a set of lending data consisting of specific field values is transformed into a set of vector representations of fixed dimensions.

The fourth part, the test of the fraud detection model, the procedure is as follows:

classifier environment: python, XGBoost classifier

Input:

the model update period T is set to be a period,

fraud detection model

Time t _k Set B for model test lending data _test (t _k )。

And (3) outputting:

the test data is the probability of fraud P.

In detail, the fraud detection model is tested as follows:

step 4.1: loan data B containing n available original fields _train (t _k ) N corresponding nodes may be associated in the homogeneous lending network. From step 3.4, it can be seen that based on t _k Time node and mapping relation

The lending data is converted into vectors with dimension dim corresponding to each lending unit number. Based on the obtained vector characterization, the Euclidean distance between each single number and the first h single numbers in the data set (the single numbers are ordered according to the generation time) is calculated in sequence for each lending data, and the h single numbers are ordered according to the order from small to large and are used as the time sequence characteristics of the corresponding single numbers. Step 4.2 is performed.

Step 4.2: importing the fraud detection model obtained in step 2.2

Let t _k Time sequence characteristics corresponding to the test data at the moment are input into a fraud detection model +.>

Obtaining a set B of test lending data _test (t _k ) In (a)Probability of fraud p (b) for each item of debit data _i ) Outputting a set of probabilities P of the test data being fraudulent, wherein P (b _i ) e.P. Determining time t _k+1 +t ₀ Whether or not is greater than the period T, if so, then T _k Time of day loan data set B _train (t _k ) The first partial step 1.1 is performed to reconstruct the relational lending network, considered as the initial lending dataset. If smaller than, let->

At time t _k+1 A third partial step 3.1 is then performed to incrementally update the network characterization based on the incoming streaming lending data.

The invention obtains the recall rate (interception rate, true Positive Rate) under different disturbing rates (error interception rate, false Positive Rate) through detection and demonstration on a real internet financial platform lending data set, and calculates KS value (which is the maximum value of the recall rate-disturbing rate under different conditions) to evaluate the performance of the system.

Innovation point of the project

1. The method has the advantages that the association loan network is built from the recorded loan data, the homogeneous loan network is derived to express the relationship between the loan data in the form of the network, meanwhile, the potential association characteristics are automatically extracted from the data based on the homogeneous information network and the network characterization learning is carried out, and the dependence of the system on business knowledge is reduced.

2. And dynamically updating the associated lending network and the homogeneous lending network structure aiming at the streaming lending data, accurately dynamically updating the related characterization of the continuously changing lending network through an incremental network characterization learning method, constructing new characteristics of the lending data based on the vector characterization of the nodes, and inputting the fraud probability of returning the lending data by the existing trained model. Compared with the traditional method, the method has stronger real-time performance in the updating of the characterization in the model, is suitable for the requirement of rapid data auditing in the network lending scene, and has higher accuracy and robustness.

Annotating: the relevant terms in the present invention can be found in the following for the prior art.

[1]Chen Y Q,Zhang J,Ng W W Y.Loan Default Prediction Using Diversified Sensitivity Undersampling[C]//2018International Conference on Machine Learning and Cybernetics(ICMLC).IEEE,2018,1:240-245.

[2]Shi Y F,Song P P.Improvement Research on the Project Loan Evaluation of Commercial Bank Based on the Risk Analysis[C]//2017 10th International Symposium on Computational Intelligence and Design(ISCID).IEEE,2017,1:3-6.

[3]Cui P,Wang X,Pei J,et al.A survey on network embedding[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(5):833-852.

[4]Saha P,Bose I,Mahanti A.A knowledge based scheme for risk assessment in loan processing by banks[J].Decision Support Systems,2016,84:78-88.

[5]Talavera A,Cano L,Paredes D,et al.Data Mining Algorithms for Risk Detection in Bank Loans[C]//Annual International Symposium on Information Management and Big Data.Springer,Cham,2018:151-159.

[6]Babaev D,Savchenko M,Tuzhilin A,et al.ET-RNN:Applying Deep Learning to Credit Loan Applications[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining.ACM,2019:2183-2190.

[7]Yu W,Cheng W,Aggarwal C C,et al.Netwalk:A flexible deep embedding approach for anomaly detection in dynamic networks[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery&Data Mining.ACM,2018:2672-2681.

[8]Chen T,Guestrin C.XGBoost:A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.ACM,2016:785-794.

Claims

1. A network lending fraud detection method based on incremental network characterization learning is characterized by comprising the following steps:

step 1, establishing a relation lending network and completing homogenization

step 2, constructing a training sample set

step 3, feature construction

step 4, training fraud detection model

step 6: updating a current test dataset

step 7, feature construction

step 8, testing the fraud detection model

2. The method of claim 1, comprising the steps of

Step 1.1: screening original fields from the original lending data, and performing field type conversion and null value removal filling pretreatment operation;

dividing original lending data into two types of single number (applync) and ATTRIBUTE (ATTRIBUTE), wherein the ATTRIBUTE (ATTRIBUTE) is other data except the single number (applync) in the lending data; for a borrowing data, it is noted as (b) _i ，ATT(b _i ))，b _i Is the single number of the lending data b, ATT (b _i ) Is the attribute set corresponding to the lending data b, att _k (b _i ) Is ATT (b) _i ) The kth element of (a);

establishing a relational loan network N based on original loan data _r = (V, E), V is a node set, E is an edge set, where edge e= (u, V), u and V belong to node set V, which contains multiple types of nodes; for each item of data b in the loan data b _i First b _i Adding node set V, adding ATT (b) _i ) Each element of the list is added to the node set V in turn, and the edge (b) _i ，att _k (b _i ) Add edge set E, att _k (b _i ) Is ATT (b) _i ) The kth element of (a); executing the step 1.2;

step 1.2: establishing a homogeneous lending network N based on a relational lending network _h ＝(V ^h ,E ^h )，V ^h Is a node set, E ^h Is an edge set, where edge e= (u, V, w), u and V belong to node set V ^h Node set V ^h Only nodes with the type of lending list number are included; when att _k (b _i )＝att _k (b _j ) When a pair of edges (b _i ，att _k (b _i ) (b) _j ，att _k (b _j ) Is regarded as edge set E ^h Edge (b) of (b) _i ，b _j ) W is the edge (b) _i ，b _j ) The number of occurrences as a homogeneous lending network N _h The weight of (a); based on relation lending network N _r Adding all nodes with the types of lending single numbers in the node set V into the node set V ^h The method comprises the steps of carrying out a first treatment on the surface of the Each pair of edges (b _i ，att _k (b _i ) (b) _j ，att _k (b _j ) When att _k (b _i )＝att _k (b _i ) When the edge (b) _i ，b _j ) Adding edge set E ^h The method comprises the steps of carrying out a first treatment on the surface of the Obtaining a homogeneous lending network N _h ＝(V ^h ,E ^h ) The method comprises the steps of carrying out a first treatment on the surface of the Executing the step 1.3;

step 1.3: based on constructed homogeneous lending network N _h Network characterization learning method NetWalk is adopted to learn homogeneous lending network N _h Vector characterization of all network nodes in (a); executing the step 1.4;

step 1.4: aiming at a homogeneous lending network N, the network characterization learning method NetWalk in the step 1.3 _h Obtaining a vector representation gamma of a node v and a corresponding vector representation gamma in a network at an initial time t, and establishing a mapping relation gamma=F _t (v) The method comprises the steps of carrying out a first treatment on the surface of the According to the mapping relation gamma=f _t (v) Representing the initial lending data as a vector representation form, and converting the lending data formed by a plurality of specific field values into a group of vector representations with fixed dimensions;

step 2.1: based on t _k Time node and mapping relation

The loan data is transformed into the dimension dim corresponding to each loan unit numberVector;

sequentially calculating Euclidean distance between each single number and the first h single numbers in the data set for each lending data based on the obtained vector characterization, sequencing the single numbers according to the generation time, sequencing the h single numbers according to the sequence from small to large, and taking the h single numbers as the constructed time sequence characteristics of the corresponding single numbers; then, introducing the similarity of the single number to be detected and the vector corresponding to the first h single numbers as the input of the fraud detection model, and facing the vector X= (X) ₁ ，····，x _dim )、Y＝(y ₁ ，····，y _dim ) The Euclidean distance is calculated as follows

Step 3.1: according to data set B _train (t _k ) Time stamp sequence, selection and data set B _test (t _k+1 ) The same amount of earliest data is placed into data set B' _test (t _k+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Data set B _test (t _k+1 ) And B' _test (t _k+1 ) The same preprocessing operation as in step 1.1 is adopted to process the data set B after processing _test (t _k+1 ) And B' _test (t _k+1 ) Based on data set B _train (t _k ) Updating the relationship lending network; based on the definition of step 1.1, the network lending data B are processed separately _test (t _k+1 ) And B' _test (t _k+1 ) Obtaining node set V in relational lending network _test (t _k+1 ) And V' _test (t _k+1 ) And edge set E _test (t _k+1 ) And E' _test (t _k+1 )，E _test (t _k+1 ) Is the single number in the lending data of the stream arrival and the lending network N related to the last moment _r A set of edges of existing relationships between existing nodes,

is a relational lending network N _r A set of expiring edges to be deleted; let v=v & &v _test (t _k+1 )-V′ _test (t _k+1 ) And e=e% _test (t _k+1 )-E′ _test (t _k+1 ) Updating a relational lending network N _r = (V, E); executing the step 3.2;

step 3.2: based on updated relationship lending network N _r = (V, E), the updated homogeneous lending network N is obtained using step 1.2 _h ＝(V ^h ,E ^h ) The method comprises the steps of carrying out a first treatment on the surface of the Executing the step 3.3;

Executing the step 3.4;

According to the mapping relation gamma=f _t (v) Re-representing the streaming lending data into a vector representation form, wherein a set of vector representations consisting of a plurality of specific field values are converted into a set of vector representations with fixed dimensions;

step 4.1: loan data B containing n available original fields _train (t _k ) N corresponding nodes in the homogeneous lending network; from step 3.4, it can be seen that based on t _k Time node and mapping relation

The lending data are transformed into vectors with dimension dim corresponding to each lending single number; based on the obtained vector characterization, sequentially calculating Euclidean distance between each single number and the first h single numbers in the data set for each lending data, sequencing the single numbers according to the generation time, sequencing the h single numbers according to the sequence from small to large, and taking the h single numbers as the time sequence characteristics of the corresponding single numbers; executing the step 4.2;

step 4.2: importing the fraud detection model obtained in step 2.2

Obtaining a set B of test lending data _test (t _k ) Fraud probability p (b) _i ) Outputting a set of probabilities P of the test data being fraudulent, wherein P (b _i )∈P。

3. The method of claim 1, wherein determining whether the corresponding time of the current test dataset exceeds the model update period, if not, repeating step 5, and if so, repeating step 1; until fraud detection is completed for all test data sets, the algorithm ends.

4. A method as claimed in claim 3, characterized in that the time t is determined _k+1 +t ₀ Whether or not is greater than the period T, if so, then T _k Time of day loan data set B _train (t _k ) Regarding the initial lending data set, performing a first partial step 1.1 to reconstruct the relational lending network; if smaller than, let

B _train (t _k+1 )＝B _train (t _k )∪B _test (t _k+1 )-B′ _test (t _k+1 ) The method comprises the steps of carrying out a first treatment on the surface of the At time t _k+1 Step 3.1 is executed to incrementally update the network characterization based on the incoming streaming lending data. />