CN115953163A

CN115953163A - Fraud risk detection method, apparatus, device and medium

Info

Publication number: CN115953163A
Application number: CN202211660696.1A
Authority: CN
Inventors: 陈强; 杨晓烨; 李弘宇; 陶俊宇
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-12-22
Filing date: 2022-12-22
Publication date: 2023-04-11

Abstract

The disclosure provides a fraud risk detection method, and relates to the technical field of artificial intelligence. The method comprises the following steps: constructing a transaction association graph for the transaction to be detected, wherein the transaction association graph is constructed by at least two types of nodes in a client node, an event node and an associated entity node of the transaction; inputting the transaction association diagram and N negative examples in a negative example pool into a risk detection model, wherein the negative examples comprise a risk transaction association diagram constructed based on historical transaction data, and the risk detection model comprises an end-to-end model pre-trained based on a graph neural network; calculating a similarity between the transaction correlation graph and each negative example of the N negative examples by using the risk detection model; and carrying out risk detection according to the similarity result calculated by the risk detection model. The present disclosure also provides a fraud risk detection apparatus, device, storage medium and program product.

Description

Fraud risk detection method, apparatus, device and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a fraud risk detection method, apparatus, device, medium, and program product.

Background

Financial fraud includes acts of breaking the order of financial management by deceiving public and private property or financial institution credit, with the aim of illegal possession, using fabricated facts or concealing true facts. With the rapid development of big data technology and internet finance, financial fraud means are more and more concealed, and the fraud risks not only bring direct economic loss to financial institutions, but also cause extremely serious reputation loss.

In the related technology, a financial fraud branching detection scheme developed based on a black and white list, a rule strategy or an artificial intelligence technology exists, but the existing detection scheme focuses on the risk of a single transaction, ignores the correlation between the transactions and cannot detect unknown types of fraud transactions in time.

Disclosure of Invention

In view of the above, the present disclosure provides a fraud risk detection method, apparatus, device, medium, and program product.

In one aspect of the embodiments of the present disclosure, a fraud risk detection method is provided, including: constructing a transaction association graph for the transaction to be detected, wherein the transaction association graph is constructed by at least two types of nodes in a client node, an event node and an associated entity node of the transaction; inputting the transaction association diagram and N negative samples in a negative sample pool into a risk detection model, wherein the negative samples comprise a risk transaction association diagram constructed based on historical transaction data, and the risk detection model comprises an end-to-end model pre-trained based on a graph neural network; calculating the similarity between the transaction association graph and each negative sample in the N negative samples by using the risk detection model, wherein N is greater than or equal to 1; and carrying out risk detection according to the similarity result calculated by the risk detection model.

According to an embodiment of the present disclosure, the constructing a transaction association diagram for the transaction to be detected includes: constructing a transaction relation graph according to the relation chain between each node in the at least two types of nodes; and converting the transaction relationship graph into a bipartite graph form to obtain the transaction association graph.

According to an embodiment of the present disclosure, the method further includes obtaining the negative sample in advance, specifically including: establishing a historical transaction association graph based on historical transaction data, wherein the historical transaction association graph is established through at least two types of nodes in client nodes, event nodes and association entity nodes of various historical transactions; determining M risk nodes in the historical transaction association graph, wherein the risk nodes comprise nodes related to risk transactions in the at least two types of nodes, and M is greater than or equal to 1; and obtaining at least one negative sample from the historical transaction association graph according to the M risk nodes.

According to an embodiment of the present disclosure, the obtaining at least one of the negative examples comprises: taking the M risk nodes as M seeds, and calculating the local community of each seed; and obtaining at least one negative sample according to the local community of each seed.

According to an embodiment of the present disclosure, the risk transaction association graph comprises a risk community graph, and the obtaining at least one negative example further comprises: pruning the local communities of each seed according to S merging rules to obtain at least one risk community graph, wherein the pruning processing comprises merging at least two local communities which accord with any merging rule, and S is greater than or equal to 1; and obtaining the negative sample according to the risk community graph.

According to an embodiment of the present disclosure, the method further includes training the risk detection model in advance, specifically including: obtaining K positive samples from the historical transaction correlation graph, wherein each positive sample comprises a local graph in the historical transaction correlation graph that does not relate to the risk node, and K is greater than or equal to 1; training the risk detection model using the K positive samples and the N negative samples.

According to an embodiment of the present disclosure, the risk detection model includes a graph vectorization layer, a tensor network layer and a similarity calculation layer, and training the risk detection model includes: and simultaneously training the image vectorization layer, the tensor network layer and the similarity calculation layer to obtain the end-to-end risk detection model.

According to an embodiment of the present disclosure, wherein: the graph vectorization layer is used for obtaining vectorization characteristics of each training sample by utilizing the graph neural network, wherein the training sample comprises the positive sample or the negative sample; the tensor network layer is used for processing vectorized features from the graph vectorization layer, and the tensor network layer is configured to learn the relation between graphs in a training process; the similarity calculation layer is used for calculating the similarity between each pair of training samples according to the output result of the tensor network layer.

According to an embodiment of the present disclosure, the method further comprises: and if the transaction to be detected is a risk transaction, updating the transaction association graph to the negative sample pool to serve as a new negative sample.

According to an embodiment of the present disclosure, the performing risk detection according to the similarity result calculated by the risk detection model includes: when the similarity result is in a first threshold value interval, determining that the transaction to be detected is risk-free; and/or when the similarity result is within a second threshold interval, determining that the transaction to be detected is at risk; and/or when the similarity result is in a third threshold interval, manually judging the transaction to be detected, or carrying out risk detection based on a preset detection strategy, wherein the preset detection strategy comprises at least one detection rule predefined according to the service of the transaction to be detected.

Another aspect of the disclosed embodiments provides a fraud risk detection apparatus, including: the system comprises a graph construction module, a transaction association graph generation module and a transaction association graph generation module, wherein the graph construction module is used for constructing the transaction association graph for the transaction to be detected, and the transaction association graph is constructed through at least two types of nodes in a client node, an event node and an associated entity node of the transaction; the model input module is used for inputting the transaction association diagram and N negative samples in the negative sample pool into a risk detection model, wherein the negative samples comprise a risk transaction association diagram constructed based on historical transaction data, and the risk detection model comprises an end-to-end model pre-trained based on a graph neural network; the similarity calculation module is used for calculating the similarity between the transaction association diagram and each negative sample in the N negative samples by using the risk detection model, wherein N is greater than or equal to 1; and the risk detection module is used for carrying out risk detection according to the similarity result calculated by the risk detection model.

The apparatus comprises means for performing each of the steps of the method as described in any one of the above.

Another aspect of the disclosed embodiments provides an electronic device, including: one or more processors; a storage device to store one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method as described above.

Yet another aspect of the embodiments of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to perform the method as described above.

Yet another aspect of the disclosed embodiments provides a computer program product comprising a computer program that when executed by a processor implements the method as described above.

One or more of the above embodiments have the following advantageous effects: an end-to-end risk detection model based on a graph neural network is provided from the viewpoint of improving identification accuracy and efficiency. The client node, the event node and the associated entity node are utilized to construct a transaction associated graph comprising a transaction object and a transaction behavior from two angles, and a risk detection model is utilized to calculate similarity results between graph data of a transaction to be detected and each negative sample, so that the accuracy and the real-time performance of financial fraud risk detection can be improved. Even if a novel fraud means appears, some nodes involved may be unchanged and can be found in time by calculating the similarity of the graph.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically shows an application scenario of a fraud risk detection method according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow chart of a fraud risk detection method according to an embodiment of the present disclosure;

FIG. 3 schematically shows a flow chart for building a transaction correlation diagram according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of determining neighbor nodes based on a relationship chain according to an embodiment of the disclosure;

figure 5 schematically illustrates a transaction correlation diagram in the form of a bipartite graph according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a flow chart of pre-obtaining a negative example according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow chart of pre-obtaining a negative example according to another embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow chart of obtaining a negative example in advance, according to another embodiment of the present disclosure;

FIG. 9 schematically illustrates a flow diagram of pre-training a risk detection model according to an embodiment of the present disclosure;

FIG. 10 schematically illustrates a model structure diagram of a risk detection model according to an embodiment of the disclosure;

FIG. 11 schematically shows an architecture diagram of a fraud risk detection system according to an embodiment of the present disclosure;

fig. 12 schematically shows a block diagram of the structure of a fraud risk detection apparatus according to an embodiment of the present disclosure; and

fig. 13 schematically shows a block diagram of an electronic device adapted to implement a fraud risk detection method according to an embodiment of the disclosure.

Detailed Description

In order to facilitate understanding of technical solutions of the embodiments of the present disclosure, some technical terms related to the present disclosure are first introduced.

"transaction dependency graph" refers to graph structure data of clients, events, related entities and relationships among them involved in a transaction, including nodes and edges.

"customer node" means each customer information involved in a transaction as a node in the graph structure.

"event node" refers to a key business event in a transaction, such as an order, deposit, withdrawal, transfer, etc., as a node in the graph structure.

The "associated entity node" refers to an entity that one transaction may be the same as other transactions, such as a mailing address, a mobile phone number, a payment means, and the like, as a node in a graph structure.

The 'graph neural network' refers to an algorithm which uses the neural network to learn graph structure data, extracts and explores features and modes in the graph structure data, and meets the requirements of graph learning tasks such as clustering, classification, prediction, segmentation and generation.

The 'bipartite graph' is also called bipartite graph, is a special model in graph theory, and means that a vertex set can be divided into two mutually disjoint subsets, and two vertices attached to each edge in the graph belong to the two mutually disjoint subsets, and the vertices in the two subsets are not adjacent.

The local community is used for acquiring local information of a network area where the risk node is located through a local community discovery algorithm.

The "risk community graph" refers to graph structure data obtained by merging a plurality of local communities having overlapping nodes or edges.

"GAT" (Graph Attention Networks) handles Graph structure data. The neighbor nodes are subjected to aggregation operation through an Attention Mechanism (Attention Mechanism), so that the self-adaptive distribution of different neighbor weights is realized, and the expression capability of the graph neural network is greatly improved.

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the transaction information of the client obtain the authorization of the client, meet the regulations of related laws and regulations, take necessary security measures and do not violate the good custom of the public order.

The risks faced by different scenarios of the current C-side (typically the client used by the consumer or personal end-customer) are: in the account security scene, the machine registration and batch login conditions are provided. In a marketing scene, a cattle party seizes a scarce commodity (such as a commemorative coin) and changes hands to be presented. In a credit scenario, some fraudulent customers may maliciously apply for certain credit products and maliciously overdue. In a trading scenario, cash-out trading and fake trading are common. The risk of stealing the number (the stealing of the number refers to the behavior that a fraudulent person obtains the normal login information of a client in an illegal way and carries out transaction by using a payment card bound by an account number) exists in the payment link, wherein the fraud percentage of a group partner case is up to 50-80%. In contrast to the C-site, the B-site (Business, a system or platform typically used by an enterprise or merchant) also has a fraud risk of fraudulent transactions, cash-out transactions, gambling, money laundering, etc.

Among these fraud risks, taking the example of the risk of stealing a number, the current difficulty in detecting the risk of stealing a number can be summarized as follows: firstly, data leakage in the internet era is serious, related information such as bank card numbers and passwords is easy to lose, and the probability and risk of stealing numbers are obviously increased. Secondly, the number stealing means is more and more concealed, and the ways of stealing the warehouse, hitting the warehouse and fishing, etc. are continuously updated and iterated, so that the identification and prevention are difficult. Third, the large size of the bank customer population results in increased costs for manually detecting fraud risks. In addition, the data classes are characterized by imbalance, often the proportion of positive samples is far greater than that of negative samples. Fourth, the development of internet finance puts high demands on timeliness, and banks make little trade-off on stability and security when pursuing speed and customer experience, so that the risk fraud detection technology cannot keep up with business innovation. Other fraud risks also have their own difficulties, resulting in failure to effectively perform fraud detection.

For example, in the black-and-white list scheme, the found cheating client, equipment and mobile phone number are added into the black list, and when the cheating client, the equipment and the mobile phone number appear again, the cheating client, the equipment and the mobile phone number are directly intercepted. This approach is very simple, has the disadvantage of slow updating and often creates a black list after a certain loss has been incurred, so it is very costly. For example, in a rule policy scheme, certain features of a rogue client are extracted to generate rules for identifying a particular type of rogue client. Such unknown types of fraudulent customers are difficult to predict based on the characteristics of existing fraudulent customers. For another example, the current artificial intelligence scheme often focuses on the risk of a single transaction, neglects the correlation between transactions, for example, in a transaction, the same credit card, IP address, UA and other information are used, and such models cannot well learn such correlation information. Such group fraud is more common in AOT (Account Takeover) scenarios, and the existing artificial intelligence technology is difficult to solve the type of problem.

Existing detection techniques mainly mine similar patterns for isolated sample samples without considering the correlation between the samples, which is the key to identify personal or group fraud. Furthermore, it is difficult to quickly identify unknown novel risk patterns based on supervised learning models of existing samples.

According to the embodiment of the disclosure, the graph model is constructed from two aspects of transaction objects and transaction behaviors to express the association relationship between samples, the risk transaction between the objects is obtained through the relationship chain of the transaction (such as transfer), and the association between the transactions is obtained through the information of the mailing address, the payment means, the mailbox and the like of the transaction (such as E-commerce platform shopping), so that the accuracy of fraud risk detection is improved. The similarity search technology based on the graph neural network can improve the real-time performance of fraud risk detection.

Fig. 1 schematically shows an application scenario of a fraud risk detection method according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The client may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by clients using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the client request, and feed back a processing result (e.g., a web page, information, or data obtained or generated according to the client request) to the terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

In some embodiments, the

terminal devices

101, 102, 103 may be held by trading customers, the server 105 may be hardware used by a financial institution to process financial transactions, and an associated processing system is deployed in the server 105. The server 105 may perform fraud risk detection for each transaction to be detected. The fraud risk detection method according to the embodiment of the disclosure will be described in detail below with reference to fig. 2 to 11 based on the scenario described in fig. 1.

Fig. 2 schematically shows a flow chart of a fraud risk detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the fraud risk detection method of this embodiment includes operations S210 to S240.

In operation S210, a transaction association graph is constructed for the transaction to be detected, wherein the transaction association graph is constructed by at least two types of nodes among the client node, the event node and the associated entity node of the transaction.

The following exemplarily explains a process of constructing the transaction association graph.

Firstly, collecting customer object information and transaction information for a transaction to be detected, wherein the customer object information is used for learning the relevance between customers by a system, and the transaction information is used for learning the relevance between transactions. For example, the customer object information includes a customer account, a mobile phone number, a customer portrait, an accounting mailbox, and the like. The customer transaction information includes a transaction ID, transaction time, total amount of transaction, transaction type, mailing address, payment means, payment instrument, IP address, device fingerprint, and the like.

Second, the data is flushed. The collected data needs to be cleaned by the graph neural network model of the AI platform, and the cleaning method can roughly comprise at least one of the following methods:

data filling: for the situation that data is possibly missing in the process of obtaining, filling processing needs to be carried out, and for integer types, averaging, median and the like need to be carried out for filling.

Data deduplication: the uniqueness and correctness of the data are ensured as much as possible, the redundant data is subjected to deduplication, and incomplete data can be directly deleted.

And (3) data conversion: the normally acquired information data cannot be directly used and needs to be converted. For example, the client portrait can describe the login location and login times, the mobile phone number can be divided into a normal number and an abnormal number, the mailing address can be divided into a common address and an abnormal address, an IP address, equipment fingerprints and other information, and whether the call equipment is a simulator, a malicious IP and the like can be judged.

And then, characteristic engineering. E.g. constructing features based on direct information, one-hot encoding and statistical information. In some embodiments, the features involved are as shown in table 1.

TABLE 1 feature mapping Table

The first type of construction mode is directly constructed according to the client object and the transaction information, and is specifically shown in the characteristics of serial numbers 1-4 in the table 1.

trans _ dest: the mailing address for the transaction is recorded using a string type.

trans _ phone: and recording the mobile phone number of the transaction by adopting a character string type.

trans _ time: the time period of the transaction is recorded using a string type.

trans total money records the total amount of the transaction in floating point type.

The second construction mode is constructed according to a one-hot coding mode, and is specifically shown as the characteristics of serial numbers 5-12 in the table 1.

label: the result label indicates whether risk exists, and takes the values of {0,1,2},0 indicates normal (decision is positive pass), 1 indicates possible abnormality (decision is strong verification needed), and 2 indicates abnormality (decision is rejection of the transaction). The one-hot codes corresponding to the values {0,1,2} are 00, 01 and 10. The result tags may be assigned after risk detection.

is _ secure _ account: the characteristic represents whether the customer account is abnormal or not, the value is {0,1},1 represents normal, and 0 represents abnormal.

is _ secure _ locations: the characteristic represents whether the client login location is a common location, and the value is {0,1}, wherein 1 represents that the current login address is a common address, and 0 represents that the login address is abnormal.

is _ secure _ phone: the characteristic shows whether the mobile phone number bound by the client is abnormal or not, the value is {0,1},1 shows that the mobile phone number bound by the client is in a normal state at present, and 0 shows that the mobile phone number is abnormal.

is _ simulator: the characteristic represents whether the transaction equipment is a simulator or not, the value is {0,1},1 represents that the transaction equipment is the simulator, and 0 represents normal equipment.

is _ secure _ ip: the characteristics indicate whether the transaction IP address is a malicious IP or not, the value is {0,1},1 indicates that the transaction IP address is normal, and 0 indicates the malicious IP.

is _ common _ locations: the characteristics indicate whether the mailing address is a common place or not, the value is {0,1},1 indicates that the mailing address filled in the transaction is common, and 0 indicates an abnormal address.

is _ common _ pay: the characteristic represents whether the payment means and the transaction tool are common payment means or not, and the value is {0,1,2,3},0 represents the common payment means of the common payment means, 1 represents the payment means which is not used by the common payment means, 2 represents the common payment means of the non-common payment means, and 3 represents the payment means which is not used by the non-common payment means.

transaction _ type: the characteristics represent the type of transaction, and take values of {0,1,2,3}, wherein 0 represents an online consumption transaction, 1 represents an online transfer transaction, 2 represents an offline withdrawal transaction, and 3 represents an offline transfer transaction.

The third type of construction method constructs the features through statistical information, specifically see the features of table 1, serial numbers 13-18.

day _ trans _ count: and the method is used for counting the total transaction times of the account on the same day and taking the total transaction times as an integer.

week _ trans _ count: and the method is used for counting the total transaction times of the account in the last week and taking the value of an integer.

month _ trans _ count: and the method is used for counting the total transaction times of the account in the last month, and taking the value as an integer.

day _ region _ count: and the method is used for counting the number of the areas with active transaction on the account on the same day and taking the number as an integer.

week _ region _ count: and counting the number of areas with active transactions in the last week of the account, and taking the number as an integer.

trans _ credit: and the method is used for counting the credibility of the transaction of the account and taking the integer.

Finally, the transaction association graph takes a customer (for example, characterized by a bank card account) as a node, takes at least part of characteristics of serial numbers 1 to 18 in table 1 as node characteristics, determines a neighbor node relationship with an arbitrary relationship chain (various roles can form a relationship through actions such as transfer, transaction and the like) defined by transaction objects (customer, merchant and website), and constructs the transaction association graph, which is introduced with fig. 3 to 5.

In operation S220, the transaction association diagram and N negative examples in the negative example pool are input to a risk detection model, where the negative examples include a risk transaction association diagram constructed based on historical transaction data, and the risk detection model includes an end-to-end model pre-trained based on a graph neural network.

Illustratively, during model reasoning, a transaction correlation diagram is constructed in real time according to the current transaction as one input of the risk detection model, the other input is obtained from a negative sample pool, the similarity between the diagram constructed in the current transaction and all samples in the negative sample pool is sequentially compared, and the probability P is used for the similarity result _i Is represented by P _i ∈[0，1]I =1,2, 3.., N represents the number of negative sample cell samples.

In operation S230, a similarity between the transaction correlation graph and each of N negative examples is calculated using a risk detection model, where N is greater than or equal to 1.

In operation S240, risk detection is performed according to the similarity result calculated by the risk detection model.

According to the embodiment of the disclosure, an end-to-end risk detection model based on a graph neural network is provided from the perspective of improving identification accuracy and efficiency. The client node, the event node and the associated entity node are utilized to construct a transaction associated graph comprising a transaction object and a transaction behavior from two angles, and a risk detection model is utilized to calculate similarity results between graph data of a transaction to be detected and each negative sample, so that the accuracy and the real-time performance of financial fraud risk detection can be improved. Even if a novel fraud method appears, some involved nodes may be unchanged and can be discovered in time by calculating the similarity of the graph.

In some embodiments, if the transaction to be detected is a risk transaction, the transaction correlation map is updated to the negative example pool as a new negative example.

After the model is trained, the model is deployed online for real-time decision making of the business, and when real-time reasoning is carried out, fraud risk detection (such as number stealing risk detection) is mainly carried out, so that a negative sample pool is set, and the sample pool is continuously expanded along with the means in the negative sample pool. Therefore, the samples in the negative sample pool can be expanded in real time, a novel fraud means is expanded in time according to the updated negative samples, and the fraud means related to the updated negative samples can be further detected.

In some embodiments, the similarity result includes a maximum similarity value between the transaction correlation diagram and each negative sample or a mean value of each similarity value, and performing risk detection according to the similarity result calculated by the risk detection model includes: and when the similarity result is in the first threshold interval, determining that the transaction to be detected is risk-free. And/or determining that the transaction to be detected is at risk when the similarity result is within a second threshold interval. And/or when the similarity result is in a third threshold interval, manually judging the transaction to be detected, or carrying out risk detection based on a preset detection strategy, wherein the preset detection strategy comprises at least one detection rule predefined according to the service of the transaction to be detected.

Illustratively, three transaction decisions of active passing, strong verification, refusal and the like are given according to the reasoning result (namely the similarity result) of the model so as to realize fraud risk detection. P _i Has a value of [0,1 ]]Before, if P _i In the presence of a maximum similarity value greater than 0.9 or a mean value of [0.8]The interval is determined to be risky, and the transaction is rejected by direct feedback, wherein the interval is greater than 0.9 or [0.8,0.9 ]]The interval is a second threshold interval. If P is _i The average value is in the interval of [0.7, 0.8), and the interval of [0.7, 0.8) is the third threshold interval. If P is _i The mean value is determined to be risk-free in the interval [0, 0.7), the transaction is actively put through, and the interval [0, 0.7) is the first threshold interval.

The strong verification is that manual secondary study and judgment are conducted or a preset detection strategy is called for secondary automatic judgment. The detection rules can be pre-specified by using expert experience according to the service characteristics, for example, the scene of the cattle party seizing the scarce goods (such as commemorative coins) is characterized by repeating IP, repeated equipment and other information, and some fraudulent customers can apply for some credit products maliciously and have problems in credit records of the fraudulent customers in the scene of maliciously overdue, or the customers do not accord with the data. One or more targeted rules may be specified for each of the above-described scene characteristics.

According to the embodiment of the disclosure, the transaction of secondary judgment can be further analyzed by manually verifying or calling the preset detection strategy for multiple times, so that a novel fraud means can be found, the training model is further updated, and the accuracy and the safety are improved.

Figure 3 schematically shows a flow chart for building a transaction correlation diagram according to an embodiment of the present disclosure. Fig. 4 schematically illustrates a schematic diagram of determining neighbor nodes based on a relationship chain according to an embodiment of the disclosure. Figure 5 schematically illustrates a transaction correlation diagram in the form of a bipartite graph according to an embodiment of the disclosure.

As shown in fig. 3, the building of the transaction correlation diagram in operation S210 includes operations S310 to S320.

In operation S310, a transaction relationship graph is constructed according to a relationship chain between each of at least two types of nodes.

Illustratively, the transaction relationship graph includes a mesh graph of nodes and edges between the nodes, as in FIG. 3.

In some embodiments, characteristics of consumption type transactions and association characteristics between the guest groups can be learned according to a User-Merchant-User (UMU) relationship chain, for example, when zhang san and lie si all have consumed transactions in a tianmao supermarket, zhang san and lie si are neighbor nodes based on the UMU.

In some embodiments, the characteristics of the relationship between customers may be learned from a UU (User-User) relationship chain that may result from bank card account transfer activity.

In some embodiments, characteristics of customer transactions may be learned from UBU (User-Bank-User) relationship chains, typically characteristics of offline transaction activities such as customer to offline cash-out, transfers, and the like.

In other embodiments, not only the transaction objects but also the transactions themselves are related, most notably, some related entities are shared by some transactions, such as the mailing address of the transaction, the payment means of the transaction, and the phone number of the transaction, so that these features are independent from the customer node features as new graph nodes. The relation between the characteristics of the risk account and the transaction is learned through the information such as the mailing address, the payment means and the mobile phone number of the transaction, so that the accuracy of number stealing risk detection is improved.

In operation S320, the transaction relationship diagram is converted into a bipartite graph form, and a transaction relationship diagram is obtained.

Referring to fig. 5, the transaction relationship diagram is converted to take the form of a bipartite graph, a portion of which is dominated by events, such as customers, orders (orders), deposits and withdrawals for transactions, and the like. The other part of the diagram is the entity to which the order is associated, such as part of the characteristics in table 1, such as mailing address (address), phone number (phone), mailbox (email), and payment means (e.g., mobile banking, weChat or Payment treasures, etc.). The composition mode can reduce the redundancy of the edge relation, and simultaneously, because most event nodes correspond to orders, the traditional model characteristics corresponding to the orders can be well restored in logs by using the bipartite graph.

Fig. 6 schematically shows a flow chart of obtaining a negative example in advance according to an embodiment of the present disclosure.

As shown in fig. 6, obtaining negative samples in advance in this embodiment includes operations S610 to S630.

In operation S610, a historical transaction association graph is constructed based on the historical transaction data, wherein the historical transaction association graph is constructed by at least two types of nodes among the client nodes, the event nodes and the associated entity nodes of each transaction in history.

The graph building process of operation S210 and fig. 3 and 4 may be referred to, for example, to sequentially perform information acquisition, data cleaning, feature engineering, and historical transaction association graph building, which is not described herein again.

In operation S620, M risk nodes in the historical transaction association graph are determined, where the risk nodes include nodes related to risk transactions in at least two types of nodes, and M is greater than or equal to 1.

Illustratively, if a certain bank card account is involved in an illegal transaction, then the bank card account acts as a risk node. Similarly, the event node and the associated entity node also act as risk nodes if they are involved in illegal transactions. The illegal transaction can comprise direct association of the illegal transaction, such as a transfer party and a receiving party of the transaction, and indirect association of the illegal transaction, such as receiving a transfer from the receiving party, wherein the amount of money is from the illegal transaction, and the range related to the illegal transaction can be flexibly set.

In operation S630, at least one negative example is obtained from the historical transaction correlation graph according to the M risk nodes.

Fig. 7 schematically shows a flow chart of pre-obtaining a negative example according to another embodiment of the present disclosure.

As shown in fig. 7, obtaining the negative sample in operation S630 includes operations S710 to S720.

In operation S710, the M risk nodes are used as M seeds, and a local community of each of the seeds is calculated.

In some embodiments, an ASJ local community discovery algorithm may be used, where a seed node is used as an initial community, a neighbor node that maximizes the fitness value added is added to the community, the local community is updated, and the fitness of all neighbor nodes of the local community is recalculated. And until all the neighbor nodes traverse and the points which improve the fitness cannot be found, determining to calculate the final local community of the node.

In other embodiments, a threshold may be specified by using an SAA local community discovery algorithm, a seed node is first specified as an initial community, the bridge degrees of edges between the community and its neighbor nodes are compared, when the bridge degree is smaller than the threshold, the node is added to the community, otherwise, the node is not added, and the process is repeated until all the neighbor nodes of the community cannot be added.

In operation S720, at least one negative example is obtained according to the local community of each seed.

In some embodiments, local communities may be cut out of the historical transaction correlation graph, each as an independent negative example.

According to the embodiment of the disclosure, independent negative samples can be obtained, and the whole historical transaction association graph does not need to be input into the model, so that the calculation complexity is reduced, and the calculation efficiency is improved. In addition, the number of samples is increased (for example, a plurality of negative samples can be obtained from one full graph), and the condition that the distribution of the positive samples and the negative samples is not balanced can be improved.

In other embodiments, local communities with overlapping portions may be merged to reduce redundancy of the negative pools, as shown in FIG. 8.

Fig. 8 schematically shows a flow chart of obtaining a negative example in advance according to another embodiment of the present disclosure.

As shown in fig. 8, obtaining the negative sample in advance in operation S720 includes operations S810 to S820.

In operation S810, the local communities of each seed are pruned according to S merging rules to obtain at least one risk community map, where the pruning includes merging at least two local communities that meet any one of the merging rules, and S is greater than or equal to 1.

For example, the S-piece merge rule may include that at least two local communities have the same node, such as the same mailing address, mobile phone number, bank card number, IP address, and the like. After merging, the originally repeated nodes are pruned into one node, and local communities including the node are merged. And if a certain local community is not merged with other local communities, the local community is used as a risk community map.

In operation S820, a negative example is obtained according to the risk community map.

Illustratively, each risk community map may be directly treated as a negative example.

According to the embodiment of the disclosure, each risk node is used as a seed, local communities of the seed are calculated, and then a certain number of local communities are pruned by a merging rule (such as the same region) to form a risk community graph. And taking the risk community graph as a negative sample to obtain key information for identifying fraud risk. Because the degree of freedom of the merging rule and the pruning method is high, the condition of imbalance of the positive and negative samples is greatly reduced when the graph of the negative samples is constructed, and the number of the negative samples can be increased by setting the merging rule.

FIG. 9 schematically shows a flow diagram of pre-training a risk detection model according to an embodiment of the disclosure.

As shown in fig. 9, the pre-trained risk detection model of this embodiment includes operations S910 to S920.

In operation S910, K positive samples are obtained from the historical transaction correlation graph, where each positive sample includes a local graph in the historical transaction correlation graph that does not involve a risk node, and K is greater than or equal to 1.

Illustratively, each positive sample is a subgraph of a whole graph constructed by historical transactions, and is a local graph without any risk nodes, and the boundary of the local graph can be flexibly set. The historical transaction association map of this embodiment may be constructed from all historical transactions. Or after the risk nodes are removed, a historical transaction association graph is constructed by using normal transactions, the local community of each client node is calculated, and the local communities are combined and pruned to be used as positive samples.

In operation S920, a risk detection model is trained using K positive samples and N negative samples.

According to the embodiment of the disclosure, the positive and negative samples are both partial graphs, and the calculation amount of the model is far less than that of a full graph constructed by all data. In addition, the problem of uneven distribution in the case of the whole graph can be avoided by controlling the number distribution of the positive and negative samples.

FIG. 10 schematically shows a model structure diagram of a risk detection model according to an embodiment of the disclosure.

Referring to fig. 10, the risk detection model includes a Graph vectorization layer (Graph-Level Embedding part in fig. 10), a tensor network layer (Graph-Graph Interactions part in fig. 10), and a similarity calculation layer (full Connection layer in fig. 10), and the training of the risk detection model includes: and training a vectorization layer, a tensor network layer and a similarity calculation layer of the graph to obtain an end-to-end risk detection model.

In some embodiments, the graph vectorization layer is used to obtain vectorized features of each training sample using a graph neural network (e.g., GAT in fig. 10), where the training samples include positive or negative samples, e.g., a pair of training samples at a time. Tensor Network layers (including Neural Network) are used to process vectorized features from graph vectorization layers, and are configured to learn relationships between graphs in a training process. The similarity calculation layer is used for calculating the similarity between each pair of training samples according to the output result of the tensor network layer, for example, the last layer of the full connection layer comprises a softmax function, and a probability value can be output, and if the above-described similar probability value interval is 0-1, the similarity represents the similarity of two transactions.

Referring to fig. 10, in operation S1010, heterogeneous maps (shown filled with different patterns and in the form of bipartite maps) of a pair of training samples are converted into homogeneous maps (shown filled with the same patterns). In operation S1020, the isomorphic graph is input to the GAT model. In operation S1030, the output of the GAT model is used as an input to a graph convolution neural network (GCNS), and vectorized features (shown by rectangles) of each training sample are obtained. In operation S1040, the vectorized feature of each training sample is input into a Neural sensor Network. In operation S1050, the output of the Neural sensor Network (represented as a rectangle in the Graph-Graph Interactions section) is input to the fully-connected layer. In operation S1060, the full-link layer outputs the similarity result.

Illustratively, the GCNs module of the model may include equation (1):

wherein u _n Representing a node's representation of the feature, its adjacent nodes m ∈ N (N). N (N) represents the set of first-order neighbors of node N plus itself, d _n Is the degree of the node n plus 1,

parameter representing the l-th layer, f ₁ Indicating the ReLU activation function.

The model learning diagram and the tensor network module NTN of the relationship between the diagrams can be usedFormula (2) wherein W ₂ ^[1：k] ，V，b ₂ Representing weights in NNT, K is the number of vectors (similarity) generated by each pair of map vectors under the control of a hyper-parameter, h represents input vectorization characteristics, f ₂ Representing an activation function.

The model full-link layer predicts a numerical value as the calculation result of the graph similarity score and uses the mean square error as the loss function, which is expressed by formula (3), wherein D is the set of the trained graph pairs, and S _ij Is the label value, s () is the predicted value, and the brackets represent a pair of training samples.

An objective function Loss (theta) is used in the model training process and is used for measuring the effect of the parameter theta, and the objective function comprises two parts: training loss function

And a regularization term Ω (θ), as shown in equation (4).

Wherein the content of the first and second substances,

the method is characterized in that a training loss function is represented and used for measuring the fitting capability of a model on training data, and omega (theta) represents a regularization term, such as an L1 regularization function, an L2 regularization function or other regularization functions, so that the complexity of the model can be controlled and overfitting can be prevented.

It can be seen that the input data for this model has two graphs, and is either two training samples during training. Respectively representing the constructed transaction to be detected to be handed over to the negative sample pool when the model is usedFig. 5 shows a general structure of input data. In addition, the end-to-end model can also be designed

The loss function (such as FocalLoss function based on the formula (3)) is used for reducing the influence of the imbalance of the positive sample and the negative sample, and the input data is a graph after being thinned, and the model calculation amount is far smaller than that of a full graph constructed by all data.

Illustratively, after the model is trained, the accuracy is selected as an evaluation index of the model. The specific method is to predict the test set by using the model, and calculate the accuracy of the obtained prediction categories and the label. Through ten-fold cross validation, the accuracy and the model of each time are recorded, and the training sample and the model parameters are continuously adjusted, so that the accuracy and the generalization performance of the model on the test set are in the optimal level.

FIG. 11 schematically shows an architecture diagram of a fraud risk detection system according to an embodiment of the disclosure.

As shown in fig. 11, the fraud risk detection system of this embodiment can achieve real-time acquisition, real-time analysis, real-time construction, and decision implementation.

Referring to fig. 1 to 10, the development of internet finance puts higher demands on the real-time performance of fraud detection technology, and proposes a similarity search technology based on a graph neural network. The traditional graph search usually measures the similarity of graphs by graph edit distance or maximum common subgraphs, however, the computation complexity of the two indexes is NP-complete, and real-time decision of the internet financial era is difficult to meet.

Some embodiments of the disclosure can perform vectorization representation on the non-Euclidean space diagram, and then calculate the similarity according to the corresponding vector of the diagram, and the whole process adopts a neural network model to integrate into an end-to-end system. Compared with the method of feeding the correlation Graph Embedding into the model, the end-to-end process is more efficient and easier to integrate with other systems or AI platforms. The efficiency of fraud risk detection is effectively improved and interaction with existing systems of the bank is facilitated.

Referring to fig. 11, the ai platform includes data processing flows of big data processing, scenario processing, data analysis, score card calculation, graph neural network model training and deployment, and the like. The financial institutions have more customers and large transaction amount, so that the large data batch processing platform can be used for collecting data in real time, collecting, transmitting and storing each transaction, and cleaning, converting and associating the transactions. Index calculations and feature extraction can then be performed to obtain the results as in table 1, and the data analyzed in real time. And performing scene processing to construct a transaction association diagram in real time. And finally, real-time decision making is carried out by taking the graph neural network model obtained through training as a risk detection model.

Illustratively, the scenarization processing comprises graph structure input required by constructing a graph neural network model from data in a relational database, and a graph database of Neo4j can also be used for directly storing the data, but considering that the existing stock data of a bank needs to be added with a piece of data scenarization. The score card is used for mapping the output (which is a probability) of the model into variables to represent three results: pass, reject and strong validation.

According to the embodiment of the disclosure, the fraud risk detection system obtains the relevant characteristic and establishes the associated model of the graph to detect the fraud means on the premise of meeting the relevant laws and regulations, protecting the privacy of personal data from being leaked and protecting the safety of data information, and has the following effects and advantages:

1. effectively identifying group fraud. By establishing the incidence relation between the samples in the form of a graph by the isolated sample data, including the relation between the clients and the relation between the transactions, the identification accuracy of the financial fraud risk is effectively improved, and the identification of the group fraud risk is facilitated.

2. Real-time reasoning is achieved. The traditional graph searching method has high calculation cost and cannot realize real-time reasoning. Vectorizing the graph in the non-Euclidean space, and calculating the similarity according to the vector corresponding to the graph, so that the calculation of the NP complete problem is converted into simple calculation, and the real-time reasoning of graph similarity search is realized.

3. Discovery and learning of multiple relational patterns. The incidence relation between the objects can be expressed by a relation chain defined by the transaction objects so as to discover the risk of a customer relation level. The association and risk of the trading level are expressed by the shared relation entity of the trading itself.

4. The construction mode of the graph is simple. The adoption of the composition mode of the bipartite graph can not only reduce the redundancy of edge relation and ensure that the calculated amount is small, but also ensure that the bipartite graph can better restore the traditional model characteristics corresponding to the orders in logs because most event nodes correspond to the orders.

5. And (4) data security. Through some safety characteristic training models, sensitive information of customers does not need to be used and stored, so that the customer data does not need to be worried about leakage, and the requirements of laws and regulations are met.

6. The fraud risk prevention capability of the financial system is improved. For example, the client can be prevented from being stolen to conduct transaction behaviors such as consumption and account transfer which are not really willing by the client, and the fund security of the client is protected really.

Based on the fraud risk detection method, the disclosure also provides a fraud risk detection device. The apparatus will be described in detail below with reference to fig. 12.

Fig. 12 schematically shows a block diagram of the structure of a fraud risk detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 12, the fraud risk detection apparatus 1200 of this embodiment includes a graph construction module 1210, a model input module 1220, a similarity calculation module 1230, and a risk detection module 1240.

The graph building module 1210 may perform operation S210 to build a transaction association graph for the transaction to be detected, where the transaction association graph is built by at least two types of nodes among the client node, the event node, and the associated entity node of the transaction.

The model input module 1220 may perform operation S220, to input the transaction association graph and N negative examples in the negative example pool to the risk detection model, where the negative examples include a risk transaction association graph constructed based on historical transaction data, and the risk detection model includes an end-to-end model pre-trained based on a graph neural network.

The similarity calculation module 1230 may perform operation S230 for calculating a similarity between the transaction correlation diagram and each of the N negative examples using the risk detection model, where N is greater than or equal to 1.

The risk detection module 1240 may perform operation S240 for performing risk detection according to the similarity result calculated by the risk detection model.

It should be noted that the fraud risk detection apparatus 1200 includes modules for performing the steps of any one of the embodiments described in fig. 2 to fig. 11. The implementation, solved technical problems, implemented functions, and achieved technical effects of each module/unit/subunit and the like in the apparatus part embodiment are respectively the same as or similar to the implementation, solved technical problems, implemented functions, and achieved technical effects of each corresponding step in the method part embodiment, and are not described herein again.

According to an embodiment of the present disclosure, any plurality of the graph building module 1210, the model input module 1220, the similarity calculation module 1230, and the risk detection module 1240 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module.

According to an embodiment of the present disclosure, at least one of the graph construction module 1210, the model input module 1220, the similarity calculation module 1230 and the risk detection module 1240 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the graph construction module 1210, the model input module 1220, the similarity calculation module 1230 and the risk detection module 1240 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

As shown in fig. 13, an electronic device 1300 according to an embodiment of the present disclosure includes a processor 1301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage section 1308 into a Random Access Memory (RAM) 1303. The processor 1301 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1301 may also include onboard memory for caching purposes. Processor 1301 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 1303, various programs and data necessary for the operation of the electronic apparatus 1300 are stored. The processor 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. The processor 1301 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 1302 and/or the RAM 1303. Note that the programs may also be stored in one or more memories other than the ROM 1302 and the RAM 1303. The processor 1301 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 1300 may also include input/output (I/O) interface 1305, which is also connected to bus 1304, according to an embodiment of the present disclosure. The electronic device 1300 may also include one or more of the following components connected to the I/O interface 1305: an input portion 1306 including a keyboard, mouse, etc. Including an output portion 1307 such as a Cathode Ray Tube (CRT), liquid Crystal Display (LCD), etc., and speakers, etc. A storage portion 1308 including a hard disk and the like. And a communication section 1309 including a network interface card such as a LAN card, a modem, or the like. The communication section 1309 performs communication processing via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1308 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be embodied in the devices/apparatuses/systems described in the above embodiments. Or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include one or more memories other than the ROM 1302 and/or the RAM 1303 and/or the ROM 1302 and the RAM 1303 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 1301. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communications component 1309, and/or installed from removable media 1311. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such embodiments, the computer program may be downloaded and installed from a network via communications component 1309 and/or installed from removable media 1311. The computer program, when executed by the processor 1301, performs the functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the client computing device, partly on the client device, partly on the remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the client computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A fraud risk detection method, comprising:

constructing a transaction association graph for the transaction to be detected, wherein the transaction association graph is constructed by at least two types of nodes of a client node, an event node and an associated entity node of the transaction;

inputting the transaction association diagram and N negative samples in a negative sample pool into a risk detection model, wherein the negative samples comprise a risk transaction association diagram constructed based on historical transaction data, and the risk detection model comprises an end-to-end model pre-trained based on a graph neural network;

calculating the similarity between the transaction association graph and each negative sample in the N negative samples by using the risk detection model, wherein N is greater than or equal to 1;

and carrying out risk detection according to the similarity result calculated by the risk detection model.

2. The method of claim 1, wherein the building a transaction association graph for the transaction to be detected comprises:

constructing a transaction relation graph according to the relation chain between each node in the at least two types of nodes;

and converting the transaction relationship graph into a bipartite graph form to obtain the transaction association graph.

3. The method according to claim 1, wherein the method further comprises obtaining the negative examples in advance, and specifically comprises:

establishing a historical transaction association graph based on historical transaction data, wherein the historical transaction association graph is established through at least two types of nodes in client nodes, event nodes and associated entity nodes of various historical transactions;

determining M risk nodes in the historical transaction association graph, wherein the risk nodes comprise nodes related to risk transactions in the at least two types of nodes, and M is greater than or equal to 1;

and obtaining at least one negative sample from the historical transaction association graph according to the M risk nodes.

4. The method of claim 3, wherein the obtaining at least one of the negative examples comprises:

taking the M risk nodes as M seeds, and calculating the local community of each seed;

and obtaining at least one negative sample according to the local community of each seed.

5. The method of claim 4, wherein the risk transaction association graph comprises a risk community graph, the obtaining at least one negative example further comprising:

pruning the local communities of each seed according to S merging rules to obtain at least one risk community graph, wherein the pruning treatment comprises merging at least two local communities which accord with any one merging rule, and S is greater than or equal to 1;

and obtaining the negative sample according to the risk community graph.

6. The method according to claim 5, wherein the method further comprises pre-training the risk detection model, in particular comprising:

obtaining K positive samples from the historical transaction correlation graph, wherein each positive sample comprises a local graph in the historical transaction correlation graph that does not relate to the risk node, and K is greater than or equal to 1;

training the risk detection model using the K positive samples and the N negative samples.

7. The method of claim 6, wherein the risk detection model includes a graph vectorization layer, a tensor network layer, and a similarity computation layer, training the risk detection model includes:

and simultaneously training the graph vectorization layer, the tensor network layer and the similarity calculation layer to obtain the end-to-end risk detection model.

8. The method of claim 7, wherein:

the graph vectorization layer is used for obtaining vectorization characteristics of each training sample by utilizing the graph neural network, wherein the training sample comprises the positive sample or the negative sample;

the tensor network layer is used for processing vectorized features from the graph vectorization layer, and the tensor network layer is configured to learn the relation between graphs in a training process;

the similarity calculation layer is used for calculating the similarity between each pair of training samples according to the output result of the tensor network layer.

9. The method of claim 1, wherein the method further comprises:

and if the transaction to be detected is a risk transaction, updating the transaction association graph to the negative sample pool to serve as a new negative sample.

10. The method of claim 1, wherein the risk detection according to the similarity result calculated by the risk detection model comprises:

when the similarity result is located in a first threshold interval, determining that the transaction to be detected is risk-free; and/or

When the similarity result is located in a second threshold interval, determining that the transaction to be detected is risky; and/or

And when the similarity result is in a third threshold interval, manually judging the transaction to be detected, or carrying out risk detection based on a preset detection strategy, wherein the preset detection strategy comprises at least one detection rule predefined according to the service of the transaction to be detected.

11. A fraud risk detection apparatus comprising:

the system comprises a graph construction module, a transaction association graph generation module and a transaction association graph generation module, wherein the graph construction module is used for constructing the transaction association graph for the transaction to be detected, and the transaction association graph is constructed through at least two types of nodes in a client node, an event node and an associated entity node of the transaction;

the model input module is used for inputting the transaction association diagram and N negative samples in the negative sample pool into a risk detection model, wherein the negative samples comprise a risk transaction association diagram constructed based on historical transaction data, and the risk detection model comprises an end-to-end model pre-trained based on a graph neural network;

a similarity calculation module for calculating a similarity between the transaction correlation diagram and each negative sample of the N negative samples by using the risk detection model, wherein N is greater than or equal to 1;

and the risk detection module is used for carrying out risk detection according to the similarity result calculated by the risk detection model.

12. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-10.

13. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 10.

14. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 10.