CN107730262B

CN107730262B - Fraud identification method and device

Info

Publication number: CN107730262B
Application number: CN201710992155.1A
Authority: CN
Inventors: 朱逢豪; 肖凯; 王维强
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2017-10-23
Filing date: 2017-10-23
Publication date: 2021-09-24
Anticipated expiration: 2037-10-23
Also published as: CN107730262A

Abstract

The embodiment of the specification provides a fraud identification method and a fraud identification device, wherein the method comprises the following steps: obtaining a sample set of transaction data, each transaction data in the sample set including at least: fraud risk attributes of the fund flow direction information, transaction party association information and transaction data; constructing a fraud propagation relation graph according to the sample set; establishing each node in a fraud propagation relation graph according to the transaction party correlation information, and establishing directed edges among the nodes according to the fund flow direction information and the transaction party correlation information; assigning an initial node score to each node in the fraud propagation relation graph; according to a PageRank iterative update algorithm, carrying out iterative update on the node score of each node until a final node score is obtained when iteration converges; and if the final node score is higher than a preset threshold value, determining that the corresponding node is a cheater node, and determining that the transaction data associated with the cheater node is cheating risk data.

Description

Fraud identification method and device

Technical Field

The present disclosure relates to the field of network technologies, and in particular, to a method and an apparatus for fraud identification.

Background

The financial field has high requirements on transaction risk control, and the security of fund transaction needs to be ensured. In practice, there may be some fraud. For example, a fraudster may attempt to profit by luring many average consumers to transfer money to them, but not returning corresponding rewards to them. In order to identify the fraud, a high-risk fraud is identified to take measures to avoid the loss of funds of the consumer as much as possible, and a transaction model may be used to identify the fraud, for example, a certain payment account is identified as a fraud account, and a fund transaction performed by the fraud account is identified as a risk transaction.

However, some fraud may still not be identified by the transaction model, and this part of hidden undetected fraud may be referred to as "hidden case". The number of hidden cases is also large, which brings high risk to security control, and hidden case excavation is necessary.

Disclosure of Invention

In view of the above, the present disclosure provides a fraud identification method and apparatus to realize identification of hidden fraud cases.

Specifically, one or more embodiments of the present disclosure are implemented by the following technical solutions:

in a first aspect, a fraud identification method is provided, the method including:

obtaining a transaction data sample set, each piece of transaction data in the transaction data sample set at least comprising: fund flow direction information, transaction party associated information and fraud risk attributes of the transaction data, wherein the transaction party associated information comprises a transaction account or a transaction medium;

constructing a fraud propagation relation graph according to the transaction data sample set; establishing each node in the fraud propagation relation graph according to the transaction party correlation information, and establishing directed edges among the nodes according to the fund flow direction information and the transaction party correlation information, wherein the directed edges are used for representing fraud propagation relations among the nodes;

assigning an initial node score to each node in the fraud propagation relation graph;

according to a PageRank iterative update algorithm, iteratively updating the node scores of all nodes of the fraud propagation relation graph until a final node score is obtained when iteration converges; determining fraud propagation weights associated with each node in the fraud propagation relation graph in an iterative updating process according to fraud risk attributes of transaction data where the node is located, wherein the fraud propagation weights are higher if the level of the fraud risk attributes is higher;

and if the final node score is higher than a preset threshold value, determining that the corresponding node is a cheater node, wherein the transaction data associated with the cheater node is cheating risk data.

In a second aspect, there is provided a fraud identification apparatus, the apparatus comprising:

a data obtaining module, configured to obtain a transaction data sample set, where each piece of transaction data in the transaction data sample set at least includes: fund flow direction information, transaction party associated information and fraud risk attributes of the transaction data, wherein the transaction party associated information comprises a transaction account or a transaction medium;

the graph construction module is used for constructing a fraud propagation relation graph according to the transaction data sample set; establishing each node in the fraud propagation relation graph according to the transaction party correlation information, and establishing directed edges among the nodes according to the fund flow direction information and the transaction party correlation information, wherein the directed edges are used for representing fraud propagation relations among the nodes;

the initial score module is used for endowing each node in the fraud propagation relation graph with an initial node score;

the score iteration module is used for carrying out iteration updating on the node scores of all the nodes of the fraud propagation relation graph according to a PageRank iteration updating algorithm until the iteration converges to obtain the final node scores; determining fraud propagation weights associated with each node in the fraud propagation relation graph in an iterative updating process according to fraud risk attributes of transaction data where the node is located, wherein the fraud propagation weights are higher if the level of the fraud risk attributes is higher;

and the identification processing module is used for determining that the corresponding node is a cheater node if the final node score is higher than a preset threshold value, and the transaction data associated with the cheater node is cheating risk data.

In a third aspect, there is provided a data processing apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the following steps when executing the instructions:

According to the fraud identification method and device in one or more embodiments of the specification, the PageRank topological relation graph is constructed according to transaction data comprising fund flow direction information and transaction party association information, and by means of setting of fraud propagation weight, the scores of the fraud in the PageRank iterative updating process are rapidly accumulated, convergence is accelerated, so that the fraud is rapidly mined, the score of the fraud is higher than a preset threshold value, the hidden fraud case is identified by the method, and the fraud propagation characteristics among the nodes of the PageRank are utilized to accord with the characteristics of fraud association, so that the fraud is mined more accurately.

Drawings

In order to more clearly illustrate one or more embodiments or technical solutions in the prior art in the present specification, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive exercise.

FIG. 1 is a flow diagram of a fraud identification method provided in one or more embodiments of the present specification;

FIG. 2 is a simple PageRank topology graph provided by one or more embodiments of the present disclosure;

FIG. 3 is a data preparation and preprocessing flow provided by one or more embodiments of the present disclosure;

FIG. 4 is a flow diagram of a PageRank algorithm operation provided by one or more embodiments of the present disclosure;

FIG. 5 is a block diagram of a fraud identification apparatus provided in one or more embodiments of the present specification;

FIG. 6 is a block diagram of a fraud identification apparatus provided in one or more embodiments of the present specification.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of protection of the disclosure.

For financial fraud cases, some may be reported by the victim, for example, after the fraudster has defrauded the victim to transfer money to the victim to effect a property, the victim may report the fraudster, including the account of the fraudster that made the transfer. However, there are also many cases of fraud that are not mined because the victim is not reported or otherwise undetected. This portion of the undiscovered fraud case may be referred to as a "hidden case". The hidden case can bring great risk to financial security prevention and control, for example, the hidden cheater account can continue to cheat other victims into money, which causes fund loss to users. Therefore, locating the cheater, digging hidden cases and identifying hidden cheating cases have great significance for financial security prevention and control.

Examples of the present disclosure may provide a fraud identification method that may be applied to hidden case mining in the context of payment account transfers. And the method can utilize the existing big data, and mine hidden cheaters by the relation between the data through a big data analysis method.

Firstly, the characteristics of a fraud case in a payment account transfer scene are briefly explained:

in the context of payment account transfers, a qualitative fraudster may be considered to have a high degree of "fraud" compared to other normal (non-fraudulent) payment accounts, i.e., the higher the degree of "fraud," indicating that the payment account is a qualitative fraudster or a higher risk of being a fraudster. Furthermore, the 'fraud degree' can be propagated in a payment account transfer scene in a transfer mode.

For example, assuming that payment account a is a qualitative fraudster, as described above, a has a high fraud level, if a transfers to payment account B, then it is equivalent to a propagating its own high fraud level to B by means of transfer, and B is also likely to be a hidden true fraudster, and the fraud level of B also rises.

For another example, still assuming that the payment account a is a qualitative fraudster, and a has a high degree of fraud, considering that the media such as identity cards, bank cards, transaction devices, etc. used by a in known fraud cases also have a high degree of fraud, which is the propagation of fraud of the payment account to the used media. Moreover, the media can also transmit high fraud degree to another payment account through transfer transaction, for example, the payment account a and the payment account C use the same identity document, that is, the identity document used by the payment account a is also the document used by the other payment account C, and the identity document can transmit the fraud degree to the payment account C, and the transfer performed by the payment account C is also very likely to be a hidden true fraud case.

In view of the characteristics of fraud propagation in the payment account transfer scenario described above, the fraud identification method of the example of the present disclosure may apply the method of PageRank to hidden case mining in the payment account scenario, because the PageRank algorithm has a certain similarity to the fraud propagation in the payment account transfer scenario described above.

The PageRank algorithm is a technique calculated by a search engine based on mutual hyperlinks between web pages as one of the elements of the web page ranking. PageRank determines the rank of a page by a network of spacious hyperlinks, and a link from an A page to a B page is equivalent to the A page voting for the B page. If a page node receives a greater number of in-links and the quality of the in-link page is higher, the more important the page is.

By taking the PageRank characteristics into consideration of a payment account scene, if one payment account has more entries, namely a plurality of source accounts transfer to the payment account, and the fraud degrees of the source accounts are higher, the source accounts can transmit high fraud degrees to the payment account, so that the fraud degree of the payment account is also rapidly increased, namely, a plurality of source nodes with high fraud degrees vote for the payment account, and the voting payment account is also a fraudster, so that the payment account is also a fraudster in a high probability, and fraud mining is realized.

For example, in a PageRank graph, the "high fraud" described above may be represented by a higher score for a node. In the multiple iteration process of the PageRank algorithm, the node scores of different nodes in the graph are different, and the node with a higher score can be considered to have a higher fraud degree. The source account can transmit the high fraud degree to the payment account, and the PageRank graph can show that the source account node is connected with a one-way edge pointing to the payment account node.

Similarly to PageRank, fraud mining in a payment account scene is also realized by voting for the fraud degree of a hidden fraud person by a known fraud person in a plurality of historical big data by virtue of the fraud degree 'voting' propagated by transfer transaction, so that a payment account with high fraud degree is mined.

Fig. 1 illustrates a flow of a fraud identification method, which illustrates the main processing steps of the fraud identification method used in the present disclosure, including:

in step 100, a transaction data sample set is obtained, each transaction data in the transaction data sample set at least includes: and determining fraud risk attributes of each transaction datum.

In this step, the transaction data sample set may be a plurality of transaction data of hidden cases to be mined. The transaction data sample set may include transaction data that has been identified as fraudulent or transaction data that has not been identified.

The flow information in the transaction data may indicate the diversion of funds for a transfer. For example, payment account a may be transferred to payment account B and payment account C may be transferred to payment account D.

The transaction party associated information in the transaction data can indicate the accounts of two parties of a transfer and the transaction medium used by the accounts in transaction. For example, the payment account a, transaction equipment used by the payment account a in transfer transaction, an identity document of a user to which the payment account a belongs, and the like may be used.

The fraud risk attribute of a piece of transaction data, in an illustrative example, may be a data property that is scored by the transaction model for the transaction data, such as "black samples" scored high and "white samples" scored low through the transaction model, and the black samples are qualitative fraudsters, i.e., determined fraudsters, and the white samples may be samples deemed normal by the transaction model. Therefore, the fraud risk attribute of a piece of transaction data may be a predetermined (e.g., determined by way of model scoring) property of the piece of transaction data, and the property may be "qualitative fraud," or "considered normal transaction," etc. Other attributes may of course be included, such as a "gray sample" scored within a certain range, which may have been reported by the user but which has not had sufficient evidence to justify fraud.

As described above, in one example, "black," "gray," and "white" may be referred to as fraud risk attributes of the transaction data, and the "black" may be considered the highest level, the "white" the lowest level, and the "gray" the intermediate level. The fraud risk attribute may be used as a basis for setting fraud propagation weights associated with nodes in the graph in a subsequent step. Details will be described later.

In step 102, a fraud propagation relationship graph is constructed from the transaction data sample set.

In this step, the constructed fraud propagation relation graph may be a PageRank topological relation graph, and the establishment of the PageRank topological relation graph includes determination of nodes in the graph and determination of relations between the nodes.

Each node in the PageRank topological relation graph can be transaction party associated information in transaction data. For example, the transaction account in the transaction party association information may be used as a node in the graph, and the transaction device used by the transaction account in performing the transfer transaction may also be used as a node in the graph.

And directed edge connection can be established among the nodes in the PageRank topological relation graph. The directed edges may be used to represent fraud propagation relationships between nodes.

For example, a unidirectional directed edge between transaction accounts may be constructed according to the flow information, and the direction of the unidirectional edge may be a flow direction, for example, payment account a transfers to payment account B, and a unidirectional edge directed from a to B may be established between node a and node B. The edges between the accounts may pass a high degree of fraud for one account to the counter account through the transfer transaction.

For another example, a bidirectional edge between the transaction account and the transaction medium may be constructed according to the transaction party association information, for example, a unidirectional edge directed to the transaction device by the payment account a may be established between the payment account a and a transaction device used by the payment account a in the transfer transaction, and a unidirectional edge directed to the payment account a by the transaction device may also be established at the same time, that is, the bidirectional edge includes two unidirectional edges directed in opposite directions. The edge between the account and the medium may be the transfer of fraud through the influence between the account and the medium (account with problem medium may also be problematic and vice versa).

In step 104, each node in the fraud propagation relationship graph is assigned an initial node score.

In this step, each node in the PageRank topological relation graph may be assigned with an initial value, for example, assuming that there are 5 nodes in the PageRank topological relation graph, the initial node score of each node may be set to 1/5. The node score is continuously updated during subsequent iterations.

In step 106, according to a PageRank iterative update algorithm, iteratively updating the node scores of the nodes of the fraud propagation relation graph until a final node score is obtained when iteration converges.

The step can be a process of iteratively updating the scores of all nodes of the PageRank topological relation graph.

In the iterative updating process, fraud propagation weights associated with each node in the fraud propagation relation graph in the iterative updating process are determined according to fraud risk attributes of transaction data where the node is located, and if the level of the fraud risk attributes is higher, the fraud propagation weights are higher.

For example, if a piece of transaction data is a black sample and a piece of transaction data is a white sample, the fraud propagation weight associated with the passive party account in the black sample may be higher than the fraud propagation weight of the passive party account in the white sample, and the fraud propagation weight may be an edge weight of an edge pointing to a node or a Topic vector weight of the node. For two nodes in the graph, taking a first node and a second node as an example, and assuming that the first node and the second node are both passive party transaction account nodes in transaction data, if the fraud risk attribute of the transaction data where the first node is located is "black" and the fraud risk attribute of the transaction data where the second node is located is "white", the fraud propagation weight associated with the first node may be set to be higher than the fraud propagation weight associated with the second node. When the fraud propagation weight setting associated with the node is higher, the method is beneficial to rapidly improving the fraud degree of a real fraud person, and rapidly converging to realize the mining of hiding the fraud person.

In step 108, if the final node score is higher than a preset threshold, it is determined that the corresponding node is a fraudster node, and the transaction data associated with the fraudster node is fraud risk data.

In this step, a preset threshold value may be compared with the final node score to determine which node in the PageRank topological relationship graph is a rogue node. The active side account node or the media node may be filtered out, and the passive side account node above the threshold may be determined as a rogue node considering whether the passive side account node is above a preset threshold. And determining the fund transactions associated with the passive side account node as fraud risk data, wherein the fund transactions may relate to fraud, and the fund transactions associated with the passive side account node are hidden cases obtained by mining.

According to the fraud identification method, the PageRank topological relation graph is constructed according to transaction data comprising fund flow direction information and transaction party association information, the scores of the fraudsters in the PageRank iterative updating process are rapidly accumulated by means of setting fraud propagation weight, convergence is accelerated, the fraudsters are rapidly mined, the scores of the fraudsters are higher than a preset threshold value, the method realizes identification of hidden fraud cases, and the fraud propagation characteristics among the nodes of the PageRank are utilized to accord with the characteristics of fraud association, so that the fraudsters are more accurately mined.

The process of mining hidden cases from a sample set of transaction data, which may include four main steps, is described in further detail below: data preparation, preprocessing, PageRank running and algorithm output.

Data preparation

In this step, a sample set of transaction data over a period of time may be obtained, where the sample set may include a plurality of pieces of transaction data. Each piece of transaction data may include: fund flow information, transaction party associated information and fraud risk attributes of the transaction data, the transaction party associated information comprising a transaction account or a transaction medium.

The fund flow information may be that account a transfers to account B, indicating the diversion of the fund.

The transaction part association information may include: a transaction account and a transaction medium. The transaction account and the fund flow information can be recorded and stored by the server when the transfer transaction is executed, the transaction medium can be recorded by the server, or the transaction medium can be acquired by other methods.

For example, taking account a as an example of transferring account a to account B, account a may be referred to as an active party transaction account, i.e., an account for paying money, and account B may be referred to as a passive party transaction account, i.e., an account for collecting money. The transaction medium may include: the identity document of the user to which the transaction account belongs, the telephone number of the user to which the transaction account belongs, transaction equipment used for carrying out fund transactions, or a bank card used for carrying out fund transactions. For example, the identity document of the user belonging to the account a and the mobile phone used for the transaction may be referred to as an active side transaction medium, and the identity document of the user belonging to the account B and the bank card used for the transaction may be referred to as a passive side transaction medium. That is, the transaction account may include a passive-side transaction account and an active-side transaction account corresponding to a fund transfer transaction, the active-side transaction medium may be a certificate and a device used by a user belonging to the active-side transaction account in a transaction process, and the passive-side transaction medium may be a certificate and a device used by a user belonging to the passive-side transaction account in a transaction process.

Each transaction data in the transaction data sample set may include fraud risk attribute of each transaction data. Illustratively, the fraud risk attribute may be represented as "black", "gray", and "white", and when it is determined that the fraud propagation weight of the transaction party associated information of the transaction data is to be set high according to the fraud risk attribute, the fraud propagation weight of the passive party transaction account in the transaction data is generally selected to be set. For example, if a piece of transaction data is a black sample and the fraud risk attribute is ranked higher, the Topic vector weight of the passive-party transaction account node in the piece of transaction data may be set higher than other nodes.

In one example, the three attributes of "black", "gray", and "white" may be determined by scoring the transaction data by the transaction model, and setting the transaction data in a certain score range as a black sample, the transaction data in another score range as a gray sample, and the transaction data in another score range as a white sample.

Although the nature of a piece of transaction data, whether the transaction data is a black sample or a white sample, etc. can be determined in advance through the transaction model, some fraud cases are hidden and the transaction model cannot be identified, so that the fraud identification method of the disclosed example is needed to be used for mining hidden cases.

For example, if a part of black samples are identified by the transaction model for a transaction data sample set, and other samples are gray samples or white samples, the method of this example is to find whether there are true hidden "black samples" in the gray samples or white samples.

Therefore, the method may set the fraud propagation weight of the node in the already qualified black sample higher, for example, with a higher Topic vector weight, or set the edge weight pointing to the node higher. And then iterate continuously through the PageRank algorithm.

At the end of iteration, on one hand, the node score of the qualitative cheater node is higher than the score threshold, and just because the fraud propagation weight of the qualitative cheater node is set to be higher, the node scores of the qualitative cheater node are rapidly accumulated and finally reach a value higher than the threshold.

On the other hand, some nodes are also mined in the gray samples or the white samples, the node scores of the nodes are also higher than the score threshold value, so that the nodes are also judged as the nodes of the cheater, and the mined nodes are the hidden cheaters. The hidden cheater node score is higher just because the hidden cheater nodes and the qualitative cheater node have a propagation relation, for example, the qualitative cheater node can propagate a high score to the hidden cheater node through an outgoing chain, so that the node score of the hidden cheater node is also rapidly increased. This is equivalent to the qualitative fraudster node "voting" the hidden fraudster node, which shows that the qualitative fraudster node also considers the hidden fraudster node to be a fraudster, and this voting has a significant component, i.e. it shows that the score of the above-mentioned qualitative fraudster node propagating through the out-link is high.

Pretreatment of

The pretreatment of the step can comprise several aspects: hot spot exclusion, passive party attribute judgment, weight setting of points and edges and label propagation clustering.

First, hot spot elimination.

In the transaction data sample set, some transaction data or accounts and media in the transaction data are non-fraudster nodes which can be basically determined, or nodes (for example, isolated nodes) which cannot perform fraudster degree propagation in the PageRank topological relation graph, and the nodes need to be excluded and do not participate in subsequent graph building.

For example, some hotspot merchants in the transaction data sample set may be excluded that may also be involved in frequent transfer transactions, but are not fraudsters.

For another example, if the passive side account of a transaction data is reported, but if the reporting ratio of the account is small, for example, 1 ten thousand transactions performed by the account are normal, but only 1 transaction is reported, the transaction data may be basically confirmed to be normal, and the passive side account may not be monitored.

As another example, the transaction medium may be a transaction-using device, which may be a public computer on which 100 transactions of funds are made and reported only twice, and then the computer may be considered normal and not belonging to the transaction medium to be monitored for fraud, and may be excluded.

For example, if the transaction medium is independent and does not contact any other account, the transaction medium may not be placed as a node in the subsequent PageRank topological relation graph because the fraud degree cannot be propagated.

As above, the practical implementation is not limited to this, and the sample filtering may be performed in other manners.

And secondly, judging the attribute of the passive party.

If a passive party transaction account participates in multiple fund transactions, and the fund transactions are respectively marked by transaction models during the data preparation process and are determined as black samples, gray samples and white samples, the highest-level attribute of the passive party transaction account can be determined as the final attribute according to the ascending order of the levels of the white, gray and black attributes, that is, as described above, if the passive party transaction account conducts the fund transactions of three attributes of the black samples, the gray samples and the white samples, the final attribute of the account can be determined as black.

Third, weight setting of points and edges.

The points and edges refer to nodes in the PageRank topological relation graph and directed edges connected between the nodes when the graph is built in the subsequent steps. The weights of the points and the edges may include the edge weights of the above-mentioned directed edges, or the Topic vector weights of the nodes.

When the points and the edges are constructed, each node in the fraud propagation relation graph can be constructed according to the transaction party correlation information, for example, an active party transaction account, an active party transaction medium, a passive party transaction account and a passive party transaction medium can be respectively used as each node in the fraud propagation relation graph.

The directed edge between the nodes may be constructed according to the fund flow direction information and the transaction part association information, for example, a unidirectional edge link may be established between the active part transaction account and the passive part transaction account according to the fund flow direction information, and the direction of the unidirectional edge indicates fund diversion, such as transfer from account a to account B. And a bidirectional edge link can be established between the transaction medium of the active party and the transaction account of the active party, and a bidirectional edge link is established between the transaction account of the passive party and the transaction medium of the passive party.

Fig. 2 illustrates a simple PageRank topological relational graph, and here, only the schematic of points and edges is briefly illustrated by the relational graph, and an actual mapping process can be executed in a subsequent execution part of the PageRank algorithm.

The fig. 2 only illustrates a partial PageRank topological relationship diagram, which includes transaction accounts and transaction mediums. Accounts a through D are transaction accounts and may be referred to as account nodes in the graph, and certificate B1 and device B2 are transaction media and may be referred to as media nodes in the graph, which are media used by the belonging user of account B in conducting a transaction in which "account a transfers to account B". Also, these media may be associated with other accounts, such as certificate b1, which is also the media when account C is transacted.

Where accounts a-D are transaction accounts that participate in a fund transaction, for example, in a transaction where account a transfers to account B, account a may be referred to as an active transaction account and account B is a passive transaction account, and in a transaction where account D transfers to account a, account a may be referred to as a passive transaction account and account D is an active transaction account. The fraud risk attribute of account a may be based on the transaction "account D transfers to account a" because the "passive party transaction account" is monitored with emphasis.

One-way directed edges may be established between accounts, for example, according to a transaction of "account a transfers to account B", one directed edge directed from account a to account B may be established, and the direction of the arrow is the direction of the fund. A bidirectional edge may be established between the account and the media, for example, between account B and certificate B1, and between account B and device B2.

Directed edges between nodes may be used to represent fraud propagation relationships between nodes.

For example, account D, for example, participates in multiple transactions, transferring both money to account a and account C. Then in the iterative update process of PageRank, the node score of account D will be assigned to two directed edges (i.e., out-chains), one directed edge being "account D points to account a" (which may be referred to as edge D-a) and the other directed edge being "account D points to account C" (which may be referred to as edge D-C). For example, assuming that the node score for account D at a certain iteration is 8, if the average assignment, edge D-a, and edge "D-c" are "4", then if account D's node score is considered an indication of account D's fraud (higher score, higher fraud, more likely fraudster), then the above-described assignment of score to the out-chain is equivalent to account D propagating fraud through directed edges. For example, for account A, the score for account A is composed of "4" of the edge D-a propagated from account D. Similarly, the directed edge between the account and the medium can also propagate the fraud degree in the iterative process, and the fraud degree (node score) of one node is propagated to the other node through the directed edge.

In this example, a node is adjusted to scale the distribution of scores for different outgoing links, for example, assuming account a is a known qualitative fraudster and account C is not fixed, account D may be skewed toward account a when distributing the scores according to the fraud risk attribute, so that the score for account a is quickly accumulated, so as to determine that account a is a fraudster node in a subsequent threshold determination (or a fraudster node if the score is higher than the threshold). Also taking account D's score of 8 as an example, in this example, account D may assign a score of "6" to edge D-a and only "2" to edge "D-c" when the score is iteratively updated. This unbalanced score assignment is analogous to account D being used to vote on a fraudster "account a is more likely to be a fraudster node than account C" with each iteration being equivalent to a voting process.

Because the iteration updating process of the PageRank comprises a plurality of iterations, the scores of the nodes in each iteration are different, and correspondingly, even if the same node is used, the scores distributed to different outgoing chains of the node are different in different iterations. The example sets an "edge weight" to set an allocation ratio for the point allocation for each iteration.

Specifically, the edge weights may be set as follows: if it is determined that a node corresponding to transaction party associated information of transaction data needs to set a higher fraud propagation weight according to a fraud risk attribute of the transaction data, the edge weight of a target edge may be set to be higher than the edge weights of other edges of the same source node. The target edge is an edge pointing to a node corresponding to the transaction party associated information; and the edge weight of the target edge is used as the fraud propagation weight associated with the node corresponding to the transaction party associated information.

For example, if it is determined that the side weight of the one-way side of the passive side transaction account pointing to a piece of transaction data can be set higher according to the fraud risk attribute, for example, the side weight of the account a in fig. 2 is set to 3.8, and the one-way side of the account D pointing to the account a can be referred to as a target side. For the source node of the target edge, namely account D, the edge weight of account D pointing to the one-way edge D-a of account a is 3.8, which is obviously higher than the edge weight of account D pointing to the one-way edge D-C of account C, i.e. when account D assigns scores to multiple outgoing chains originating from account D, account D prefers to D-a, and assigns more scores to D-a to be propagated to the account a node through D-a.

In practical implementation, the following formula can be used:

the above formula (1) indicates that for a certain node in the PageRank graph, the node may have multiple incoming links, and one of the incoming links has a propagated score. The source node of the "one in-chain" is p_jWherein PR (p)_j) Is the node p_jIs given by the score of the current iteration, W (p)_j) Z is a ratio, i.e. node p_jScore of (PR) (p)_j) The proportion allocated to said "one in-chain" is a number smaller than 1. For example, for account A in FIG. 2, the score propagated on inbound chain D-a may be the score of account D node multiplied by "3.8/Z", while for account C, the score propagated on inbound chain D-C may be the score of account D node multiplied by "1.7/Z". It can be seen that the edge weights are set so that in the score iteration process, the distribution of the source node scores is not averaged, but is propagated differentially according to the edge weights.

In addition, when the fraud propagation weight associated with the passive side transaction account corresponding to the target edge is higher and a higher edge weight is set, based on the interaction relationship between the account and the medium, the edge weight of the bidirectional edge between the passive side transaction account and the passive side transaction medium may also be set higher. The edge weights of the bidirectional edges may be set equal, for example, the edge weights of the bidirectional edges between account B and certificate B1 in fig. 2 are set equal. In one example, the edge weight of the bidirectional edge may be less than or equal to the edge weight of the target edge, and the bidirectional edge and the target edge may correspond to the same node.

The above description illustrates that for nodes with higher fraud risk attributes, the edge weight of the target edge may be set higher than the edge weights of other edges of the same source node. In setting the specific amount of the side weight, various methods can be flexibly selected, and two side weight amount setting modes are listed as follows:

in one example, the correspondence between the fraud risk attribute and the edge weight may be preset, for example, the edge weight of the target edge corresponding to the passive side transaction account of the black sample is uniformly assumed to be "5", and the edge weight of the target edge corresponding to the passive side transaction account of the gray sample is uniformly assumed to be "4". The edge weight of the node corresponding to the transaction part associated information needing fraud monitoring can be set as a fraud propagation weight corresponding to the fraud risk attribute according to the corresponding relation, and the transaction part associated information can include a transaction account of a passive part or a transaction medium of the passive part. When the edge weight of the target edge is set to be higher, it can be stated that the node corresponding to the target edge has a higher level of the fraud propagation attribute than the nodes corresponding to other edges.

In another example, the edge weight corresponding to the node may be set to a more refined amount. For example, on the basis that the edge weights of the target edges corresponding to the transaction accounts of the passive parties of the black samples are uniformly set to be 5, the samples in the black sample set are different in the degree to be monitored, and the same value "5" can be corrected by combining other factors such as the number of transfers and the amount of transfers.

For example, for a passive side transaction account with a black sample, another value can be obtained by performing a formula calculation with a certain relation with the transfer amount or the number of times on the basis of the initially set value 5. In such different black samples, when the transfer times or the amount of the transaction accounts of the passive parties are different, the corresponding edge weights may also be different. If one node transfers money frequently or the transfer amount is large, the calculated edge weight can be relatively high.

For another example, the above-mentioned manner of setting the amount of further thinning the side weight is not limited to the weight difference between the respective black samples, and the fraud propagation weight of the node in the "gray" sample or the "white" sample may also be set higher. For example, if the edge weight of the target edge corresponding to the passive side transaction account node of one black sample is 5, and the edge weight corresponding to the passive side transaction account node of the other white sample is 2, but a one-way edge exists between the two samples, that is, the passive side transaction account node of the black sample (black node) points to the passive side transaction account node of the white sample (white node), and the black node frequently transfers large amounts of money to the white node for multiple times, it is possible to modify the edge weight 2, which is originally and uniformly set, to 3.8 by combining the above calculation formula that comprehensively considers the number of transfers or the amount of money.

In this step, the weights of the point and the edge are set, and in addition to the above-mentioned edge weight, the weight of the Topic vector of the corresponding node may also be set. It should be noted that, in actual implementation, the weights of the edge weight and the Topic vector weight may be improved at the same time, or only the weight of one of the edge weight and the Topic vector weight may be changed.

The setting of the weight of the Topic vector may be to solve the problem of termination points and the problem of traps in the PageRank, and when a terminal node is reached, the next node may be continuously reached through random jump. According to the fraud identification method of the example, in order to enable the node score of the real fraud to be rapidly and cumulatively improved in the iterative process, the node requiring the fraud can obtain a high random jump probability.

Based on this, each node in the fraud propagation relation graph can be used to form a Topic vector, the Topic vector represents a set of to-be-judged fraud, each node is used as a vector factor, and the Topic vector weight of each node can be used as the fraud propagation weight associated with the node corresponding to the vector factor. When the Topic vector weight of each vector factor in the Topic vector is set, different Topic vector weights can represent the probability of jumping in the PageRank iteration, and the probability of jumping between potential cheater nodes is larger than that between the cheater node and a common node. Therefore, according to the fraud risk attribute, the Topic vector weight of the node corresponding to the transaction party associated information with higher attribute level can be set to be higher than that of other vector factors.

For example, in fig. 2, assuming account a is a known qualitative fraudster node, the Topic vector weight for account a node may be set to the highest "5", while account B and account D may have a lower level of fraud risk attributes than account a, and the Topic vector weight may be set slightly lower.

As the following equation (2), the score inputted by the random jump among the scores of one node can be exemplified:

where 1-a represents the probability of a random jump input score, t_ithe/Z is a ratio, which may be an average distribution according to the conventional PageRank, for example, if there are N nodes in the PankRank relationship graph, the ratio may be 1/N, but in the method of the present example, the hop probability to the rogue node is higher without the average distribution, for example, the ratio 5/Z of account a in fig. 2 is higher than the ratio 3/Z of account B. In addition, to distinguish from Z in formula 1, it may be assumed that Z is in formula 1₁Let Z be in this formula 2₂，Z₁And Z₂Are all normalization factors. According to the fraud risk attribute, the higher the attribute level of the node, the higher the weight setting of the Topic vector, and the higher the weight of the Topic vector of the node, the larger the input score of the random jump expressed by the formula (2), so that the score accumulation in the iterative process is faster.

There are also a number of ways to set the specific amount of weight of the Topic vector. For example, according to the corresponding relation between the preset fraud risk attribute and the Topic vector weight, setting the Topic vector weight of the node corresponding to the transaction party associated information as the fraud propagation weight corresponding to the fraud risk attribute of the node; and, as the fraud risk attribute level is higher, the corresponding Topic vector weight is larger. Alternatively, the Topic vector weights may be adjusted in combination with other factors.

This step may set an edge weight or a Topic vector weight of a node according to the fraud risk attribute of each node that has been determined. After the weight setting of the points and the edges is completed, the method can be directly used in an iterative process of the PageRank in a subsequent PageRank running stage.

And fourthly, label propagation clustering.

The step can be a classification method, and nodes which are closely related to each other can be found and put into a PageRank relation graph, so that independent transaction data which are not related to each other (for example, two fund transactions are completely independent, and are not directly related or have no indirect connection) are prevented from being in the PageRank graph as much as possible.

For example, a clustering algorithm that simply explains label propagation: the label propagation algorithm is a semi-supervised classification algorithm, and the principle is that label information of labeled nodes is used for predicting label information of unlabeled nodes. In the algorithm execution process, the label of each node is propagated to the adjacent nodes according to the similarity, each node updates the label of the node according to the label of the adjacent node in each step of node propagation, the greater the similarity with the node is, the greater the influence weight value of the adjacent node on the label is, the more the labels of the similar nodes tend to be consistent, and the easier the label is to be propagated. During the label propagation process, the label of the labeled data is kept unchanged, so that the label is transmitted to the unlabeled data like a source head.

As mentioned above, before the PageRank algorithm runs, the flow shown in fig. 3 may be executed, which is only briefly described below, and the detailed processing procedure may be referred to the above data preparation and preprocessing steps:

in step 300, a transaction data sample set is obtained, including fund flow information, transaction party associated information, and fraud risk attributes of the transaction data, the transaction party associated information including a transaction account or a transaction medium.

In step 302, hot spot exclusion and passive party attribute determination is performed on the transaction data sample set.

In step 304, edge weights and Topic vector weights are set for the nodes in the PageRank graph.

In step 306, tag propagation clustering is performed to find transaction data in the same PageRank graph.

PageRank operation

In the algorithm operation stage of the step, the iterative updating process of the scores of all the nodes in the PageRank topological relation graph is mainly carried out until iteration converges, and the final node score is obtained. Wherein the edge weights and Topic vector weights determined in the preprocessing stage described above will be used in the iteration. The following is also described in five steps, see the example of fig. 4:

in step 400, a fraud propagation relationship graph is constructed from the transaction data sample set.

The transaction account or the transaction medium in the transaction party association information may be used as each node in the fraud propagation relationship graph. And, one-way edges may be established between accounts, the pointing of the one-way edges indicating the direction of funds transfer between accounts, and two-way edges may be established between accounts and media, for example, a two-way edge link is established between an active side transaction medium and an active side transaction account, and a two-way edge link is established between a passive side transaction account and a passive side transaction medium.

In step 402, each node in the fraud propagation relationship graph is assigned an initial node score.

In step 404, the node scores of the nodes of the fraud propagation relation graph are iteratively updated according to a PageRank iterative update algorithm until a final node score is obtained when iteration converges.

In the iterative updating process of this step, the fraud propagation weight associated with each node in the fraud propagation relation graph in the iterative updating process is determined according to the fraud risk attribute of the transaction data where the node is located, and if the level of the fraud risk attribute is higher, the fraud propagation weight is higher. And if the fraud risk attribute of the transaction data where the first node is located is higher than the fraud risk attribute of the transaction data where the second node is located, the fraud propagation weight associated with the first node is higher than the fraud propagation weight associated with the second node. The fraud propagation weights herein may include the edge weights or the Topic vector weights described above.

In one example, if the PageRank iterative update algorithm only considers edge weights that improve directed edges between nodes, then an iterative update of node scores can be performed according to the following formula:

in the above formula (3), PR (p)_i) Representing a node p in a PageRank topological relation graph_iIs a point of, the node p_iThe score of (1) may include two parts, one part is the score normally propagated through the outgoing chain, i.e. the part before "+" in the above formula (3), a represents the probability that the node is normally linked down, the other part is the part after "+" in the formula (3), which may be the score of random jump, and 1-a may be the probability of random jump.

Wherein, PR (p)_j) Is node p_jCurrent score of, W (p)_j) Z is a ratio, is node p_jTo "the node p_jPointing to node p_i"the score assigned on this edge. W (p)_j) Can be called edge weight, in particular node p_iCorresponding edge weights. All pointing nodes p_iAll the entries of (1) are according to W (p) when the scores are distributed_j) The ratio/Z is divided.

May be a pointing node p_iThe sum of the scores of all incoming chains.

In order to ensure that the state transition matrix is a probabilistic matrix, the normalization factor Z may satisfy:

also, Z may be updated in each iteration.

In another example, if the PageRank iterative update algorithm only considers improving the Topic vector weights of the nodes, then the iterative update of the node scores can be performed according to the following formula:

wherein the content of the first and second substances,PR(p_i) Representing a node p in a PageRank topological relation graph_iIs a point of, the node p_iThe score of (a) is also comprised of two parts, "+" is preceded by a score that normally propagates through the outgoing chain, and "+" is followed by a score that randomly jumps from. In the formula (5) of this example, t_iIs node p_iThe Topic vector weight of (1-a) × t_iZ is the node p at each iteration_iThe score obtained in the random jump section. And the normalization factor Z represents the sum of all nodes.

In the context of the transfer transaction of this example, to obtain higher node scores (PR values) for the qualitative fraudster and the potential fraudsters in the passive-party transaction account, the qualitative fraudster node and the node at risk of fraud may be set with larger Topic vector weights, while the Topic vector weights for the general nodes may be set with smaller weights.

In yet another example, the PageRank iterative update algorithm may also consider both the Topic vector weights of the improved nodes and the edge weights of the directed edges. Then, the iterative update algorithm for the node scores may be as follows:

in step 406, the final node score is compared with a preset threshold, and if the final node score is higher than the preset threshold, it is determined that the corresponding node is a fraud node.

In the step, the cheater node can be obtained through threshold judgment. In addition, after the fraudster node is determined, the transaction medium node and the active side transaction account node in the nodes higher than the preset threshold value are filtered, and only the passive side transaction account node is considered as the fraudster node.

In step 408, transaction data associated with the fraudster node is determined as fraud risk data.

In this step, the transaction associated with the passive party transaction account node determined to be the fraudster node may be determined to be a hidden case, and this part of the transaction is fraud risk data. For example, the transaction that transferred account a to account B was determined to be a fraud case, and account B may also be determined to be a fraudster account.

Algorithm output

The list of the fraudsters obtained through threshold determination can be used as a supplement of a transaction model for security prevention and control, and transaction data, namely a hidden case, associated with the nodes of the fraudsters can be supplemented as a black sample of the transaction model. These supplementary black samples are hidden fraud cases that were not originally recognized by the transaction model mined by the flow of the present example method. The transaction model may be trained and refined based on the updated set of black samples.

According to the fraud identification method, the PageRank graph model is built through fund transactions and transaction party correlation information, the edge weight and the Topic vector weight in the PageRank iterative updating algorithm are set, and the node scores of qualitative cheaters or hidden cheaters can be rapidly accumulated in the PageRank iterative updating process, so that the nodes with high scores are identified as cheater nodes, and the cheat hidden nodes are mined. The method has the advantages of more comprehensive and accurate hidden case identification, high convergence rate and capability of realizing quick and accurate hidden case identification.

To implement the above method, one or more embodiments of the present specification further provide a fraud identification apparatus, as shown in fig. 5, which may include: a data acquisition module 51, a graph construction module 52, an initialization score module 53, a score iteration module 54 and an identification processing module 55.

A data obtaining module 51, configured to obtain a transaction data sample set, where each piece of transaction data in the transaction data sample set at least includes: fund flow direction information, transaction party associated information and fraud risk attributes of the transaction data, wherein the transaction party associated information comprises a transaction account or a transaction medium;

a graph construction module 52, configured to construct a fraud propagation relationship graph according to the transaction data sample set; establishing each node in the fraud propagation relation graph according to the transaction party correlation information, and establishing directed edges among the nodes according to the fund flow direction information and the transaction party correlation information, wherein the directed edges are used for representing fraud propagation relations among the nodes;

an initialization score module 53, configured to assign an initial node score to each node in the fraud propagation relation graph;

the score iteration module 54 is configured to iteratively update the node scores of the nodes of the fraud propagation relation graph according to a PageRank iterative update algorithm until a final node score is obtained when iteration converges; determining fraud propagation weights associated with each node in the fraud propagation relation graph in an iterative updating process according to fraud risk attributes of transaction data where the node is located, wherein the fraud propagation weights are higher if the level of the fraud risk attributes is higher;

and an identification processing module 55, configured to determine that the corresponding node is a fraudster node if the final node score is higher than a preset threshold, where the transaction data associated with the fraudster node is fraud risk data.

In one example, graph building module 52 is configured to: acquiring an active party transaction account, an active party transaction medium, a passive party transaction account and a passive party transaction medium in the transaction party correlation information; respectively taking the transaction account of the active party, the transaction medium of the active party, the transaction account of the passive party and the transaction medium of the passive party as each node in the fraud propagation relation graph; according to the fund flow direction information, establishing a one-way side link between the transaction account of the active party and the transaction account of the passive party; and establishing a bidirectional side link between the active side transaction medium and the active side transaction account, and establishing a bidirectional side link between the passive side transaction account and the passive side transaction medium.

As shown in fig. 6, the apparatus may further include: at least one module of:

a first weight setting module 56 for: according to the fraud risk attribute of one piece of transaction data, setting the edge weight of a target edge associated with a node corresponding to the transaction party associated information of the transaction data, wherein the edge weight is higher than the edge weights of other edges of the same source node; the target edge is an edge which points to a node corresponding to the related information of the transaction party; and the edge weight of the target edge is used as a fraud propagation weight associated with the node corresponding to the transaction party associated information.

A second weight setting module 57, configured to: forming a Topic vector by each node in the fraud propagation relation graph, wherein the Topic vector represents a set of fraud to be judged, and each node is used as a vector factor; setting a Topic vector weight of each vector factor in the Topic vector, wherein the Topic vector weight is used as a fraud propagation weight associated with a corresponding node of the vector factor; and setting the Topic vector weight of the node corresponding to the transaction party associated information of the transaction data to be higher than the Topic vector weight of other vector factors according to the fraud risk attribute of the transaction data.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The execution sequence of each step in the flow shown in the above method embodiment is not limited to the sequence in the flow chart. Furthermore, the description of each step may be implemented in software, hardware or a combination thereof, for example, a person skilled in the art may implement it in the form of software code, and may be a computer executable instruction capable of implementing the corresponding logical function of the step. When implemented in software, the executable instructions may be stored in a memory and executed by a processor in the device.

For example, corresponding to the graph display method, one or more embodiments of the present specification also provide a data processing device, which may include a processor, a memory, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the following steps by executing the instructions:

The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

One skilled in the art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the graph display apparatus, since it is substantially similar to the embodiment of the method, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above description is only exemplary of the preferred embodiment of one or more embodiments of the present disclosure, and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A fraud identification method, the method comprising:

2. The method of claim 1, wherein constructing each node in the fraud propagation relationship graph according to transaction party associated information, and constructing directed edges between each node according to the fund flow direction information and transaction party associated information comprises:

acquiring an active party transaction account, an active party transaction medium, a passive party transaction account and a passive party transaction medium in the transaction party correlation information;

respectively taking the transaction account of the active party, the transaction medium of the active party, the transaction account of the passive party and the transaction medium of the passive party as each node in the fraud propagation relation graph;

according to the fund flow direction information, establishing a one-way side link between the transaction account of the active party and the transaction account of the passive party;

and establishing a bidirectional side link between the active side transaction medium and the active side transaction account, and establishing a bidirectional side link between the passive side transaction account and the passive side transaction medium.

3. The method of claim 1, the transaction medium comprising at least one of:

the identity document of the user to which the transaction account belongs;

the telephone number of the user to which the transaction account belongs;

a transaction device for use in conducting a funds transaction;

or a bank card used for conducting a fund transaction.

4. The method of claim 1, further comprising:

according to the fraud risk attribute of one piece of transaction data, setting the edge weight of a target edge associated with a node corresponding to the transaction party associated information of the transaction data, wherein the edge weight is higher than the edge weights of other edges of the same source node;

the target edge is an edge pointing to a node corresponding to the transaction party associated information; and the edge weight of the target edge is used as the fraud propagation weight associated with the node corresponding to the transaction party associated information.

5. The method of claim 4, the transaction party association information, comprising: the target edge corresponds to a passive party transaction account and a passive party transaction medium related to the passive party transaction account;

the target edge comprises: a unidirectional edge directed to the passive side transaction account and a bidirectional edge between the passive side transaction account and a passive side transaction medium;

the edge weight of the bidirectional edge between the passive side transaction account and the passive side transaction medium is set to be equal, and the edge weight of the bidirectional edge is smaller than or equal to the edge weight of the unidirectional edge.

6. The method of claim 4, the setting edge weights of the target edges to be higher than edge weights of other edges of the same source node, comprising:

setting the edge weight of the corresponding node of the transaction party associated information as a fraud propagation weight corresponding to the fraud risk attribute according to the corresponding relation between the preset fraud risk attribute and the edge weight; and determining that the node corresponding to the target edge has a larger edge weight than the nodes corresponding to other edges according to the fraud risk attribute.

7. The method of claim 1, further comprising:

forming a Topic vector by each node in the fraud propagation relation graph, wherein the Topic vector represents a set of fraud to be judged, and each node is used as a vector factor;

setting a Topic vector weight of each vector factor in the Topic vector, wherein the Topic vector weight is used as a fraud propagation weight associated with a corresponding node of the vector factor; and setting the Topic vector weight of the node corresponding to the transaction party associated information of the transaction data to be higher than the Topic vector weight of other vector factors according to the fraud risk attribute of the transaction data.

8. The method of claim 7, wherein setting the Topic vector weight of the corresponding node of the transaction part association information to be higher than the Topic vector weights of other vector factors comprises:

setting the Topic vector weight of the corresponding node of the transaction party associated information as a fraud propagation weight corresponding to the fraud risk attribute according to the corresponding relation between the preset fraud risk attribute and the Topic vector weight; and, as the higher the level of the fraud risk attribute of the transaction data is, the larger the weighting of the Topic vector of the node corresponding to the transaction party associated information of the transaction data is.

9. The method of claim 1, after determining that the final node score is above a preset threshold, the method further comprising: and filtering the transaction medium nodes and the active side transaction account nodes in the nodes higher than the preset threshold value.

10. An apparatus for fraud identification, the apparatus comprising:

11. The apparatus of claim 10, wherein the first and second electrodes are disposed on opposite sides of the substrate,

the graph building module is configured to: acquiring an active party transaction account, an active party transaction medium, a passive party transaction account and a passive party transaction medium in the transaction party correlation information; respectively taking the transaction account of the active party, the transaction medium of the active party, the transaction account of the passive party and the transaction medium of the passive party as each node in the fraud propagation relation graph; according to the fund flow direction information, establishing a one-way side link between the transaction account of the active party and the transaction account of the passive party; and establishing a bidirectional side link between the active side transaction medium and the active side transaction account, and establishing a bidirectional side link between the passive side transaction account and the passive side transaction medium.

12. The apparatus of claim 10, the apparatus further comprising:

a first weight setting module to: according to the fraud risk attribute of one piece of transaction data, setting the edge weight of a target edge associated with a node corresponding to the transaction party associated information of the transaction data, wherein the edge weight is higher than the edge weights of other edges of the same source node; the target edge is an edge which points to a node corresponding to the related information of the transaction party; and the edge weight of the target edge is used as the fraud propagation weight associated with the node corresponding to the transaction party associated information.

13. The apparatus of claim 10, the apparatus further comprising:

a second weight setting module to: forming a Topic vector by each node in the fraud propagation relation graph, wherein the Topic vector represents a set of fraud to be judged, and each node is used as a vector factor; setting a Topic vector weight of each vector factor in the Topic vector, wherein the Topic vector weight is used as a fraud propagation weight associated with a corresponding node of the vector factor; and setting the Topic vector weight of the node corresponding to the transaction party associated information of the transaction data to be higher than the Topic vector weight of other vector factors according to the fraud risk attribute of the transaction data.

14. A data processing apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor when executing the instructions implementing the steps of: