CN112785423A

CN112785423A - Method, device, equipment and storage medium for mining fraud risk node

Info

Publication number: CN112785423A
Application number: CN202110169128.0A
Authority: CN
Inventors: 何浪; 郭亚萌; 张炫
Original assignee: Rocking Digital Chongqing Technology Co ltd
Current assignee: Rocking Digital Chongqing Technology Co ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-05-11

Abstract

The invention provides a method, a device, equipment and a storage medium for mining fraud risk nodes. All nodes can be comprehensively analyzed, classification and screening are carried out on the nodes by adopting the calculation of the centrality of the intermediary, so that various methods are adopted for specific analysis only on the nodes with possible fraud risks, and the efficiency of mining the fraud risk nodes is greatly improved while the comprehensiveness is ensured.

Description

Method, device, equipment and storage medium for mining fraud risk node

Technical Field

The present invention relates to the field of information technologies, and in particular, to a method, an apparatus, a device, and a storage medium for mining fraud risk nodes.

Background

Fraud risk is one of the main risks existing in the consumption of financial business, and refers to the risk that a credit client completely has no willingness to repay, and bad accounts of many financial institutions are caused by fraud. Therefore, an effective risk control system needs to be constructed, and the pre-loan anti-fraud link is a first pneumatic control link for helping financial institutions filter poor-quality users and screening out fraudulent people. The traditional fraud risk resisting method is mainly based on personal behavior data acquired through SDK, consumption and communication data crawled by technologies such as crawler and user data from third-party institutions, and a scoring model is constructed by using machine learning methods such as logistic regression, random forest, neural network and the like, so that fraud risk of new customers is quantitatively evaluated. In the big data era of today, through the integration of mass data, the established anti-fraud system opens a skylight for the wind control of financial institutions.

However, fraud risk is increasingly characterized by industrial chain, and forms a professional technology development industry around fraud implementation, such as virtual simulation data, bill counterfeiting and the like; identity credit packaging and false identity providing industries, such as identity cards for grandparents and mothers in rural areas; business vulnerability discovery and fraud methods teach industries and those working in these industries have high intelligence that they can variously probe financial institutions' anti-fraud rules and then use the techniques, materials, etc. provided by the former two industries to implement fraud. Such group fraud is sometimes not easily detected, and artificial intelligence algorithms are used to identify fish that may produce missed nets, whereas manual screening is slow.

Disclosure of Invention

In view of the foregoing, it is necessary to provide a fraud risk node mining method, apparatus, device and storage medium.

A method of mining fraud risk nodes, the method comprising: receiving enterprise data, and establishing an enterprise map according to the enterprise data, wherein the enterprise map is composed of a plurality of enterprise groups, each enterprise group is a sub-map in the map, the map comprises nodes and relations and weights among the nodes, and the nodes are enterprises and investors; marking nodes belonging to the blacklist according to a predicted blacklist; calculating the intermediary centrality of the nodes which do not belong to the blacklist according to the enterprise graph, and taking the nodes with the intermediary centrality larger than a preset value as complex nodes; performing characteristic analysis on the complex node based on the enterprise data to judge whether fraud risk exists; and when the fraud risk is judged to exist, the complex node is included in the blacklist to obtain an updated target blacklist.

In one embodiment, after the step of calculating the mediation center of the nodes not belonging to the blacklist according to the enterprise graph and using the nodes with the mediation center greater than a preset value as the complex nodes, the method further includes: taking the node with the centrality of the intermediary being smaller than the preset value as a common node; and outputting the common nodes and archiving the common nodes as a normal list.

In one embodiment, after the step of performing feature analysis on the complex node based on the enterprise data and determining whether a fraud risk exists, the method further includes: and when the judgment result shows that the fraud risk does not exist, outputting the complex node and archiving the complex node as a normal list.

In one embodiment, the including, when it is determined that there is a fraud risk, the complex node in the blacklist, and after the obtaining the updated target blacklist, further includes: taking a group where complex nodes belonging to a target blacklist are located as a target group, performing characteristic analysis on the target group, and judging whether fraud risk exists or not; when the target population has a fraud risk, the target population is a fraud risk population and is stored as a population blacklist.

In one embodiment, the including, when it is determined that there is a fraud risk, the complex node in the blacklist, and after the obtaining the updated target blacklist, further includes: updating the enterprise map according to the new enterprise data to obtain an updated target enterprise map; and calculating the incidence relation between the new node and the fraud risk group in the target enterprise map based on the target enterprise map to obtain the fraud tendency value of the new node.

The utility model provides a cheat risk node's mining device, includes map building module, node marking module, betweenness calculation module, characteristic analysis module and blacklist update module, wherein: the graph establishing module is used for receiving enterprise data and establishing an enterprise graph according to the enterprise data, wherein the enterprise graph is composed of a plurality of sub-graph spectrums, each sub-graph spectrum is an ethnic group, each sub-graph spectrum comprises nodes and relations and weights among the nodes, and the nodes are enterprises and investors specifically; the node marking module is used for marking the nodes belonging to the blacklist according to a predicted blacklist; the intermediary calculation module is used for calculating the intermediary centrality of the nodes which do not belong to the blacklist according to the enterprise graph, and taking the nodes with the intermediary centrality larger than a preset value as complex nodes; the characteristic analysis module is used for carrying out characteristic analysis on the complex node based on the enterprise data and judging whether fraud risks exist or not; and the blacklist updating module is used for bringing the complex node into the blacklist to obtain an updated target blacklist when the fraud risk is judged to exist.

In one embodiment, the apparatus further includes a population risk module, specifically including a population analysis unit and a list establishment unit, wherein: the family group analysis unit is used for taking the family group where the complex node belonging to the target blacklist is located as a target family group, performing characteristic analysis on the target family group and judging whether fraud risk exists or not; the list establishing unit is further configured to, when the target population has a fraud risk, store the target population as a fraud risk population as a population blacklist.

In one embodiment, the device further includes a fraud tendency calculation module, specifically including an image updating unit and a fraud tendency calculation unit, wherein: the image updating unit is used for updating the enterprise map according to new enterprise data to obtain an updated target enterprise map; and the fraud tendency calculation unit is used for calculating the incidence relation between the new node in the target enterprise map and the fraud risk group based on the target enterprise map to obtain the fraud tendency value of the new node.

An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for mining fraud risk nodes as described in the various embodiments above when executing the program.

A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of a method of mining a fraud risk node as described in the various embodiments above.

According to the method, the device, the equipment and the storage medium for mining the fraud risk node, the enterprise graph is established according to enterprise data, the nodes in the enterprise graph are screened according to the known blacklist, the nodes which do not belong to the blacklist are subjected to intermediary centrality calculation, the nodes with the intermediary centrality larger than the preset value are taken as complex nodes, fraud risk analysis is carried out, if the complex nodes exist, the complex nodes are added to the blacklist to obtain the target blacklist, the mining of the nodes with fraud risk is achieved, meanwhile, fraud risk assessment is carried out according to the clan where the complex nodes exist, and the mining of the clan with fraud risk is achieved. All nodes can be comprehensively analyzed, classification and screening are carried out on the nodes by adopting the calculation of the centrality of the intermediary, so that various methods are adopted for specific analysis only on the nodes with possible fraud risks, and the efficiency of mining the fraud risk nodes is greatly improved while the comprehensiveness is ensured.

Drawings

Fig. 1 is an application scenario diagram of a fraud risk node mining method in one embodiment;

FIG. 2 is a flow chart illustrating a method for mining fraud risk nodes in one embodiment;

FIG. 3 is a block diagram of a fraud risk node mining apparatus according to an embodiment;

fig. 4 is an internal structural diagram of the device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings by way of specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The cheating risk node mining method provided by the application can be applied to the application environment shown in fig. 1. The terminal 1 is a place implemented by the method, and the terminal 1 can perform network interaction with the server 2, wherein the terminal 1 receives enterprise data from the server 2, and a finally obtained target risk degree of a target enterprise can be transmitted to the server 2 through a network. The terminal 1 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 2 may be implemented by an independent server or a server cluster formed by a plurality of servers. Firstly, various data of a user, such as information of a mobile phone device number, a data network, a geographical position and the like, can be collected in real time through the SDK. In addition, a communication record and the like of the user can be acquired by capturing data of a user communication carrier. These information data are then transmitted to the terminal 1 via the server 2. With the data, a huge complex network can be established through the association relationship, namely the enterprise map is established, for example, mobile phone numbers of mobile phone devices with some account numbers are the same, or the same WIFI is used, or the same base station is used, or communication records exist among the mobile phone numbers, and the like. After the network is established, the network can be deeply analyzed and mined.

The risk source: the risk source may be an individual, a business, or a ethnic group. When the financial transaction fund chain is broken or serious business disputes are met, the enterprise becomes a risk source, and the risk degree of the enterprise is recorded at the node in the financial transaction map.

Risk vector: the risk is that conduction in a vacuum is not possible and a certain medium is necessary. In this model, risk carriers expand into areas, industries, upstream and downstream relationships, etc., as inter-population risk transmission is involved.

In one embodiment, as shown in fig. 2, there is provided a method for mining fraud risk nodes, including the following steps:

s110, receiving enterprise data, and establishing an enterprise map according to the enterprise data, wherein the enterprise map is composed of a plurality of enterprise groups, each enterprise group is a sub-map in the map, the map comprises nodes and relations and weights among the nodes, and the nodes are enterprises and investors.

In particular, the methods presented by the present disclosure are based on financial transaction maps with relationship weights. Specifically, financial transaction data is input into a graph, a transaction party, including an enterprise and an investor, forms nodes in the graph, and the nodes record attributes of the enterprise, such as registered capital, a region where the enterprise is located, a main operation business, ethnic group affiliation and the like; the investment and invested relationships between the business and investor form the edges in the graph, while recording attributes of the financial transaction relationship, such as investment amount and equity. The nodes are mutually hooked through edges to form the whole graph. Second, the present model relates to the concept of population in the atlas. Specifically, the financial transaction group refers to a sub-graph formed by some nodes in the financial transaction graph, wherein the degree of correlation between the nodes in the sub-graph is higher than the degree of correlation between the nodes outside the sub-graph. Businesses within the same business segment typically have certain associations and similarities, such as equity investments with each other or belonging to a region or industry. Mature algorithms such as LPA and Louvain can realize the group division in the graph. Third, enterprise risk refers to the effect of future uncertainty on the enterprise's achievement of its business objectives, and is generally divided into systematic and non-systematic risks. According to the enterprise map established in the scheme, the target risk of the target enterprise consists of three parts, including initial risk, associated node risk and associated ethnic group risk.

And S120, marking the nodes belonging to the blacklist according to the predicted blacklist.

Specifically, nodes corresponding to the blacklist in the enterprise graph are labeled according to a known blacklist, where the blacklist is a node which is already clear and has a fraud risk, and includes enterprises and individuals, such as a deceased person and the like.

S130, calculating the intermediary centrality of the nodes which do not belong to the blacklist according to the enterprise graph, and taking the nodes with the intermediary centrality larger than a preset value as complex nodes.

Specifically, the nodes belonging to the known blacklist in step S120 are removed, and the mediation centrality is calculated for the remaining nodes. The betweenness, i.e. the betweenness centrality, is used to measure the number of times a vertex appears in the shortest path between any other two vertex pairs, that is, the betweenness centrality of a vertex is greater if the number of times a vertex appears in the shortest path between any two vertices is greater. The first step of the algorithm is to find the shortest path between any two vertices, and then count the number of occurrences of each intermediate vertex in all shortest paths. The indexes used for measuring the centrality of the nodes in the complex network are numerous, the centrality methods are improved for improving the accuracy of the classical centrality indexes and reducing the calculation complexity of the classical centrality indexes, and the standardized approximate centrality calculation formula is as follows:

where N represents the number of vertices in the graph,

representing the shortest distance from vertex i to vertex n. It is common practice to normalize this score so that it represents the average length of the shortest path, rather than the sum of them. If the shortest distances from a node to other nodes in the graph are all small, we consider the node to have high centrality in the intermediary.

The characteristics of the nodes in the financial transaction graph are further judged through the judgment of the intermediary centrality, the complex nodes with the intermediary centrality larger than a preset value are generally super nodes in the network, the nodes are huge in degree and are 'hubs' in the network, and the complex nodes are members of cheating groups with high probability.

In one embodiment, after step S130, the method further includes: taking the node with the centrality of the intermediary being smaller than the preset value as a common node; and outputting and archiving the common nodes into a normal list. Specifically, the common nodes with the intermediary centrality smaller than the preset value are generally isolated nodes in the network, the nodes are not connected with other nodes and are often relatively safe nodes, and such nodes can be directly output and archived into a normal list.

S140, based on the enterprise data, performing characteristic analysis on the complex nodes, and judging whether fraud risks exist.

Specifically, when the complex node obtained through calculation needs to be subjected to internal analysis and characteristic statistics according to various enterprise data, the analysis includes, but is not limited to, various common analysis methods based on reputation, business situation, and the like, and whether fraud risk exists is judged.

In one embodiment, after step S140, the method further includes: and when the judgment result shows that the fraud risk does not exist, outputting the complex node and archiving the complex node as a normal list. Specifically, when the result of the judgment is that the fraud risk does not exist, the result is transmitted to a normal list for storage.

S150, when the fraud risk is judged to exist, the complex node is included in the blacklist, and the updated target blacklist is obtained.

Specifically, when the judgment result shows that the node has the fraud risk, the complex node needs to be included in a blacklist, and the obtained updated blacklist is the target blacklist, so that mining of the node with the fraud risk is realized.

In one embodiment, after step S150, the method further includes: taking the group where the complex nodes belonging to the target blacklist are located as a target group, performing characteristic analysis on the target group, and judging whether fraud risk exists or not; when the target population is at risk of fraud, the target population is a fraud risk population and is stored as a population blacklist. Specifically, the aggregation coefficient of the nodes, the nodes with a larger aggregation coefficient are often located inside a small community, the probability that the community is a fraudulent group is high, and the community is reflected in an enterprise graph and is in the form of a group, so that the group where the complex nodes belonging to the target blacklist are located is also likely to have a fraud risk, and the group also needs to be analyzed.

In one embodiment, after step S150, the method further includes: updating the enterprise map according to the new enterprise data to obtain an updated target enterprise map; and calculating the incidence relation between the new node and the fraud risk group in the target enterprise map based on the target enterprise map to obtain the fraud tendency value of the new node. Specifically, as the financial institution has new users continuously entering every day, the fraud tendency of the new users can be calculated in real time by calculating the association relationship between the new users and the blacklist group.

In the embodiment, an enterprise graph is established according to enterprise data, nodes in the enterprise graph are screened according to a known blacklist, nodes not belonging to the blacklist are subjected to intermediary centrality calculation, nodes with intermediary centrality larger than a preset value are used as complex nodes, fraud risk analysis is carried out, and if the nodes exist, the complex nodes are added to the blacklist to obtain a target blacklist, so that the nodes with fraud risk are mined, meanwhile, fraud risk evaluation is carried out according to a group where the complex nodes exist, and the group with fraud risk is mined. All nodes can be comprehensively analyzed, classification and screening are carried out on the nodes by adopting the calculation of the centrality of the intermediary, so that various methods are adopted for specific analysis only on the nodes with possible fraud risks, and the efficiency of mining the fraud risk nodes is greatly improved while the comprehensiveness is ensured.

In one embodiment, as shown in fig. 3, there is provided a fraud risk node mining apparatus 200, which includes a graph establishing module 210, a node labeling module 220, an intervention calculating module 230, a feature analyzing module 240, and a blacklist updating module 250, wherein:

the graph establishing module 210 is configured to receive enterprise data, and establish an enterprise graph according to the enterprise data, where the enterprise graph is formed by a plurality of sub-graph spectrums, each sub-graph spectrum is an ethnic group, and each sub-graph spectrum includes nodes and relationships and weights between the nodes, where the nodes are specifically enterprises and investors;

the node labeling module 220 is configured to label nodes belonging to the blacklist according to a predicted blacklist;

the intermediary calculating module 230 is configured to calculate an intermediary centrality of a node not belonging to the blacklist according to the enterprise graph, and use the node with the intermediary centrality greater than a preset value as a complex node;

the feature analysis module 240 is configured to perform feature analysis on the complex node based on the enterprise data, and determine whether a fraud risk exists;

the blacklist updating module 250 is configured to, when it is determined that the fraud risk exists, include the complex node in a blacklist to obtain an updated target blacklist.

In one embodiment, the apparatus further comprises a population risk module, specifically comprising a population analysis unit and a list establishment unit, wherein: the family group analysis unit is used for taking the family group where the complex node belonging to the target blacklist is located as a target family group, performing characteristic analysis on the target family group and judging whether fraud risk exists or not; the list establishing unit is further configured to, when the target population has a fraud risk, store the target population as a fraud risk population as a population blacklist.

In one embodiment, the apparatus further includes a fraud tendency calculation module, specifically including an image updating unit and a fraud tendency calculation unit, wherein: the picture updating unit is used for updating the enterprise map according to the new enterprise data to obtain an updated target enterprise map; and the fraud tendency calculation unit is used for calculating the incidence relation between the new node and the fraud risk group in the target enterprise map based on the target enterprise map to obtain the fraud tendency value of the new node.

In one embodiment, a device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 4. The device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the device is configured to provide computing and control capabilities. The memory of the device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the device is used for storing configuration templates and also can be used for storing target webpage data. The network interface of the device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of mining fraud risk nodes.

Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation on the devices to which the present application applies, and that a particular device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is further provided a storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method according to the preceding embodiment, the computer may be part of the mining device of a fraud risk node mentioned above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented in program code executable by a computing device, such that they may be stored on a computer storage medium (ROM/RAM, magnetic disks, optical disks) and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A method for mining fraud risk nodes is characterized by comprising the following steps:

receiving enterprise data, and establishing an enterprise map according to the enterprise data, wherein the enterprise map is composed of a plurality of enterprise groups, each enterprise group is a sub-map in the map, the map comprises nodes and relations and weights among the nodes, and the nodes are enterprises and investors;

marking nodes belonging to the blacklist according to a predicted blacklist;

calculating the intermediary centrality of the nodes which do not belong to the blacklist according to the enterprise graph, and taking the nodes with the intermediary centrality larger than a preset value as complex nodes;

performing characteristic analysis on the complex node based on the enterprise data to judge whether fraud risk exists;

and when the fraud risk is judged to exist, the complex node is included in the blacklist to obtain an updated target blacklist.

2. The method of claim 1, wherein calculating the mediation rank of the nodes not belonging to the blacklist according to the enterprise graph, and after the step of regarding the nodes with the mediation rank larger than a preset value as the complex nodes, further comprises:

taking the node with the centrality of the intermediary being smaller than the preset value as a common node;

and outputting the common nodes and archiving the common nodes as a normal list.

3. The method of claim 1, wherein said step of performing a feature analysis on said complex node based on said enterprise data to determine if a risk of fraud exists further comprises:

and when the judgment result shows that the fraud risk does not exist, outputting the complex node and archiving the complex node as a normal list.

4. The method of claim 1, wherein when it is determined that a fraud risk exists, including the complex node in the blacklist, and after the step of obtaining an updated target blacklist, further comprising:

taking a group where complex nodes belonging to a target blacklist are located as a target group, performing characteristic analysis on the target group, and judging whether fraud risk exists or not;

when the target population has a fraud risk, the target population is a fraud risk population and is stored as a population blacklist.

5. The method of claim 4, wherein when it is determined that a fraud risk exists, including the complex node in the blacklist, and after the step of obtaining an updated target blacklist, further comprising:

updating the enterprise map according to the new enterprise data to obtain an updated target enterprise map;

and calculating the incidence relation between the new node and the fraud risk group in the target enterprise map based on the target enterprise map to obtain the fraud tendency value of the new node.

6. The utility model provides a cheat risk node's mining device which characterized in that, includes map building module, node marking module, betweenness calculation module, characteristic analysis module and blacklist update module, wherein:

the graph establishing module is used for receiving enterprise data and establishing an enterprise graph according to the enterprise data, wherein the enterprise graph is composed of a plurality of sub-graph spectrums, each sub-graph spectrum is an ethnic group, each sub-graph spectrum comprises nodes and relations and weights among the nodes, and the nodes are enterprises and investors specifically;

the node marking module is used for marking the nodes belonging to the blacklist according to a predicted blacklist;

the intermediary calculation module is used for calculating the intermediary centrality of the nodes which do not belong to the blacklist according to the enterprise graph, and taking the nodes with the intermediary centrality larger than a preset value as complex nodes;

the characteristic analysis module is used for carrying out characteristic analysis on the complex node based on the enterprise data and judging whether fraud risks exist or not;

and the blacklist updating module is used for bringing the complex node into the blacklist to obtain an updated target blacklist when the fraud risk is judged to exist.

7. The apparatus of claim 6, further comprising a population risk module, particularly comprising a population analysis unit and a list creation unit, wherein:

the family group analysis unit is used for taking the family group where the complex node belonging to the target blacklist is located as a target family group, performing characteristic analysis on the target family group and judging whether fraud risk exists or not;

the list establishing unit is further configured to, when the target population has a fraud risk, store the target population as a fraud risk population as a population blacklist.

8. The apparatus according to claim 7, further comprising a fraud tendency calculation module, specifically comprising a picture updating unit and a fraud tendency calculation unit, wherein:

the image updating unit is used for updating the enterprise map according to new enterprise data to obtain an updated target enterprise map;

and the fraud tendency calculation unit is used for calculating the incidence relation between the new node in the target enterprise map and the fraud risk group based on the target enterprise map to obtain the fraud tendency value of the new node.

9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 5 are implemented when the computer program is executed by the processor.

10. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method of any one of claims 1 to 5.