CN116957329A

CN116957329A - Risk identification method, risk identification device, risk identification equipment and readable storage medium

Info

Publication number: CN116957329A
Application number: CN202310868025.2A
Authority: CN
Inventors: 许小龙; 刘腾飞; 张天翼; 王维强
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-10-27

Abstract

The specification discloses a risk identification method, device, equipment and readable storage medium, wherein through the similarity between the characteristics of a to-be-identified business object and the characteristics of candidate business objects, each associated business object of the to-be-identified business object is determined from the candidate business objects, further, the to-be-identified business object and each associated object are taken as nodes respectively, the business relationship between the to-be-identified business object and each associated object is taken as an edge, a target topological graph is constructed, the characteristics of the target topological graph and each node are taken as input, and the predicted risk type of the to-be-identified business object is obtained through a risk identification model. Therefore, the correlation objects which are more important to the predicted risk types of the identified business objects are screened out through the feature similarity to construct the target topological graph, so that the accuracy of risk identification and the safety of privacy data are improved, the input dimension of a risk identification model is reduced, and the consumption of computing resources is reduced.

Description

Risk identification method, risk identification device, risk identification equipment and readable storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a risk identification method, apparatus, device, and readable storage medium.

Background

With the improvement of people's attention to privacy data and the rapid development of internet technology, online business has been rapidly developed and widely focused. The user can conduct online service with other service objects through the service platform. However, if the user performs an online service with a service object that has a service risk, the user's online service may be affected. Therefore, the service platform can perform risk identification on the service object in advance, so that the service object with service risk can be identified in time, and risk early warning is performed for the user.

Based on this, the present specification provides a risk identification method.

Disclosure of Invention

The present specification provides a risk identification method, apparatus, device, and readable storage medium, to partially solve the above-mentioned problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a risk identification method, which comprises the following steps:

responding to the risk identification request, and acquiring a business object to be identified and each candidate business object;

respectively determining the characteristics of the business objects to be identified and the characteristics of each candidate business object;

screening each associated object of the business object to be identified from each candidate business object according to the similarity between the characteristics of the business object to be identified and the characteristics of each candidate business object;

Acquiring a business relation between the business object to be identified and each associated object, and constructing a target topological graph by taking the business object to be identified and each associated object as nodes and taking the business relation as an edge;

and taking the target topological graph and the characteristics of each node contained in the target topological graph as input, and inputting the input into a pre-trained risk identification model to obtain the predicted risk type of the business object to be identified, which is output by the risk identification model.

The present specification provides a risk identification apparatus comprising:

the acquisition module is used for responding to the risk identification request and acquiring the business objects to be identified and each candidate business object;

the characteristic determining module is used for respectively determining the characteristics of the business objects to be identified and the characteristics of each candidate business object;

the associated object screening module is used for screening each associated object of the business object to be identified from each candidate business object according to the similarity between the characteristics of the business object to be identified and the characteristics of each candidate business object;

the target topological graph construction module is used for acquiring the business relation between the business object to be identified and each associated object, constructing a target topological graph by taking the business object to be identified and each associated object as nodes and taking the business relation as an edge;

The risk identification module is used for taking the target topological graph and the characteristics of each node contained in the target topological graph as input, inputting the characteristics into a pre-trained risk identification model, and obtaining the predicted risk type of the business object to be identified, which is output by the risk identification model.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the risk identification method described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the risk identification method described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the risk identification method provided by the specification, through the similarity between the characteristics of the service object to be identified and the characteristics of each candidate service object, each associated service object of the service object to be identified is determined from each candidate service object, further, the service object to be identified and each associated object are taken as nodes, the service relationship between the service object to be identified and each associated object is taken as an edge, a target topological graph is constructed, the characteristics of the target topological graph and each node are taken as input, and the predicted risk type of the service object to be identified is obtained through a risk identification model. Therefore, the correlation objects which are more important to the predicted risk types of the identified business objects are screened out through the feature similarity to construct the target topological graph, so that the accuracy of risk identification is improved, the input dimension of a risk identification model is reduced, and the consumption of computing resources is reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at

In the figure:

FIG. 1 is a schematic flow chart of a risk identification method in the present specification;

FIG. 2 is a schematic diagram of a topology diagram of the present disclosure;

FIG. 3 is a schematic diagram of a topology diagram of the present disclosure;

FIG. 4 is a schematic flow chart of a risk identification method in the present specification;

FIG. 5 is a schematic flow chart of a risk identification method in the present specification;

FIG. 6 is a schematic diagram of a risk identification apparatus provided in the present specification;

fig. 7 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In addition, all the actions for acquiring signals, information or data in the present specification are performed under the condition of conforming to the corresponding data protection rule policy of the place and obtaining the authorization given by the corresponding device owner.

Currently, with the rapid development of internet technology, online-based transaction business is vigorously developed. The user can conduct online service with other service objects through the service platform. However, if the user performs an online service with a service object that is at risk for the service, a risk, such as fraud, may be encountered. Therefore, the service platform can perform risk identification on the service object in advance, so that the service object with service risk can be identified in time, and risk early warning is performed for the user. In order to timely identify the business object with business risk, whether the business object has risk or what type of risk exists can be identified by training a risk identification model.

In the risk identification scene, a scheme of training a risk identification model by taking a service object as a training sample and the risk type of the service object as a label is adopted in the past, but the scheme cannot utilize the service relationship among different service objects, and the problems of poor generalization and low accuracy may exist. In order to utilize the business relations among different business objects, a graph neural network (Graph Neural Networks, GNN) can be adopted, the business objects are taken as nodes, the business relations among the business objects are taken as edges, a topological graph among a plurality of business objects is constructed, then feature aggregation of the nodes is carried out based on the topological graph, features of the nodes are obtained through iteration, and therefore risk types of the nodes are identified based on the features of the nodes.

In practical applications, in a topology map using a service object as a node, the following situations may exist: a node is connected with a large number of neighboring nodes through edges, and the node can be called a hot spot. When a hotspot exists in the topological graph, as the number of neighbor nodes of the hotspot is large, a large number of features of the neighbor nodes need to be aggregated when the features of the hotspot are determined, a large amount of computing resources are consumed, the computing resources and the training time are consumed in the training process of executing the risk identification model by taking the topological graph as a training sample, and the identification delay is reduced when the risk type of a business object to be identified is identified. Therefore, before inputting the topological graph into the risk identification model, sampling the neighbor nodes of the hot spot, and if 1000 neighbor nodes exist in the hot spot, only 50 neighbor nodes can be selected to participate in the characteristic determination process of the hot spot. The sampling manner of the neighbor nodes of the hot spot may be a random sampling manner, a random walk manner or the like, however, no matter what sampling manner is adopted, reducing the number of the neighbor nodes of the hot spot can cause a certain information loss to the characteristics of the hot spot, and the degree of the information loss cannot be determined. Thus, how to reduce the information loss caused by reducing the number of neighboring nodes is a problem to be solved.

Based on the above, the description provides a risk identification method, which screens out related objects which are more important to the predicted risk type of the identified business object to be identified through the feature similarity to construct a target topological graph, so that the accuracy of risk identification and the safety of privacy data are improved, the input dimension of a risk identification model is reduced, and the consumption of computing resources is reduced.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a risk identification method provided in the present specification.

S100: and responding to the risk identification request, and acquiring the business object to be identified and each candidate business object.

The embodiment of the present disclosure provides a risk identification method, and the execution process of the method may be executed by an electronic device such as a server deployed with a risk identification model. In addition, in the process of performing risk recognition on the business object to be recognized through the risk recognition model, the related risk recognition model, the electronic device performing the pre-training process of the risk recognition model and the electronic device performing the risk recognition method may be the same or different, and the description is not limited to this.

In the computer technology, the topological graph can be used for representing the relation among the nodes, the information of the neighbor nodes of each node is aggregated through feature aggregation, and the features of the nodes are determined by combining the information of the nodes, so that the topological graph can be applied to various business scenes such as user wind control and the like. For example, a relationship of funds flow between accounts, a relationship of sharing of devices between accounts, a relationship between holders of accounts, and the like can be represented by a topology map.

In this specification, a service object may be a user or an account of the user, a topology map may be constructed based on the service object, different nodes in the topology map correspond to different service objects, and edges between the nodes are used to represent service relationships between the service objects, such as a fund transaction relationship between the accounts, and a common relationship between operation devices between the accounts. In the embodiment of the present disclosure, a specific technical scheme is described by taking a service object as an account of a user and a service relationship as a fund business relationship as an example.

When the server receives the risk identification request, a business object to be identified can be determined based on the risk identification request. The business object to be identified may be a user or account for which risk identification is required. The server can also acquire attribute information of the business object to be identified according to the business object to be identified, such as fund flowing of the account, account opening rows of the account, holder information of the account and the like.

In addition, in this step, each candidate business object is also acquired. Each candidate service object may be a service object having a direct service relationship with the service object to be identified, or may be a service object having an indirect service relationship with the service object to be identified, or may be a service object having a direct service relationship with the service object to be identified and an indirect service relationship with the service object to be identified. I.e. there is a direct and/or indirect business relationship between the business object to be identified and the candidate business objects.

For example, in the topology diagram shown in fig. 2, for the node 1, the node 2 is directly connected to the node 1 through an edge, which indicates that there is a direct business relationship between the business object corresponding to the node 1 and the business object corresponding to the node 2; the node 1 and the node 5 are not directly connected through edges, but pass through the node 3 in the middle, and an indirect business relation exists between a business object corresponding to the node 1 and a business object corresponding to the node 5 in the specification; two paths exist between the node 1 and the node 6, one path is directly connected through an edge, and the other path is connected through the node 3, so that the fact that a direct business relationship exists between a business object corresponding to the node 1 and a business object corresponding to the node 6, and an indirect business relationship exists is explained. And there are no edges or other nodes at all between node 1 and node 9, indicating that there is no traffic relationship between node 1 and node 9.

The method for obtaining each candidate service object may be to traverse the service relationship of the service object to be identified, and take the service object having a direct and/or indirect service relationship with the service object to be identified as each candidate service object; or constructing an original topological graph based on a plurality of service objects containing the service objects to be identified and service relationships among the service objects, taking the service objects as nodes and the service relationships as edges, determining each candidate service object based on the connection relationships among the nodes in the original topological graph, and taking the service object corresponding to each node with the hop number not more than a preset hop number threshold value in the original topological graph as each candidate service object if the node corresponding to the service object to be identified is taken as a central node.

S102: and respectively determining the characteristics of the business objects to be identified and the characteristics of each candidate business object.

Further, the feature of the service object to be identified is determined, and the feature can be determined according to the information of the service object to be identified. The characteristics of each candidate business object may be determined based on the information of each candidate business object, respectively.

The information of the service object may be attribute information of the service object itself, or may be data of a service related to the service object. For example, in the case where the service object is an account, the information of the service object may be attribute information of the service object itself such as an account opening line of the account, a type of electronic device using the account, identity information of an account holder, a registered place of the account, and the like. The account may be data specific to a business related to a business object such as a public business, a private financial business, a social security account, and a payroll account.

The method for determining the features of the to-be-identified business object according to the information of the to-be-identified business object may be any existing method such as manually constructing structural features, extracting features through a pre-trained neural network, and the like, which is not limited in this specification. The characteristics of each candidate business object are determined to be the same. Also, in the embodiment of the present specification, the characteristics of the business object to be identified and the characteristics of each candidate business object are fixed before being input to the risk identification model.

S104: and screening each associated object of the business object to be identified from each candidate business object according to the similarity between the characteristics of the business object to be identified and the characteristics of each candidate business object.

Specifically, for each candidate service object in each candidate service object, the similarity between the feature of the service object to be identified and the feature of the candidate service object is determined, where the manner of determining the similarity may be any existing manner of determining the feature similarity, such as JS divergence, KL divergence, L1, L2, etc., which is not limited in this specification.

Further, each associated object of the business object to be identified is screened out based on the similarity between the characteristics of the business object to be identified and the characteristics of each candidate business object.

Since the similarity characterizes the distance between the feature of the service object to be identified and the feature of the candidate service object in the feature space, the higher the similarity, the more identical or similar parts exist between the information of the service object to be identified and the information of the candidate service object. Conversely, the lower the similarity, the less identical or similar parts exist between the information of the business object to be identified and the information of the candidate business object. In general, a plurality of candidate services with higher similarity are used as associated objects corresponding to the service objects to be identified, but according to different risk identification scenes, a plurality of candidate service objects with lower similarity can be selected as associated objects corresponding to the service objects to be identified.

Optionally, according to different risk identification scenarios, the associated object may be a candidate service object similar in feature to the service object to be identified, or may be a candidate service object dissimilar in feature to the service object to be identified, and the specific scheme may be as follows:

first, the method includes the steps of. And determining a risk identification task according to the risk identification request.

In an actual risk identification scenario, there may be different risk identification tasks, such as determining whether the business object to be identified is a partner fraudulent account, or determining whether the business object to be identified is a separate money laundering account, etc. Different risk identification tasks may correspond to different associated object screening schemes.

For example, for a risk identification task that determines whether a business object to be identified is a group fraud account, frequent funds may be conducted between a plurality of fraud accounts belonging to the same group, that is, if the business object to be identified is one of the fraud accounts in the group, then the candidate business objects having a business relationship with the business object to be identified may also be other fraud accounts of the same group, and in general, there is a commonality between the fraud accounts of the same group, that is, the similarity is high in the feature dimension, so when it is determined whether the business object to be identified is a group fraud account for the risk identification task, a plurality of candidate business objects having a high similarity may be regarded as the associated objects of the business object to be identified.

As another example, in a risk task of identifying a money laundering account, since money laundering actions may involve multiple types of money exchange, in general, there is a normal personal account where money exchange exists with the money laundering account, and information of the personal account and information of the money laundering account may be greatly different, so that the similarity between the money laundering account and the personal account where money exchange exists with the money laundering account is low in a feature dimension, therefore, in the risk task, a plurality of candidate business objects with low similarity may be regarded as associated objects of the dykes ratio business object.

And secondly, when the risk identification task is a first task, taking the candidate business object with the similarity higher than a preset first similarity threshold as the associated object of the business object to be identified.

In particular, the first task may be to determine whether a risk transaction involving the business object to be identified is a partner blackout transaction, such as partner fraud, partner gambling, etc. In the first task, the business object to be identified needs to aggregate the features of other business objects with higher feature similarity with the business object to enrich the features of the business object, so that the purpose of improving the risk identification accuracy is achieved. Therefore, when the risk identification task is determined to be the first task based on the risk identification, the candidate business object with the similarity higher than the preset first similarity threshold value is used as the associated object of the business object to be identified, so that after the business object to be identified and the associated object thereof are input into the risk identification model in the subsequent steps, the risk identification model can aggregate part of information of the associated object with the higher feature similarity of the business object to be identified into the business object to be identified.

And then, when the risk identification task is a second task, using the candidate business objects with the similarity lower than a preset second similarity threshold as the associated objects of the business objects to be identified.

In particular, the second task may be to determine whether the risk transaction involved with the business object to be identified is an independent blackout transaction, such as an independent money laundering, an independent lending, etc. In the second task, the service object to be identified needs to aggregate the features of other service objects with smaller feature similarity, and larger information gain is brought to the aggregation of the features of the service object to be identified through the features of the other service objects with smaller feature similarity, so that the features of the service object to be identified are enriched, and the purpose of improving the risk identification accuracy is achieved. Therefore, when the risk identification task is determined to be the second task based on the risk identification, the candidate business object with the similarity lower than the preset second similarity threshold value is used as the associated object of the business object to be identified, so that after the business object to be identified and the associated object thereof are input into the risk identification model in the subsequent steps, the risk identification model can aggregate part of information of the associated object with the smaller feature similarity of the business object to be identified into the business object to be identified.

S106: and acquiring the business relation between the business object to be identified and each associated object, and constructing a target topological graph by taking the business object to be identified and each associated object as nodes and taking the business relation as an edge.

The business relationship between the business objects to be identified and the associated objects may be a funding relationship between accounts, a device sharing relationship between accounts, a relationship between holders of accounts, and the like. Different nodes in the target topological graph respectively correspond to different service objects (the service object to be identified and each associated object), and edges between the nodes correspond to service relations between the service objects.

In addition, the target topological graph not only comprises the nodes corresponding to the business objects to be identified and the edges between the nodes corresponding to the associated objects, but also comprises the edges between the nodes corresponding to the associated objects. As shown in fig. 3, when the node 1 corresponds to the service object to be identified and the nodes 2 to 6 respectively correspond to the associated objects, the method not only includes edges between the node 1 and the nodes 2 to 6, but also includes edges between the nodes corresponding to the associated objects, such as edges between the node 3 and the node 6, and edges between the node 3 and the node 5.

S108: and taking the target topological graph and the characteristics of each node contained in the target topological graph as input, and inputting the input into a pre-trained risk identification model to obtain the predicted risk type of the business object to be identified, which is output by the risk identification model.

The risk recognition model is used for recognizing transaction risks of the to-be-recognized service object, and is equivalent to determining whether the service executed by the to-be-recognized service object comprises a blackout transaction service. Since the input of the risk recognition model includes the target topological graph and the characteristics of each node in the target topological graph, the risk recognition model can be a model constructed based on graph neural networks (Graph Neural Networks, GNN), the risk recognition model at least includes a neural network layer for feature aggregation, and a neural network layer for performing risk recognition (classification) based on the characteristics of the service object to be recognized after feature aggregation, and the actual GNN technology of the risk recognition model can include any existing graph neural network technology, such as graph convolution neural networks (Graph Convolution Neural Networks, GCNNs) and graph annotation force networks (Graph attention networks, GAT), which are not limited in the specification.

The risk recognition model may be a two-class model or a multi-class model, which depends on what type of neural network layer the risk recognition model includes for risk recognition (classification), such as softmax for the two-class model and sigmoid for the multi-class model, which is not limited in this specification.

Thus, the predicted risk type of the business object to be identified output by the risk identification model may be that the business object to be identified is not at risk, or that the business object to be identified is at risk, or may be that there is no risk or a specific predicted risk type (fraud type, money laundering type, gambling type, etc.).

Optionally, before the target topology map and the features of the nodes included in the target topology map are input into the risk recognition model in S108, if the target topology map is still larger, the scale of the target topology map may be further reduced, specifically: acquiring the intensity index of the business relationship between the business object to be identified and each associated object, determining the edge weight among nodes contained in the target topological graph according to the intensity index of the business relationship between the business object to be identified and each associated object, and deleting the edge with the edge weight smaller than a preset weight threshold value to obtain the updated target topological graph.

The strength index of the business relationship between the business object to be identified and each associated object can be determined according to the business information of the business executed between the business object to be identified and the associated object, such as the amount of money passing through, the number of money passing through in a certain period of time, and the like, the business is taken as the amount of money passing through, the business information is taken as the amount of money passing through, and the greater the amount of money passing through between the business object to be identified and the associated object is, the greater the strength index is, and the smaller the amount of money passing through is, the weaker the strength index is. The statistical method for determining the service information according to the service to be identified between the service object and the associated object according to the service information of the service to be identified between the service object and the associated object is not limited in this specification, and may be average, weighted average, variance, etc.

Further, according to the intensity index of the business relationship between the business object to be identified and each associated object, determining the edge weight between the nodes included in the target topological graph, the intensity index of the business relationship between the business object to be identified and the associated object may be directly used as the edge weight corresponding to the edge between the node corresponding to the business object to be identified and the node corresponding to the associated object, or the intensity index may be normalized, and the normalized intensity index may be used as the edge weight.

The edge weights can be used for representing the tightness degree of the business relations among the nodes, the closer the business relations among the nodes are, the larger the edge weights corresponding to the edges among the nodes are, otherwise, the looser the business relations among the nodes are, and the smaller the edge weights corresponding to the edges among the nodes are. This is because in an actual wind-controlled scenario, there may be a small number of (e.g., one) funds transactions between the normal personal account and the fraudulent account, which means that the normal personal account is not spoofed by the fraudulent account, in that the fraudulent account is more than the other personal accounts connected by edges, so that the features of the normal personal account can be ignored during feature aggregation, and the computing resources consumed by feature aggregation are reduced to a certain extent, while the features of too many important neighbor nodes are not ignored.

In S108, the feature of each node included in the updated target topology graph is used as the input of the risk identification model, so as to obtain the predicted risk type of the service object to be identified through the risk identification model.

It should be noted that, after the scale of the target topological graph is further reduced by the edge weights among the nodes included in the target topological graph, in addition to deleting the edge with the edge weight smaller than the preset weight threshold, the isolated node after deleting the edge can be deleted, i.e. the isolated node is a node in which no other node is connected with the isolated node through the edge in the target topological graph. But in general, the nodes deleted in the scale of the reduced target topology map do not contain the nodes corresponding to the business objects to be identified.

In the risk identification method provided by the description, through the similarity between the characteristics of the service object to be identified and the characteristics of each candidate service object, each associated service object of the service object to be identified is determined from each candidate service object, then the service object to be identified and each associated object are taken as nodes, the service relationship between the service object to be identified and each associated object is taken as an edge, a target topological graph is constructed, the characteristics of the target topological graph and each node are taken as input, and the predicted risk type of the service object to be identified is obtained through a risk identification model.

Therefore, the correlation objects which are more important to the predicted risk types of the identified business objects are screened out through the feature similarity to construct the target topological graph, so that the accuracy of risk identification and the safety of privacy data are improved, the input dimension of a risk identification model is reduced, and the consumption of computing resources is reduced.

In an alternative embodiment of the present disclosure, as in step S100 of fig. 1, there may be a direct business relationship or an indirect business relationship between the candidate business object and the business object to be identified, so that the candidate business object of the business object to be identified may be determined directly from the original topology map including the business object to be identified according to the business relationship between the nodes represented by the original topology map, as shown in fig. 4, the specific scheme is as follows:

s200: and obtaining an original topological graph, wherein the original topological graph comprises nodes corresponding to the business objects to be identified.

The original topological graph at least comprises nodes corresponding to the business objects to be identified, and besides the nodes corresponding to the business objects to be identified, the nodes corresponding to other business objects are included, wherein the other business objects possibly have business objects with direct business relationships with the business objects to be identified, business objects with indirect business relationships with the business objects to be identified, and business objects without business relationships with the business objects to be identified. Therefore, although the original topological graph comprises the nodes corresponding to the business objects to be identified, feature aggregation can be realized by directly inputting the original topological graph into the risk identification model so as to aggregate the features of the neighbor nodes of the nodes corresponding to the business objects to be identified to the features of the nodes corresponding to the business objects to be identified, excessive computing resources are consumed, the identification delay is large, and the method cannot be directly applied to an actual risk identification scene.

For this purpose, in the scheme shown in fig. 4, a sub-topology map including the nodes corresponding to the business objects to be identified is segmented from the original topology map, so as to reduce the scale of the input of the risk identification model.

S202: and dividing the original topological graph to obtain a sub-topological graph containing the nodes corresponding to the business objects to be identified.

Further, an original adjacency matrix corresponding to the original topological graph can be determined based on the edges contained in the original topological graph, and the original adjacency matrix can represent the relation of the nodes in the original topological graph through edge connection. And searching nodes corresponding to the business object to be identified from the original adjacent matrix, thereby determining candidate business objects with direct and/or indirect business relations with the business object to be identified.

Besides the original adjacency matrix, the candidate business object can be obtained by dividing the original topological graph to obtain the sub-topological graph. Specifically, a node corresponding to a service object to be identified in the original topological graph is taken as a central node, and service objects corresponding to nodes with hop numbers not higher than the preset hop number threshold in the central node in the original topological graph are taken as candidate service objects according to the preset hop number threshold. The preset hop count threshold value can be a fixed value preset according to priori experience, and can be flexibly adjusted according to different risk identification tasks, which is not limited in the specification.

S204: and determining each candidate service object according to the service objects respectively corresponding to each node contained in the sub-topology map.

And respectively using the service objects corresponding to the nodes except the node corresponding to the service object to be identified in the nodes contained in the sub-topological graph obtained by segmentation as candidate service objects.

In one or more embodiments of the present disclosure, the risk recognition model used for determining the predicted risk type of the business object to be recognized in step S108 of fig. 1 may be iteratively trained according to the following method, as shown in fig. 5:

s300: and acquiring a plurality of service objects and service relations among the service objects, and constructing a first topological graph which takes the service objects as nodes and the service relations as edges.

In this embodiment of the present disclosure, the first topology map includes nodes corresponding to a plurality of service objects, different nodes respectively correspond to different service objects, and an edge may or may not exist between any two nodes in each node, that is, for each node, the node does not have a service relationship with any other node in the first topology map. For example, for two accounts belonging to two fraud partners respectively, the fraud means employed by the two fraud partners are different, so victims suffering from the two fraud partners are different, then in the first topology, there may be no edge between the two nodes and there may also be no connection relationship between the nodes connected to the two nodes.

In addition, in the present specification, it is also not limited whether the first topological graph is a homograph or a heterograph, and the first topological graph of a continuous type can be obtained according to a specific application scenario, which is not limited in the present specification.

S302: and for each node in the first topological graph, a second topological graph corresponding to the node taking the node as a center is segmented from the first topological graph.

Specifically, in the first topology map, for each node having an edge connection with other nodes, the second topology map corresponding to the node centered on the node can be divided from the first topology map. The dividing manner of the second topological graph for dividing the node may be dividing based on a hop threshold, dividing based on edge weights, or dividing based on node feature similarity, which is not limited in the specification. However, in the present specification, in the second topology corresponding to the node, the node is a center node, and there is a direct or indirect traffic relationship between the node and the node as a center node.

S304: and respectively determining the characteristics of each node contained in the second topological graph corresponding to the node, taking the characteristics of each node contained in the second topological graph corresponding to the node and the second topological graph corresponding to the node as input, and inputting the input to a risk identification model to be trained to obtain the predicted risk type of the business object corresponding to the node, which is output by the risk identification model.

The scheme for determining the node characteristics is similar to S102 described above, and will not be described here.

S306: and training the risk identification model by taking the minimization of the difference between the predicted risk type of each business object and the risk type label of each business object as a training target.

In this embodiment of the present disclosure, a training sample for training the risk recognition model is a second topology map corresponding to each node included in the first topology map and features of each node included in the second topology map corresponding to each node, and an label of the training sample is a risk type of each business object corresponding to each node included in the first topology map. Thus, in training the risk recognition model, the risk recognition model may be trained based on minimizing the differences between the predicted risk types of the business objects and the risk type labels of the business objects output by the model. The loss function used in the training process of the risk identification model can be any type of existing loss function, and the specification is not limited to this.

In step S302, an optional scheme for dividing the second topology map corresponding to the node from the first topology map is as follows:

And determining an adjacency matrix of the first topological graph according to edges among nodes contained in the first topological graph.

And determining the rest nodes, of the nodes contained in the first topological graph, of which the hop count with the node is not more than a preset hop count threshold value, as reference nodes corresponding to the node according to the adjacency matrix of the first topological graph.

The preset hop count threshold value can be a fixed value preset according to priori experience, and can be flexibly adjusted according to different risk identification tasks, which is not limited in the specification.

The other nodes, of which the hop count between the nodes is not more than a preset hop count threshold, in the nodes contained in the first topological graph are used as reference nodes corresponding to the nodes, so that the nodes with direct business relations and indirect business relations with the nodes are screened out, and the nodes far away from the nodes in the first topological graph are prevented from being divided into the second topological graph of the nodes, and therefore difficulty in feature aggregation is increased.

And constructing a second topological graph corresponding to the node according to the node and each reference node corresponding to the node.

And constructing a second topological graph corresponding to the node based on the node and the edges between the reference nodes corresponding to the node and the edges between the reference nodes.

Further, in the above construction of the second topology map corresponding to the node according to the node and each reference node corresponding to the node, the following scheme may be further implemented:

and determining the characteristics of the node and the characteristics of each reference node corresponding to the node.

Further, in the foregoing solution for screening out each reference node of the node based on the preset hop threshold, in practical application, a larger number of reference nodes may be screened out, in order to avoid that the second topological graph scale is larger, and increase difficulty for training the risk identification model, in an optional embodiment of the present disclosure, an associated node of the node is screened out from the reference nodes through features of the node and similarity between features of each reference node corresponding to the node, and then the second topological graph of the node is constructed by the associated node and the node.

The features of the node and the features of each reference node corresponding to the node are determined similarly to S102 described above, and will not be described here again.

And respectively determining the similarity between the characteristics of the node and the characteristics of each reference node corresponding to the node, and screening the associated node of the node from the reference nodes according to the similarity.

The scheme for determining the association node based on the similarity is similar to the scheme for determining the association object in S104, and will not be described here.

And acquiring a business relation between the node and the associated node of the node, and constructing a second topological graph of the node according to the business relation between the node and the associated node of the node.

Fig. 6 is a schematic diagram of a risk identification device provided in the present specification, specifically including:

an obtaining module 400, configured to obtain a service object to be identified and each candidate service object in response to the risk identification request;

a feature determining module 402, configured to determine features of the service objects to be identified and features of the candidate service objects respectively;

an associated object screening module 404, configured to screen each associated object of the service object to be identified from each candidate service object according to a similarity between the feature of the service object to be identified and the feature of each candidate service object;

the target topological graph construction module 406 is configured to obtain a service relationship between the service object to be identified and each associated object, and construct a target topological graph by using the service object to be identified and each associated object as nodes and the service relationship as an edge;

The risk recognition module 408 is configured to input the target topological graph and the characteristics of each node included in the target topological graph as input to a pre-trained risk recognition model, and obtain a predicted risk type of the business object to be recognized output by the risk recognition model.

Optionally, the association object screening module 404 is specifically configured to determine a risk identification task according to the risk identification request; when the risk identification task is a first task, using the candidate business objects with the similarity higher than a preset first similarity threshold as the associated objects of the business objects to be identified; and when the risk identification task is a second task, using the candidate business object with the similarity lower than a preset second similarity threshold as the associated object of the business object to be identified.

Optionally, the obtaining module 400 is specifically configured to obtain an original topology map, where the original topology map includes a node corresponding to the service object to be identified; dividing the original topological graph to obtain a sub-topological graph containing the nodes corresponding to the business objects to be identified; and determining each candidate service object according to the service objects respectively corresponding to each node contained in the sub-topology map.

Optionally, the apparatus further comprises:

the training module 410 is specifically configured to obtain a plurality of service objects and service relationships among the service objects, and construct a first topology graph with the service objects as nodes and the service relationships as edges; for each node in the first topological graph, a second topological graph corresponding to the node taking the node as a center is segmented from the first topological graph; respectively determining the characteristics of each node contained in the second topological graph corresponding to the node, taking the characteristics of each node contained in the second topological graph corresponding to the node and the second topological graph corresponding to the node as input, and inputting the input to a risk identification model to be trained to obtain the predicted risk type of the business object corresponding to the node, which is output by the risk identification model; and training the risk identification model by taking the minimization of the difference between the predicted risk type of each business object and the risk type label of each business object as a training target.

Optionally, the training module 410 is specifically configured to determine an adjacency matrix of the first topology map according to edges between nodes included in the first topology map; determining the rest nodes, of the nodes contained in the first topological graph, of which the hop count between the nodes is not more than a preset hop count threshold value according to the adjacency matrix of the first topological graph, and taking the rest nodes as reference nodes corresponding to the nodes; and constructing a second topological graph corresponding to the node according to the node and each reference node corresponding to the node.

Optionally, the training module 410 is specifically configured to determine a feature of the node and a feature of each reference node corresponding to the node; respectively determining the similarity between the characteristics of the node and the characteristics of each reference node corresponding to the node, and screening the associated node of the node from the reference nodes according to the similarity; and acquiring a business relation between the node and the associated node of the node, and constructing a second topological graph of the node according to the business relation between the node and the associated node of the node.

The apparatus further comprises:

the updating module 412 is specifically configured to obtain an intensity index of a business relationship between the business object to be identified and each associated object; determining edge weights among nodes contained in the target topological graph according to the strength indexes of the business relations between the business objects to be identified and the associated objects; deleting edges with the edge weights smaller than a preset weight threshold value to obtain an updated target topological graph;

optionally, the risk recognition module 408 is specifically configured to input, as input, the updated target topology graph and features of each node included in the updated target topology graph to a pre-trained risk recognition model, so as to obtain a predicted risk type of the business object to be recognized, which is output by the risk recognition model.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the risk identification method shown in fig. 1 described above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 7. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 7, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the risk identification method shown in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A risk identification method, the method comprising:

2. The method of claim 1, wherein the screening the associated objects of the service object to be identified from the candidate service objects according to the similarity specifically includes:

determining a risk identification task according to the risk identification request;

when the risk identification task is a first task, using the candidate business objects with the similarity higher than a preset first similarity threshold as the associated objects of the business objects to be identified;

and when the risk identification task is a second task, using the candidate business object with the similarity lower than a preset second similarity threshold as the associated object of the business object to be identified.

3. The method of claim 1, acquiring the service object to be identified and each candidate service object, specifically comprising:

Acquiring an original topological graph, wherein the original topological graph comprises nodes corresponding to service objects to be identified;

dividing the original topological graph to obtain a sub-topological graph containing the nodes corresponding to the business objects to be identified;

and determining each candidate service object according to the service objects respectively corresponding to each node contained in the sub-topology map.

4. The method of claim 1, pre-training a risk identification model, comprising in particular:

acquiring a plurality of service objects and service relations among the service objects, and constructing a first topological graph which takes the service objects as nodes and the service relations as edges;

for each node in the first topological graph, a second topological graph corresponding to the node taking the node as a center is segmented from the first topological graph;

respectively determining the characteristics of each node contained in the second topological graph corresponding to the node, taking the characteristics of each node contained in the second topological graph corresponding to the node and the second topological graph corresponding to the node as input, and inputting the input to a risk identification model to be trained to obtain the predicted risk type of the business object corresponding to the node, which is output by the risk identification model;

and training the risk identification model by taking the minimization of the difference between the predicted risk type of each business object and the risk type label of each business object as a training target.

5. The method of claim 4, wherein the step of partitioning the first topology map into a second topology map corresponding to the node centered on the node specifically comprises:

determining an adjacency matrix of the first topological graph according to edges among nodes contained in the first topological graph;

determining the rest nodes, of the nodes contained in the first topological graph, of which the hop count between the nodes is not more than a preset hop count threshold value according to the adjacency matrix of the first topological graph, and taking the rest nodes as reference nodes corresponding to the nodes;

6. The method of claim 5, wherein the constructing a second topology map corresponding to the node according to the node and each reference node corresponding to the node specifically comprises:

determining the characteristics of the node and the characteristics of each reference node corresponding to the node;

respectively determining the similarity between the characteristics of the node and the characteristics of each reference node corresponding to the node, and screening the associated node of the node from the reference nodes according to the similarity;

7. The method of claim 1, further comprising, prior to inputting the target topology graph and the characteristics of each node contained in the target topology graph as inputs to a pre-trained risk identification model:

acquiring the strength index of the business relationship between the business object to be identified and each associated object;

determining edge weights among nodes contained in the target topological graph according to the strength indexes of the business relations between the business objects to be identified and the associated objects;

deleting edges with the edge weights smaller than a preset weight threshold value to obtain an updated target topological graph;

taking the target topological graph and the characteristics of each node contained in the target topological graph as input, and inputting the input into a pre-trained risk identification model, wherein the method specifically comprises the following steps of:

and taking the updated target topological graph and the characteristics of each node contained in the updated target topological graph as inputs, and inputting the characteristics into a pre-trained risk identification model to obtain the predicted risk type of the business object to be identified, which is output by the risk identification model.

8. A risk identification device, the device comprising:

9. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when the program is executed.