CN113222609B - Risk identification method and device - Google Patents

Risk identification method and device Download PDF

Info

Publication number
CN113222609B
CN113222609B CN202110493571.3A CN202110493571A CN113222609B CN 113222609 B CN113222609 B CN 113222609B CN 202110493571 A CN202110493571 A CN 202110493571A CN 113222609 B CN113222609 B CN 113222609B
Authority
CN
China
Prior art keywords
risk
graph
information
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110493571.3A
Other languages
Chinese (zh)
Other versions
CN113222609A (en
Inventor
管楚
付子圣
陈红
巩金慧
周绪刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AlipayCom Co ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110493571.3A priority Critical patent/CN113222609B/en
Publication of CN113222609A publication Critical patent/CN113222609A/en
Application granted granted Critical
Publication of CN113222609B publication Critical patent/CN113222609B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a risk identification method and device. According to the method of the embodiment, a risk graph is obtained firstly, the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities; then inputting the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph; the risk identification model is obtained by learning the knowledge of the nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information.

Description

Risk identification method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of computer application technologies, and in particular, to a risk identification method and apparatus.
Background
With the rapid development of internet technology, people increasingly utilize the internet to perform communication, study and work, and even perform economic behaviors such as transaction, payment, account transfer, investment and the like through the internet. On the one hand these behaviors may present certain risks; on the other hand, some lawbreakers can easily perform lawbreakers by utilizing technical defects, legal defects and the like of the internet. These pose a threat to the security of the network behavior. Therefore, a method for automatically and accurately identifying risks is needed.
Disclosure of Invention
One or more embodiments of the present specification describe a method for risk identification to facilitate automatic and accurate risk identification of an identified object.
According to a first aspect, there is provided a risk identification method comprising:
acquiring a risk graph, wherein the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities;
inputting the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph;
the risk identification model is obtained by learning the knowledge of the nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information.
In one embodiment, the risk identification model is trained as follows:
determining nodes marked with risk scores in the risk graph;
obtaining the risk recognition model by utilizing a node training graph neural network GNN marked with risk scores in the risk graph;
wherein the GNN learns weight information of feature information, path information and neighbor information of the node labeled with the risk score in a training process to minimize a difference between a risk score output by the GNN for the node labeled with the risk score and the labeled risk score.
In another embodiment, the risk identification model acquires the characteristic information, the path information and the neighbor information of the node to be identified from the risk graph, and performs attention processing on the characteristic information, the path information and the neighbor information by using the weight information obtained by pre-learning; and mapping the vector obtained after attention processing to the risk score.
In one embodiment, the method further comprises:
and determining the combination of the characteristics and the risk subgraph which have the greatest influence on the risk score of the node to be identified, wherein the risk subgraph is the subgraph which comprises the node to be identified in the risk graph.
In another embodiment, the determining the combination of features and risk sub-graphs that most affect the risk score comprises:
transforming the combination formed by the characteristics of the nodes to be identified and the risk subgraph;
respectively determining mutual information between each combination and the risk score;
and determining the combination of the corresponding characteristics and the risk subgraph when the mutual information meets the preset conditions.
In one embodiment, the number of nodes contained in the risk sub-graph is less than or equal to a preset number threshold.
In another embodiment, the entity comprises an account or a group of accounts, and the association comprises a funding relationship, a transaction relationship, an intermediary relationship, or a contact list relationship.
According to a second aspect, there is provided a risk identification apparatus comprising:
the risk graph acquisition unit is configured to acquire a risk graph, wherein the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities;
the risk scoring unit is configured to input the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph;
the risk identification model is obtained by learning the knowledge of the nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information.
In one embodiment, the apparatus further comprises:
a model training unit configured to determine nodes in the risk graph labeled with risk scores; obtaining the risk recognition model by utilizing the node training graph neural network GNN marked with the risk score in the risk graph; wherein the GNN learns weight information of feature information, path information and neighbor information of the node labeled with the risk score in a training process to minimize a difference between a risk score output by the GNN for the node labeled with the risk score and the labeled risk score.
In another embodiment, the risk identification model is configured to acquire feature information, path information, and neighbor information of the node to be identified from the risk graph, and perform attention processing on the feature information, the path information, and the neighbor information by using weight information obtained through pre-learning; and mapping the vector obtained after attention processing to the risk score.
In one embodiment, the apparatus further comprises:
an attribution interpretation unit configured to determine a combination of a feature and a risk subgraph having the greatest influence on the risk score of the node to be identified, wherein the risk subgraph is a subgraph in the risk graph including the node to be identified.
In another embodiment, the attribution interpretation unit is specifically configured to transform a combination of features and risk subgraphs of the nodes to be identified; respectively determining mutual information between each combination and the risk score; and determining the combination of the corresponding characteristics and the risk subgraph when the mutual information meets the preset conditions.
In one embodiment, the number of nodes contained in the risk sub-graph is less than or equal to a preset number threshold.
In another embodiment, the entity comprises an account or a group of accounts, and the association comprises a funding relationship, a transaction relationship, an intermediary relationship, or a contact list relationship.
According to a third aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
It can be seen from the above technical solutions that, in the embodiments of the present specification, starting with a risk graph formed by an association relationship between entities, a risk identification model is obtained from node learning with a known risk score based on three aspects of feature information, path information, and neighbor information of the entities, so as to realize risk scoring of an entity to be identified in the risk graph, thereby improving accuracy of risk identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 illustrates a main method flow diagram according to one embodiment;
FIG. 2 illustrates a detailed method flow diagram according to one embodiment;
FIG. 3 illustrates a flow diagram of a feature screening method according to one embodiment;
FIG. 4 illustrates a schematic diagram of determining a risk condition according to one embodiment;
FIG. 5 shows a schematic block diagram of the risk assessment arrangement according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
Most of the traditional risk identification methods analyze the risk of an entity based on the attribute and the behavior of the entity. For example, the risk of an account is analyzed for the type of account, length of use, frequency of transfers, amount of transfers, etc. However, as risk measures continue to evolve in diversity and complexity, traditional risk identification approaches have been unable to accommodate increasingly diverse and complex risk measures. It is less accurate to analyze whether an account is at risk purely by way of entity attributes and behaviors.
After analysis and research of various service scenes, many risk means are realized by certain relations with other entities, and the relations are even indirect multi-hop relations. For example, after one account transfers money to another account through an investment relationship, the other account transfers money to the other account through a transaction relationship. Thus, embodiments of the present description begin with the identification of risks from associations between entities.
Specific implementations of the above concepts are described below.
FIG. 1 shows a flow diagram of a risk identification method according to an embodiment of the present description. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 1, the method includes:
step 101, obtaining a risk graph, wherein the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities.
103, inputting the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph; the risk identification model is obtained by learning the knowledge of nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information.
It can be seen from the above embodiments that, in the embodiments of the present specification, starting with a risk graph formed by association relationships between entities, a risk identification model is obtained from node learning with known risk scores based on three aspects of the feature information, path information, and neighbor information of the entities, so as to score risks of entities to be identified in the risk graph, thereby improving accuracy of risk identification.
The steps in the above examples are described in detail below.
First, the step 101, namely "acquiring a risk map", will be described in detail with reference to an embodiment.
The risk graph can be constructed in advance based on the entities and the incidence relation among the entities. As shown in fig. 2, the nodes in the constructed risk graph are entities, and the edges are association relationships between the entities. Wherein the entity can be specifically determined according to the scene of risk identification. For example, for the risk identification scenario, the entity may be an account, such as a bank account, a financial platform account, a transaction platform account, and so on. The transaction platform account can be an entity account such as a bank card, a virtual account of a financial platform, or a transaction platform account bound with the virtual account of the financial platform or bound with a bank account.
Edges may be associative relationships between entities and may include, for example, financial relationships, transactional relationships, media relationships, address book relationships, and the like. Wherein the relationship may be, for example, an investment relationship, a transfer relationship, or the like. The trading relationship may be to trade goods, stock, service, etc. The media relationship may be whether the same local area network, the same device used, the same network segment, the same WIFI coverage, etc. occurs, such as a funding or transaction relationship for the two accounts. The address book relationship may be, for example, whether a user corresponding to one entity is in an address book of a user corresponding to another entity, whether users corresponding to two entities are friends of each other in a social network, and the like.
The type, number, direction, scale, etc. of the nodes and edges in fig. 2 are illustrative and not intended to limit the present application.
The step 103 of inputting the risk graph into the risk identification model to obtain the risk score of the node to be identified in the risk graph is described in detail below with reference to the embodiment.
In this step, the risk score of the node to be identified is estimated by using a risk identification model. The risk recognition model has the input of a risk graph and the output of the risk graph is the risk score of the node to be recognized. The node to be identified can be any node in the risk graph.
The idea of the step is to label the nodes with known risk scores in the risk graph, and learn a risk scoring mechanism by starting from three aspects of feature information, path information and neighbor information of the nodes with known risk scores. Thereby enabling the risk scoring mechanism to be extended to risk scoring other nodes with unknown risk scores.
The learning process is actually a training process of the risk recognition model. As shown in fig. 3, the following steps may be included:
in step 301, nodes labeled with risk scores in the risk graph are determined.
In this step, nodes with known risk scores in the risk graph may be labeled. The risk scores of these nodes may be obtained from some official information, for example comparing information provided by institutions with public trust to obtain the risk scores of some entities.
For example, if a citizen is known to have criminal behavior from information provided by an official agency, the account corresponding to the citizen may be marked with a value having a higher risk score.
As another example, if a company is known from the media-provided information that it is conducting frequent violations, the company's account may be marked with a value having a higher risk score.
As another example, if a company is known from information provided by an official agency to be rated as a star corporation for a number of consecutive years, the company's account may be marked with a value having a lower risk score.
In addition, financial professionals can perform manual professional risk scoring and labeling on some entities according to some attributes, behaviors and other information of the entities. Other risk score labeling methods may also be used, which are not exhaustive.
In step 303, training a GNN (Graph Neural Networks) by using the nodes labeled with the risk scores in the risk Graph to obtain a risk identification model; the GNN learns the weights of the characteristic information, the path information and the neighbor information of the node marked with the risk score in the training process so as to minimize the difference between the risk score output by the GNN to the node marked with the risk score and the marked risk score.
In the embodiment of the present specification, the risk identification model may be implemented by using GNN. GNNs are a class of deep learning-based methods of processing domain information. In the training process, when the GNN learns the nodes labeled with the risk scores in the risk graph, the following three aspects are mainly learned:
in a first aspect, a weight of neighbor information of a node.
Considering the influence of different neighbor nodes, a layer of neighbor attention mechanism is designed to learn the importance, namely the weight, of each neighbor node. As one way that can be achieved, the vector representation of each neighbor node can be expressed as:
Figure GDA0003149259630000071
wherein the content of the first and second substances,
Figure GDA0003149259630000072
a vector representation of the neighbor nodes on the path p for account u,
Figure GDA0003149259630000073
representing the set of neighboring nodes on the path p,
Figure GDA0003149259630000074
is the feature vector of the neighbor node j. Alpha is alphaρ(u, j) is a preset mapping function, and represents the weight of the neighbor node j to the account u. The above-mentioned mapping function αρ(u, j) may be implemented using, for example, an additive function, a dot product function, a scaled dot product function, a bilinear function, etc., or other custom functions. The rule is that the more similar u and j, the higher the weight.
In the embodiment of the present specification, the paths in the risk graph may be divided and defined in advance according to the type and the length (i.e., the number of nodes included). Wherein the type of path may include, for example, a path from a financial platform account to a bank account, a path from a bank account to a financial platform account, a path from a bank account to a transaction platform account, a path from a transaction platform account to a transaction platform account, and the like. Considering that different types of paths also have certain influence on account risk, path factors are introduced in weight learning of neighbor information and feature information of nodes in the embodiments of the present specification.
In a second aspect, weights of characteristic information of nodes.
Considering that the feature information of a node may have a great decisive effect on whether the node has a risk, a feature attention mechanism is adopted in the embodiment of the present specification, which may be expressed as:
Figure GDA0003149259630000081
wherein the content of the first and second substances,
Figure GDA0003149259630000082
for the account u, the feature vector is known, and to distinguish between different paths, only the path ρ is marked for the calculation of the above formula. B isρFor the state transition matrix after being affected by the neighbor,
Figure GDA0003149259630000083
is the feature vector of account u considering neighbor influence.
The characteristic information of the node may be personal attributes of the user corresponding to the account, such as gender, age, occupation, region, etc., transaction characteristics, such as transaction amount, transaction object amount, etc., of the last month, last week, etc., environmental characteristics, such as the type of device used, the type of network, etc.
In a third aspect, a weight of path information for a node.
For each node, its attention to each meta-path in the path set P may be calculated, which may be expressed as:
Figure GDA0003149259630000084
wherein z isρThe attention vector representing the path ρ, γ (u, ρ) represents a normalization process, which can be considered as obtaining a vector representation of each path ρ of the node.
And integrating the vector representation of each neighbor of the node, the vector representation of each path and the feature vector of the node by the GNN, and mapping the integrated vector to a specific score to obtain the risk score of the node. In the process of training the GNN, the training target is to predict the risk score of the node labeled with the risk score by the GNN, and the difference between the predicted risk score and the labeled risk score is minimized. A loss function can be set according to the training target, and the values of the loss function are used to update the model parameters in each iteration process, including α in the above formula (1)ρ(u, j) parameter, B in the above formula (2)ρZ in the above formula (3)ρAnd until a preset iteration stop condition is met. The iteration stop condition may be that the value of the loss function is less than or equal to a preset threshold, or the number of iterations reaches a preset threshold, or the like. That is, in the iterative process of model training, the three weights are updated in each iteration, and finally, the risk score predicted by the model for each node with the labeled risk score is consistent with the labeled risk score as much as possible.
After the learning process, the obtained risk identification model can be used for carrying out risk scoring on the nodes in the risk graph. Similarly, the risk identification model acquires the characteristic information, the path information and the neighbor information of the node to be identified from the risk graph, performs attention processing on the characteristic information, the path information and the neighbor information by using the pre-learned weight information, and maps the vector obtained after attention processing to a specific risk score.
It should be noted that the node to be identified may be a node that is not yet labeled with a risk score in the risk graph, or may be a node that is labeled with a risk score.
Through experimental verification, a risk graph is constructed based on the transaction scene analysis of the virtual currency. Contains 52 ten thousand accounts, including the account of the transaction platform and also the bank card account. Edges contain transaction relationships and media relationships. Including 78-dimensional statistical features and 128-dimensional behavioral features are also extracted for each account. Then, risk identification is performed in the manner described in the above embodiments of the present specification, and tests are performed on 2 data sets. After test verification, the accuracy can reach 98.6%, and the coverage rate on the two test sets reaches 82.1% and 95%.
Furthermore, after the node to be identified is subjected to the risk identification to obtain the risk score, the combination of the feature and the risk subgraph which have the greatest influence on the risk score of the node to be identified can be further determined, so that the risk score can be explained, and the reliability and the privacy of the model can be improved. Two interpretable concepts are introduced in the examples of this specification: core features and core subgraphs. I.e. the features and risk sub-graphs that have the greatest impact on the risk score. Wherein the risk subgraph refers to a subgraph in the risk graph containing the nodes to be identified.
A preferred implementation is provided herein to determine the combination of features and risk sub-graphs that most affect the risk score. Referring to fig. 4, the following steps may be included:
step 401, transforming the combination of the characteristics of the node to be identified and the risk subgraph.
In this step, the features and the risk sub-graph may be randomly transformed to form a combination, or all the combinations of the features and the risk sub-graph may be exhausted.
And step 403, determining mutual information between each combination and the risk score respectively.
Mutual information is utilized in the step, and is a useful information measure in the information theory, which can be regarded as the information quantity contained in one random variable about another variable, or the reduced uncertainty of one variable due to another random variable. In the present application, which is actually the risk score due to the reduced uncertainty of the above combination, the mutual information may take the following formula:
MI(Y,(g,x))=H(Y)-H(Y|G=g,X=x)) (4)
where g represents a risk subgraph, x represents a feature, Y represents a risk score, and MI (Y, (g, x)) refers to the mutual information of Y and the combination consisting of g and x. H () is a function of entropy.
Step 405, determining a combination of the corresponding features and the risk subgraph when the mutual information meets the preset conditions.
In this step, the preset condition may be that the mutual information value is maximum, or that the mutual information value is greater than or equal to a preset threshold value. Taking the maximum mutual information value as an example, finding the combination of x and g corresponding to the maximum mutual information value can be expressed as:
Figure GDA0003149259630000101
in addition, in order to reduce the spatial complexity of the computation, the size of the risk subgraph used may be limited, that is, the risk subgraph needs to include a number of nodes less than or equal to a preset number threshold k, that is, | g ≦ k. The value of k can be an empirical value or an experimental value.
Where H (Y) is a constant, it corresponds to minimizing the conditional entropy H (Y | G ═ G)s,X=Xs) Wherein G) issFor the set of risk subgraphs sampled at the time of the transformation combination in step 401, XsIs a sampled feature set.
Figure GDA0003149259630000102
PφRepresenting the sampling strategy, i.e. adopting the preset sampling strategy to sample to obtain the GsAnd Xs
After the risk identification model gives a specific score Y e Y, it is desirable to find the features and risk sub-graphs that have the greatest impact on the score Y. Therefore, the above idea is equivalent to collecting different combinations of "feature + risk subgraph", i.e. combinations of transforms x and g, by means of sampling. If some characteristics or neighbors are removed, the influence on the score is not large, in this case, the mutual information of the combination of the characteristics and the risk subgraph and the risk score is small, and the influence of the characteristics and the neighbors on the account node is small; instead, it is stated that these features and neighbors are the core features and core risk sub-graph for this account.
It can be seen that the embodiment of the present specification further provides attribution interpretation information for the risk identification model to score the risk for the node to be identified, so that the cost of manual verification is reduced, and a basis is provided for the subsequent optimization and adjustment of the wind control system.
In addition, due to the attribution and interpretation mechanism, the risk score of the model can be verified, and the reliability of the risk identification model is increased. And the attribution interpretation mechanism does not need to leak the identification mechanism of the risk identification model, so that the privacy of the risk identification model is ensured.
In addition, in addition to risk assessment and interpretation for a single account, risk assessment and interpretation may be performed for some sets of accounts, such as risk assessment and interpretation for risk groups. The specific approach is similar to that described above, except that these account sets are treated as one node in the risk graph.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
According to an embodiment of another aspect, a risk assessment device is provided. FIG. 5 shows a schematic block diagram of the risk assessment arrangement according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 5, the apparatus 500 includes: the graph obtaining unit 501 and the risk scoring unit 502 may further include a model training unit 503 and an attribution interpretation unit 504. The main functions of each component unit are as follows:
the graph obtaining unit 501 is configured to obtain a risk graph, where the risk graph includes nodes and edges between the nodes, the nodes are entities, and the edges are association relationships between the entities.
And the risk scoring unit 502 is configured to input the risk graph into the risk identification model to obtain the risk score of the node to be identified in the risk graph.
The risk identification model is obtained by learning the knowledge of the nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information.
A model training unit 503 configured to determine nodes labeled with risk scores in the risk graph; training GNN by using nodes marked with risk scores in the risk graph to obtain a risk identification model; wherein the GNN learns attention weights of the feature information, path information and neighbor information of the node labeled with the risk score in a training process to minimize a difference between a risk score output by the GNN for the node labeled with the risk score and the labeled risk score.
The risk identification model is used for acquiring characteristic information, path information and neighbor information of a node to be identified from a risk graph and performing attention processing on the characteristic information, the path information and the neighbor information by using attention weight information obtained by pre-learning; and mapping the vector obtained after attention processing to the risk score.
Still further, the attribution interpretation unit 504 is configured to determine a combination of the feature that most affects the risk score of the node to be identified and a risk sub-graph, wherein the risk sub-graph is a sub-graph of the risk graph including the node to be identified.
As a preferred embodiment, the attribution interpretation unit 504 is specifically configured to transform a combination of features of the node to be identified and the risk subgraph; determining mutual information between each combination and the risk score; and determining the combination of the corresponding characteristics and the risk subgraph when the mutual information meets the preset conditions.
In order to reduce the spatial complexity of the computation, the number of nodes contained in the risk subgraph is limited to be less than or equal to a preset number threshold.
As a typical application scenario, the entities may include accounts or account groups, and the association relationship includes a fund relationship, a transaction relationship, a media relationship or an address book relationship.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 1, fig. 3 or fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method of fig. 1, 3 or 4.
With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (11)

1. A risk identification method, comprising:
acquiring a risk graph, wherein the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities;
inputting the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph;
the risk identification model is obtained by learning the knowledge of nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information;
the method further comprises the following steps:
determining a combination of a feature and a risk subgraph which has the greatest influence on the risk score of the node to be identified, wherein the risk subgraph is a subgraph which comprises the node to be identified in the risk graph;
wherein the determining the combination of the feature and the risk subgraph that most impacts the risk score comprises:
transforming the combination formed by the characteristics of the nodes to be identified and the risk subgraph;
respectively determining mutual information between each combination and the risk score;
and determining the combination of the corresponding characteristics and the risk subgraph when the mutual information meets the preset conditions.
2. The method of claim 1, wherein the risk identification model is trained using:
determining nodes marked with risk scores in the risk graph;
obtaining the risk recognition model by utilizing a node training graph neural network GNN marked with risk scores in the risk graph;
wherein the GNN learns weight information of feature information, path information and neighbor information of the node labeled with the risk score in a training process to minimize a difference between a risk score output by the GNN for the node labeled with the risk score and the labeled risk score.
3. The method according to claim 1, wherein the risk identification model acquires feature information, path information and neighbor information of the node to be identified from the risk graph, and performs attention processing on the feature information, the path information and the neighbor information by using weight information obtained by pre-learning; and mapping the vector obtained after attention processing to the risk score.
4. The method of claim 1, wherein the number of nodes contained in the risk sub-graph is less than or equal to a preset number threshold.
5. The method of any of claims 1-4, wherein the entity comprises an account or group of accounts, and the association comprises a funding relationship, a trading relationship, an intermediary relationship, or an address book relationship.
6. A risk identification device comprising:
the risk graph acquisition unit is configured to acquire a risk graph, wherein the risk graph comprises nodes and edges between the nodes, the nodes are entities, and the edges are incidence relations between the entities;
the risk scoring unit is configured to input the risk graph into a risk identification model to obtain a risk score of a node to be identified in the risk graph;
the risk identification model is obtained by learning the knowledge of nodes with known risk scores in the risk graph in advance, wherein the knowledge comprises characteristic information, path information and neighbor information;
further comprising:
an attribution interpretation unit configured to determine a combination of a feature and a risk subgraph having the greatest influence on the risk score of the node to be identified, wherein the risk subgraph is a subgraph in the risk graph including the node to be identified;
wherein the attribution interpretation unit is specifically configured to transform a combination of features and risk subgraphs of the nodes to be identified; respectively determining mutual information between each combination and the risk score; and determining the combination of the corresponding characteristics and the risk subgraph when the mutual information meets the preset conditions.
7. The apparatus of claim 6, further comprising:
a model training unit configured to determine nodes in the risk graph labeled with risk scores; obtaining the risk recognition model by utilizing a node training graph neural network GNN marked with risk scores in the risk graph; wherein the GNN learns weight information of feature information, path information and neighbor information of the node labeled with the risk score in a training process to minimize a difference between a risk score output by the GNN for the node labeled with the risk score and the labeled risk score.
8. The device according to claim 6, wherein the risk identification model is configured to acquire feature information, path information, and neighbor information of the node to be identified from the risk graph, and perform attention processing on the feature information, the path information, and the neighbor information by using weight information obtained through pre-learning; and mapping the vector obtained after attention processing to the risk score.
9. The apparatus of claim 6, wherein a number of nodes contained in the risk sub-graph is less than or equal to a preset number threshold.
10. The apparatus of any one of claims 6 to 9, wherein the entity comprises an account or group of accounts, and the association comprises a funding relationship, a transaction relationship, an intermediary relationship, or a contact list relationship.
11. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-5.
CN202110493571.3A 2021-05-07 2021-05-07 Risk identification method and device Active CN113222609B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110493571.3A CN113222609B (en) 2021-05-07 2021-05-07 Risk identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110493571.3A CN113222609B (en) 2021-05-07 2021-05-07 Risk identification method and device

Publications (2)

Publication Number Publication Date
CN113222609A CN113222609A (en) 2021-08-06
CN113222609B true CN113222609B (en) 2022-05-06

Family

ID=77091245

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110493571.3A Active CN113222609B (en) 2021-05-07 2021-05-07 Risk identification method and device

Country Status (1)

Country Link
CN (1) CN113222609B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091208B (en) * 2023-01-16 2023-10-27 张一超 Credit risk enterprise identification method and device based on graph neural network

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019514148A (en) * 2016-04-07 2019-05-30 ホワイト・アンヴィル・イノベーションズ,エルエルシー Method for analyzing digital data
EP3794511A1 (en) * 2018-05-18 2021-03-24 BenevolentAI Technology Limited Graph neutral networks with attention
CN110263227B (en) * 2019-05-15 2023-07-18 创新先进技术有限公司 Group partner discovery method and system based on graph neural network
CN111292195A (en) * 2020-02-28 2020-06-16 中国工商银行股份有限公司 Risk account identification method and device
CN111460170B (en) * 2020-03-27 2024-02-13 深圳价值在线信息科技股份有限公司 Word recognition method, device, terminal equipment and storage medium
CN112215487B (en) * 2020-10-10 2023-05-23 吉林大学 Vehicle running risk prediction method based on neural network model
CN112257959A (en) * 2020-11-12 2021-01-22 上海优扬新媒信息技术有限公司 User risk prediction method and device, electronic equipment and storage medium
CN112580780A (en) * 2020-12-14 2021-03-30 深圳前海微众银行股份有限公司 Model training processing method, device, equipment and storage medium
CN112669143A (en) * 2021-01-08 2021-04-16 上海优扬新媒信息技术有限公司 Risk assessment method, device and equipment based on associated network and storage medium

Also Published As

Publication number Publication date
CN113222609A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN111860573B (en) Model training method, image category detection method and device and electronic equipment
US11403643B2 (en) Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
WO2019196546A1 (en) Method and apparatus for determining risk probability of service request event
CN111080123A (en) User risk assessment method and device, electronic equipment and storage medium
WO2021204269A1 (en) Classification model training, and object classification
CN109766557B (en) Emotion analysis method and device, storage medium and terminal equipment
WO2020207079A1 (en) Image recognition-based desensitization processing method and device
CN111199474B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN111008898B (en) Method and apparatus for evaluating model interpretation tools
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
Walsh et al. Automated human cell classification in sparse datasets using few-shot learning
US20220215292A1 (en) Method to identify incorrect account numbers
CN113240505A (en) Graph data processing method, device, equipment, storage medium and program product
CN114202336A (en) Risk behavior monitoring method and system in financial scene
CN113222609B (en) Risk identification method and device
CN111325344A (en) Method and apparatus for evaluating model interpretation tools
CN112818868B (en) Method and device for identifying illegal user based on behavior sequence characteristic data
CN111340102B (en) Method and apparatus for evaluating model interpretation tools
CN111274907B (en) Method and apparatus for determining category labels of users using category recognition model
Yang et al. Investigating the effectiveness of data augmentation from similarity and diversity: An empirical study
Sheng et al. Semantic-preserving abstractive text summarization with Siamese generative adversarial net
CN111310931A (en) Parameter generation method and device, computer equipment and storage medium
CN112446777A (en) Credit evaluation method, device, equipment and storage medium
US20230088840A1 (en) Dynamic assessment of cryptocurrency transactions and technology adaptation metrics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230105

Address after: 201208 Floor 15, No. 447, Nanquan North Road, Free Trade Pilot Zone, Pudong New Area, Shanghai

Patentee after: Alipay.com Co.,Ltd.

Address before: 310000 801-11 section B, 8th floor, 556 Xixi Road, Xihu District, Hangzhou City, Zhejiang Province

Patentee before: Alipay (Hangzhou) Information Technology Co.,Ltd.

TR01 Transfer of patent right