CN116957770A - Method and device for identifying financial fraud - Google Patents

Method and device for identifying financial fraud Download PDF

Info

Publication number
CN116957770A
CN116957770A CN202210386438.2A CN202210386438A CN116957770A CN 116957770 A CN116957770 A CN 116957770A CN 202210386438 A CN202210386438 A CN 202210386438A CN 116957770 A CN116957770 A CN 116957770A
Authority
CN
China
Prior art keywords
community
node
user
communities
fraud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210386438.2A
Other languages
Chinese (zh)
Inventor
汪海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
CM Intelligent Mobility Network Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Shanghai ICT Co Ltd
CM Intelligent Mobility Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Shanghai ICT Co Ltd, CM Intelligent Mobility Network Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202210386438.2A priority Critical patent/CN116957770A/en
Publication of CN116957770A publication Critical patent/CN116957770A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for identifying financial fraud, which relates to the technical field of Internet application, and comprises the following steps: acquiring user data; preprocessing user data to obtain a user relationship network, wherein each user in the user relationship network is used as a node; inputting a model for identifying financial fraud into a user relationship network, and outputting fraud probability of a user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities. According to the invention, the algorithm is realized, the large community and the small community are simultaneously considered, additional super parameters are required to be added, the algorithm accuracy is high, and the use is simple and convenient.

Description

Method and device for identifying financial fraud
Technical Field
The invention relates to the technical field of Internet application, in particular to a method and a device for identifying financial fraud.
Background
In the prior art, two conditions exist in a community discovery algorithm based on modularity optimization: in some cases, common sense considers that there should be cases of multiple small communities, which tend to group multiple small communities into a large community, while in other cases, common sense considers that cases of large communities should be combined, which tend not to be combined; this problem becomes more obvious for operators or large internet companies, and relation mining for companies or schools cannot be performed.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying financial fraud, which are used for solving the problems that in the prior art, for complex parts in an industrial environment or patients in the medical field, parts are easy to wear or personnel are easy to hurt due to rigid constraint.
In order to solve the above problems, the present invention is achieved as follows:
in a first aspect, the present invention provides a method of identifying financial fraud, comprising:
acquiring user data;
preprocessing the user data to obtain a user relationship network, wherein each user in the user relationship network is used as a node;
inputting the user relation network into a model for identifying financial fraud, and outputting fraud probability of the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
Optionally, the modularity is calculated using the following formula:
wherein A is ij The weight of the edge between the node i and the node j; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
Optionally, the inputting the model for identifying financial fraud into the user relationship network and outputting the fraud probability of the user includes:
step 1: constructing a first network diagram according to the user relation network, wherein each user in the first network diagram is used as a node;
step 2: calculating communities to which each node in the first network graph belongs;
step 3: obtaining a change in modularity of each node i when transferring from one community to each community adjacent to the node i;
step 4: calculating community c 'with maximum variation of modularity when dividing node i into communities' i =argmaxΔQ(c i ) Taking the community with the largest variation as the current community of the node i;
step 5: for each node i, obtaining community information to which the node i currently belongs and community information to which all nodes connected with the node i belong as a set S;
step 6: step 3-step 5 is executed again according to the preset times; dividing iteration calculation rounds into odd rounds and even rounds after the preset times; the node i and the node v are combined j Connected, wherein the node v j Belongs to the community with the maximum variation of the node i; when the iteration rounds are odd rounds and the variation of the current community is smaller than the variation of the transferred community, updating community information; or the iteration round is an even round, and when the variation of the current community is larger than that of the transferred community, the iteration round is more than that of the transferred communityNew community information;
step 7: judging whether all nodes do not change the community;
step 8: if all nodes do not change the communities, reconstructing a second network diagram according to the current community information, wherein each vertex in the second network diagram corresponds to one community in the first network diagram; the internal weight of the vertex in the second network diagram is the sum of the weights of all the edges in the community in the first network diagram;
step 9: step 3-step 8 are executed again to carry out iterative updating until all nodes do not change the affiliated communities any more;
step 10: and calculating the probability of the fraudulent communities according to the number of the users in the community information, and evaluating the fraud risk of the single user according to the community information.
Optionally, the obtaining a change in modularity of each node i when transferring from one community to another community includes:
calculating the change of the modularity, wherein the change of the modularity is as follows:
wherein (1)>For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; k (k) i,in The sum of the edge weights of the node i and the community; k (k) j,in The sum of the edge weights of the node j and the community; m is the sum of the weights of all the edges in the graph.
Optionally, the calculating the probability of the fraudulent community according to the number of the users in the community information includes:
the probability calculation formula of the fraudulent community is as follows:wherein, |c i I is c i Total number of users in community; y is j The data is marked for the person of the jth user stored in the database.
Optionally, the evaluating fraud risk of the single user according to the community information includes:
the common fraudulent user and the core fraudulent user are identified through an intermediary centrality algorithm, and the single user fraud risk is evaluated.
In a second aspect, the present invention provides an apparatus for identifying financial fraud, comprising:
the acquisition module is used for acquiring user data;
the preprocessing module is used for preprocessing the user data to obtain a user relationship network, and each user in the user relationship network is used as a node;
the identification module is used for inputting the user relation network into a financial fraud identification model and outputting fraud probability of the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
Optionally, the modularity is calculated using the following formula:
wherein A is ij The weight of the edge between the node i and the node j; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
Optionally, the inputting the model for identifying financial fraud into the user relationship network and outputting the fraud probability of the user includes:
step 1: constructing a first network diagram according to the user relation network, wherein each user in the first network diagram is used as a node;
step 2: calculating communities to which each node in the first network graph belongs;
step 3: obtaining a change in modularity of each node i when transferring from one community to each community adjacent to the node i;
step 4: calculating community c 'with maximum variation of modularity when dividing node i into communities' i =argmaxΔQ(c i ) Taking the community with the largest variation as the current community of the node i;
step 5: for each node i, obtaining community information to which the node i currently belongs and community information to which all nodes connected with the node i belong as a set S;
step 6: step 3-step 5 is executed again according to the preset times; dividing iteration calculation rounds into odd rounds and even rounds after the preset times; the node i and the node v are combined j Connected, wherein the node v j Belongs to the community with the maximum variation of the node i; when the iteration rounds are odd rounds and the variation of the current community is smaller than the variation of the transferred community, updating community information; or when the iteration round is an even round and the variation of the current community is larger than the variation of the transferred community, updating the community information;
step 7: judging whether all nodes do not change the community;
step 8: if all nodes do not change the communities, reconstructing a second network diagram according to the current community information, wherein each vertex in the second network diagram corresponds to one community in the first network diagram; the internal weight of the vertex in the second network diagram is the sum of the weights of all the edges in the community in the first network diagram;
step 9: step 3-step 8 are executed again to carry out iterative updating until all nodes do not change the affiliated communities any more;
step 10: and calculating the probability of the fraudulent communities according to the number of the users in the community information, and evaluating the fraud risk of the single user according to the community information.
Optionally, the obtaining a change in modularity of each node i when transferring from one community to another community includes:
calculating the change of the modularity, wherein the change of the modularity is as follows:
wherein (1)>For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; ci is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; k (k) i,in The sum of the edge weights of the node i and the community; k (k) j,in The sum of the edge weights of the node j and the community; m is the sum of the weights of all the edges in the graph.
Optionally, the calculating the probability of the fraudulent community according to the number of the users in the community information includes:
the probability calculation formula of the fraudulent community is as follows:wherein, |c i I is c i Total number of users in community; y is j The data is marked for the person of the jth user stored in the database.
Optionally, the evaluating fraud risk of the single user according to the community information includes:
the common fraudulent user and the core fraudulent user are identified through an intermediary centrality algorithm, and the single user fraud risk is evaluated.
In a third aspect, the present invention provides a server comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method of identifying financial fraud as defined in any of the first aspects.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of identifying financial fraud as in any of the first aspects.
According to the invention, a community discovery algorithm is improved by modifying a modularity definition method, so that the problem that a large community and a small community cannot be considered is solved, meanwhile, the large community such as a company and the small community such as a family can be identified, no additional super parameters are needed to be added, no manual intervention is needed, the algorithm accuracy is high, and the use is simple and convenient.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow chart of a method for identifying financial fraud according to an embodiment of the present invention;
FIG. 2 is a schematic general flow chart of a method for identifying financial fraud according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an apparatus for identifying financial fraud according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of the present invention provides a method for identifying financial fraud, including:
step 11: acquiring user data;
step 12: preprocessing the user data to obtain a user relationship network, wherein each user in the user relationship network is used as a node;
step 13: inputting the user relation network into a model for identifying financial fraud, and outputting fraud probability of the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
In the embodiment of the invention, the community discovery algorithm is improved by modifying the modular definition method, so that the problem that a large community and a small community cannot be considered is solved, meanwhile, the large community such as a company and the small community such as a family can be identified, no additional super parameters are needed to be added, no manual intervention is needed, the algorithm accuracy is high, and the use is simple and convenient.
In the embodiment of the invention, in step 11, data such as conversation, short messages, equipment, APP use, website access and the like of a user are collected at a base station side through embedded data collection equipment; in step 12, classifying, checking, correcting or deleting the data collected, marking unreliable data, and importing the unreliable data into a production database; and analyzing and processing the collected data, unifying the data to the same level by normalization, standardization and other methods to obtain a user relationship network, wherein each user in the user relationship network is used as a node.
In the embodiment of the invention, in step 13, after a relationship network of users is obtained, a user community with a strong association relationship is mined by identifying a financial fraud model, fraud conditions of users in the community are analyzed, and fraud probability is obtained for each user; the deployment mode of Nebula+Plato is adopted, so that the results output by the model can be conveniently called by an external financial institution, the deployment mode can accommodate large data volume, the search speed is high, and the method is suitable for environment deployment under the condition of large data; the calling process of the model comprises the following steps: the external financial institution invokes the model for identifying financial fraud through an interface provided by the Nebula database; when the external financial institution inputs the name, the mobile phone number and the identity card of the user, the fraud probability of the user can be correspondingly output.
In the embodiment of the present invention, optionally, the modularity is calculated by adopting the following formula:
wherein A is ij The weight of the edge between the node i and the node j; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
In the embodiment of the invention, the modularity is also called a modularization metric value and is used for measuring the structural strength of the network community, and the modularity is as follows:its modularity formula in the prior art>On the basis of adding correctionItems, i.e. penalty items:>modular formula in the prior art>Meaning that the proportion of edges that fall within the same community subtracts the probability expectation that these edges are randomly assigned; the correction term->Meaning a weighted average of the strength of relationship between two nodes of different communities; wherein (1)>The term represents the average of the relationship strength between two nodes of different communities, such that if there are more edges between two small communities, the correction term between them will be small, i.e. a negative number with a larger absolute value, so that in the next iteration, the two small communities will tend to merge, such that they merge into one large community plus the weight>So that when the connection between two small communities is not tight, the weight is small, the correction term tends to 0, and whether the correction term is combined basically does not change the value of the correction term is +.>The method is slightly reduced, so that the two small communities are not combined in the next iteration, so that communities with close relationships are not combined, communities with not close relationships are not combined, and meanwhile, a large community and a small community are considered.
In the embodiment of the invention, the modularity formula is deformed into:wherein (1)>For community c i The sum of the edge weights in the inner part; />For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; m is the sum of the weights of all the edges in the user relationship network; />For community c i Middle node and community c j The sum of edge weights between the intermediate nodes; />Community c i Middle node and community c j The link closeness between the middle nodes; wherein (1)>|c i I is community c i The number of nodes in, |c j I is community c j The number of middle nodes;
calculating a first change in modularity when a node joins a community is:wherein (1)>For community c i The sum of the edge weights in the inner part; />For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) i,in The sum of the edge weights of the nodes and communities; m is the sum of the weights of all the edges in the user relationship network;simplifying the first change in modularity is:
calculating a transition of a node from one community to another community, the second change in modularity being:the method is as follows: in actual calculation, it is only necessary to determine whether the value in brackets is greater than 0.
In the embodiment of the present invention, optionally, the inputting the model for identifying financial fraud into the user relationship network and outputting the fraud probability of the user includes:
step 1: constructing a first network graph G (V, E) according to the user relation network, wherein each user in the first network graph is used as a node; wherein V is a set of vertices and E is a set of edges; and performing distributed storage on the first network graph G (V, E);
step 2: calculating communities to which each node in the first network graph belongs;
step 3: calling a second change formula of the modularity to obtain the change of the modularity when each node i is transferred from one community to each community adjacent to the node i;
step 4: calculating community c 'with maximum variation of modularity when dividing node i into communities' i =argmaxΔQ(c i ) Taking the community with the largest variation as the current community of the node i; wherein DeltaQ (c) i ) To divide node i into community c i The amount of change in time modularity;
step 5: for each node i, obtaining community information to which the node i currently belongs and community information to which all nodes connected with the node i belong as a set S;
step 6: step 3-step 5 is executed again according to the preset times; dividing iteration calculation rounds into odd rounds and even rounds after the preset times; the node i and the node v are combined j Connected, wherein the node v j Belongs to the community with the maximum variation of the node i; when the iteration rounds are odd rounds and the variation of the current community is smaller than the variation of the transferred community, updating community information; or when the iteration round is an even round and the variation of the current community is larger than the variation of the transferred community, updating the community information;
step 7: judging whether all nodes do not change the community;
step 8: if all nodes do not change the community, reconstructing a second network graph G ' (V ', E ') according to the current community information, wherein each vertex in the second network graph corresponds to one community in the first network graph; the internal weight of the vertex in the second network diagram is the sum of the weights of all the edges in the community in the first network diagram; wherein V 'is a set of vertices and E' is a set of edges; each vertex in the second network graph G '(V', E ') corresponds to one community in the first network graph G (V, E), and the community identity in the first network graph G (V, E) is used as the vertex identity in the second network graph G' (V ', E'); the internal weight of the vertex in the second network graph G ' (V ', E ') is the sum of the weights of all the edges in the community in the first network graph G (V, E); edges in the second network graph G ' (V ', E ') are edges between communities in the first network graph G (V, E);
step 9: step 3-step 8 are executed again to carry out iterative updating until all nodes do not change the affiliated communities any more;
step 10: and calculating the probability of the fraudulent communities according to the number of the users in the community information, and evaluating the fraud risk of the single user according to the community information.
In the embodiment of the invention, a new community finding algorithm is provided based on redefinition of module degree, communities are divided through iterative computation, and the existing black sample data and community division results are combined, so that the method can be used for computing whether millions of users of small financial companies or billions of data of large Internet companies exist, and serious aggregation phenomenon can be avoided when the data are large.
In the embodiment of the present invention, optionally, the obtaining the module degree change when each node i is transferred from one community to another community includes:
calculating the change of the modularity, wherein the change of the modularity is as follows:
wherein (1)>For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; k (k) i,in The sum of the edge weights of the node i and the community; k (k) j,in The sum of the edge weights of the node j and the community; m is the sum of the weights of all the edges in the graph.
In the embodiment of the present invention, optionally, the calculating the probability of the fraudulent community according to the number of users in the community information includes:
the probability calculation formula of the fraudulent community is as follows:wherein, |c i I is c i Total number of users in community; y is j The data is marked for the person of the jth user stored in the database.
In an embodiment of the present invention, optionally, the evaluating the fraud risk of the single user according to the community information includes:
the common fraudulent user and the core fraudulent user are identified through an intermediary centrality algorithm, and the single user fraud risk is evaluated.
In the embodiment of the invention, the users and the relations in the fraudulent communities are extractedSeparately forming a subgraph, and calculating the risk of a single user in the fraudulent community by using an intermediary centrality algorithm; firstly, calculating the importance of each user in a fraud community, wherein the importance of each user in the fraud community is as follows:where r is the number of shortest paths between node s and node t; r (v) j ) Is the passing v between the node s and the node t j Is the number of shortest paths of (a); c i Is node v j The community to which the community belongs;
the fraud probability of each specific user is obtained according to the fraud probability of the whole community and the importance of each user in the community, and is as follows:wherein, |c i The I is the number of users in the community; p (c) i ) A risk level for the community as a whole; />Is a normalized coefficient.
In the embodiment of the invention, the common fraudulent user and the core fraudulent user are further identified through the intermediary centrality algorithm, the single user fraud risk is accurately estimated, and the risk is prevented through monitoring the core fraudulent user, so that the risk control precision is improved.
In the embodiment of the invention, a new community finding algorithm is provided based on redefinition of module degree, communities are divided through iterative computation, and risks of all users in the communities are specifically evaluated by utilizing the intermediation centrality in combination with the existing black sample data and community division results, so that the industrial landing in the financial wind control field is realized.
Referring to fig. 2, in the embodiment of the present invention, an iterative algorithm that maximizes the module degree is obtained based on a new module degree definition mode, and the iterative algorithm is used as a core to identify financial fraud; firstly, collecting edge equipment data, which comprises the following steps: the method comprises the steps that data such as conversation, short messages, equipment, APP use, website access and the like of a user are collected at a base station side through embedded data collection equipment; secondly, data cleaning and warehousing are carried out, including: classifying, checking, correcting or deleting error data, marking unreliable data and importing the unreliable data into a production database; again, data preprocessing is performed, including: analyzing and processing the relationship data, unifying the data to the same level by normalization, standardization and other methods to obtain a relationship network of a user, and facilitating subsequent further calculation; again, model training is performed, including: after obtaining a relation network of users, utilizing a community discovery algorithm to mine user communities with strong association relation, analyzing fraud conditions of the users in communities, and obtaining fraud probability for each user; wherein the model training is to use the model for identifying financial fraud; finally, performing model deployment, including: the deployment mode of Nebula+Plato is adopted, so that the results output by the model can be conveniently called by an external financial institution, the deployment mode can accommodate large data volume, the search speed is high, and the method is suitable for environment deployment under the condition of large data; the calling process of the model comprises the following steps: the external financial institution invokes the model for identifying financial fraud through an interface provided by the Nebula database; when the external financial institution inputs the name, the mobile phone number and the identity card of the user, the fraud probability of the user can be correspondingly output.
In the embodiment of the invention, punishment is carried out on the small communities based on the prior art through the new modularity definition, meanwhile, the relatively obvious small communities are reserved through adjusting punishment coefficients, and the problem that a community discovery algorithm based on the modularity cannot consider both the large communities and the small communities is effectively solved; a new community discovery algorithm is deduced, so that the accuracy of the result community is higher; the method can be applied to different scenes with large data volume, and can be used for calculation no matter the user volume of millions of small financial companies or the data volume of billions of large Internet companies; and core fraud users and edge fraud users in the fraud communities are further explored through an intermediation centrality method, fraud organizers are accurately identified, risk control precision is improved, and convenience and rapidness are realized.
Referring to fig. 3, the present invention provides an apparatus for identifying financial fraud, comprising:
an acquisition module 31 for acquiring user data;
a preprocessing module 32, configured to preprocess the user data to obtain a user relationship network, where each user in the user relationship network is used as a node;
an identification module 33 for inputting the user relationship network into a model for identifying financial fraud and outputting a fraud probability for the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
In the embodiment of the present invention, optionally, the modularity is calculated by adopting the following formula:
wherein A is ij The weight of the edge between the node i and the node j; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
In the embodiment of the present invention, optionally, the inputting the model for identifying financial fraud into the user relationship network and outputting the fraud probability of the user includes:
step 1: constructing a first network diagram according to the user relation network, wherein each user in the first network diagram is used as a node;
step 2: calculating communities to which each node in the first network graph belongs;
step 3: obtaining a change in modularity of each node i when transferring from one community to each community adjacent to the node i;
step 4: calculating community c 'with maximum variation of modularity when dividing node i into communities' i =argmaxΔQ(c i ) Taking the community with the largest variation as the current community of the node i;
step 5: for each node i, obtaining community information to which the node i currently belongs and community information to which all nodes connected with the node i belong as a set S;
step 6: step 3-step 5 is executed again according to the preset times; dividing iteration calculation rounds into odd rounds and even rounds after the preset times; the node i and the node v are combined j Connected, wherein the node v j Belongs to the community with the maximum variation of the node i; when the iteration rounds are odd rounds and the variation of the current community is smaller than the variation of the transferred community, updating community information; or when the iteration round is an even round and the variation of the current community is larger than the variation of the transferred community, updating the community information;
step 7: judging whether all nodes do not change the community;
step 8: if all nodes do not change the communities, reconstructing a second network diagram according to the current community information, wherein each vertex in the second network diagram corresponds to one community in the first network diagram; the internal weight of the vertex in the second network diagram is the sum of the weights of all the edges in the community in the first network diagram;
step 9: step 3-step 8 are executed again to carry out iterative updating until all nodes do not change the affiliated communities any more;
step 10: and calculating the probability of the fraudulent communities according to the number of the users in the community information, and evaluating the fraud risk of the single user according to the community information.
In the embodiment of the present invention, optionally, the obtaining the module degree change when each node i is transferred from one community to another community includes:
calculating the change of the modularity, wherein the change of the modularity is as follows:
wherein (1)>Is a community ci The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; k (k) i,in The sum of the edge weights of the node i and the community; k (k) j,in The sum of the edge weights of the node j and the community; m is the sum of the weights of all the edges in the graph.
In the embodiment of the present invention, optionally, the calculating the probability of the fraudulent community according to the number of users in the community information includes:
the probability calculation formula of the fraudulent community is as follows:wherein, |c i I is c i Total number of users in community; y is j The data is marked for the person of the jth user stored in the database.
In an embodiment of the present invention, optionally, the evaluating the fraud risk of the single user according to the community information includes:
the common fraudulent user and the core fraudulent user are identified through an intermediary centrality algorithm, and the single user fraud risk is evaluated.
The network side device provided by the embodiment of the present invention can implement each process implemented by the method for identifying financial fraud in the method embodiment of fig. 1, and in order to avoid repetition, a description is omitted here.
Referring to fig. 4, the embodiment of the present invention further provides a server 40, which includes a processor 41, a memory 42, and a computer program stored in the memory 42 and capable of running on the processor 41, where the computer program, when executed by the processor 41, implements the processes of the above-mentioned method embodiment for identifying financial fraud, and can achieve the same technical effects, and for avoiding repetition, will not be repeated here.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above-mentioned method embodiment for identifying financial fraud, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a terminal, etc.) to perform the method according to the embodiments of the present invention.
The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims (10)

1. A method of identifying financial fraud, comprising:
acquiring user data;
preprocessing the user data to obtain a user relationship network, wherein each user in the user relationship network is used as a node;
inputting the user relation network into a model for identifying financial fraud, and outputting fraud probability of the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
2. The method of identifying financial fraud as defined in claim 1, wherein,
the modularity is calculated by the following formula:
wherein A is ij The weight of the edge between the node i and the node j; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
3. The method of identifying financial fraud as defined in claim 2, wherein said inputting the user relationship network into a model identifying financial fraud and outputting a fraud probability for the user includes:
step 1: constructing a first network diagram according to the user relation network, wherein each user in the first network diagram is used as a node;
step 2: calculating communities to which each node in the first network graph belongs;
step 3: obtaining a change in modularity of each node i when transferring from one community to each community adjacent to the node i;
step 4: calculating community c 'with maximum variation of modularity when dividing node i into communities' i =arg maxΔQ(c i ) Taking the community with the largest variation as the current community of the node i;
step 5: for each node i, obtaining community information to which the node i currently belongs and community information to which all nodes connected with the node i belong as a set S;
step 6: step 3-step 5 is executed again according to the preset times; dividing iteration calculation rounds into odd rounds and even rounds after the preset times; the node i and the node v are combined j Connected, wherein the node v j Belongs to the community with the maximum variation of the node i; when the iteration rounds are odd rounds and the variation of the current community is smaller than the variation of the transferred community, updating community information; or when the iteration round is an even round and the variation of the current community is larger than the variation of the transferred community, updating the community information;
step 7: judging whether all nodes do not change the community;
step 8: if all nodes do not change the communities, reconstructing a second network diagram according to the current community information, wherein each vertex in the second network diagram corresponds to one community in the first network diagram; the internal weight of the vertex in the second network diagram is the sum of the weights of all the edges in the community in the first network diagram;
step 9: step 3-step 8 are executed again to carry out iterative updating until all nodes do not change the affiliated communities any more;
step 10: and calculating the probability of the fraudulent communities according to the number of the users in the community information, and evaluating the fraud risk of the single user according to the community information.
4. A method of identifying financial fraud as defined in claim 3, wherein said obtaining a change in modularity for each node i as it transitions from one community to another community comprises:
calculating the change of the modularity, wherein the change of the modularity is as follows:wherein (1)>For community c i The edge weight sum of each node in the community comprises an edge with a node in the community and an edge with a node outside the community; c i Is the ith community; k (k) i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; k (k) i,in The sum of the edge weights of the node i and the community; k (k) j,in The sum of the edge weights of the node j and the community; m is the sum of the weights of all the edges in the graph.
5. A method of identifying financial fraud as defined in claim 3, wherein said calculating a probability of a fraudulent community from the number of users in said community information comprises:
the probability calculation formula of the fraudulent community is as follows:wherein, |c i I is c i Total number of users in community; y is j The data is marked for the person of the jth user stored in the database.
6. A method of identifying financial fraud as defined in claim 3, wherein said evaluating fraud risk for individual users based on said community information comprises:
the common fraudulent user and the core fraudulent user are identified through an intermediary centrality algorithm, and the single user fraud risk is evaluated.
7. An apparatus for identifying financial fraud, comprising:
the acquisition module is used for acquiring user data;
the preprocessing module is used for preprocessing the user data to obtain a user relationship network, and each user in the user relationship network is used as a node;
the identification module is used for inputting the user relation network into a financial fraud identification model and outputting fraud probability of the user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities.
8. The apparatus for identifying financial fraud as defined in claim 7,
the modularity is calculated by the following formula:
wherein A is ij Weighting edges between nodes i and j;k i The sum of the weights of all edges connected with the node i; k (k) j The sum of the weights of all edges connected with the node j; c i The community to which the node i belongs; m is the sum of the weights of all sides;is the degree of tightness of the connection of two communities.
9. A server, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the method of identifying financial fraud as defined in any of claims 1 to 6.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of the method of identifying financial fraud as defined in any of claims 1-6.
CN202210386438.2A 2022-04-11 2022-04-11 Method and device for identifying financial fraud Pending CN116957770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210386438.2A CN116957770A (en) 2022-04-11 2022-04-11 Method and device for identifying financial fraud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210386438.2A CN116957770A (en) 2022-04-11 2022-04-11 Method and device for identifying financial fraud

Publications (1)

Publication Number Publication Date
CN116957770A true CN116957770A (en) 2023-10-27

Family

ID=88455182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210386438.2A Pending CN116957770A (en) 2022-04-11 2022-04-11 Method and device for identifying financial fraud

Country Status (1)

Country Link
CN (1) CN116957770A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455660A (en) * 2023-12-25 2024-01-26 浙江邦盛科技股份有限公司 Financial real-time safety detection system, method, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117455660A (en) * 2023-12-25 2024-01-26 浙江邦盛科技股份有限公司 Financial real-time safety detection system, method, equipment and storage medium
CN117455660B (en) * 2023-12-25 2024-05-24 浙江邦盛科技股份有限公司 Financial real-time safety detection system, method, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108364195B (en) User retention probability prediction method and device, prediction server and storage medium
CN110991474A (en) Machine learning modeling platform
CN111754044B (en) Employee behavior auditing method, device, equipment and readable storage medium
CN114491263B (en) Recommendation model training method and device, recommendation method and device
CN112365007B (en) Model parameter determining method, device, equipment and storage medium
CN107623924A (en) It is a kind of to verify the method and apparatus for influenceing the related Key Performance Indicator KPI of Key Quality Indicator KQI
CN110688484B (en) Microblog sensitive event speech detection method based on unbalanced Bayesian classification
CN115511606A (en) Object identification method, device, equipment and storage medium
CN116957770A (en) Method and device for identifying financial fraud
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN115203496A (en) Project intelligent prediction and evaluation method and system based on big data and readable storage medium
CN116361759B (en) Intelligent compliance control method based on quantitative authority guidance
CN115952438B (en) Social platform user attribute prediction method, system, mobile device and storage medium
CN111241297A (en) Map data processing method and device based on label propagation algorithm
CN116049678A (en) Feature contribution degree evaluation method, device, electronic equipment and storage medium
CN117011020A (en) Fraud identification method, network device and storage medium
CN110570301B (en) Risk identification method, device, equipment and medium
CN111815442A (en) Link prediction method and device and electronic equipment
CN117390079B (en) Data processing method and system for data center
CN111353576B (en) Information generation method, device and equipment based on fuzzy neural network
KR102343579B1 (en) Method for providing service using parents predicting model
CN113379455B (en) Order quantity prediction method and equipment
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
CN112419050B (en) Credit evaluation method and device based on telephone communication network and social behavior
CN113822309B (en) User classification method, apparatus and non-volatile computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination