CN116957770A

CN116957770A - Method and device for identifying financial fraud

Info

Publication number: CN116957770A
Application number: CN202210386438.2A
Authority: CN
Inventors: 汪海涛
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Shanghai ICT Co Ltd; CM Intelligent Mobility Network Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2023-10-27

Abstract

The invention provides a method for identifying financial fraud, which relates to the technical field of Internet application, and comprises the following steps: acquiring user data; preprocessing user data to obtain a user relationship network, wherein each user in the user relationship network is used as a node; inputting a model for identifying financial fraud into a user relationship network, and outputting fraud probability of a user; the model for identifying financial fraud determines the information of communities to which each user belongs by taking the module degree variation as an evaluation index according to the user relation network, and determines the fraud probability of the user in the communities according to the determined information of communities; the modularity is the weighted average of probability expectations obtained by subtracting edges from the proportion of the edges of the same community where the nodes fall, and the relationship strength between two nodes of different communities. According to the invention, the algorithm is realized, the large community and the small community are simultaneously considered, additional super parameters are required to be added, the algorithm accuracy is high, and the use is simple and convenient.

Description

A method and device for identifying financial fraud

技术领域Technical field

本发明涉及互联网应用技术领域，尤其涉及一种识别金融欺诈的方法及装置。The present invention relates to the field of Internet application technology, and in particular, to a method and device for identifying financial fraud.

背景技术Background technique

现有技术中，基于模块度优化的社团发现算法存在两种情况：在某些情况下，我们常识认为应该存在多个小社区的案例，它倾向于将多个小社区合为大社区，而在另一些情况下，我们常识认为应该合成大社区的案例，它倾向于不合成；这种问题对于运营商或者大型互联网公司，会变得更加明显，无法做以公司或学校为对象的关系挖掘。In the existing technology, there are two situations in the community discovery algorithm based on modularity optimization: in some cases, our common sense believes that there should be multiple small communities, which tends to merge multiple small communities into a large community, and In other cases, our common sense believes that cases of large communities should be synthesized, but it tends not to be synthesized; this problem will become more obvious for operators or large Internet companies, and it is impossible to do relationship mining targeting companies or schools. .

发明内容Contents of the invention

本发明实施例提供了一种识别金融欺诈的方法及装置，用于解决现有技术中对于工业环境中的复杂零件或者医疗领域的病人，刚性约束容易造成零件磨损或者人员受伤的问题。Embodiments of the present invention provide a method and device for identifying financial fraud, which are used to solve the problem in the prior art that for complex parts in industrial environments or patients in the medical field, rigid constraints can easily cause parts wear or personnel injuries.

为了解决上述问题，本发明是这样实现的：In order to solve the above problems, the present invention is implemented as follows:

第一方面，本发明提供了一种识别金融欺诈的方法，包括：In a first aspect, the present invention provides a method for identifying financial fraud, including:

获取用户数据；Get user data;

对所述用户数据进行预处理获得用户关系网络，所述用户关系网络中每个用户作为一个节点；Preprocess the user data to obtain a user relationship network, in which each user serves as a node;

将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率；所述识别金融欺诈的模型根据所述用户关系网络，以模块度变化量作为评价指标确定各个用户所属的社区的信息，根据确定的社区的信息，确定社区内的用户的欺诈概率；其中，所述模块度为所述节点落在同一社区的边的比例减去对边进行随机分配所得到的概率期望再加上不同社区的两个节点之间关系强度的加权平均值。The user relationship network is input into a model for identifying financial fraud, and the user's fraud probability is output; the model for identifying financial fraud uses the user relationship network to determine information about the community to which each user belongs, using the modularity change as an evaluation index. , determine the fraud probability of users in the community based on the information of the determined community; where the modularity is the proportion of edges where the node falls in the same community minus the probability expectation obtained by randomly assigning edges plus A weighted average of the relationship strengths between two nodes in different communities.

可选的，所述模块度采用如下公式计算：Optionally, the modularity is calculated using the following formula:

其中，A_ij为节点i和节点j之间边的权重；k_i为所有与节点i相连的边的权重之和；k_j为所有与节点j相连的边的权重之和；c_i为节点i所属的社区；m为所有边的权重之和；为两个社区的联系紧密程度。Among them, A _ij is the weight of the edge between node i and node j; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; c _i is the node The community i belongs to; m is the sum of the weights of all edges; The closeness of the connection between the two communities.

可选的，所述将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率，包括：Optionally, inputting the user relationship network into a model for identifying financial fraud and outputting the user's fraud probability includes:

步骤1：根据所述用户关系网络构造第一网络图，第一网络图中每个用户作为一个节点；Step 1: Construct a first network graph according to the user relationship network, with each user in the first network graph serving as a node;

步骤2：计算所述第一网络图中每个节点所属的社区；Step 2: Calculate the community to which each node in the first network graph belongs;

步骤3：获得每个节点i从一个社区转移到其邻接的每一个社区时模块度的变化；Step 3: Obtain the change in modularity of each node i when it moves from one community to each of its adjacent communities;

步骤4：计算将节点i划分到社区时模块度的变化量最大的社区c′_i＝argmaxΔQ(c_i)，将该变化量最大的社区作为节点i当前所属社区；Step 4: Calculate the community c′ _i =argmaxΔQ(c _i ) with the largest change in modularity when dividing node i into communities, and use the community with the largest change as the community to which node i currently belongs;

步骤5：对于每一个节点i，获得其当前所属社区信息以及与节点i相连的所有节点所属的社区信息，作为集合S；Step 5: For each node i, obtain the community information to which it currently belongs and the community information to which all nodes connected to node i belong, as a set S;

步骤6：按照预设次数再次执行步骤3-步骤5；预设次数后将迭代计算轮次划分为奇数轮和偶数轮；将所述节点i与节点v_j相连，其中，所述节点v_j属于使节点i变化量最大的社区；当迭代轮次为奇数轮，且当前社区的变化量小于转移后的社区的变化量，则更新社区信息；或迭代轮次为偶数轮，且当前社区的变化量大于转移后的社区的变化量时，则更新社区信息；Step 6: Perform steps 3 to 5 again according to the preset number of times; after the preset number of times, divide the iterative calculation rounds into odd-numbered rounds and even-numbered rounds; connect the node i to the node v _j , where the node v _j Belongs to the community that maximizes the change in node i; when the iteration round is an odd-numbered round and the change in the current community is less than the change in the community after the transfer, the community information is updated; or the iteration round is an even-numbered round and the change in the current community When the change amount is greater than the change amount of the transferred community, the community information is updated;

步骤7：判断是否所有节点都不再改变所属社区；Step 7: Determine whether all nodes will no longer change the community they belong to;

步骤8：若所有节点都不再改变所属社区，则按照当前社区信息重新构造第二网络图，其中，第二网络图中的每个顶点对应所述第一网络图中的一个社区；第二网络图中的顶点内部权重为所述第一网络图中社区内所有边的权重和；Step 8: If all nodes no longer change their belonging communities, reconstruct the second network graph according to the current community information, where each vertex in the second network graph corresponds to a community in the first network graph; second The internal weight of a vertex in the network graph is the sum of the weights of all edges in the community in the first network graph;

步骤9：再次执行步骤3-步骤8进行迭代更新，直到所有节点都不再改变所属社区；Step 9: Perform steps 3 to 8 again for iterative updates until all nodes no longer change their belonging communities;

步骤10：通过所述社区信息中用户的个数计算欺诈社区的概率，并根据所述社区信息评估单个用户的欺诈风险。Step 10: Calculate the probability of community fraud based on the number of users in the community information, and evaluate the fraud risk of a single user based on the community information.

可选的，所述获得每个节点i从一个社区转移到另一个社区时模块度的变化，包括：Optionally, obtaining the change in modularity of each node i when it moves from one community to another community includes:

计算所述模块度的变化，所述模块度的变化为：Calculate the change of the modularity, and the change of the modularity is:

其中，/>为社区c_i中各个节点的边权重和，包括与社区内节点的边和与社区外节点的边；c_i为第i个社区；k_i为所有与节点i相连的边的权重之和；k_j为所有与节点j相连的边的权重之和；k_i,in:节点i与社区的边权重之和；k_j,in:节点j与社区的边权重之和；m为图中所有边的权重之和。 Among them,/> is the sum of the edge weights of each node in community c _i , including the edges with nodes within the community and the edges with nodes outside the community; c _i is the i-th community; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; k _i,in : the sum of the edge weights of node i and the community; k _j,in : the sum of the edge weights of node j and the community; m is the sum of all edge weights in the graph The sum of edge weights.

可选的，所述通过所述社区信息中用户的个数计算欺诈社区的概率，包括：Optionally, calculating the probability of defrauding the community based on the number of users in the community information includes:

所述欺诈社区的概率计算公式为：其中，|c_i|为c_i社区内的总用户数；y_j为数据库中存储的第j个用户的个人已打标数据。The probability calculation formula of the fraud community is: Among them, |c _i | is the total number of users in the c _i community; y _j is the personal marked data of the jth user stored in the database.

可选的，所述根据所述社区信息评估单个用户的欺诈风险，包括：Optionally, the assessment of the fraud risk of an individual user based on the community information includes:

通过中介中心性算法识别普通欺诈用户和核心欺诈用户，评估单个用户欺诈风险。Identify common fraudulent users and core fraudulent users through the betweenness centrality algorithm, and assess individual user fraud risks.

第二方面，本发明提供了一种识别金融欺诈的装置，包括：In a second aspect, the present invention provides a device for identifying financial fraud, including:

获取模块，用于获取用户数据；Obtain module, used to obtain user data;

预处理模块，用于对所述用户数据进行预处理获得用户关系网络，所述用户关系网络中每个用户作为一个节点；A preprocessing module, used to preprocess the user data to obtain a user relationship network, with each user in the user relationship network serving as a node;

识别模块，用于将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率；所述识别金融欺诈的模型根据所述用户关系网络，以模块度变化量作为评价指标确定各个用户所属的社区的信息，根据确定的社区的信息，确定社区内的用户的欺诈概率；其中，所述模块度为所述节点落在同一社区的边的比例减去对边进行随机分配所得到的概率期望再加上不同社区的两个节点之间关系强度的加权平均值。An identification module, configured to input the user relationship network into a model for identifying financial fraud, and output the user's fraud probability; the model for identifying financial fraud determines each user based on the user relationship network, using the change in modularity as an evaluation index. The information of the community to which it belongs is used to determine the fraud probability of users in the community based on the information of the determined community; where the modularity is the proportion of the edges of the node falling in the same community minus the random allocation of edges. The probabilistic expectation is coupled with a weighted average of the relationship strengths between two nodes in different communities.

其中，/>为社区c_i中各个节点的边权重和，包括与社区内节点的边和与社区外节点的边；_ci为第i个社区；k_i为所有与节点i相连的边的权重之和；k_j为所有与节点j相连的边的权重之和；k_i,in:节点i与社区的边权重之和；k_j,in:节点j与社区的边权重之和；m为图中所有边的权重之和。 Among them,/> is the sum of the edge weights of each node in community c _i , including the edges with nodes within the community and the edges with nodes outside the community; _ci is the i-th community; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; k _i,in : the sum of the edge weights of node i and the community; k _j,in : the sum of the edge weights of node j and the community; m is all the edges in the graph The sum of the weights.

第三方面，本发明提供了一种服务器，包括：处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序，所述程序被所述处理器执行时实现如第一方面中任一项所述的识别金融欺诈的方法的步骤。In a third aspect, the present invention provides a server, including: a processor, a memory, and a program stored on the memory and executable on the processor. When the program is executed by the processor, the following is implemented: The steps of the method of identifying financial fraud according to any one of the aspects.

第四方面，本发明提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如第一方面中任一项所述的识别金融欺诈的方法的步骤。In a fourth aspect, the present invention provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the method described in any one of the first aspects is implemented. Steps in the method of identifying financial fraud.

本发明中，通过修正模块度定义的方法，改进社区发现算法，解决了无法兼顾大社区与小社区的问题，同时能够识别出公司这样的大社区和家庭这样的小社区，不需要加入另外的超参数，无需人工干预，算法准确性高，使用简便。In the present invention, by modifying the definition of modularity and improving the community discovery algorithm, the problem of being unable to take into account both large communities and small communities is solved. At the same time, large communities such as companies and small communities such as families can be identified without adding additional ones. Hyperparameters require no manual intervention, the algorithm is highly accurate and easy to use.

附图说明Description of the drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be construed as limiting the invention. Also throughout the drawings, the same reference characters are used to designate the same components. In the attached picture:

图1为本发明实施例提供的一种识别金融欺诈的方法的流程示意图；Figure 1 is a schematic flow chart of a method for identifying financial fraud provided by an embodiment of the present invention;

图2为本发明实施例提供的一种识别金融欺诈的方法的总体流程示意图；Figure 2 is an overall flow diagram of a method for identifying financial fraud provided by an embodiment of the present invention;

图3为本发明实施例提供的一种识别金融欺诈的装置的结构示意图；Figure 3 is a schematic structural diagram of a device for identifying financial fraud provided by an embodiment of the present invention;

图4为本发明实施例提供的另一种服务器的结构示意图。Figure 4 is a schematic structural diagram of another server provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

请参考图1，本发明实施例提供了一种识别金融欺诈的方法，包括：Please refer to Figure 1. An embodiment of the present invention provides a method for identifying financial fraud, including:

步骤11：获取用户数据；Step 11: Obtain user data;

步骤12：对所述用户数据进行预处理获得用户关系网络，所述用户关系网络中每个用户作为一个节点；Step 12: Preprocess the user data to obtain a user relationship network, with each user in the user relationship network serving as a node;

步骤13：将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率；所述识别金融欺诈的模型根据所述用户关系网络，以模块度变化量作为评价指标确定各个用户所属的社区的信息，根据确定的社区的信息，确定社区内的用户的欺诈概率；其中，所述模块度为所述节点落在同一社区的边的比例减去对边进行随机分配所得到的概率期望再加上不同社区的两个节点之间关系强度的加权平均值。Step 13: Input the user relationship network into a model for identifying financial fraud, and output the user's fraud probability; the model for identifying financial fraud uses the modularity change as an evaluation index to determine the model to which each user belongs based on the user relationship network. Community information, based on the determined community information, determines the fraud probability of users in the community; where the modularity is the proportion of edges where the node falls in the same community minus the probability expectation obtained by randomly assigning edges. Plus a weighted average of the relationship strengths between two nodes in different communities.

本发明实施例中，通过修正模块度定义的方法，改进社区发现算法，解决了无法兼顾大社区与小社区的问题，同时能够识别出公司这样的大社区和家庭这样的小社区，不需要加入另外的超参数，无需人工干预，算法准确性高，使用简便。In the embodiment of the present invention, by modifying the definition of modularity, the community discovery algorithm is improved, which solves the problem of being unable to take into account both large communities and small communities. At the same time, it is possible to identify large communities such as companies and small communities such as families, without joining. Additional hyperparameters require no manual intervention, the algorithm is highly accurate and easy to use.

本发明实施例中，在步骤11中，在基站侧通过嵌入式数据采集设备，采集用户的通话、短信、设备、APP使用及网站访问等数据；在步骤12中，对采集的数据进行分类、检查，修正或删除错误的数据，标记不可靠的数据，导入生产数据库中；并对采集的数据进行分析处理，通过归一化，标准化等方法，将数据统一到同一量级，得到用户的关系网络，所述用户关系网络中每个用户作为一个节点。In the embodiment of the present invention, in step 11, the base station side collects data on the user's calls, text messages, equipment, APP usage, website visits and other data through the embedded data collection device; in step 12, the collected data is classified, Check, correct or delete erroneous data, mark unreliable data, and import it into the production database; analyze and process the collected data, unify the data to the same level through normalization, standardization and other methods, and obtain the relationship between users Network, each user in the user relationship network serves as a node.

本发明实施例中，在步骤13中，在得到用户的关系网络之后，通过识别金融欺诈的模型，挖掘具有强关联关系的用户社区，分析社区中用户的欺诈情况，对每个用户得出一个欺诈概率；并且采用Nebula+Plato的部署方式，方便外部金融机构调用模型输出的结果，所述部署方式可容纳数据量大，搜索速度快，适用于大数据情况下的环境部署；其中，模型的调用过程包括：外部金融机构通过Nebula数据库提供的接口，调用所述识别金融欺诈的模型；当外部金融机构输入用户的姓名、手机号和身份证即可对应输出用户的欺诈概率。In the embodiment of the present invention, in step 13, after obtaining the user's relationship network, a user community with strong correlations is mined through a model for identifying financial fraud, the fraud situation of users in the community is analyzed, and a result is obtained for each user. Fraud probability; and the deployment method of Nebula+Plato is used to facilitate external financial institutions to call the results output by the model. The deployment method can accommodate large amounts of data, has fast search speed, and is suitable for environment deployment in the case of big data; among them, the model The calling process includes: the external financial institution calls the model for identifying financial fraud through the interface provided by the Nebula database; when the external financial institution inputs the user's name, mobile phone number and ID card, the user's fraud probability can be correspondingly output.

本发明实施例中，可选的，所述模块度采用如下公式计算：In the embodiment of the present invention, optionally, the modularity is calculated using the following formula:

本发明实施例中，模块度也称模块化度量值，用于衡量网络社区结构强度，所述模块度为：其在现有技术的模块度公式/>基础上增设了修正项，即惩罚项:/>现有技术中的模块度公式/>含义为落在同一社区内边的比例减去对这些边进行随机分配所得到的概率期望；所述修正项/>的含义为不同社区的两两节点之间关系强度的加权平均；其中，/>表示不同社区的两两节点之间关系强度的平均值，该项使得，如果两个小社区之间有较多的连边，那么他们之间的修正项会很小，即是一个绝对值较大的负数，因此在下一轮迭代中，会趋向于合并这两个小社区，使得这两个小社区合并成一个大社区再加上了权重/>使得当两个小社区之间联系不紧密时，权重就会很小，修正项趋于0，是否合并基本不会改变修正项的值，而/>会略有下降，因此下一轮迭代不会合并这两个小社区使得合并关系紧密的社区，不合并关系不紧密的社区，同时兼顾大社区和小社区。In the embodiment of the present invention, the modularity is also called the modularity metric value, which is used to measure the strength of the network community structure. The modularity is: Its modularity formula in the prior art/> On the basis of this, a correction item is added, that is, a penalty item:/> Modularity formula in the prior art/> The meaning is the proportion of edges falling within the same community minus the probability expectation obtained by randomly assigning these edges; the correction term/> The meaning of is the weighted average of the relationship strength between two nodes in different communities; where, /> Represents the average relationship strength between two nodes in different communities. This term makes it so that if there are many connected edges between two small communities, then the correction term between them will be very small, that is, a smaller absolute value. A large negative number, so in the next iteration, the two small communities will tend to be merged, so that the two small communities are merged into one large community and added with weights/> This makes it so that when the connection between two small communities is not close, the weight will be very small, and the correction term will tend to 0. Whether to merge or not will not change the value of the correction term, and /> There will be a slight decrease, so the next iteration will not merge these two small communities, so that the closely related communities will be merged, and the less closely related communities will not be merged, taking into account both large communities and small communities.

本发明实施例中，将所述模块度公式变形为：其中，/>为社区c_i内的边权重之和；/>为社区c_i中各个节点的边权重和，包括与社区内节点的边和与社区外节点的边；c_i为第i个社区；m为用户关系网络中所有边的权重之和；/>为社区c_i中节点与社区c_j中节点之间的边权重之和；/>社区c_i中节点与社区c_j中节点之间的联系紧密度；其中，/>|c_i|为社区c_i中的节点数，|c_j|为社区c_j中节点数；In the embodiment of the present invention, the modularity formula is transformed into: Among them,/> is the sum of edge weights within community c _i ;/> is the sum of the edge weights of each node in community c _i , including the edges with nodes within the community and the edges with nodes outside the community; c _i is the i-th community; m is the sum of the weights of all edges in the user relationship network;/> is the sum of edge weights between nodes in community c _i and nodes in community c _j ;/> The closeness of connection between the nodes in community c _i and the nodes in community c _j ; where, /> |c _i | is the number of nodes in community c _i , |c _j | is the number of nodes in community c _j ;

计算当一个节点加入一个社区时，模块度的第一变化为：其中，/>为社区c_i内的边权重之和；/>为社区c_i中各个节点的边权重和，包括与社区内节点的边和与社区外节点的边；c_i为第i个社区；k_i为所有与节点i相连的边的权重之和；k_i,in为节点与社区的边权重之和；m为用户关系网络中所有边的权重之和；对模块度的第一变化进行化简为： Calculate the first change in modularity when a node joins a community as: Among them,/> is the sum of edge weights within community c _i ;/> is the sum of the edge weights of each node in community c _i , including the edges with nodes within the community and the edges with nodes outside the community; c _i is the i-th community; k _i is the sum of the weights of all edges connected to node i; k _i,in is the sum of the edge weights of nodes and communities; m is the sum of the weights of all edges in the user relationship network; the first change in modularity is simplified as:

计算一个节点从一个社区转移到另一个社区，模块度的第二变化为：化简后为：实际计算时，只需要判断中括号中的值是否大于0即可。Calculating the second change in modularity when a node moves from one community to another is: Simplified to: In actual calculation, you only need to determine whether the value in the square brackets is greater than 0.

本发明实施例中，可选的，所述将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率，包括：In the embodiment of the present invention, optionally, inputting the user relationship network into a model for identifying financial fraud and outputting the user's fraud probability includes:

步骤1：根据所述用户关系网络构造第一网络图G(V,E)，第一网络图中每个用户作为一个节点；其中，V为顶点的集合，E为边的集合；并对所述第一网络图G(V,E)进行分布式存储；Step 1: Construct a first network graph G(V,E) based on the user relationship network, with each user in the first network graph serving as a node; where V is a set of vertices and E is a set of edges; and The first network graph G(V,E) is distributed and stored;

步骤3：调用模块度的第二变化公式获得每个节点i从一个社区转移到其邻接的每一个社区时模块度的变化；Step 3: Call the second change formula of modularity to obtain the change in modularity of each node i when it moves from one community to each of its adjacent communities;

步骤4：计算将节点i划分到社区时模块度的变化量最大的社区c′_i＝argmaxΔQ(c_i)，将该变化量最大的社区作为节点i当前所属社区；其中，ΔQ(c_i)为将节点i划分到社区c_i时模块度的改变量；Step 4: Calculate the community c′ _i =argmaxΔQ(c _i ) with the largest change in modularity when dividing node i into communities, and use the community with the largest change as the community to which node i currently belongs; where, ΔQ(c _i ) is the change in modularity when node i is divided into community c _i ;

步骤8：若所有节点都不再改变所属社区，则按照当前社区信息重新构造第二网络图G′(V′,E′)，其中，第二网络图中的每个顶点对应所述第一网络图中的一个社区；第二网络图中的顶点内部权重为所述第一网络图中社区内所有边的权重和；其中，V′为顶点的集合，E′为边的集合；第二网络图G′(V′,E′)中的每个顶点对应第一网络图G(V,E)中的一个社区，将第一网络图G(V,E)中的社区身份标识作为第二网络图G′(V′,E′)中的顶点身份标识；第二网络图G′(V′,E′)中的顶点内部权重为第一网络图G(V,E)中社区内所有边的权重和；第二网络图G′(V′,E′)中的边为第一网络图G(V,E)中社区之间的边；Step 8: If all nodes no longer change their belonging communities, reconstruct the second network graph G′(V′,E′) according to the current community information, where each vertex in the second network graph corresponds to the first A community in the network graph; the internal weight of the vertices in the second network graph is the sum of the weights of all edges in the community in the first network graph; where V′ is the set of vertices and E′ is the set of edges; second Each vertex in the network graph G′(V′,E′) corresponds to a community in the first network graph G(V,E), and the community identity in the first network graph G(V,E) is used as the The identity of the vertex in the second network graph G′(V′,E′); the internal weight of the vertex in the second network graph G′(V′,E′) is the community in the first network graph G(V,E) The sum of the weights of all edges; the edges in the second network graph G′(V′,E′) are the edges between communities in the first network graph G(V,E);

本发明实施例中，基于对模块度重新定义，提出了一种新的社区发现算法，通过迭代计算来划分社区，并结合已有黑样本数据和社区划分结果，使得不管是小型金融公司几百万的用户量还是大型互联网公司十几亿的数据量，都可以使用本方法进行计算，避免在数据量较大时会发生严重的聚合现象。In the embodiment of the present invention, based on the redefinition of modularity, a new community discovery algorithm is proposed, which divides communities through iterative calculations, and combines existing black sample data and community division results, so that whether it is a small financial company or hundreds of This method can be used to calculate whether the user volume is 10,000 or the data volume of a large Internet company is more than one billion to avoid serious aggregation phenomena that may occur when the data volume is large.

本发明实施例中，可选的，所述获得每个节点i从一个社区转移到另一个社区时模块度的变化，包括：In the embodiment of the present invention, optionally, obtaining the change in modularity of each node i when it moves from one community to another community includes:

本发明实施例中，可选的，所述通过所述社区信息中用户的个数计算欺诈社区的概率，包括：In this embodiment of the present invention, optionally, calculating the probability of defrauding a community based on the number of users in the community information includes:

本发明实施例中，可选的，所述根据所述社区信息评估单个用户的欺诈风险，包括：In this embodiment of the present invention, optionally, the assessment of a single user's fraud risk based on the community information includes:

本发明实施例中，将欺诈社区中的用户和关系提取出来单独成一个子图，用中介中心性算法计算欺诈社区中单独某个用户的风险；首先计算欺诈社区中每个用户的重要性，所述欺诈社区中每个用户的重要性为：其中r是节点s与节点t之间最短路径的数量；r(v_j)是节点s与节点t之间通过v_j的最短路径的数量；c_i是节点v_j所属社区；In the embodiment of the present invention, the users and relationships in the fraud community are extracted into a separate subgraph, and the betweenness centrality algorithm is used to calculate the risk of a single user in the fraud community; first, the importance of each user in the fraud community is calculated, The importance of each user in the fraud community is: where r is the number of shortest paths between node s and node t; r(v _j ) is the number of shortest paths between node s and node t through v _j ; c _i is the community to which node v _j belongs;

根据社区整体的欺诈概率和社区中每个用户的重要性得到具体每个用户的欺诈概率为：其中，|c_i|为社区中的用户数目；P(c_i)为社区整体的风险水平；/>为归一化系数。According to the overall fraud probability of the community and the importance of each user in the community, the specific fraud probability of each user is: Among them, |c _i | is the number of users in the community; P(c _i ) is the overall risk level of the community;/> is the normalization coefficient.

本发明实施例中，通过中介中心性算法进一步识别普通欺诈用户和核心欺诈用户，准确评估单个用户欺诈风险，通过对核心欺诈用户的监控，防范风险，提高了风险控制精度。In the embodiment of the present invention, common fraudulent users and core fraudulent users are further identified through the intermediary centrality algorithm, and the fraud risk of a single user is accurately assessed. By monitoring core fraudulent users, risks are prevented and risk control accuracy is improved.

本发明实施例中，基于对模块度重新定义，提出了一种新的社区发现算法，通过迭代计算来划分社区，并结合已有黑样本数据和社区划分结果，利用中介中心性来具体评估社区中各个用户的风险，实现了在金融风控领域的工业落地。In the embodiment of the present invention, based on the redefinition of modularity, a new community discovery algorithm is proposed, which divides communities through iterative calculations, and combines existing black sample data and community division results to use betweenness centrality to specifically evaluate communities. Risks of each user are eliminated, achieving industrial implementation in the field of financial risk control.

请参考图2，本发明实施例中，基于新的模块度定义方式，得到使得模块度最大化的迭代算法，以此迭代算法为核心，进行识别金融欺诈；首先，进行边缘设备数据采集，包括：在基站侧通过嵌入式数据采集设备，采集用户的通话、短信、设备、APP使用及网站访问等数据；其次，进行数据清洗入库，包括：对采集的数据进行分类、检查，修正或删除错误的数据，标记不可靠的数据，导入生产数据库中；再次，进行数据预处理，包括：对关系数据进行分析处理，通过归一化，标准化等方法，将数据统一到同一量级，得到用户的关系网络，方便后续做进一步计算；再次，进行模型训练，包括：在得到用户的关系网络之后，利用社区发现算法，挖掘具有强关联关系的用户社区，分析社区中用户的欺诈情况，对每个用户得出一个欺诈概率；其中，所属模型训练即对所述识别金融欺诈的模型进行使用；最后，进行模型部署，包括：采用Nebula+Plato的部署方式，方便外部金融机构调用模型输出的结果，所述部署方式可容纳数据量大，搜索速度快，适用于大数据情况下的环境部署；其中，模型的调用过程包括：外部金融机构通过Nebula数据库提供的接口，调用所述识别金融欺诈的模型；当外部金融机构输入用户的姓名、手机号和身份证即可对应输出用户的欺诈概率。Please refer to Figure 2. In the embodiment of the present invention, based on the new modularity definition method, an iterative algorithm that maximizes modularity is obtained. With this iterative algorithm as the core, financial fraud is identified; first, edge device data collection is performed, including : On the base station side, the user's data such as calls, text messages, equipment, APP usage and website visits are collected through embedded data collection equipment; secondly, the data is cleaned and stored in the database, including: classifying, checking, correcting or deleting the collected data. Wrong data, marked unreliable data, are imported into the production database; again, data preprocessing is performed, including: analyzing and processing relational data, and unifying the data to the same level through normalization, standardization and other methods to obtain user The relationship network is convenient for further calculations; thirdly, model training is carried out, including: after obtaining the user's relationship network, use the community discovery algorithm to mine user communities with strong relationships, analyze the fraud situation of users in the community, and analyze the fraud situation of each user. Each user obtains a fraud probability; among them, the model training is to use the model for identifying financial fraud; finally, the model is deployed, including: using the Nebula+Plato deployment method to facilitate external financial institutions to call the results output by the model , the deployment method can accommodate large amounts of data, has fast search speed, and is suitable for environmental deployment in the case of big data; wherein, the calling process of the model includes: external financial institutions call the financial fraud identification method through the interface provided by the Nebula database Model; when an external financial institution inputs the user's name, mobile phone number and ID card, it can output the user's fraud probability accordingly.

本发明实施例中，通过新的模块度定义在现有技术的基础上对小的社区进行惩罚，同时又通过调整惩罚系数，保留比较明显的小社区，有效解决了基于模块度的社区发现算法无法兼顾大社区与小社区的问题；并推导出了新的社区发现算法，使得结果社区准确度更高；可以应用于不同数据量大场景，不管是小型金融公司几百万的用户量还是大型互联网公司十几亿的数据量，都可以使用本技术方案进行计算；并通过中介中心性的方法，进一步发掘欺诈社区中的核心欺诈用户和边缘欺诈用户，准确识别欺诈组织者，提高了风险控制精度，方便快捷。In the embodiment of the present invention, small communities are punished based on the existing technology through a new definition of modularity, and at the same time, relatively obvious small communities are retained by adjusting the penalty coefficient, effectively solving the community discovery algorithm based on modularity. It cannot take into account the problems of large communities and small communities; and deduces a new community discovery algorithm, making the resulting community more accurate; it can be applied to different scenarios with large amounts of data, whether it is a small financial company with millions of users or a large The data volume of more than one billion Internet companies can be calculated using this technical solution; and through the intermediary centrality method, we can further discover core fraud users and edge fraud users in the fraud community, accurately identify fraud organizers, and improve risk control. Accuracy, convenience and speed.

请参考图3，本发明提供了一种识别金融欺诈的装置，包括：Please refer to Figure 3. The present invention provides a device for identifying financial fraud, including:

获取模块31，用于获取用户数据；Acquisition module 31, used to obtain user data;

预处理模块32，用于对所述用户数据进行预处理获得用户关系网络，所述用户关系网络中每个用户作为一个节点；The preprocessing module 32 is used to preprocess the user data to obtain a user relationship network, with each user in the user relationship network serving as a node;

识别模块33，用于将所述用户关系网络输入识别金融欺诈的模型，并输出用户的欺诈概率；所述识别金融欺诈的模型根据所述用户关系网络，以模块度变化量作为评价指标确定各个用户所属的社区的信息，根据确定的社区的信息，确定社区内的用户的欺诈概率；其中，所述模块度为所述节点落在同一社区的边的比例减去对边进行随机分配所得到的概率期望再加上不同社区的两个节点之间关系强度的加权平均值。The identification module 33 is used to input the user relationship network into a model for identifying financial fraud, and output the user's fraud probability; the model for identifying financial fraud uses the change in modularity as an evaluation index to determine each individual user based on the user relationship network. The information of the community to which the user belongs is used to determine the fraud probability of users in the community based on the information of the determined community; where the modularity is the proportion of edges where the node falls in the same community minus the random allocation of edges. The probability expectation of , plus the weighted average of the relationship strengths between two nodes in different communities.

其中，/>为社区_ci中各个节点的边权重和，包括与社区内节点的边和与社区外节点的边；c_i为第i个社区；k_i为所有与节点i相连的边的权重之和；k_j为所有与节点j相连的边的权重之和；k_i,in:节点i与社区的边权重之和；k_j,in:节点j与社区的边权重之和；m为图中所有边的权重之和。 Among them,/> is the sum of edge weights of each node in community _c , including the edges with nodes within the community and the edges with nodes outside the community; c _i is the i-th community; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; k _i,in : the sum of the edge weights of node i and the community; k _j,in : the sum of the edge weights of node j and the community; m is all the edges in the graph The sum of the weights.

本发明实施例提供的网络侧设备能够实现图1的方法实施例中识别金融欺诈的方法实现的各个过程，为避免重复，这里不再赘述。The network side device provided by the embodiment of the present invention can implement each process of the method for identifying financial fraud in the method embodiment of Figure 1. To avoid duplication, details will not be described here.

请参考图4，本发明实施例还提供一种服务器40，包括处理器41，存储器42，存储在存储器42上并可在所述处理器41上运行的计算机程序，该计算机程序被处理器41执行时实现上述识别金融欺诈的方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Please refer to Figure 4. This embodiment of the present invention also provides a server 40, which includes a processor 41, a memory 42, and a computer program stored on the memory 42 and executable on the processor 41. The computer program is processed by the processor 41. During execution, each process of the above embodiment of the method for identifying financial fraud is realized, and the same technical effect can be achieved. To avoid repetition, details will not be described here.

本发明实施例还提供一种计算机可读存储介质，所述计算机可读存储介质上存储计算机程序，所述计算机程序被处理器执行时实现上述识别金融欺诈的方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。其中，所述的计算机可读存储介质，如只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等。Embodiments of the present invention also provide a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned method embodiment for identifying financial fraud is implemented, and can To achieve the same technical effect, to avoid repetition, we will not repeat them here. Wherein, the computer-readable storage medium is such as read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者终端等)执行本发明各个实施例所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence or the part that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), including several instructions to cause a terminal (which can be a mobile phone, computer, server, air conditioner, or terminal, etc.) to execute the methods described in various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述，但是本发明并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本发明的启示下，在不脱离本发明宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings. However, the present invention is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Under the inspiration of the present invention, many forms can be made without departing from the spirit of the present invention and the scope protected by the claims, all of which fall within the protection of the present invention.

Claims

1. A method for identifying financial fraud, characterized by:

Get user data;

Preprocess the user data to obtain a user relationship network, in which each user serves as a node;

The user relationship network is input into a model for identifying financial fraud, and the user's fraud probability is output; the model for identifying financial fraud uses the user relationship network to determine information about the community to which each user belongs, using the modularity change as an evaluation index. , determine the fraud probability of users in the community based on the information of the determined community; where the modularity is the proportion of edges where the node falls in the same community minus the probability expectation obtained by randomly assigning edges plus A weighted average of the relationship strengths between two nodes in different communities.

2. The method for identifying financial fraud according to claim 1, characterized in that,

The modularity is calculated using the following formula:

Among them, A _ij is the weight of the edge between node i and node j; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; c _i is the node The community i belongs to; m is the sum of the weights of all edges; The closeness of the connection between the two communities.

3. The method for identifying financial fraud according to claim 2, characterized in that said inputting the user relationship network into a model for identifying financial fraud and outputting the user's fraud probability includes:

Step 1: Construct a first network graph according to the user relationship network, with each user in the first network graph serving as a node;

Step 2: Calculate the community to which each node in the first network graph belongs;

Step 3: Obtain the change in modularity of each node i when it moves from one community to each of its adjacent communities;

Step 4: Calculate the community c′ _i =arg maxΔQ(c _i ) with the largest change in modularity when dividing node i into communities, and use the community with the largest change as the community to which node i currently belongs;

Step 5: For each node i, obtain the community information to which it currently belongs and the community information to which all nodes connected to node i belong, as a set S;

Step 6: Perform steps 3 to 5 again according to the preset number of times; after the preset number of times, divide the iterative calculation rounds into odd-numbered rounds and even-numbered rounds; connect the node i to the node v _j , where the node v _j Belongs to the community that maximizes the change in node i; when the iteration round is an odd-numbered round and the change in the current community is less than the change in the community after the transfer, the community information is updated; or the iteration round is an even-numbered round and the change in the current community When the change amount is greater than the change amount of the transferred community, the community information is updated;

Step 7: Determine whether all nodes will no longer change the community they belong to;

Step 8: If all nodes no longer change their belonging communities, reconstruct the second network graph according to the current community information, where each vertex in the second network graph corresponds to a community in the first network graph; second The internal weight of a vertex in the network graph is the sum of the weights of all edges in the community in the first network graph;

Step 9: Perform steps 3 to 8 again for iterative updates until all nodes no longer change their belonging communities;

Step 10: Calculate the probability of community fraud based on the number of users in the community information, and evaluate the fraud risk of a single user based on the community information.

4. The method for identifying financial fraud according to claim 3, characterized in that obtaining the change in modularity of each node i when it moves from one community to another community includes:

Calculate the change of the modularity, and the change of the modularity is: Among them,/> is the sum of the edge weights of each node in community c _i , including the edges with nodes within the community and the edges with nodes outside the community; c _i is the i-th community; k _i is the sum of the weights of all edges connected to node i; k _j is the sum of the weights of all edges connected to node j; k _i,in : the sum of the edge weights of node i and the community; k _j,in : the sum of the edge weights of node j and the community; m is the sum of all edge weights in the graph The sum of edge weights.

5. The method for identifying financial fraud according to claim 3, wherein calculating the probability of a fraudulent community based on the number of users in the community information includes:

The probability calculation formula of the fraud community is: Among them, |c _i | is the total number of users in the c _i community; y _j is the personal marked data of the jth user stored in the database.

6. The method for identifying financial fraud according to claim 3, characterized in that said assessing the fraud risk of a single user based on the community information includes:

Identify common fraudulent users and core fraudulent users through the betweenness centrality algorithm, and assess individual user fraud risks.

7. A device for identifying financial fraud, characterized by including:

Obtain module, used to obtain user data;

A preprocessing module, used to preprocess the user data to obtain a user relationship network, with each user in the user relationship network serving as a node;

An identification module, configured to input the user relationship network into a model for identifying financial fraud, and output the user's fraud probability; the model for identifying financial fraud determines each user based on the user relationship network, using the change in modularity as an evaluation index. The information of the community to which it belongs is used to determine the fraud probability of users in the community based on the information of the determined community; where the modularity is the proportion of the edges of the node falling in the same community minus the random allocation of edges. The probabilistic expectation is coupled with a weighted average of the relationship strengths between two nodes in different communities.

8. The device for identifying financial fraud according to claim 7, characterized in that,

The modularity is calculated using the following formula:

9. A server, characterized in that it includes: a processor, a memory, and a program stored on the memory and executable on the processor. When the program is executed by the processor, it implements claim 1 -The steps of the method of identifying financial fraud described in any of 6.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the method of any one of claims 1-6 is implemented. Steps in the method of identifying financial fraud.