CN110046929B

CN110046929B - Fraudulent party identification method and device, readable storage medium and terminal equipment

Info

Publication number: CN110046929B
Application number: CN201910184810.XA
Authority: CN
Inventors: 毕文智; 谢波
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2023-06-20
Anticipated expiration: 2039-03-12
Also published as: CN110046929A

Abstract

The invention belongs to the technical field of computers, and particularly relates to a method and a device for identifying fraudulent groups, a computer readable storage medium and terminal equipment. The method comprises the steps of respectively extracting data interaction records among users from a user database; constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph; carrying out community division on the relation graph to obtain communities of all users; respectively calculating the data unbalance degree of each user community according to the data interaction records; and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through the data interaction condition of the user, so that the identification efficiency is greatly improved compared with the identification aiming at a single user.

Description

Fraudulent party identification method and device, readable storage medium and terminal equipment

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a method and a device for identifying fraudulent groups, a computer readable storage medium and terminal equipment.

Background

With the increasing development of the internet, the internet technology and services industry continues to combine, deriving a wide variety of internet services. Among them, the development of internet financial services is particularly remarkable, and with the development of the mobile internet, internet finance brings great convenience to the life of users.

Today, where internet finances are increasingly developed, a number of lawbreakers, especially rogue user communities forming organizations, are presented with extremely high financial costs and huge losses to internet finance companies. Aiming at products such as small loans, stage and the like in the internet financial industry and related sales promotion activities, lawbreakers are gradually changed from individual fraud to organized group fraud, and a large amount of 'wool-pulling' behaviors are carried out, so that huge losses are caused to internet financial companies.

At present, internet finance companies usually identify fraud risks according to individual information of users and perform wind control processing, namely, the internet finance companies identify individual users, lack identification means for fraud groups, and are low in identification efficiency.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a method, an apparatus, a computer readable storage medium, and a terminal device for identifying a rogue group, so as to solve the problem that in the prior art, identification is performed for a single user, identification means for a rogue group is lacking, and identification efficiency is low.

A first aspect of an embodiment of the present invention provides a method for identifying a fraudulent group, which may include:

respectively extracting data interaction records among all users from a preset user database;

constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph;

carrying out community division on the relation graph to obtain communities of all users;

respectively calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;

and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group.

A second aspect of an embodiment of the present invention provides a fraud group identification apparatus, which may include:

the data interaction record extraction module is used for respectively extracting data interaction records among all users in a preset user database;

the relationship diagram construction module is used for constructing a relationship diagram among the users according to the data interaction records, wherein each user is used as the vertex of the relationship diagram, and the data interaction relationship among the users is used as the edge of the relationship diagram;

the user community dividing module is used for carrying out community division on the relation graph to obtain each user community;

the data unbalance degree calculation module is used for calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;

and the fraud group selection module is used for selecting the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraud groups.

A third aspect of embodiments of the present invention provides a computer readable storage medium storing computer readable instructions which when executed by a processor perform the steps of:

A fourth aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions to perform the steps of:

Compared with the prior art, the embodiment of the invention has the beneficial effects that: according to the embodiment of the invention, firstly, data interaction records among all users are respectively extracted from a preset user database, a relation graph among all users is constructed according to the data interaction records, wherein all users are respectively used as vertexes of the relation graph, the data interaction relationship among all users is used as edges of the relation graph, then communities are divided into communities, so that all user communities are obtained, and according to the data interaction records, the data unbalance degree is the degree of difference between receiving data and transmitting data of the users, and considering that a fraud group generally carries out fund collection in a mode of transmitting a red packet to one account through a reward fund acquired in an activity, and finally, the fund collection is obtained illegally through one account, so that in the fraud group, a phenomenon of aggregation is formed, namely, a plurality of accounts transmit red packets to one or more accounts in a large number, and the account receiving red packets rarely or not transmit red packets to other accounts. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through analysis of the user data interaction condition, so that the identification efficiency is greatly improved compared with the identification of a single user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of one embodiment of a method for fraudulent party identification in an embodiment of the present invention;

FIG. 2 is a schematic illustration of a relationship diagram between users;

FIG. 3 is a schematic flow chart of community partitioning of a relationship graph resulting in communities of users;

FIG. 4 is a block diagram of one embodiment of a rogue party identification device in accordance with an embodiment of the present invention;

fig. 5 is a schematic block diagram of a terminal device in an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, an embodiment of a method for identifying fraudulent groups according to an embodiment of the present invention may include:

step S101, respectively extracting data interaction records among all users from a preset user database.

During the operation of the system, the behavior of each user in the system is recorded in detail and stored in the user database. When the user behavior needs to be analyzed, the total quantity of users and the behavior data related to the users in the system can be acquired in the user database. Each user's behavior data includes its data interaction record in the system, which may be a user's red-pack transceiving record.

Each data interaction record may include the following:

a sender identifier, namely a user identifier for sending data;

a receiver identifier, namely a user identifier of the received data;

and the data attribute value, namely the numerical value in the data, particularly, if the data interaction record is a red packet receiving and transmitting record of the user, the attribute value is the red packet amount.

Wherein the registered account of the user can be used as the unique identification of the user. Considering that there may be a case where a certain transmitting side transmits data to a certain receiving side multiple times, it is preferable to combine the behaviors of transmitting data from the same transmitting side to the same receiving side, and only keep one data interaction record, but the attribute value is the sum of the attribute values of each time.

And step S102, constructing a relation diagram among all users according to the data interaction record.

A Graph (Graph) is a mathematical object representing the relationship between objects, and is the basic study object of Graph theory. If each edge of the graph is given a direction, the resulting graph is referred to as a directed graph. In the directed graph, the edges associated with a vertex have a division of the outgoing edge (the edge that starts at the vertex) and the incoming edge (the edge that ends at the vertex). The relationship map in this embodiment is a directed map.

Wherein, each user can be respectively used as the vertex of the relation graph, and the data interaction relation among each user can be used as the side of the relation graph. For example, if the attribute value of user 1 sends to user 2 a value of 5, an edge pointing from user 1 to user 2 may be constructed, and similarly, other edges may be constructed according to the relationship between users, so that the vertex and edge construction of the relationship graph is completed.

For example, for a data interaction record as shown in the following table, a relationship diagram as shown in fig. 2 may be constructed:

sender side	Receiving party	Attribute value
			User
1	User 6	5
			User 2	User 6	7
User 3	User 6	4
			User 4	User 6	6
User 5	User 6	9
			User 6	User 4	1
User 4	User 7	2
			User 7	User 4	3
User 8	User 9	1
			User 9	User 8	3

And step S103, carrying out community division on the relation diagram to obtain communities of all users.

Community partitioning is an important technology for analyzing network structures, and is to cluster vertexes in a graph on the graph containing vertexes and edges to form communities, wherein connection among vertexes in the communities is relatively dense, and connection among vertexes of different communities is relatively sparse.

As shown in fig. 3, step S103 may specifically include the following procedures:

step S1031, each vertex in the relation graph is used as a community, and initial modularity of the relation graph is calculated.

In practical applications, the modularity method is a common method for measuring community division quality. It will be appreciated that, in order to enable subsequent communities to be partitioned, the initial state of the relationship graph is not partitioned, in this embodiment, each vertex in the relationship graph that is not partitioned yet may be taken as a community, and the initial modularity of the relationship graph may be calculated.

Wherein, the initial modularity can be calculated as follows:

taking the sum of the edge numbers corresponding to all connected vertexes in each community as the sum of internal feature data corresponding to each community, and taking the sum of the internal feature data sums of all communities as the sum of the community feature data, wherein each community in the relation diagram corresponding to the initial modularity has only one vertex, so that the sum of the internal feature data corresponding to each community is 0, and the sum of the community feature data is also 0; taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities; and taking the difference between the community characteristic data sum and the community characteristic data sum as the initial modularity of the relation graph.

Step S1032, dividing the vertexes into communities respectively aiming at each vertex, and calculating the target modularity of the test community structure formed after the vertexes are divided into any community respectively.

Specifically, for each vertex, dividing the vertex into communities, and taking the sum of the edge numbers corresponding to all connected vertices in each community as the internal feature data sum corresponding to each community; taking the sum of the internal characteristic data sums of all communities as community characteristic data sums; taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities; and taking the difference between the community characteristic data and the inter-community characteristic data as the target modularity of the test community structure formed after the vertex is divided into any community.

The closer the vertex connection within the community is, the sparse the vertex connection between communities, indicating that the higher the quality of the division of communities is. Therefore, in order to analyze the community division quality, the characteristic data in the communities can be compared with the characteristic data among communities, and the modularity is defined by the community characteristic data and the characteristic data among communities, so that the community division is more in line with the actual situation, and the community division is more accurate.

Step S1033, for each vertex, calculating a difference value between the target modularity and the initial modularity of each test community structure corresponding to the vertex, and dividing the vertex into communities corresponding to the maximum difference value.

The value of the modularity can reflect the quality of a community structure division, and the larger the value of the modularity is, the more reasonable the community division is. Therefore, in this embodiment, in order to divide the vertex into communities having the closest relationship with the vertex, for each vertex, a difference between the modularity of each target test community structure corresponding to the vertex and the initial modularity may be calculated, and the vertex may be divided into communities corresponding to the communities when the difference is the largest.

The method has the advantages that the community division is carried out on the vertexes in the relation graph by using modularity, the most reasonable division of the vertexes in the communities with which the vertexes are divided can be quantitatively evaluated, and therefore the vertexes can be divided into communities with the closest relation with the vertexes, and the community division is more accurate.

And step S104, respectively calculating the data unbalance degree of each user community according to the data interaction record, wherein the data unbalance degree is the degree of difference between the received data and the transmitted data of the user.

First, the data unbalance degree of each user can be calculated according to the following equation:

wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN _c ，UN _c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN _c,u ，RN _c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users _c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN _c,u ，SN _c,u The total number of second associated users of the (c) th user community, namely the users who send data to the current user, the receiving _c,u,r Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg _c,u Data imbalance for the u-th user of the c-th user community.

Then, the data imbalance degree of each user community can be calculated according to the following formula:

wherein Max is a maximum function, comUnbalDeg _c And selecting the maximum value from the data unbalance degrees of all users in the user community as the data unbalance degree of the user community.

And step 105, selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold value from all the user communities as a fraud group.

Through the process, a plurality of user communities are divided, the data unbalance degree of each user community is calculated respectively, and in consideration of the fact that fraud groups generally collect funds obtained in activities in a manner of sending red packets to one account and finally take the funds to illegal account transfer, the fraud groups can form an aggregation phenomenon, namely a plurality of accounts send red packets to one or a plurality of accounts in a large quantity, and the red packet receiving accounts send little or no red packets to other accounts. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners.

The process of setting the unbalance threshold may include:

and acquiring each historical fraud partner from a preset database, wherein the historical fraud partner is a user community which is identified as fraud partner.

Calculating the data unbalance degree of each historical fraud group, and constructing a sample set as follows:

SampleSet＝{HsUbDeg ₁ 、HsUbDeg ₂ 、…、HsUbDeg _h 、…、HsUbDeg _HN }

wherein h is the serial number of each historical fraud partner, h is more than or equal to 1 and less than or equal to HN, and HN isTotal number of history fraudulent group, hsiBDeg _h For the data imbalance of the h historical fraud partner, sampleSet is the sample set.

Selecting a sample with the maximum value from the sample set according to a preset first selection proportion, and constructing the selected sample as a maximum sample set shown as follows:

MaxSet＝{HsUbDegMin ₁ 、HsUbDegMin ₂ 、…、HsUbDegMin _hmin 、…、HsUbDegMin _MinNum wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η ₁ ，η ₁ For the first selection ratio, it may be set according to practical situations, for example, it may be set to 0.1, 0.2, 0.3 or other values, where hmax is the sample number in the maximum sample set, and 1.ltoreq.hmax.ltoreq. MaxNum, hsUbDegMax _hmax A hmax-th sample of the maximum set of samples;

selecting a sample with the minimum value from the sample set according to a preset second selection proportion, and constructing the selected sample as a minimum sample set shown as follows:

MinSet＝{HsUbDegMin ₁ 、HsUbDegMin ₂ 、…、HsUbDegMin _hmin 、…、HsUbDegMin _MinNum }

wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η ₂ ，η ₂ For the second selection ratio, it may be set according to practical situations, for example, it may be set to 0.1, 0.2, 0.3 or other values, hmin is the sample number in the minimum sample set, 1.ltoreq.hmin.ltoreq. MinNum, hsUbDegMin _hmin A hmin sample that is the smallest set of samples;

a median sample set is constructed as follows:

MidSet＝{HsUbDegMid ₁ 、HsUbDegMid ₂ 、…、HsUbDegMid _hmid 、…、HsUbDegMid _MidNum }

wherein MidSet is the median sample set,and midset=sampleset-MaxSet-MinSet, midNum is the number of samples in the median sample set, and midnum=hn× (1- η) ₁ -η ₂ ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid _hmid An hmid sample that is the median sample set;

calculating the imbalance threshold according to:

the Coef is a preset coefficient, and may be set according to practical situations, for example, may be set to 0.5, 1, 2 or other values, where UbDegThresh is the imbalance threshold.

In summary, the embodiment of the invention firstly extracts the data interaction records among the users in the preset user database, constructs the relationship graph among the users according to the data interaction records, wherein each user is taken as the vertex of the relationship graph, the data interaction relationship among the users is taken as the edge of the relationship graph, then the relationship graph is subjected to community division to obtain communities of each user, and the data unbalance degree of each community of the users is calculated according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the transmitted data of the users, and the fund collection is carried out by taking account of the fraudulent party in a way of transmitting the rewarded funds acquired in the activity to one account and finally is obtained by taking the account to the illegal account transfer. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through analysis of the user data interaction condition, so that the identification efficiency is greatly improved compared with the identification of a single user.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Corresponding to a fraudulent party identification method described in the above embodiment, fig. 4 shows a block diagram of an embodiment of a fraudulent party identification apparatus according to an embodiment of the present invention.

In this embodiment, a fraud group identification apparatus may include:

the data interaction record extracting module 401 is configured to extract data interaction records between users respectively in a preset user database;

a relationship diagram construction module 402, configured to construct a relationship diagram between users according to the data interaction record, where each user is used as a vertex of the relationship diagram, and a data interaction relationship between each user is used as an edge of the relationship diagram;

the user community dividing module 403 is configured to perform community division on the relationship graph to obtain each user community;

the data unbalance degree calculation module 404 is configured to calculate, according to the data interaction records, data unbalance degrees of the user communities, where the data unbalance degrees are degrees of difference between the received data and the transmitted data of the users;

and the fraud group selection module 405 is configured to select, from among the user communities, a user community with a data imbalance degree greater than a preset imbalance degree threshold as a fraud group.

Further, the user community dividing module may include:

the initial modularity calculation unit is used for taking each vertex in the relation graph as a community respectively and calculating the initial modularity of the relation graph;

the target modularity calculation unit is used for dividing the vertexes into communities respectively aiming at each vertex, and calculating the target modularity of a test community structure formed after the vertexes are divided into any community respectively;

and the vertex dividing unit is used for calculating the difference value between the target modularity and the initial modularity of each test community structure corresponding to each vertex aiming at each vertex, and dividing the vertex into communities corresponding to the communities with the maximum difference value.

Further, the target modularity calculating unit may include:

the first computing subunit is used for dividing the vertexes into communities respectively aiming at each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each community as the internal characteristic data sum corresponding to each community;

a second computation subunit configured to take a sum of the internal feature data sums of all communities as a community feature data sum;

the third computation subunit is used for taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities;

and the fourth computing subunit is used for dividing the difference between the community characteristic data and the characteristic data between communities as the vertex into target modularity of a test community structure formed after any community.

Further, the data imbalance calculation module may include:

a first unbalance degree calculation unit for calculating the data unbalance degree of each user according to the following formula:

wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN _c ，UN _c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN _c,u ，RN _c,u Is the firstThe total number of first associated users of the u-th user of the c user communities, the first associated users being users who have sent data to the current user, the receiving _c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN _c,u ，SN _c,u The total number of second associated users of the (c) th user community, namely the users who send data to the current user, the receiving _c,u,r Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg _c,u A data imbalance degree for a nth user of the c-th user community;

a second unbalance degree calculation unit for calculating the data unbalance degree of each user community according to the following formula:

wherein Max is a maximum function, comUnbalDeg _c Data imbalance for the c-th community of users.

Further, the fraudulent party identification apparatus may further include:

a historical fraud group acquisition module for acquiring each historical fraud group from a preset database, wherein the historical fraud group is a community of users which are identified as fraud groups;

the sample set construction module is used for respectively calculating the data unbalance degree of each historical fraud group and constructing a sample set shown as follows:

wherein h is the serial number of each historical fraud partner, h is more than or equal to 1 and less than or equal to HN, HN is the total number of the historical fraud partners, hsibDeg _h For the data unbalance degree of the h historical fraud group, sampleSet is the sample set;

the maximum sample set constructing module is used for selecting a sample with the maximum value from the sample sets according to a preset first selection proportion, and constructing the selected sample into the maximum sample set shown as follows:

MaxSet＝{HsUbDegMin ₁ 、HsUbDegMin ₂ 、…、HsUbDegMin _hmin 、…、HsUbDegMin _MinNum }

wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η ₁ ，η ₁ For the first selected proportion, hmax is the sample number in the maximum sample set, wherein hmax is more than or equal to 1 and less than or equal to MaxNum, hsUbDegMax _hmax A hmax-th sample of the maximum set of samples;

the minimum sample set constructing module is used for selecting a sample with the minimum value from the sample sets according to a preset second selection proportion, and constructing the selected sample into the minimum sample set shown as follows:

wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η ₂ ，η ₂ For the second selected proportion, hmin is the sample number in the minimum sample set, and hmin is more than or equal to 1 and less than or equal to MinNum, hsUbDegMin _hmin A hmin sample that is the smallest set of samples;

a median sample set construction module for constructing a median sample set as shown below:

wherein MidSet is the median sample set, and midset=sampleset-MaxSet-MinSet, midum is the number of samples in the median sample set, and midum=hn× (1- η ₁ -η ₂ ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid _hmid Is saidThe hmid sample of the median sample set;

an imbalance threshold calculation module, configured to calculate the imbalance threshold according to the following formula:

wherein Coef is a preset coefficient, ubDegThresh is the imbalance threshold.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Fig. 5 shows a schematic block diagram of a terminal device according to an embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown.

In this embodiment, the terminal device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The terminal device 5 may include: a processor 50, a memory 51, and computer readable instructions 52 stored in the memory 51 and executable on the processor 50, such as computer readable instructions for performing the fraud partner identification method described above. The processor 50, when executing the computer readable instructions 52, implements the steps of the various embodiments of the fraud group identification method described above, such as steps S101 through S105 shown in fig. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of modules 401 through 405 shown in fig. 4.

Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to accomplish the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions describing the execution of the computer readable instructions 52 in the terminal device 5.

The processor 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer readable instructions as well as other instructions and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.

The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, comprising a number of computer readable instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing computer readable instructions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying fraudulent groups, comprising:

taking each vertex in the relation graph as a user community respectively, and calculating the initial modularity of the relation graph;

dividing the vertexes into user communities for each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each user community as the internal feature data sum corresponding to each user community; taking the sum of the internal characteristic data sums of all user communities as a user community characteristic data sum; taking the sum of the edge numbers corresponding to the two vertexes connected with different user communities as the feature data sum of the user communities; dividing the difference between the user community characteristic data and the user community characteristic data as the vertex to any user community to form a target modularity of a test user community structure;

for each vertex, calculating the difference value between the target modularity and the initial modularity of each test user community structure corresponding to the vertex, and dividing the vertex into user communities corresponding to the maximum difference value;

respectively calculating the data unbalance degree of each user community according to the data interaction records:

wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN _c ，UN _c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN _c,u ，RN _c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users _c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN _c,u ，SN _c,u A total number of second associated users which are the (u) th user of the (c) th user community, wherein the second associated users are users which Send data to the current user, and are Send _c,u,s Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg _c,u The (u) th use for the (c) th user communityUser data unbalance degree, max is maximum function, comUnbalDeg _c Data imbalance degree for the c-th user community;

2. A fraudulent party identification method according to claim 1, wherein the process of setting the unbalance threshold includes:

acquiring each historical fraud partner from a preset database, wherein the historical fraud partner is a community of users which are identified as fraud partners;

MaxSet＝{HsUbDegMax ₁ 、HsUbDegMax ₂ 、…、HsUbDegMax _hmax 、…、HsUbDegMax _MaxNum wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η ₁ ，η ₁ For the first selected proportion, hmax is the sample number in the maximum sample set, wherein hmax is more than or equal to 1 and less than or equal to MaxNum, hsUbDegMax _hmax A hmax-th sample of the maximum set of samples;

MinSet＝{HsUbDegMin ₁ 、HsUbDegMin ₂ 、…、HsUbDegMin _hmin 、…、HsUbDegMin _MinNum wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η ₂ ，η ₂ For the second selected proportion, hmin is the sample number in the minimum sample set, and hmin is more than or equal to 1 and less than or equal to MinNum, hsUbDegMin _hmin A hmin sample that is the smallest set of samples;

a median sample set is constructed as follows:

MidSet＝{HsUbDegMid ₁ 、HsUbDegMid ₂ 、…、HsUbDegMid _hmid 、…、HsUbDegMid _MidNum wherein MidSet is the median sample set and midset=sampleset-MaxSet-MinSet, midum is the number of samples in the median sample set and midum=hn× (1- η ₁ -η ₂ ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid _hmid An hmid sample that is the median sample set;

calculating the imbalance threshold according to:

wherein Coef is a preset coefficient, ubDegThresh is the imbalance threshold.

3. A fraudulent party identification apparatus comprising:

the user community dividing module is used for taking each vertex in the relation graph as a user community respectively and calculating the initial modularity of the relation graph; dividing the vertexes into user communities for each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each user community as the internal feature data sum corresponding to each user community; taking the sum of the internal characteristic data sums of all user communities as a user community characteristic data sum; taking the sum of the edge numbers corresponding to the two vertexes connected with different user communities as the feature data sum of the user communities; dividing the difference between the user community characteristic data and the user community characteristic data as the vertex to any user community to form a target modularity of a test user community structure; for each vertex, calculating the difference value between the target modularity and the initial modularity of each test user community structure corresponding to the vertex, and dividing the vertex into user communities corresponding to the maximum difference value;

the data unbalance degree calculation module is used for calculating the data unbalance degree of each user community according to the data interaction records:

wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN _c ，UN _c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN _c,u ，RN _c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users _c,u,r Receiving attribute values of data of the r first associated user for the u user of the c user community, wherein s is used for the second associationNumber of family, s is more than or equal to 1 and less than or equal to SN _c,u ，SN _c,u A total number of second associated users which are the (u) th user of the (c) th user community, wherein the second associated users are users which Send data to the current user, and are Send _c,u,s Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg _c,u Data unbalance degree of the (u) th user of the (c) th user community, max being maximum function, comUnbalDeg _c Data imbalance degree for the c-th user community;

4. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of a fraud partner identifying method as defined in any of claims 1 to 2.

5. A terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, wherein execution of the computer readable instructions by the processor implements the steps of a fraud partner identifying method of any of claims 1 to 2.