CN110046929B - Fraudulent party identification method and device, readable storage medium and terminal equipment - Google Patents

Fraudulent party identification method and device, readable storage medium and terminal equipment Download PDF

Info

Publication number
CN110046929B
CN110046929B CN201910184810.XA CN201910184810A CN110046929B CN 110046929 B CN110046929 B CN 110046929B CN 201910184810 A CN201910184810 A CN 201910184810A CN 110046929 B CN110046929 B CN 110046929B
Authority
CN
China
Prior art keywords
user
data
users
community
user community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910184810.XA
Other languages
Chinese (zh)
Other versions
CN110046929A (en
Inventor
毕文智
谢波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910184810.XA priority Critical patent/CN110046929B/en
Publication of CN110046929A publication Critical patent/CN110046929A/en
Application granted granted Critical
Publication of CN110046929B publication Critical patent/CN110046929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0225Avoiding frauds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computers, and particularly relates to a method and a device for identifying fraudulent groups, a computer readable storage medium and terminal equipment. The method comprises the steps of respectively extracting data interaction records among users from a user database; constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph; carrying out community division on the relation graph to obtain communities of all users; respectively calculating the data unbalance degree of each user community according to the data interaction records; and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through the data interaction condition of the user, so that the identification efficiency is greatly improved compared with the identification aiming at a single user.

Description

Fraudulent party identification method and device, readable storage medium and terminal equipment
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method and a device for identifying fraudulent groups, a computer readable storage medium and terminal equipment.
Background
With the increasing development of the internet, the internet technology and services industry continues to combine, deriving a wide variety of internet services. Among them, the development of internet financial services is particularly remarkable, and with the development of the mobile internet, internet finance brings great convenience to the life of users.
Today, where internet finances are increasingly developed, a number of lawbreakers, especially rogue user communities forming organizations, are presented with extremely high financial costs and huge losses to internet finance companies. Aiming at products such as small loans, stage and the like in the internet financial industry and related sales promotion activities, lawbreakers are gradually changed from individual fraud to organized group fraud, and a large amount of 'wool-pulling' behaviors are carried out, so that huge losses are caused to internet financial companies.
At present, internet finance companies usually identify fraud risks according to individual information of users and perform wind control processing, namely, the internet finance companies identify individual users, lack identification means for fraud groups, and are low in identification efficiency.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method, an apparatus, a computer readable storage medium, and a terminal device for identifying a rogue group, so as to solve the problem that in the prior art, identification is performed for a single user, identification means for a rogue group is lacking, and identification efficiency is low.
A first aspect of an embodiment of the present invention provides a method for identifying a fraudulent group, which may include:
respectively extracting data interaction records among all users from a preset user database;
constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph;
carrying out community division on the relation graph to obtain communities of all users;
respectively calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;
and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group.
A second aspect of an embodiment of the present invention provides a fraud group identification apparatus, which may include:
the data interaction record extraction module is used for respectively extracting data interaction records among all users in a preset user database;
the relationship diagram construction module is used for constructing a relationship diagram among the users according to the data interaction records, wherein each user is used as the vertex of the relationship diagram, and the data interaction relationship among the users is used as the edge of the relationship diagram;
the user community dividing module is used for carrying out community division on the relation graph to obtain each user community;
the data unbalance degree calculation module is used for calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;
and the fraud group selection module is used for selecting the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraud groups.
A third aspect of embodiments of the present invention provides a computer readable storage medium storing computer readable instructions which when executed by a processor perform the steps of:
respectively extracting data interaction records among all users from a preset user database;
constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph;
carrying out community division on the relation graph to obtain communities of all users;
respectively calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;
and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group.
A fourth aspect of the embodiments of the present invention provides a terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor executing the computer readable instructions to perform the steps of:
respectively extracting data interaction records among all users from a preset user database;
constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph;
carrying out community division on the relation graph to obtain communities of all users;
respectively calculating the data unbalance degree of each user community according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the sent data of the user;
and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group.
Compared with the prior art, the embodiment of the invention has the beneficial effects that: according to the embodiment of the invention, firstly, data interaction records among all users are respectively extracted from a preset user database, a relation graph among all users is constructed according to the data interaction records, wherein all users are respectively used as vertexes of the relation graph, the data interaction relationship among all users is used as edges of the relation graph, then communities are divided into communities, so that all user communities are obtained, and according to the data interaction records, the data unbalance degree is the degree of difference between receiving data and transmitting data of the users, and considering that a fraud group generally carries out fund collection in a mode of transmitting a red packet to one account through a reward fund acquired in an activity, and finally, the fund collection is obtained illegally through one account, so that in the fraud group, a phenomenon of aggregation is formed, namely, a plurality of accounts transmit red packets to one or more accounts in a large number, and the account receiving red packets rarely or not transmit red packets to other accounts. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through analysis of the user data interaction condition, so that the identification efficiency is greatly improved compared with the identification of a single user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of one embodiment of a method for fraudulent party identification in an embodiment of the present invention;
FIG. 2 is a schematic illustration of a relationship diagram between users;
FIG. 3 is a schematic flow chart of community partitioning of a relationship graph resulting in communities of users;
FIG. 4 is a block diagram of one embodiment of a rogue party identification device in accordance with an embodiment of the present invention;
fig. 5 is a schematic block diagram of a terminal device in an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, an embodiment of a method for identifying fraudulent groups according to an embodiment of the present invention may include:
step S101, respectively extracting data interaction records among all users from a preset user database.
During the operation of the system, the behavior of each user in the system is recorded in detail and stored in the user database. When the user behavior needs to be analyzed, the total quantity of users and the behavior data related to the users in the system can be acquired in the user database. Each user's behavior data includes its data interaction record in the system, which may be a user's red-pack transceiving record.
Each data interaction record may include the following:
a sender identifier, namely a user identifier for sending data;
a receiver identifier, namely a user identifier of the received data;
and the data attribute value, namely the numerical value in the data, particularly, if the data interaction record is a red packet receiving and transmitting record of the user, the attribute value is the red packet amount.
Wherein the registered account of the user can be used as the unique identification of the user. Considering that there may be a case where a certain transmitting side transmits data to a certain receiving side multiple times, it is preferable to combine the behaviors of transmitting data from the same transmitting side to the same receiving side, and only keep one data interaction record, but the attribute value is the sum of the attribute values of each time.
And step S102, constructing a relation diagram among all users according to the data interaction record.
A Graph (Graph) is a mathematical object representing the relationship between objects, and is the basic study object of Graph theory. If each edge of the graph is given a direction, the resulting graph is referred to as a directed graph. In the directed graph, the edges associated with a vertex have a division of the outgoing edge (the edge that starts at the vertex) and the incoming edge (the edge that ends at the vertex). The relationship map in this embodiment is a directed map.
Wherein, each user can be respectively used as the vertex of the relation graph, and the data interaction relation among each user can be used as the side of the relation graph. For example, if the attribute value of user 1 sends to user 2 a value of 5, an edge pointing from user 1 to user 2 may be constructed, and similarly, other edges may be constructed according to the relationship between users, so that the vertex and edge construction of the relationship graph is completed.
For example, for a data interaction record as shown in the following table, a relationship diagram as shown in fig. 2 may be constructed:
sender side Receiving party Attribute value
User
1 User 6 5
User 2 User 6 7
User 3 User 6 4
User 4 User 6 6
User 5 User 6 9
User 6 User 4 1
User 4 User 7 2
User 7 User 4 3
User 8 User 9 1
User 9 User 8 3
And step S103, carrying out community division on the relation diagram to obtain communities of all users.
Community partitioning is an important technology for analyzing network structures, and is to cluster vertexes in a graph on the graph containing vertexes and edges to form communities, wherein connection among vertexes in the communities is relatively dense, and connection among vertexes of different communities is relatively sparse.
As shown in fig. 3, step S103 may specifically include the following procedures:
step S1031, each vertex in the relation graph is used as a community, and initial modularity of the relation graph is calculated.
In practical applications, the modularity method is a common method for measuring community division quality. It will be appreciated that, in order to enable subsequent communities to be partitioned, the initial state of the relationship graph is not partitioned, in this embodiment, each vertex in the relationship graph that is not partitioned yet may be taken as a community, and the initial modularity of the relationship graph may be calculated.
Wherein, the initial modularity can be calculated as follows:
taking the sum of the edge numbers corresponding to all connected vertexes in each community as the sum of internal feature data corresponding to each community, and taking the sum of the internal feature data sums of all communities as the sum of the community feature data, wherein each community in the relation diagram corresponding to the initial modularity has only one vertex, so that the sum of the internal feature data corresponding to each community is 0, and the sum of the community feature data is also 0; taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities; and taking the difference between the community characteristic data sum and the community characteristic data sum as the initial modularity of the relation graph.
Step S1032, dividing the vertexes into communities respectively aiming at each vertex, and calculating the target modularity of the test community structure formed after the vertexes are divided into any community respectively.
Specifically, for each vertex, dividing the vertex into communities, and taking the sum of the edge numbers corresponding to all connected vertices in each community as the internal feature data sum corresponding to each community; taking the sum of the internal characteristic data sums of all communities as community characteristic data sums; taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities; and taking the difference between the community characteristic data and the inter-community characteristic data as the target modularity of the test community structure formed after the vertex is divided into any community.
The closer the vertex connection within the community is, the sparse the vertex connection between communities, indicating that the higher the quality of the division of communities is. Therefore, in order to analyze the community division quality, the characteristic data in the communities can be compared with the characteristic data among communities, and the modularity is defined by the community characteristic data and the characteristic data among communities, so that the community division is more in line with the actual situation, and the community division is more accurate.
Step S1033, for each vertex, calculating a difference value between the target modularity and the initial modularity of each test community structure corresponding to the vertex, and dividing the vertex into communities corresponding to the maximum difference value.
The value of the modularity can reflect the quality of a community structure division, and the larger the value of the modularity is, the more reasonable the community division is. Therefore, in this embodiment, in order to divide the vertex into communities having the closest relationship with the vertex, for each vertex, a difference between the modularity of each target test community structure corresponding to the vertex and the initial modularity may be calculated, and the vertex may be divided into communities corresponding to the communities when the difference is the largest.
The method has the advantages that the community division is carried out on the vertexes in the relation graph by using modularity, the most reasonable division of the vertexes in the communities with which the vertexes are divided can be quantitatively evaluated, and therefore the vertexes can be divided into communities with the closest relation with the vertexes, and the community division is more accurate.
And step S104, respectively calculating the data unbalance degree of each user community according to the data interaction record, wherein the data unbalance degree is the degree of difference between the received data and the transmitted data of the user.
First, the data unbalance degree of each user can be calculated according to the following equation:
Figure BDA0001992502100000081
wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN c ,UN c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN c,u ,RN c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN c,u ,SN c,u The total number of second associated users of the (c) th user community, namely the users who send data to the current user, the receiving c,u,r Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg c,u Data imbalance for the u-th user of the c-th user community.
Then, the data imbalance degree of each user community can be calculated according to the following formula:
Figure BDA0001992502100000082
wherein Max is a maximum function, comUnbalDeg c And selecting the maximum value from the data unbalance degrees of all users in the user community as the data unbalance degree of the user community.
And step 105, selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold value from all the user communities as a fraud group.
Through the process, a plurality of user communities are divided, the data unbalance degree of each user community is calculated respectively, and in consideration of the fact that fraud groups generally collect funds obtained in activities in a manner of sending red packets to one account and finally take the funds to illegal account transfer, the fraud groups can form an aggregation phenomenon, namely a plurality of accounts send red packets to one or a plurality of accounts in a large quantity, and the red packet receiving accounts send little or no red packets to other accounts. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners.
The process of setting the unbalance threshold may include:
and acquiring each historical fraud partner from a preset database, wherein the historical fraud partner is a user community which is identified as fraud partner.
Calculating the data unbalance degree of each historical fraud group, and constructing a sample set as follows:
SampleSet={HsUbDeg 1 、HsUbDeg 2 、…、HsUbDeg h 、…、HsUbDeg HN }
wherein h is the serial number of each historical fraud partner, h is more than or equal to 1 and less than or equal to HN, and HN isTotal number of history fraudulent group, hsiBDeg h For the data imbalance of the h historical fraud partner, sampleSet is the sample set.
Selecting a sample with the maximum value from the sample set according to a preset first selection proportion, and constructing the selected sample as a maximum sample set shown as follows:
MaxSet={HsUbDegMin 1 、HsUbDegMin 2 、…、HsUbDegMin hmin 、…、HsUbDegMin MinNum wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η 1 ,η 1 For the first selection ratio, it may be set according to practical situations, for example, it may be set to 0.1, 0.2, 0.3 or other values, where hmax is the sample number in the maximum sample set, and 1.ltoreq.hmax.ltoreq. MaxNum, hsUbDegMax hmax A hmax-th sample of the maximum set of samples;
selecting a sample with the minimum value from the sample set according to a preset second selection proportion, and constructing the selected sample as a minimum sample set shown as follows:
MinSet={HsUbDegMin 1 、HsUbDegMin 2 、…、HsUbDegMin hmin 、…、HsUbDegMin MinNum }
wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η 2 ,η 2 For the second selection ratio, it may be set according to practical situations, for example, it may be set to 0.1, 0.2, 0.3 or other values, hmin is the sample number in the minimum sample set, 1.ltoreq.hmin.ltoreq. MinNum, hsUbDegMin hmin A hmin sample that is the smallest set of samples;
a median sample set is constructed as follows:
MidSet={HsUbDegMid 1 、HsUbDegMid 2 、…、HsUbDegMid hmid 、…、HsUbDegMid MidNum }
wherein MidSet is the median sample set,and midset=sampleset-MaxSet-MinSet, midNum is the number of samples in the median sample set, and midnum=hn× (1- η) 12 ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid hmid An hmid sample that is the median sample set;
calculating the imbalance threshold according to:
Figure BDA0001992502100000101
the Coef is a preset coefficient, and may be set according to practical situations, for example, may be set to 0.5, 1, 2 or other values, where UbDegThresh is the imbalance threshold.
In summary, the embodiment of the invention firstly extracts the data interaction records among the users in the preset user database, constructs the relationship graph among the users according to the data interaction records, wherein each user is taken as the vertex of the relationship graph, the data interaction relationship among the users is taken as the edge of the relationship graph, then the relationship graph is subjected to community division to obtain communities of each user, and the data unbalance degree of each community of the users is calculated according to the data interaction records, wherein the data unbalance degree is the degree of difference between the received data and the transmitted data of the users, and the fund collection is carried out by taking account of the fraudulent party in a way of transmitting the rewarded funds acquired in the activity to one account and finally is obtained by taking the account to the illegal account transfer. The direction of sending the red packets among normal users or teams is random, so that the phenomenon can be utilized to select the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraudulent partners. By the embodiment of the invention, the special behavior characteristics of the fraudulent party are fully utilized, and the fraudulent party is identified through analysis of the user data interaction condition, so that the identification efficiency is greatly improved compared with the identification of a single user.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Corresponding to a fraudulent party identification method described in the above embodiment, fig. 4 shows a block diagram of an embodiment of a fraudulent party identification apparatus according to an embodiment of the present invention.
In this embodiment, a fraud group identification apparatus may include:
the data interaction record extracting module 401 is configured to extract data interaction records between users respectively in a preset user database;
a relationship diagram construction module 402, configured to construct a relationship diagram between users according to the data interaction record, where each user is used as a vertex of the relationship diagram, and a data interaction relationship between each user is used as an edge of the relationship diagram;
the user community dividing module 403 is configured to perform community division on the relationship graph to obtain each user community;
the data unbalance degree calculation module 404 is configured to calculate, according to the data interaction records, data unbalance degrees of the user communities, where the data unbalance degrees are degrees of difference between the received data and the transmitted data of the users;
and the fraud group selection module 405 is configured to select, from among the user communities, a user community with a data imbalance degree greater than a preset imbalance degree threshold as a fraud group.
Further, the user community dividing module may include:
the initial modularity calculation unit is used for taking each vertex in the relation graph as a community respectively and calculating the initial modularity of the relation graph;
the target modularity calculation unit is used for dividing the vertexes into communities respectively aiming at each vertex, and calculating the target modularity of a test community structure formed after the vertexes are divided into any community respectively;
and the vertex dividing unit is used for calculating the difference value between the target modularity and the initial modularity of each test community structure corresponding to each vertex aiming at each vertex, and dividing the vertex into communities corresponding to the communities with the maximum difference value.
Further, the target modularity calculating unit may include:
the first computing subunit is used for dividing the vertexes into communities respectively aiming at each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each community as the internal characteristic data sum corresponding to each community;
a second computation subunit configured to take a sum of the internal feature data sums of all communities as a community feature data sum;
the third computation subunit is used for taking the sum of the edge numbers corresponding to all the two vertexes connected with different communities as the characteristic data sum between communities;
and the fourth computing subunit is used for dividing the difference between the community characteristic data and the characteristic data between communities as the vertex into target modularity of a test community structure formed after any community.
Further, the data imbalance calculation module may include:
a first unbalance degree calculation unit for calculating the data unbalance degree of each user according to the following formula:
Figure BDA0001992502100000121
wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN c ,UN c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN c,u ,RN c,u Is the firstThe total number of first associated users of the u-th user of the c user communities, the first associated users being users who have sent data to the current user, the receiving c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN c,u ,SN c,u The total number of second associated users of the (c) th user community, namely the users who send data to the current user, the receiving c,u,r Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg c,u A data imbalance degree for a nth user of the c-th user community;
a second unbalance degree calculation unit for calculating the data unbalance degree of each user community according to the following formula:
Figure BDA0001992502100000131
wherein Max is a maximum function, comUnbalDeg c Data imbalance for the c-th community of users.
Further, the fraudulent party identification apparatus may further include:
a historical fraud group acquisition module for acquiring each historical fraud group from a preset database, wherein the historical fraud group is a community of users which are identified as fraud groups;
the sample set construction module is used for respectively calculating the data unbalance degree of each historical fraud group and constructing a sample set shown as follows:
SampleSet={HsUbDeg 1 、HsUbDeg 2 、…、HsUbDeg h 、…、HsUbDeg HN }
wherein h is the serial number of each historical fraud partner, h is more than or equal to 1 and less than or equal to HN, HN is the total number of the historical fraud partners, hsibDeg h For the data unbalance degree of the h historical fraud group, sampleSet is the sample set;
the maximum sample set constructing module is used for selecting a sample with the maximum value from the sample sets according to a preset first selection proportion, and constructing the selected sample into the maximum sample set shown as follows:
MaxSet={HsUbDegMin 1 、HsUbDegMin 2 、…、HsUbDegMin hmin 、…、HsUbDegMin MinNum }
wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η 1 ,η 1 For the first selected proportion, hmax is the sample number in the maximum sample set, wherein hmax is more than or equal to 1 and less than or equal to MaxNum, hsUbDegMax hmax A hmax-th sample of the maximum set of samples;
the minimum sample set constructing module is used for selecting a sample with the minimum value from the sample sets according to a preset second selection proportion, and constructing the selected sample into the minimum sample set shown as follows:
MinSet={HsUbDegMin 1 、HsUbDegMin 2 、…、HsUbDegMin hmin 、…、HsUbDegMin MinNum }
wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η 2 ,η 2 For the second selected proportion, hmin is the sample number in the minimum sample set, and hmin is more than or equal to 1 and less than or equal to MinNum, hsUbDegMin hmin A hmin sample that is the smallest set of samples;
a median sample set construction module for constructing a median sample set as shown below:
MidSet={HsUbDegMid 1 、HsUbDegMid 2 、…、HsUbDegMid hmid 、…、HsUbDegMid MidNum }
wherein MidSet is the median sample set, and midset=sampleset-MaxSet-MinSet, midum is the number of samples in the median sample set, and midum=hn× (1- η 12 ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid hmid Is saidThe hmid sample of the median sample set;
an imbalance threshold calculation module, configured to calculate the imbalance threshold according to the following formula:
Figure BDA0001992502100000141
wherein Coef is a preset coefficient, ubDegThresh is the imbalance threshold.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described apparatus, modules and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Fig. 5 shows a schematic block diagram of a terminal device according to an embodiment of the present invention, and for convenience of explanation, only a portion related to the embodiment of the present invention is shown.
In this embodiment, the terminal device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The terminal device 5 may include: a processor 50, a memory 51, and computer readable instructions 52 stored in the memory 51 and executable on the processor 50, such as computer readable instructions for performing the fraud partner identification method described above. The processor 50, when executing the computer readable instructions 52, implements the steps of the various embodiments of the fraud group identification method described above, such as steps S101 through S105 shown in fig. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of modules 401 through 405 shown in fig. 4.
Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to accomplish the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions describing the execution of the computer readable instructions 52 in the terminal device 5.
The processor 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer readable instructions as well as other instructions and data required by the terminal device 5. The memory 51 may also be used to temporarily store data that has been output or is to be output.
The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution contributing to the prior art or in the form of a software product stored in a storage medium, comprising a number of computer readable instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing computer readable instructions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (5)

1. A method for identifying fraudulent groups, comprising:
respectively extracting data interaction records among all users from a preset user database;
constructing a relation graph among the users according to the data interaction records, wherein each user is used as a vertex of the relation graph, and the data interaction relation among the users is used as an edge of the relation graph;
taking each vertex in the relation graph as a user community respectively, and calculating the initial modularity of the relation graph;
dividing the vertexes into user communities for each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each user community as the internal feature data sum corresponding to each user community; taking the sum of the internal characteristic data sums of all user communities as a user community characteristic data sum; taking the sum of the edge numbers corresponding to the two vertexes connected with different user communities as the feature data sum of the user communities; dividing the difference between the user community characteristic data and the user community characteristic data as the vertex to any user community to form a target modularity of a test user community structure;
for each vertex, calculating the difference value between the target modularity and the initial modularity of each test user community structure corresponding to the vertex, and dividing the vertex into user communities corresponding to the maximum difference value;
respectively calculating the data unbalance degree of each user community according to the data interaction records:
Figure FDA0004220590170000011
Figure FDA0004220590170000012
wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN c ,UN c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN c,u ,RN c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users c,u,r Receiving an attribute value of data of the (r) first associated user for the (u) user of the (c) th user community, wherein s is a serial number of the second associated user, and s is more than or equal to 1 and less than or equal to SN c,u ,SN c,u A total number of second associated users which are the (u) th user of the (c) th user community, wherein the second associated users are users which Send data to the current user, and are Send c,u,s Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg c,u The (u) th use for the (c) th user communityUser data unbalance degree, max is maximum function, comUnbalDeg c Data imbalance degree for the c-th user community;
and selecting a user community with the data unbalance degree larger than a preset unbalance degree threshold from all the user communities as a fraud group.
2. A fraudulent party identification method according to claim 1, wherein the process of setting the unbalance threshold includes:
acquiring each historical fraud partner from a preset database, wherein the historical fraud partner is a community of users which are identified as fraud partners;
calculating the data unbalance degree of each historical fraud group, and constructing a sample set as follows:
SampleSet={HsUbDeg 1 、HsUbDeg 2 、…、HsUbDeg h 、…、HsUbDeg HN }
wherein h is the serial number of each historical fraud partner, h is more than or equal to 1 and less than or equal to HN, HN is the total number of the historical fraud partners, hsibDeg h For the data unbalance degree of the h historical fraud group, sampleSet is the sample set;
selecting a sample with the maximum value from the sample set according to a preset first selection proportion, and constructing the selected sample as a maximum sample set shown as follows:
MaxSet={HsUbDegMax 1 、HsUbDegMax 2 、…、HsUbDegMax hmax 、…、HsUbDegMax MaxNum wherein MaxSet is the maximum sample set, maxNum is the number of samples in the maximum sample set, and maxnum=hn×η 1 ,η 1 For the first selected proportion, hmax is the sample number in the maximum sample set, wherein hmax is more than or equal to 1 and less than or equal to MaxNum, hsUbDegMax hmax A hmax-th sample of the maximum set of samples;
selecting a sample with the minimum value from the sample set according to a preset second selection proportion, and constructing the selected sample as a minimum sample set shown as follows:
MinSet={HsUbDegMin 1 、HsUbDegMin 2 、…、HsUbDegMin hmin 、…、HsUbDegMin MinNum wherein MinSet is the minimum sample set, minNum is the number of samples in the minimum sample set, and minnum=hn×η 2 ,η 2 For the second selected proportion, hmin is the sample number in the minimum sample set, and hmin is more than or equal to 1 and less than or equal to MinNum, hsUbDegMin hmin A hmin sample that is the smallest set of samples;
a median sample set is constructed as follows:
MidSet={HsUbDegMid 1 、HsUbDegMid 2 、…、HsUbDegMid hmid 、…、HsUbDegMid MidNum wherein MidSet is the median sample set and midset=sampleset-MaxSet-MinSet, midum is the number of samples in the median sample set and midum=hn× (1- η 12 ) Hmid is the sample number in the median sample set, 1.ltoreq.hmid.ltoreq. MidNum, hsUbDegMid hmid An hmid sample that is the median sample set;
calculating the imbalance threshold according to:
Figure FDA0004220590170000031
wherein Coef is a preset coefficient, ubDegThresh is the imbalance threshold.
3. A fraudulent party identification apparatus comprising:
the data interaction record extraction module is used for respectively extracting data interaction records among all users in a preset user database;
the relationship diagram construction module is used for constructing a relationship diagram among the users according to the data interaction records, wherein each user is used as the vertex of the relationship diagram, and the data interaction relationship among the users is used as the edge of the relationship diagram;
the user community dividing module is used for taking each vertex in the relation graph as a user community respectively and calculating the initial modularity of the relation graph; dividing the vertexes into user communities for each vertex, and taking the sum of the edge numbers corresponding to all connected vertexes in each user community as the internal feature data sum corresponding to each user community; taking the sum of the internal characteristic data sums of all user communities as a user community characteristic data sum; taking the sum of the edge numbers corresponding to the two vertexes connected with different user communities as the feature data sum of the user communities; dividing the difference between the user community characteristic data and the user community characteristic data as the vertex to any user community to form a target modularity of a test user community structure; for each vertex, calculating the difference value between the target modularity and the initial modularity of each test user community structure corresponding to the vertex, and dividing the vertex into user communities corresponding to the maximum difference value;
the data unbalance degree calculation module is used for calculating the data unbalance degree of each user community according to the data interaction records:
Figure FDA0004220590170000041
Figure FDA0004220590170000042
wherein c is the sequence number of the user community, c is not less than 1 and not more than ComNam, comNam is the total number of the user community, u is the sequence number of the user, u is not less than 1 and not more than UN c ,UN c The total number of users in the user community c is that r is the serial number of the first associated user, and r is more than or equal to 1 and less than or equal to RN c,u ,RN c,u The total number of first associated users of the (c) th user community, wherein the first associated users are users who send data to the current user, and the receiving users c,u,r Receiving attribute values of data of the r first associated user for the u user of the c user community, wherein s is used for the second associationNumber of family, s is more than or equal to 1 and less than or equal to SN c,u ,SN c,u A total number of second associated users which are the (u) th user of the (c) th user community, wherein the second associated users are users which Send data to the current user, and are Send c,u,s Receiving an attribute value of data of the s second associated user for the u user of the c user community, unbalDeg c,u Data unbalance degree of the (u) th user of the (c) th user community, max being maximum function, comUnbalDeg c Data imbalance degree for the c-th user community;
and the fraud group selection module is used for selecting the user communities with the data unbalance degree larger than the preset unbalance degree threshold value from the user communities as fraud groups.
4. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of a fraud partner identifying method as defined in any of claims 1 to 2.
5. A terminal device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, wherein execution of the computer readable instructions by the processor implements the steps of a fraud partner identifying method of any of claims 1 to 2.
CN201910184810.XA 2019-03-12 2019-03-12 Fraudulent party identification method and device, readable storage medium and terminal equipment Active CN110046929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910184810.XA CN110046929B (en) 2019-03-12 2019-03-12 Fraudulent party identification method and device, readable storage medium and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910184810.XA CN110046929B (en) 2019-03-12 2019-03-12 Fraudulent party identification method and device, readable storage medium and terminal equipment

Publications (2)

Publication Number Publication Date
CN110046929A CN110046929A (en) 2019-07-23
CN110046929B true CN110046929B (en) 2023-06-20

Family

ID=67274664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910184810.XA Active CN110046929B (en) 2019-03-12 2019-03-12 Fraudulent party identification method and device, readable storage medium and terminal equipment

Country Status (1)

Country Link
CN (1) CN110046929B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457893B (en) * 2019-07-24 2023-05-05 阿里巴巴集团控股有限公司 Method and equipment for acquiring account group
CN110490730B (en) * 2019-08-21 2022-07-26 北京顶象技术有限公司 Abnormal fund aggregation behavior detection method, device, equipment and storage medium
CN111311276B (en) * 2020-02-07 2023-08-29 北京明略软件系统有限公司 Identification method and device for abnormal user group and readable storage medium
CN113313505B (en) * 2020-02-25 2023-07-25 中国移动通信集团浙江有限公司 Abnormality positioning method and device and computing equipment
CN111489190A (en) * 2020-03-16 2020-08-04 上海趣蕴网络科技有限公司 Anti-cheating method and system based on user relationship
CN111401959B (en) * 2020-03-18 2023-09-29 多点(深圳)数字科技有限公司 Risk group prediction method, apparatus, computer device and storage medium
CN111428217B (en) * 2020-04-12 2023-07-28 中信银行股份有限公司 Fraudulent party identification method, apparatus, electronic device and computer readable storage medium
CN111784502A (en) * 2020-06-30 2020-10-16 中国工商银行股份有限公司 Abnormal transaction account group identification method and device
CN114003648B (en) * 2021-10-20 2024-04-26 支付宝(杭州)信息技术有限公司 Identification method and device for risk transaction group partner, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056455A (en) * 2016-06-02 2016-10-26 南京邮电大学 Group and place recommendation method based on location and social relationship
CN107194623A (en) * 2017-07-20 2017-09-22 深圳市分期乐网络科技有限公司 A kind of discovery method and device of clique's fraud
CN108009915A (en) * 2017-12-21 2018-05-08 连连银通电子支付有限公司 A kind of labeling method and relevant apparatus of fraudulent user community
WO2018103456A1 (en) * 2016-12-06 2018-06-14 中国银联股份有限公司 Method and apparatus for grouping communities on the basis of feature matching network, and electronic device
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161745A1 (en) * 2015-12-03 2017-06-08 Mastercard International Incorporated Payment account fraud detection using social media heat maps

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056455A (en) * 2016-06-02 2016-10-26 南京邮电大学 Group and place recommendation method based on location and social relationship
WO2018103456A1 (en) * 2016-12-06 2018-06-14 中国银联股份有限公司 Method and apparatus for grouping communities on the basis of feature matching network, and electronic device
CN107194623A (en) * 2017-07-20 2017-09-22 深圳市分期乐网络科技有限公司 A kind of discovery method and device of clique's fraud
CN108009915A (en) * 2017-12-21 2018-05-08 连连银通电子支付有限公司 A kind of labeling method and relevant apparatus of fraudulent user community
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map

Also Published As

Publication number Publication date
CN110046929A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
CN110046929B (en) Fraudulent party identification method and device, readable storage medium and terminal equipment
CN110032583B (en) Fraudulent party identification method and device, readable storage medium and terminal equipment
CN108009915B (en) Marking method and related device for fraudulent user community
CN108763277B (en) Data analysis method, computer readable storage medium and terminal device
CN108665159A (en) A kind of methods of risk assessment, device, terminal device and storage medium
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
CN109600344B (en) Method and device for identifying risk group and electronic equipment
CN111259931B (en) User grouping and activity determining method and system
CN110224859B (en) Method and system for identifying a group
CN111695938B (en) Product pushing method and system
CN111090780A (en) Method and device for determining suspicious transaction information, storage medium and electronic equipment
CN111626767A (en) Resource data distribution method, device and equipment
CN111460315A (en) Social portrait construction method, device and equipment and storage medium
CN110796539A (en) Credit investigation evaluation method and device
CN109462635B (en) Information pushing method, computer readable storage medium and server
CN109450963B (en) Message pushing method and terminal equipment
CN115879819A (en) Enterprise credit evaluation method and device
CN114285896A (en) Information pushing method, device, equipment, storage medium and program product
CN111401478B (en) Data anomaly identification method and device
CN114265740A (en) Error information processing method, device, equipment and storage medium
CN113706279A (en) Fraud analysis method and device, electronic equipment and storage medium
CN110458707B (en) Behavior evaluation method and device based on classification model and terminal equipment
CN109474703B (en) Personalized product combination pushing method, device and system
CN108711073B (en) User analysis method, device and terminal
CN112749974A (en) Transaction data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant