CN109063966B - Risk account identification method and device - Google Patents

Risk account identification method and device Download PDF

Info

Publication number
CN109063966B
CN109063966B CN201810717644.0A CN201810717644A CN109063966B CN 109063966 B CN109063966 B CN 109063966B CN 201810717644 A CN201810717644 A CN 201810717644A CN 109063966 B CN109063966 B CN 109063966B
Authority
CN
China
Prior art keywords
account
operator
group
operator type
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810717644.0A
Other languages
Chinese (zh)
Other versions
CN109063966A (en
Inventor
李超
陈帅
王立
陈弢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201810717644.0A priority Critical patent/CN109063966B/en
Publication of CN109063966A publication Critical patent/CN109063966A/en
Application granted granted Critical
Publication of CN109063966B publication Critical patent/CN109063966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present specification provides a method for identifying a risk account, including: acquiring account behavior records meeting preset statistical conditions, wherein each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs; taking N media types as an operator type, and generating a plurality of connected subsets of at least one operator type; n is a natural number; dividing each connected subset into a number of groups corresponding to the operator type by clustering; the risk degree of an account is judged based on a group which belongs to the account and corresponds to at least one operator type.

Description

Risk account identification method and device
Technical Field
The present specification relates to the field of data processing technologies, and in particular, to a method and an apparatus for identifying a risk account.
Background
With the development of communication technology, people are becoming more and more accustomed to processing various work and life items using the internet, and processing such items is generally performed by a user registering an account in a system of a network service provider running a corresponding service and then using the account as a representative of the identity thereof to execute relevant business logic.
The Internet has the characteristics of anonymity, rapidness, convenience and the like, greatly facilitates the life of people, and provides favorable conditions for the implementation of various illegal behaviors. Various illegal persons and black products are ganged, and the registered account is used for collecting marketing resources, carrying out grey behaviors such as false credit frying, bill swiping and the like, even carrying out crime behaviors such as fraud, money laundering and the like, thereby causing damage to the benefits of legal users. How to accurately identify these risk accounts has become an issue of increasing concern for network service providers.
Disclosure of Invention
In view of the above, the present specification provides a method for identifying a risk account, including:
acquiring account behavior records meeting preset statistical conditions, wherein each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs;
taking N media types as an operator type, and generating a plurality of connected subsets of at least one operator type; the connected subset comprises at least one member account and at least one member operator, and one connected subset comprises all accounts using the at least one member operator in the account behavior record and all operators belonging to the operator type and used by the at least one member account; the operator belonging to a certain operator type includes N media corresponding to N media types constituting the operator type; n is a natural number;
dividing each connected subset into a number of groups corresponding to the operator type by clustering;
the risk degree of an account is judged based on a group which belongs to the account and corresponds to at least one operator type.
The present specification also provides an apparatus for identifying a risk account, including:
the system comprises a record acquisition unit, a storage unit and a processing unit, wherein the record acquisition unit is used for acquiring account behavior records meeting preset statistical conditions, and each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs;
the connected subset unit is used for generating a plurality of connected subsets of at least one operator type by taking the N media types as one operator type; the connected subset comprises at least one member account and at least one member operator, and one connected subset comprises all accounts using the at least one member operator in the account behavior record and all operators belonging to the operator type and used by the at least one member account; the operator belonging to a certain operator type includes N media corresponding to N media types constituting the operator type; n is a natural number;
a group clustering unit for dividing each connected subset into a number of groups corresponding to the operator type by clustering;
and the risk judging unit is used for judging the risk degree of the account based on a group which belongs to the certain account and corresponds to at least one operator type.
This specification provides a computer device comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; and when the processor runs the computer program, executing the steps of the risk account identification method.
The present specification also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of the method for identifying a risk account as described above.
According to the technical scheme, in the embodiment of the specification, the media type or the combination of the media types is used as the algorithm type, the plurality of connected subsets corresponding to the algorithm type are generated according to the media used by the account behavior record, the connected subsets are clustered to obtain the group to which the account belongs, and the risk account is judged according to the group to which the account corresponding to at least one algorithm type belongs, so that the possibility that the legal account is judged to be a black production group member due to the fact that the media are falsely used is greatly reduced, and the accuracy of identification is improved.
Drawings
FIG. 1 is a flow chart of a method for identifying risk accounts in an embodiment of the present disclosure;
FIG. 2 is an exemplary diagram of a two-part diagram of an overall medium in an application example of the present specification;
FIG. 3 is an operator type OP in an application example of the present specificationaAn example graph of an account and operator bipartite graph of (c);
FIG. 4 is an operation type OP in an application example of the present specificationbAn example graph of an account and operator bipartite graph of (c);
FIG. 5 is an example of an operator type OP in the application example of the present specificationcAn example graph of an account and operator bipartite graph of (c);
FIG. 6 is a hardware block diagram of an apparatus for carrying out embodiments of the present description;
fig. 7 is a logical block diagram of an identification apparatus for risk accounts in an embodiment of the present disclosure.
Detailed Description
The embodiment of the specification provides a new method for identifying a risk account, which uses a medium type or a combination of medium types used by a user in business behaviors as an operator type, divides accounts using the same operator type or operators of the type used by the same account into a connected subset according to the operator type, generates a plurality of connected subsets corresponding to the operator type, clusters all the connected subsets to obtain a group to which the account belongs, and judges the risk account based on the group to which the account belongs, so that the probability that the account is wrongly associated into a black product group can be greatly reduced for the clustering of the connected subsets, thereby reducing the possibility that a legal account is judged as the risk account due to the medium being used, and enabling the identification of the risk account to be more accurate.
Embodiments of the present description may be implemented on any device with computing and storage capabilities, such as a mobile phone, a tablet Computer, a PC (Personal Computer), a notebook, a server, and so on; the functions in the embodiments of the present specification may also be implemented by a logical node operating in two or more devices.
In the embodiment of the present specification, the flow of the risk account identification method is shown in fig. 1.
Step 110, account behavior records meeting preset statistical conditions are obtained, and each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs.
In an embodiment of the present specification, a network service provider's system bases the identification of risk accounts on network behavior performed by users using accounts. When a user performs a certain action by using an account, when interacting with a system of a service provider, various different types of resources (referred to as Media in this specification) are inevitably used, for example, a Device identifier of a Device used in performing the account action (i.e., a unique identifier of the Device, such as a Device-ID of an android Device, a unique Device identifier of an apple Device, and the like), an IMEI (International Mobile Equipment Identity) of the Device, a MAC (Media Access Control) address of the Device, an IP address, a WiFi (wireless fidelity) identifier of a Device Access network, a Mobile terminal number, and the like; for another example, the user's identification number, bank card number, etc. may be used in some specific business processes. Some of these types of media have little variation for normal users, can represent the identity of the user to a certain extent, and some of them can reflect the common environment when the user performs account behaviors, and have important reference value for identifying risk accounts.
The actions performed by the user with the account (referred to as account actions) include an action of registering the account, an action of logging in the account, and various actions related to the service provider's business (such as browsing, publishing, shopping, transferring money, etc.) performed on the basis of the logged-in account after logging in. During the interaction of these account behaviors, the service of the service provider may collect various types of media used when the behavior occurs. The server may generate records of these account behaviors, including in each record an account on which a certain behavior occurred and one to many different types of media used when the behavior occurred. In this specification, a medium refers to a specific value of a type of medium to which the medium belongs, for example, a cell phone number 13912345678 is a medium, 13987654321 is another medium, and types of media to which both media belong are mobile terminal numbers.
It should be noted that different media types may be used in different account behaviors. For example, there would be no WiFi identification of this media type in a record of account behavior that would be conducted if the network were accessed over a network cable.
Account behavior is constantly being generated during the operation of the network service provider's system. A predetermined statistical condition may be used to select a portion of the account behavior for use in identifying risk accounts. The predetermined statistical conditions can be determined according to the service characteristics of the service provider in the actual application scene, the accuracy of identification, the timeliness requirement and other factors. For example, all account behaviors for a particular transaction or transactions (e.g., transfers) within a month may be considered as account behaviors that meet a predetermined statistical condition.
The device operating the embodiment of the present specification may extract account behavior records satisfying the predetermined statistical condition from a system log of a service provider, may read account behavior records satisfying the predetermined statistical condition from a plurality of databases recording account behaviors, may read data tables generated by other subsystems and recording account behavior records satisfying the predetermined statistical condition, and may also use other manners without limitation.
And step 120, taking N media types (N is a natural number) as an operator type, and generating a plurality of connected subsets of at least one operator type. A connected subset includes at least one member account and at least one member operator, and a connected subset includes all accounts that have used the at least one member operator in the account behavior record and all operators belonging to the operator type that have been used by the at least one member account.
In the embodiments of the present specification, one operator type is constituted by a certain media type or a combination of several media types, and for example, a mobile terminal number may be regarded as one operator type (set as OP)1) The ID card number is used as an operator type (set as OP)2) Taking the combination of the mobile terminal number and the identity card number as an operator type (set as OP)3). Each operator belongs to an operator type, and if the operator type is composed of N media types, the operator is composed of N media belonging to the N media types, respectively, that is: the operator includes N media corresponding to N media types constituting the operator type to which the operator belongs, for example 13912345678 being part of OP1An operator of the operator type 110101200001011234 being part of the OP2One operator of the operator type, 13912345678 and 110101200001011234, is of the OP3An operator of the operator type.
For a certain operator type, a set of the operator type can be generated by taking an operator of the type in the account behavior record (i.e. an operator belonging to the operator type) meeting a predetermined statistical condition and an account using the operator of the type in the record as elements. For a certain operator type set, if a certain operator of the certain type is used by a certain account in the account behavior record in the set, the account and the operator of the certain type are drawn into a connected subset. Thus, the set of operator types may be divided into one to a plurality of connected subsets, each connected subset including at least one account (referred to as a member account of the connected subset) and at least one operator of the type (referred to as a member operator of the connected subset), the member accounts of a connected subset including all accounts that have used at least one member operator of the connected subset in the account behavior record, and the member operators of a connected subset including all operators of the type that have been used by at least one member account of the connected subset in the account behavior record. In other words, each member account of any one of the connected subsets has used at least one member operator in the account behavior record, and each member operator has been used by at least one member account in the account behavior record.
In this way, the member accounts and member operators in each connected subset can be directly or indirectly associated through the account behavior records. Two accounts using the same operator of the type can become member accounts of the same connected subset, and two operators of the type used by the same account can also become member operators of the same connected subset.
The embodiments of the present description are not limited to the specific manner in which the connected subset of the certain operator type is generated based on the account behavior record, and the following examples are given.
In one implementation mode, aiming at a certain operator type, all accounts using operators of the type in account behavior records are taken as nodes on one side, all operators of the type in the account behavior records are taken as nodes on the other side, and a bipartite graph of the accounts and the operators is constructed by taking an account and an operator of the type as edges in at least one account behavior record; and adopting a connected algorithm to the bipartite graph of the account and the operator to obtain a plurality of maximum connected subgraphs, and taking nodes in each maximum connected subgraph as members in a connected subset of the operator type. All nodes in the bipartite graph of the accounts and the operators form the set of the operator types, and the process of obtaining the maximum connected subgraph by adopting the connected algorithm is a process of dividing different accounts using the same operator in the set of the operator types and different operators used by the same account into a connected subset.
In the above implementation, when the bipartite graph of the account and the operator is generated, the weight of each node, each edge, or each node and each edge may be generated according to the account behavior record and inherited by the node and/or the edge of the maximum connected subgraph, or the weight of the node and/or the edge in the maximum connected subgraph may be generated according to the weight of the bipartite graph of the account and the operator. The specific generation manner of the weight may be set according to the requirements of the actual application scenario, and is not limited. For example, the weight of an edge between an account and an operator may be set according to the number of times that the account and the operator appear in the same record in an account behavior record.
Which media types or combinations of media types are selected as the type of the algorithm and the number of the types of the algorithm according to the business characteristics of the network service provider in the actual application scenario, the media types that can be collected by the server during the account behavior, and other factors, and the embodiments of the present specification are not limited. For example, for a mobile phone App (application), two media types, namely an IMEI (equipment identity), a mobile terminal number, and the like, can be respectively used as an operator type, and a combination of an MAC address and a WiFi identifier of the device can be used as an operator type; for another example, for a network financial service system, two media types, i.e., an identification number and a bank card number, can be respectively used as an operator type; in an application scenario in which risk identification from a plurality of information points of view is desired, two or more types of operator types may be employed, and a plurality of connected subsets may be generated for each type of operator type.
Each connected subset is divided by clustering into groups corresponding to certain operator types, step 130.
A clustering algorithm is applied to each connected subset of an operator type, dividing each connected subset into one to more groups corresponding to the operator type. The clustering algorithm to be used can be selected according to the requirements of the actual application scene; when the clustering algorithm is applied, only the member accounts in the connected subset can be clustered, and the member accounts and the member operators in the connected subset can also be clustered; the embodiments of the present specification do not limit the above two points. After the clustering is completed, each group may be composed of a plurality of member accounts, a plurality of member operators, a plurality of member accounts and a plurality of member operators.
In the above implementation of generating connected subsets by using a bipartite graph of accounts and operators, the maximum connected subgraph (i.e., the maximum connected subgraph used to derive a connected subset) of each connected subset may be clustered by using a community discovery algorithm to derive a number of groups corresponding to a certain operator type. If the nodes, edges, or nodes and edges in the bipartite graph and the maximum connected subgraph have weights generated according to the account behavior records, the community discovery algorithm divides the maximum connected subgraph into one to multiple groups according to the weights of the nodes, edges, or nodes and edges. Various community discovery algorithms can be used for group division of the maximum connected subgraph, for example, the group division using the Louvain algorithm can be adopted. In addition, the processing of the weights by various community discovery algorithms can be realized by referring to the prior art, and is not described in detail.
Because all accounts using the same operator in the account behavior records become member accounts of a connected subset, the number of the member accounts in the connected subset is often very large in most application scenarios, and the member accounts usually have no prominent common characteristics and are difficult to become reliable bases for risk account determination. Assuming that account a and account B are divided into a connected subset by identity card number association (the same identity card number is used in the account behavior record), and account a is closely related to account C and account D, and account B is closely related to account E and account F, but no association can be established between account C and account D and account E or account F by any one of the account types, it can be guessed that one of account a and account B is used by the other. If the connected subset to which the account A and the account B belong is taken as the basis for judging the risk account, the false association is generated, and the normal account is judged as the risk account by mistake. Therefore, on the basis of the connected subsets, a clustering method is utilized to further purify, groups with different characteristics are excavated, the accuracy of the group to which the account belongs is improved, and the problem of error propagation of the connected subsets can be solved.
In step 140, the risk level of an account is determined based on a group to which the account belongs and which corresponds to at least one operator type.
For a certain operator type, the group to which a certain account belongs can reflect other accounts closely related to the account, or other accounts and the type operator, and whether an account is often used for carrying out illegal activities can be reflected on the characteristics of the accounts of the group to which the account belongs, or the quantity, distribution and the like of the accounts and the type medium. The method adopts the account group corresponding to two or more than two types of operators to describe the associated account of a certain account or the associated account and the associated medium from different angles, can provide wider basis for judging the risk account, and improves the accuracy of judgment.
The determination condition of the risk account may be determined according to factors such as a specific operator type, the number of operator types, and behavior characteristics of the risk account, which are adopted in an actual application scenario, and embodiments of the present specification are not limited. For example, for a scenario that employs K (K is a natural number) types of arithmetic types, a weight value may be set for each type of arithmetic type, the number of accounts in the group to which a certain account belongs corresponding to the K types of arithmetic types is weighted and summed, and if the sum value reaches or exceeds a certain threshold, the account is considered to be a risk account.
In one example, a risk account model supervised machine learning model can be constructed to make the determination of risk accounts. The risk account model is a supervised machine learning model whose inputs include group characteristics for groups corresponding to several types of operators and whose outputs are the risk level of a certain account. The group characteristics may be the size of the group (e.g., the number of accounts in the group, the number of operators, or the number of group accounts and operators), the degree of the account in the group (i.e., the number of associations the account has with other nodes in the group, such as the number of edges connected with other nodes in the maximum connected subgraph), and the size of the black sample in the group (e.g., the number of black samples in the group, or the proportion of black samples, etc.). Different operator types in one risk account model can adopt the same group characteristics, and can also adopt respective group characteristics as input.
In addition to group characteristics corresponding to a group of one to multiple operator types, the input to the risk account model may include business characteristics of the account. The service features may be any features other than describing the association of the account with the medium, such as attribute information provided at registration, the number of friends in the address book, whether to open a certain service, the number of times or amount a certain service is performed within a predetermined time period, and the like. The service characteristics can be extracted or counted from various attribute information and historical service behaviors of the account.
The risk account model may use various supervised machine learning algorithms, such as random forest, GBDT (Gradient Boosting decision Tree), logistic regression, deep learning, and the like, without limitation.
After training of the risk account model is completed by adopting training data marked with black samples and/or white samples, for a certain account, the group characteristics of the account or the group characteristics and the business characteristics of the account are input into the risk account model, and the risk degree of the account is determined according to the output of the risk account model. For example, when the output is greater than or less than a certain threshold, the account is considered to be a risky account.
It should be noted that there may be cases where an account does not belong to any group in an account type (e.g., the account has not used the media types that make up the account type in the account behavior record), and a default value may be set for this case as an input to the risk account model for the account.
As the group to which the reflection account belongs has higher precision than the connected subset, the risk account is judged by adopting the machine learning model on the basis of the group, the information can be filtered layer by layer, the most accurate part is finally left, and the judgment precision is greatly improved. When the group characteristics corresponding to various operator types are used as the input of the risk account model, the description of the account behavior is more comprehensive, and the judgment accuracy is further improved.
It can be seen that in the embodiments of the present specification, a media type or a combination of media types used by a user when performing a business behavior is used as an operator type, a plurality of connected subsets corresponding to the operator type are generated according to media used by an account behavior record, each connected subset is clustered to obtain a group to which an account belongs, and a risk account is determined based on the group to which the account belongs.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In one example of application of the present specification, in order to deal with various illegal actions of black product group, such as organizing many users to swipe bills, purchasing many different identities of trumpet for fraudulent or cheating activities, etc., a third party payment platform is expected to be able to identify suspicious accounts and advance risk prevention and control in advance.
The third party payment platform extracts historical behavior records (account behavior records) of all accounts in last 3 months such as login, password modification, purchase, payment and the like on the mobile device, and takes the identification of the mobile device used by the account for performing the behaviors as an operator type OPaAnd using the WiFi identification of the equipment access network as the operator type OPbOP of the computing type, a combination of a mobile device identity and a WiFi identitycTo perform identification of the risk account.
The server side traverses each extracted historical behavior record, takes the account in the record as a side node, and takes the mobile equipment mark in the recordAnd identifying the WiFi as nodes on the other side, and generating a total medium bipartite graph. Assuming that the generated overall media bipartite graph is shown in fig. 2, where the gear shapes represent mobile device identities, Ma1、Ma2、Ma3And Ma44 different mobile device identities (i.e., 4 media for which the mobile device identifies the media type); wrench shape representing WiFi identification, Mb1And Mb2Two different WiFi identities (i.e., 2 media for which WiFi identifies the media type). For simplicity, the weight of all nodes and edges in the overall media bipartite graph is set to 1.
Pair operator type OPaTaking the specific value of each mobile device identifier as an OPaOperator for extracting OP from the total medium bipartite graphaThe association part of the operator and the account generates OPaThe operator and account of (2), extracting OP from FIG. 2aThe two-part graph of accounts and operators of (1) is shown in fig. 3. Similarly, the OP is extracted from FIG. 2bThe two-part graph of accounts and operators of (1) is shown in fig. 4.
Pair operator type OPcTaking the combination of different mobile equipment identification specific values and WiFi identification specific values as different OPscOperator for extracting OP from the total medium bipartite graphcThe part of the operator and account is associated, and OP is extracted from FIG. 2cThe two-part graph of accounts and operators of (1) is shown in fig. 5.
Applying a connected graph algorithm to the account and the operator bipartite graph (such as fig. 3, fig. 4 and fig. 5) belonging to different operator types respectively to obtain an operator type OPa、OPbAnd OPcThe nodes in each maximum connected graph are a connected subset of the corresponding operator type.
And applying a Louvain community discovery algorithm to each maximum connected graph, and dividing account nodes in each maximum connected graph into groups.
Third party payment platform built to correspond to an operator type OPa、OPbAnd OPcGroup characteristics of the group (e.g., number of accounts in the group, percentage of black samples to the number of group accounts, etc.), business characteristics of the account (e.g., account registrationTime, number of times the account paid within the last week, total amount, etc.) as input risk account models, and training the risk account models by using training data marked with black and white samples.
When a third-party payment platform needs to detect the risk degree of a certain account, corresponding the account to an operation type OPa、OPbAnd OPcThe group characteristics of the group to which the account belongs and the business characteristics of the account are input into the trained risk account model, and the output of the risk account model is the risk degree of the account.
Corresponding to the above flow implementation, the embodiment of the present specification further provides an identification device for a risk account. The apparatus may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, the logical device is formed by reading a corresponding computer program instruction into a memory for running through a Central Processing Unit (CPU) of the device. In terms of hardware, in addition to the CPU, the memory, and the storage shown in fig. 6, the device in which the risk account identification apparatus is located generally includes other hardware such as a chip for performing wireless signal transmission and reception and/or other hardware such as a board for implementing a network communication function.
Fig. 7 is a schematic diagram illustrating an apparatus for identifying a risk account according to an embodiment of the present disclosure, including a record obtaining unit, a connected subset unit, a group clustering unit, and a risk determining unit, where: the record acquisition unit is used for acquiring account behavior records meeting preset statistical conditions, and each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs; the connected subset unit is used for generating a plurality of connected subsets of at least one operator type by taking the N media types as the operator type; the connected subset comprises at least one member account and at least one member operator, and one connected subset comprises all accounts using the at least one member operator in the account behavior record and all operators belonging to the operator type and used by the at least one member account; the operator belonging to a certain operator type includes N media corresponding to N media types constituting the operator type; n is a natural number; the group clustering unit is used for dividing each connected subset into a plurality of groups corresponding to the operator type through clustering; the risk judging unit is used for judging the risk degree of an account based on a group which belongs to the account and corresponds to at least one operator type.
In one example, the risk determining unit is specifically configured to: generating group characteristics according to a group corresponding to at least one operator type to which a certain account belongs, inputting the group characteristics of the account into a risk account model, and determining the risk degree of the account according to the output of the risk account model; the risk account model is a supervised machine learning model.
In the above example, the group characteristics include one or more of the following: the size of the group, the degree of the account in the group, the size of the black sample in the group.
In the above example, the inputting of the risk account model may further include: the business characteristics of the account.
In one implementation, the connected subset unit includes a bipartite graph subunit and a connected algorithm subunit, where: the bipartite graph subunit is used for constructing a bipartite graph of accounts and operators by taking all accounts using operators belonging to the operator type in the account behavior records as nodes on one side, taking all operators belonging to the operator type in the account behavior records as nodes on the other side, and taking at least one account behavior record comprising a certain account and a certain operator as edges; and the communication algorithm subunit is used for obtaining a plurality of maximum communication subgraphs by adopting a communication algorithm on the bipartite graph of the account and the operator, and taking the node in each maximum communication subgraph as a member in a communication subset of the operator type.
In the foregoing implementation manner, the group clustering unit may be specifically configured to: and adopting a community discovery algorithm for the maximum connected subgraph of each connected subset to obtain a plurality of groups corresponding to the operator types.
In the above implementation manner, at least one of the nodes and edges of the bipartite graph has a weight generated according to the account behavior record; the cluster clustering unit is specifically configured to: and according to the weight of the node and the weight of the edge in the maximum connected subgraph of each connected subset or the weight of the node and the edge, obtaining a plurality of groups corresponding to the operator type by adopting a community discovery algorithm for the maximum connected subgraph.
Optionally, the media types include one or more of: equipment identification, mobile terminal number, IP address, media access control MAC address, wireless fidelity WiFi identification, identity card number and bank card number.
Embodiments of the present description provide a computer device that includes a memory and a processor. Wherein the memory has stored thereon a computer program executable by the processor; the processor, when executing the stored computer program, performs the steps of the method for identifying a risk account in the embodiments of the present description. For a detailed description of the steps of the identification method of the risk account, refer to the previous contents and are not repeated.
Embodiments of the present description provide a computer-readable storage medium having stored thereon computer programs which, when executed by a processor, perform the steps of the method for identifying a risk account in embodiments of the present description. For a detailed description of the steps of the identification method of the risk account, refer to the previous contents and are not repeated.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Claims (16)

1. A method of identifying a risk account, comprising:
acquiring account behavior records meeting preset statistical conditions, wherein each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs;
taking N media types as an operator type, and generating a plurality of connected subsets of at least one operator type; the connected subset comprises at least one member account and at least one member operator, and one connected subset comprises all accounts using the at least one member operator in the account behavior record and all operators belonging to the operator type and used by the at least one member account; the operator belonging to a certain operator type includes N media corresponding to N media types constituting the operator type; n is a natural number;
clustering each connected subset based on the degree of closeness of association between member accounts, and dividing each connected subset into a plurality of groups corresponding to the operator type, wherein each group comprises a plurality of member accounts and/or a plurality of member operators in the connected subset;
generating group characteristics according to a group corresponding to at least one operator type to which a certain account belongs, inputting the group characteristics of the account into a risk account model, and determining the risk degree of the account according to the output of the risk account model; the risk account model is a supervised machine learning model.
2. The method of claim 1, the group characteristics comprising one to more of: the size of the group, the degree of the account in the group, the size of the black sample in the group.
3. The method of claim 1, the inputting of the risk account model further comprising: the business characteristics of the account.
4. The method of claim 1, wherein generating a number of connected subsets of at least one operator type with N media types as an operator type comprises:
taking all accounts using operators belonging to the operator type in the account behavior records as nodes on one side, taking all operators belonging to the operator type in the account behavior records as nodes on the other side, and constructing a bipartite graph of the accounts and the operators by taking at least one account behavior record comprising a certain account and a certain operator as edges;
and obtaining a plurality of maximum connected subgraphs by adopting a connected algorithm for the bipartite graph of the account and the operator, and taking nodes in each maximum connected subgraph as members in a connected subset of the operator type.
5. The method of claim 4, the dividing each connected subset into a number of groups corresponding to the operator type by clustering, comprising: and adopting a community discovery algorithm for the maximum connected subgraph of each connected subset to obtain a plurality of groups corresponding to the operator types.
6. The method of claim 5, at least one of the nodes and edges of the bipartite graph having weights generated from the account behavior records;
the dividing, by clustering, each connected subset into a number of groups corresponding to the operator type, comprising: and according to the weight of the node and the weight of the edge in the maximum connected subgraph of each connected subset or the weight of the node and the edge, obtaining a plurality of groups corresponding to the operator type by adopting a community discovery algorithm for the maximum connected subgraph.
7. The method of claim 1, the media types comprising one to more of: equipment identification, mobile terminal number, IP address, media access control MAC address, wireless fidelity WiFi identification, identity card number and bank card number.
8. An apparatus for identifying a risk account, comprising:
the system comprises a record acquisition unit, a storage unit and a processing unit, wherein the record acquisition unit is used for acquiring account behavior records meeting preset statistical conditions, and each record comprises an account where a behavior occurs and a plurality of media used when the behavior occurs;
the connected subset unit is used for generating a plurality of connected subsets of at least one operator type by taking the N media types as one operator type; the connected subset comprises at least one member account and at least one member operator, and one connected subset comprises all accounts using the at least one member operator in the account behavior record and all operators belonging to the operator type and used by the at least one member account; the operator belonging to a certain operator type includes N media corresponding to N media types constituting the operator type; n is a natural number;
the group clustering unit is used for clustering each connected subset based on the degree of closeness of association among the member accounts and dividing each connected subset into a plurality of groups corresponding to the operator types, wherein each group comprises a plurality of member accounts and/or a plurality of member operators in the connected subsets;
the risk judging unit is used for generating group characteristics according to a group which belongs to a certain account and corresponds to at least one operator type, inputting the group characteristics of the account into a risk account model, and determining the risk degree of the account according to the output of the risk account model; the risk account model is a supervised machine learning model.
9. The apparatus of claim 8, the group characteristics comprising one to more of: the size of the group, the degree of the account in the group, the size of the black sample in the group.
10. The apparatus of claim 8, the input of the risk account model further comprising: the business characteristics of the account.
11. The apparatus of claim 8, the connected subset unit comprising:
the bipartite graph subunit is used for constructing a bipartite graph of accounts and operators by taking all accounts using operators belonging to the operator type in the account behavior records as nodes on one side, taking all operators belonging to the operator type in the account behavior records as nodes on the other side, and taking at least one account behavior record comprising a certain account and a certain operator as edges;
and the connected algorithm subunit is used for obtaining a plurality of maximum connected subgraphs by adopting a connected algorithm for the bipartite graph of the account and the operator, and taking the node in each maximum connected subgraph as a member in a connected subset of the operator type.
12. The apparatus of claim 11, the group clustering unit to: and adopting a community discovery algorithm for the maximum connected subgraph of each connected subset to obtain a plurality of groups corresponding to the operator types.
13. The apparatus of claim 12, at least one of nodes and edges of the bipartite graph having weights generated from the account behavior records;
the cluster clustering unit is specifically configured to: and according to the weight of the node and the weight of the edge in the maximum connected subgraph of each connected subset or the weight of the node and the edge, obtaining a plurality of groups corresponding to the operator type by adopting a community discovery algorithm for the maximum connected subgraph.
14. The apparatus of claim 8, the media types comprising one to more of: equipment identification, mobile terminal number, IP address, media access control MAC address, wireless fidelity WiFi identification, identity card number and bank card number.
15. A computer device, comprising: a memory and a processor; the memory having stored thereon a computer program executable by the processor; the processor, when executing the computer program, performs the steps of any of claims 1 to 7.
16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of any one of claims 1 to 7.
CN201810717644.0A 2018-07-03 2018-07-03 Risk account identification method and device Active CN109063966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810717644.0A CN109063966B (en) 2018-07-03 2018-07-03 Risk account identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810717644.0A CN109063966B (en) 2018-07-03 2018-07-03 Risk account identification method and device

Publications (2)

Publication Number Publication Date
CN109063966A CN109063966A (en) 2018-12-21
CN109063966B true CN109063966B (en) 2022-02-01

Family

ID=64819140

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810717644.0A Active CN109063966B (en) 2018-07-03 2018-07-03 Risk account identification method and device

Country Status (1)

Country Link
CN (1) CN109063966B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020866B (en) * 2019-01-22 2023-06-13 创新先进技术有限公司 Training method and device for recognition model and electronic equipment
CN110019818B (en) * 2019-02-14 2024-01-16 创新先进技术有限公司 Method and device for detecting batch registration mailbox
CN110084468B (en) * 2019-03-14 2020-09-01 阿里巴巴集团控股有限公司 Risk identification method and device
CN110288358A (en) * 2019-06-20 2019-09-27 武汉斗鱼网络科技有限公司 A kind of equipment group determines method, apparatus, equipment and medium
CN110851541B (en) * 2019-10-30 2022-09-27 支付宝(杭州)信息技术有限公司 Method and device for generating risk characteristics based on relational graph
CN111160916A (en) * 2019-12-04 2020-05-15 支付宝(杭州)信息技术有限公司 Risk transaction identification method and device
CN112990919A (en) * 2019-12-17 2021-06-18 中国银联股份有限公司 Information processing method and device
CN111340612B (en) * 2020-02-25 2022-12-06 支付宝(杭州)信息技术有限公司 Account risk identification method and device and electronic equipment
CN113554438B (en) * 2020-04-23 2023-12-05 北京京东振世信息技术有限公司 Account identification method and device, electronic equipment and computer readable medium
CN112084422B (en) * 2020-08-31 2024-05-10 腾讯科技(深圳)有限公司 Account data intelligent processing method and device
CN112417176B (en) * 2020-12-09 2024-04-02 交通银行股份有限公司 Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
CN112883363A (en) * 2021-02-05 2021-06-01 上海识装信息科技有限公司 Method for identifying fingerprint collision of equipment
CN113722546B (en) * 2021-08-19 2024-03-12 北京达佳互联信息技术有限公司 Abnormal user account acquisition method and device, electronic equipment and storage medium
CN113988718A (en) * 2021-12-23 2022-01-28 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268271A (en) * 2014-10-13 2015-01-07 北京建筑大学 Interest and network structure double-cohesion social network community discovering method
CN107294974A (en) * 2017-06-26 2017-10-24 阿里巴巴集团控股有限公司 The method and apparatus for recognizing target clique
CN107592296A (en) * 2017-08-02 2018-01-16 阿里巴巴集团控股有限公司 The recognition methods of rubbish account and device
CN108009915A (en) * 2017-12-21 2018-05-08 连连银通电子支付有限公司 A kind of labeling method and relevant apparatus of fraudulent user community

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268271A (en) * 2014-10-13 2015-01-07 北京建筑大学 Interest and network structure double-cohesion social network community discovering method
CN107294974A (en) * 2017-06-26 2017-10-24 阿里巴巴集团控股有限公司 The method and apparatus for recognizing target clique
CN107592296A (en) * 2017-08-02 2018-01-16 阿里巴巴集团控股有限公司 The recognition methods of rubbish account and device
CN108009915A (en) * 2017-12-21 2018-05-08 连连银通电子支付有限公司 A kind of labeling method and relevant apparatus of fraudulent user community

Also Published As

Publication number Publication date
CN109063966A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109063966B (en) Risk account identification method and device
TWI728292B (en) Method and device for identifying suspicious money laundering gang
Bi et al. A big data clustering algorithm for mitigating the risk of customer churn
CN107563757B (en) Data risk identification method and device
US10504120B2 (en) Determining a temporary transaction limit
US20200162492A1 (en) Security weakness and infiltration detection and repair in obfuscated website content
US11568181B2 (en) Extraction of anomaly related rules using data mining and machine learning
CN107705199B (en) Generation method and device of feature calculation code
Wang et al. Representing fine-grained co-occurrences for behavior-based fraud detection in online payment services
KR101722017B1 (en) Method for pear to pear banking using big data analysis and apparatus for performing the same
US11823197B2 (en) Authenticating based on user behavioral transaction patterns
CN111325550A (en) Method and device for identifying fraudulent transaction behaviors
CN107592296A (en) The recognition methods of rubbish account and device
CN108600270A (en) A kind of abnormal user detection method and system based on network log
CN114116802A (en) Data processing method, device, equipment and storage medium of Flink computing framework
CN113656699A (en) User feature vector determination method, related device and medium
CN116562931A (en) Method, device, equipment and storage medium for processing consumption coupon transaction data
AU2022360356A1 (en) Data compression techniques for machine learning models
CN109597851B (en) Feature extraction method and device based on incidence relation
CN113159937A (en) Method and device for identifying risks and electronic equipment
CN115147117A (en) Method, device and equipment for identifying account group with abnormal resource use
Kang Fraud Detection in Mobile Money Transactions Using Machine Learning
KR102409019B1 (en) System and method for risk assessment of financial transactions and computer program for the same
CN111784503B (en) Operation rendering method, system and storage medium of communication credit investigation data
US20230010147A1 (en) Automated determination of accurate data schema

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200922

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant