CN115456788B

CN115456788B - Method, device and equipment for detecting risk group

Info

Publication number: CN115456788B
Application number: CN202211387148.6A
Authority: CN
Inventors: 赵闻飙; 赵文龙; 张天翼; 马博群; 董迹海; 徐恪; 李琦
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-21
Anticipated expiration: 2042-11-07
Also published as: CN115456788A

Abstract

The embodiment of the specification discloses a method, a device and equipment for detecting a risk group, wherein the method comprises the following steps: receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed, and clustering the target data of the plurality of different users to obtain one or more user groups consisting of the target data, wherein the target data comprises one or more different data characteristics; determining the occurrence probability corresponding to each data feature based on target data of a plurality of different users, and respectively determining the number of users contained in each user group and the number of each data feature based on the target data contained in each user group and each user group; and determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature, and further determining the user group with preset risk.

Description

Method, device and equipment for detecting risk group

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for detecting a risk group.

Background

In a risk prevention and control scenario in the financial field, a group clustering algorithm is usually used to obtain some suspicious or risky groups, and in order to determine whether an account in the group has an illegal transaction, a more general detection method is required to perform group exception qualification on the clustered group, which should have general versatility, however, a common processing method is to perform group exception qualification by using a method of spreading risk tags to obtain the concentration of black seeds in the group, but the method depends on the black seed tags, and is not applicable to an unsupervised clustering scenario, and the result obtained by the method based on risk tag spreading depends on the design of the composition method or risk tag spreading method to a great extent, while the qualitative results obtained by the composition methods or risk tag spreading methods designed by different designers may not be consistent, so that not only is the labor cost increased during the design process, and the consumption of human resources increased, but also the stability of the final qualitative result is poor. Therefore, a technical scheme which is simpler and more efficient and can carry out the abnormity qualification of the group without black seed labels is needed to be provided.

Disclosure of Invention

The embodiment of the specification aims to provide a technical scheme which is simpler and more efficient and can carry out the exception qualification of the group without a black seed label.

In order to implement the above technical solution, the embodiments of the present specification are implemented as follows:

the embodiment of the present specification provides a method for detecting a risk group, where the method includes: receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in each user group based on the occurrence probability of each data feature distribution in each user group, and outputting the determined relevant information of the user groups with the preset risks.

The embodiment of the present specification provides a method for detecting a risk group, where the method includes: the method comprises the steps of obtaining target data of a plurality of different users to be processed, clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data characteristic distribution contained in each user group.

The embodiment of the present specification provides a device for detecting a risk group, the device includes: the data acquisition module is used for receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And the data processing module is used for determining the occurrence probability corresponding to each data feature based on the target data of the plurality of different users, and respectively determining the number of the users contained in each user group and the number of each data feature based on the target data contained in each user group and each user group. And the probability determining module is used for determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And the risk group determining module is used for determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data feature distribution contained in each user group, and outputting the determined relevant information of the user groups with the preset risks.

The embodiment of the present specification provides a device for detecting a risk group, the device includes: the data acquisition module is used for acquiring target data of a plurality of different users to be processed and clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And the data statistics module is used for determining the occurrence probability corresponding to each data feature based on the target data of the different users, and respectively determining the number of the users and the number of each data feature contained in each user group based on each user group and the target data contained in each user group. And the probability determining module is used for determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And the group determining module is used for determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data feature distribution contained in each user group.

An embodiment of the present specification provides a detection apparatus for a risk group, where the detection apparatus for a risk group includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of the distribution of each data feature contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in each user group based on the occurrence probability of each data feature distribution in each user group, and outputting the determined relevant information of the user groups with the preset risks.

The embodiment of the present specification provides a detection device for a risk group, where the detection device for a risk group includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: the method comprises the steps of obtaining target data of a plurality of different users to be processed, clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data characteristic distribution contained in each user group.

Embodiments of the present specification also provide a storage medium for storing computer-executable instructions, which when executed by a processor implement the following processes: receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in each user group based on the occurrence probability of each data feature distribution in each user group, and outputting the determined relevant information of the user groups with the preset risks.

The present specification also provides a storage medium for storing computer executable instructions, which when executed by a processor implement the following procedures: the method comprises the steps of obtaining target data of a plurality of different users to be processed, clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics. And respectively determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and the target data contained in each user group. And determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature. And determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data characteristic distribution contained in each user group.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 illustrates an embodiment of a method for risk group detection according to the present disclosure;

FIG. 2 is another embodiment of a method for risk group detection according to the present disclosure;

FIG. 3 is a schematic diagram of a risk group detection process according to the present disclosure;

FIG. 4 is a schematic diagram of another embodiment of a method for risk group detection;

FIG. 5A is a flowchart of another embodiment of a method for risk group detection;

FIG. 5B is a schematic diagram of another risk group detection process described herein;

FIG. 6 is a flowchart of another embodiment of a method for risk group detection;

FIG. 7 is a schematic diagram of an embodiment of a risk group detection device according to the present disclosure;

FIG. 8 is another embodiment of a risk group detection device according to the present disclosure;

fig. 9 is an embodiment of a risk group detection apparatus according to the present disclosure.

Detailed Description

The embodiment of the specification provides a method, a device and equipment for detecting a risk group.

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

Example one

As shown in fig. 1, an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, a smart watch, a vehicle-mounted device, or the like). The server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of financial service or online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example to describe in detail, and for the execution process of the terminal device, reference may be made to the following relevant contents, which are not described herein again. The method may specifically comprise the steps of:

in step S102, target data of a plurality of different users to be processed are obtained, and the target data of the plurality of different users are clustered to obtain one or more user groups formed by the target data, where the target data includes one or more different data characteristics.

In this embodiment, the user may be a user who performs a certain service, for example, the user may be a user who performs online payment, or may be a user who performs online shopping, and in practical applications, the user may use a pre-registered account as identity information of an operator to perform the service. The services may include multiple services, for example, a payment service, a transfer service, a facial recognition service, an online transaction service, and the like, and the service in this embodiment may be one service or may include multiple services, which may be set specifically according to an actual situation, and this is not limited in this embodiment of the present specification. The target data may include data related to the user, or may also include related data generated during a process of performing a certain service by the user, and the like, and the target data may include one data feature (or a data feature with one dimension), or may also include a plurality of different data features (or data features with a plurality of different dimensions), for example, the target data may include whether the location where the user is located is within a specified area, whether the user is over 18 years old (i.e., whether the user is over a preset value), whether the payment amount of the user is over 100 yuan, whether the shopping category to which the goods purchased by the user belong is a clothing category, and the like, which may be set according to actual situations. Based on the above, the target data of each of the plurality of different users may include one or more different data characteristics, for example, the target data of each user includes 20 dimensions of data, so that the target data of the plurality of different users constitutes a large amount of data, for example, 1 ten thousand of target data of different users, wherein the target data of each user includes 20 dimensions of data. The user group may be a group consisting of accounts of the users, i.e. the user group may also be referred to as an account group.

In the implementation, in a risk prevention and control scenario in the financial field and the like, some suspicious or risky groups are usually obtained by using a group clustering algorithm, and in order to determine whether an account in the group has an illegal transaction, a more general detection method is required to perform group exception qualification on the clustered group, which should have general versatility, however, a common processing method is to perform group exception qualification by using a method of obtaining a black seed concentration in the group by risk tag propagation, but the method depends on a black seed tag, and is not suitable for performing unsupervised clustering scenarios, and the result obtained by the risk tag propagation method largely depends on the design of a composition method or a risk tag propagation method, while generally, qualitative results obtained by composition methods or risk tag propagation methods designed by different designers may not be consistent, so that not only is the labor cost increased in the design process, the consumption of human resources increased, but also the stability of the final qualitative result obtained is poor. Therefore, a technical scheme which is simpler and more efficient and can perform the abnormal qualification of the group without a black seed label is needed to be provided. The embodiment of the present specification provides an achievable processing method, which may specifically include the following:

for a certain service or some services, every time a user executes the service, the related information of the service executed by the user and the related information of the user can be recorded, and the recorded information can comprise the time for executing the service, the related information generated by the operation of the user in the process of executing the service, position information, whether the age of the user exceeds a preset value, whether the payment amount of the user exceeds a preset amount, the category of a commodity of a transaction, the related information of the other party of the transaction and the like. Based on the above, the server may record the related information each time the user executes the service, and the server may also perform the recording process when there is another user executing the service, so that the server may record related information of a plurality of different users, related information of each user, and the like, which may be set according to actual situations, and this is not limited in this embodiment of the specification.

It should be noted that, besides the above-mentioned manner, information such as related information of a plurality of different users and related information of each user may also be obtained in other manners, for example, a database may be set, and the database may include related information of a plurality of different users and related information of each user, and based on this, related information of a plurality of different users and related information of each user may be obtained from the above-mentioned database, or related information of a plurality of different users and related information of each user may be obtained from a network or a specified service system, and may be specifically set according to actual situations, which is not limited in the embodiments of this specification.

By the method, the related information of a plurality of different users, the related information of a certain service to be executed and the like can be obtained, the recorded related information can be used as source data, one or more data characteristics (namely, one-dimensional or multi-dimensional data characteristics) are constructed on the basis of the source data, and based on the data characteristics, the target data of each user can be generated, so that the target data of a plurality of different users can be obtained. When data in the near term (such as the last 7 days or the last 10 days) or in a certain historical period (such as within one month of the last 10 months or within the previous month of the current time) needs to be processed, target data of a plurality of different users in the near term or in the certain historical period can be acquired.

After the target data of the multiple different users is obtained in the above manner, the target data of the multiple different users may be clustered, specifically, a clustering algorithm, such as a Kmeans algorithm, a DBSCAN algorithm (i.e., a density-based spatial clustering algorithm), and the like, may be preset, specifically, the target data of the multiple different users may be clustered by using the Kmeans algorithm, so as to obtain one or more user groups formed by the target data, each user group may include one or more different users or accounts, each user has corresponding target data, and the target data may include data characteristics of one dimension or multiple different dimensions. The clustering process by the clustering algorithm may be specifically performed based on a set clustering algorithm, and is not described herein again.

It should be noted that the user group obtained by clustering the target data of the multiple different users may include target data of all users, that is, the target data of the multiple different users are all divided into the user group, for example, target data of 1 ten thousand users may be divided into corresponding user groups after clustering, or the obtained user group may further include target data of some users, that is, target data of some users in the target data of the multiple different users are divided into user groups, target data of the remaining users are not divided into user groups, for example, target data of 1 ten thousand users may be divided into corresponding user groups after clustering, and target data of the remaining 9000 users may be independently present without belonging to any user group, because the significance of finding an abnormal group or a risk group in the process of risk prevention and control is more important, and therefore, target data of the remaining 9000 users may not be detected, and thus, a risk of a user group is often detected, and a risk of a user group is not detected.

In step S104, based on the target data of a plurality of different users, the occurrence probability corresponding to each data feature is determined, and based on each user group and the target data included in each user group, the number of users included in each user group and the number of each data feature are respectively determined.

In implementation, the occurrence probability corresponding to each data feature may be a global probability of each data feature, based on which, the occurrence number of each data feature may be obtained from the target data of multiple different users, and the number of multiple different users may be calculated, and the obtained occurrence number of each data feature may be divided by the calculated number of users to obtain a corresponding result, which may be the occurrence probability corresponding to each data feature.

The above is to process the global data, and then, may process the user groups, specifically, for each user group, the number of users included in each user group may be calculated, and a group identifier of each user group may be obtained, and then, for any user group, the number of each data feature may be obtained from target data included in the user group, for example, target data of 5 users is included in a certain user group, the target data includes a data feature a, a data feature B, and a data feature C, the number of the data feature a is 2, the number of the data feature B is 1, and the number of the data feature C is 1, and the group identifier of the user group is 001. In the same manner, the number of each data feature in each user group can be obtained from the target data contained in each user group.

In step S106, the probability of occurrence of each data feature distribution included in each user group is determined based on the probability of occurrence corresponding to each data feature, and the number of users included in each user group and the number of each data feature.

In implementation, a corresponding algorithm may be preset according to practical situations such as expert experience or business requirements, and the occurrence probability of each data feature distribution included in each user group may be calculated by using the above algorithm based on the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature. The above algorithm may be an algorithm regarding the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature.

In step S108, a user group with a preset risk among one or more user groups is determined based on the occurrence probability of each data feature distribution included in each user group.

In implementation, a corresponding threshold may be preset according to an actual situation, the occurrence probability of each data feature distribution included in each user group may be compared with the threshold to obtain a corresponding comparison result, and a user group with a preset risk in the one or more user groups may be determined based on the obtained comparison result, for example, if the occurrence probability of the data feature distribution included in a certain user group is greater than the number of the threshold, it may be determined that the user group has the preset risk, and otherwise, it may be determined that the user group does not have the preset risk.

The embodiment of the specification provides a method for detecting a risk group, which comprises the steps of obtaining target data of a plurality of different users to be processed, clustering the target data of the plurality of different users to obtain one or more user groups consisting of the target data, wherein the target data comprises one or more different data characteristics, then determining the occurrence probability corresponding to each data characteristic based on the target data of the plurality of different users, determining the occurrence probability of each data characteristic contained in each user group and the number of each data characteristic based on the target data contained in each user group and each user group, and then determining the occurrence probability of each data characteristic distribution contained in each user group and the number of each data characteristic based on the number of users contained in each user group and the number of each data characteristic, thereby determining the user group with a preset risk.

Example two

As shown in fig. 2, an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, a smart watch, a vehicle-mounted device, or the like). The server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of financial service or online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example to describe in detail, and for the execution process of the terminal device, reference may be made to the following relevant contents, which are not described herein again. The method may specifically comprise the steps of:

in step S202, target data of a plurality of different users to be processed is obtained, and the target data of the plurality of different users is clustered to obtain one or more user groups formed by the target data, where the target data includes one or more different data characteristics.

The specific processing procedure of step S202 may refer to relevant contents in the first embodiment, and is not described herein again.

In step S204, target data of a plurality of different users are encoded to obtain encoded information corresponding to each data feature of each user.

In an embodiment, the target data may be encoded in a plurality of different manners, for example, an encoding manner may be set using mutually exclusive information, specifically, yes or no may be used to encode each data feature in the target data, for example, the target data includes two data features, that is, whether the position of the user is within a preset area range and whether the user age is greater than 60 years, for the data feature, whether the position of the user is within the preset area range, if the position of the user is within the preset area range, the obtained encoded information after the encoding process is yes, for the data feature, whether the user age is greater than 60 years, if the user age is not greater than 60 years, the obtained encoded information after the encoding process is no, and the like.

In addition, the above-mentioned only uses the mutually exclusive information setting coding method to perform the coding process on the target data, but in practical application, more than two kinds of information setting coding methods may be used, for example, multiple information such as "1", "2", "3" … may be set to set the corresponding coding methods, wherein the characters in the multiple information may represent different meanings, specifically, for example, "1" represents a certain data interval, such as 0-0.3, etc., "2" represents another data interval, such as 0.3-0.5, etc., "3" represents another data interval, such as 0.5-0.8, etc., and furthermore, the above-mentioned multiple different data intervals may also be set to obtain the corresponding coding methods, which may be set according to practical situations, and this is not limited by the embodiments of this specification.

The specific processing manner of the step S204 may be various, and an optional processing manner is provided as follows, which may specifically include the following: and respectively carrying out coding processing on each data feature of each user in the target data of a plurality of different users through a binary binomial feature coding mode to obtain binary coded information corresponding to each data feature of each user.

The binarization is a method of performing encoding processing by using mutually exclusive numerical values or information of 2 contents in an encoding processing process, for example, the binarization may be "0" and "1", where "0" and "1" may represent mutually exclusive numerical values of 2 contents, such as "0" represents "no", "1" represents "yes", and the like, which may be specifically set according to an actual situation, and this embodiment of the present specification does not limit this.

In an implementation, each data feature of each user in target data of multiple different users may be encoded by a binarized binomial feature encoding method to obtain binarized encoded information corresponding to each data feature of each user, for example, each data feature in the target data may be encoded by using "1" or "0", for example, if the target data includes two data features, that is, "whether the position of the user is within a preset region range" and "whether the age of the user is greater than 60 years", and for the data feature, "whether the position of the user is within the preset region range", if the position of the user is within the preset region range, the encoded information that can be obtained after the encoding process is performed is "1", and for the data feature, "whether the age of the user is greater than 60", and if the age of the user is not greater than 60 years, the encoded information that can be obtained after the encoding process is "0", and the like, which may be specifically set according to practical situations, and this specification embodiment does not limit.

In step S206, based on the coding information corresponding to each data feature of each user, the occurrence probability corresponding to each data feature in the target data of a plurality of different users is determined.

In implementation, as shown in fig. 3, target data of a plurality of different users may be used as global data, and in the case of global data, the number of pieces of encoded information corresponding to each data feature may be calculated according to encoded information corresponding to each data feature of each user, for example, the number of occurrences of "1" in the global data, where the encoded information is data feature "whether the position of the user is within a preset region range", may be counted, so as to obtain the number of pieces of encoded information corresponding to each data feature. The occurrence probability corresponding to the data feature may be obtained by dividing the number of the coded information corresponding to each data feature by the number of the users included in the global data, so that the occurrence probability corresponding to each data feature in the target data of a plurality of different users may be obtained, which may be specifically shown in table 1.

TABLE 1

Wherein dim1 and dim2 … dimN represent N different data characteristics, and 0.2, 0.052 and 0.3 … 0.6.6 respectively represent the occurrence probabilities corresponding to the corresponding data characteristics.

In step S208, the number of users included in each user group and the number of each data feature are respectively determined based on each user group and the target data included in each user group.

In implementation, as shown in fig. 3, the number of users included in each user group may be counted based on each user group and the coding information corresponding to each data feature in the target data included in each user group, and the number of coding information corresponding to each data feature may be counted to obtain the number of each data feature, which may be specifically shown in table 2.

TABLE 2

In step S210, the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature are respectively input into the probability mass function PMF, so as to obtain the occurrence probability of each data feature distribution included in each user group.

The Probability Mass Function PMF (Probability Mass Function) is the Probability of each specific value of the discrete random variable, and the Probability Mass Function may be defined on any discrete random variable, including constant distribution, binomial distribution (including Bernoulli distribution), negative binomial distribution, poisson distribution, geometric distribution, and hyper-geometric distribution random variable.

In implementation, the PMF calculation formula for the binomial distribution of the single variables is as follows:

wherein p represents the occurrence probability corresponding to each data feature (or may be a global probability that the coding information corresponding to each data feature is 1), k represents the number of each data feature contained in each user group (or may be a number that the coding information corresponding to each data feature contained in each user group is 1), and n represents the number of users contained in each user group. The occurrence probability p corresponding to each data feature, the number n of users included in each user group, and the number k of each data feature may be input into the PMF calculation formula to obtain a corresponding calculation result, where the calculation result is the occurrence probability of each data feature distribution included in each user group, and may be specifically shown in table 3.

TABLE 3

In step S212, based on the occurrence probability of each data feature distribution included in each user group and a preset probability threshold, data features with occurrence probabilities smaller than the preset probability threshold in each user group are determined.

The preset probability threshold may be a preset threshold, specifically, 50% or 30%, and may be specifically set according to an actual situation, which is not limited in the embodiment of the present specification.

In step S214, the data features having the occurrence probability smaller than the preset probability threshold in each user group are aggregated to obtain the probability that each user group has the preset risk.

The preset risk can be set according to the actual situation, such as fraud risk, illegal financial activity and the like.

In implementation, a corresponding aggregation algorithm, such as an Aggregator function, may be set according to actual conditions. For example, an Aggregator function may be used to aggregate the data features of which the occurrence probability is smaller than a preset probability threshold in each user group, so as to obtain a final aggregation result, where the aggregation result may represent a score value of each user group with a preset risk, and the obtained score value of each user group with a preset risk may be used as the probability of each user group with a preset risk.

In step S216, a user group with a preset risk in one or more user groups is determined based on the probability of the preset risk in each user group and a preset risk probability threshold.

The risk probability threshold may be a preset threshold, specifically, 60% or 80%.

In implementation, if the probability that a certain user group has a preset risk is greater than a preset risk probability threshold, it may be determined that the user group has the preset risk, and then, a user group having the preset risk in one or more user groups may be obtained. If the probability that a certain user group has a preset risk is smaller than a preset risk probability threshold, it can be determined that the user group does not have the preset risk.

The embodiment of the specification provides a method for detecting a risk group, which includes obtaining target data of a plurality of different users to be processed, clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, determining occurrence probability corresponding to each data feature based on the target data of the plurality of different users, determining the number of users and the number of each data feature contained in each user group based on the target data contained in each user group and each user group, and determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature and the number of each data feature contained in each user group, thereby determining the user group with a preset risk.

EXAMPLE III

As shown in fig. 4, an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone and a tablet computer, or may also be a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, a smart watch, a vehicle-mounted device, and the like). The server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of financial service or online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example to describe in detail, and for the execution process of the terminal device, reference may be made to the following relevant contents, which are not described herein again. The method may specifically comprise the steps of:

in step S402, target data of a plurality of different users to be processed are obtained, and the target data of the plurality of different users are clustered to obtain one or more user groups formed by the target data, where the target data includes one or more different data characteristics.

In step S404, each data feature of each user in the target data of multiple different users is encoded by a binary feature encoding method, so as to obtain binary encoded information corresponding to each data feature of each user, where the two values in the binarization are mutually exclusive and are 0 and 1, respectively.

In step S406, based on the coding information corresponding to each data feature of each user, the occurrence probability corresponding to each data feature in the target data of a plurality of different users is determined.

In step S408, the number of users included in each user group and the number of each data feature are respectively determined based on each user group and the target data included in each user group.

In step S410, the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature are respectively input into the probability mass function PMF, so as to obtain the occurrence probability of each data feature distribution included in each user group.

The specific processing of the steps S402 to S410 can refer to the related contents in the first embodiment and the second embodiment, and will not be described herein again.

In step S412, according to the occurrence probability of each data feature distribution included in each user group, the occurrence probabilities of the data features included in each user group are sorted from small to large to obtain the sorted data features in each user group.

In implementation, as shown in fig. 3, if the probability of occurrence of a certain data feature distribution included in a certain user group is lower, it indicates that the user group is more abnormal on the data feature, and therefore, the probability of occurrence of the data feature included in each user group may be sorted from small to large according to the probability of occurrence of each data feature distribution included in each user group, so as to obtain the sorted data feature in each user group.

In step S414, the top N data features are selected from the sorted data features in each user group, where N is an integer greater than or equal to 1.

In step S416, the N data features arranged at the top, which are selected from the sorted data features in each user group, are aggregated to obtain the probability that each user group has the preset risk.

The preset risk can be set according to actual conditions, such as fraud risk, illegal financial activities and the like.

In implementation, as shown in fig. 3, a corresponding aggregation algorithm, specifically, an Aggregator function, may be set according to actual conditions. For example, an Aggregator function may be used to aggregate N data features arranged in front, which are selected from the sorted data features in each user group, to obtain a final aggregation result, where the aggregation result may represent a score value of a preset risk existing in each user group, and the obtained score value of the preset risk existing in each user group may be used as a probability of the preset risk existing in each user group.

In step S418, a user group with a preset risk in one or more user groups is determined based on the probability of the preset risk in each user group and a preset risk probability threshold.

Example four

As shown in fig. 5A and 5B, an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone and a tablet computer, or may also be a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, a smart watch, a vehicle-mounted device, etc.). The server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of financial service or online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example to describe in detail, and for the execution process of the terminal device, reference may be made to the following relevant contents, which are not described herein again. The method may specifically comprise the steps of:

in step S502, a detection request of a risk group is received, target data of a plurality of different users to be processed is obtained based on the detection request, and the target data of the plurality of different users is clustered to obtain one or more user groups formed by the target data, where the target data includes one or more different data features.

In implementation, when data in a recent period (for example, the recent 7 days or the recent 10 days) or within a certain historical period (for example, within one month of the last 10 months or within the previous month of the current time) needs to be processed, the risk detecting party may send a detection request of a risk group to the server through the management device, and the server may receive the detection request of the risk group, may obtain target data of a plurality of different users to be processed based on the detection request of the risk group, and perform clustering processing on the target data of the plurality of different users to obtain one or more user groups formed by the target data, where specific processing procedures may refer to the foregoing related contents, and are not described herein again.

It should be noted that the risk detection party may be an operator of a specific service, specifically, an operator of a certain financial service (such as a payment service or a transfer service) or an instant application service, or a user of the specific service, and may be specifically set according to an actual situation, which is not limited in this embodiment of the present specification.

In step S504, based on the target data of a plurality of different users, the occurrence probability corresponding to each data feature is determined, and based on each user group and the target data included in each user group, the number of users included in each user group and the number of each data feature are respectively determined.

In step S506, the probability of occurrence of each data feature distribution included in each user group is determined based on the probability of occurrence corresponding to each data feature, and the number of users included in each user group and the number of each data feature.

In step S508, a user group with a preset risk in one or more user groups is determined based on the occurrence probability of each data feature distribution included in each user group, and the determined related information of the user group with the preset risk is output.

In implementation, after obtaining a user group with a preset risk in one or more user groups in the above manner, the relevant information (such as group identifier, code, association diagram, etc.) of the determined user group with the preset risk may be obtained, and the obtained relevant information of the user group with the preset risk may be output to the management device of the risk detection party, so that the risk detection party may check the relevant information of the determined user group with the preset risk through the management device, and subsequently, the risk detection party may mark the user group with the preset risk, and may use the determined user group with the preset risk as a risk group or an abnormal group, etc.

The embodiment of the specification provides a method for detecting a risk group, which comprises the steps of receiving a detection request of the risk group, obtaining target data of a plurality of different users to be processed based on the detection request, clustering the target data of the plurality of different users to obtain one or more user groups consisting of the target data, wherein the target data comprises one or more different data characteristics, then determining the occurrence probability corresponding to each data characteristic based on the target data of the plurality of different users, determining the number of users contained in each user group and the number of each data characteristic based on the target data contained in each user group and each user group, and finally outputting relevant information of the user groups with the determined preset risks based on the occurrence probability corresponding to each data characteristic and the number of users contained in each user group and the number of each data characteristic, thereby realizing the group by using qualitative data statistics, wherein the method is a qualitative data group, the method is an abnormal qualitative and statistical method, and the method is a method which is based on a set of abnormal qualitative statistical methods, and is based on abnormal statistical methods, and the abnormal qualitative statistical methods are based on the general data group, and the abnormal statistical methods are based on the abnormal statistical methods, and the abnormal qualitative statistical methods, and the abnormal statistical methods support the calculation methods, and the abnormal quantitative method can be used for calculating abnormal group, and the abnormal qualitative and the abnormal group, and the method can be used for calculating the abnormal group, and can be used for the method, and can be used for calculating the method, the result was stable.

EXAMPLE five

As shown in fig. 6, an execution subject of the method may be a terminal device or a server, where the terminal device may be a certain terminal device such as a mobile phone, a tablet computer, or a computer device such as a notebook computer or a desktop computer, or may also be an IoT device (specifically, a smart watch, a vehicle-mounted device, or the like). The server may be an independent server, or a server cluster formed by a plurality of servers, and the server may be a background server of financial service or online shopping service, or a background server of an application program. In this embodiment, a server is taken as an example to describe in detail, and for the execution process of the terminal device, reference may be made to the following relevant contents, which are not described herein again. The method may specifically comprise the steps of:

in step S602, a detection request of a risk group is received, target data of a plurality of different users to be processed is obtained based on the detection request, and the target data of the plurality of different users is clustered, so as to obtain one or more user groups formed by the target data, where the target data includes one or more different data features.

In step S604, target data of a plurality of different users is encoded to obtain encoded information corresponding to each data feature of each user.

The specific processing manner of step S604 may be various, and the following provides an optional processing manner, which may specifically include the following: and respectively carrying out coding processing on each data feature of each user in the target data of a plurality of different users through a binary binomial feature coding mode to obtain binary coded information corresponding to each data feature of each user, wherein the binary in the binarization is mutually exclusive.

In step S606, based on the coding information corresponding to each data feature of each user, the occurrence probability corresponding to each data feature in the target data of a plurality of different users is determined.

In step S608, the number of users included in each user group and the number of each data feature are respectively determined based on each user group and the target data included in each user group.

In step S610, the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature are respectively input into the probability mass function PMF, so as to obtain the occurrence probability of each data feature distribution included in each user group.

The processing manner of step S506 in the fourth embodiment may be various, and an optional processing manner is provided below, and may specifically include the following processing of steps S612 to S618.

In step S612, according to the occurrence probability of each data feature distribution included in each user group, the occurrence probabilities of the data features included in each user group are sorted from small to large to obtain the sorted data features in each user group.

In step S614, N top-ranked data features are selected from the sorted data features in each user group, where N is an integer greater than or equal to 1.

In step S616, the N data features arranged at the top selected from the sorted data features in each user group are aggregated to obtain the probability that each user group has the preset risk.

In step S618, a user group with a preset risk in one or more user groups is determined based on the probability that each user group has the preset risk and a preset risk probability threshold, and the determined related information of the user group with the preset risk is output.

The processing manner of step S506 in the fourth embodiment may be various, and an optional processing manner is provided below, which may specifically include the following processing of step A2 to step A6.

In step A2, based on the occurrence probability of each data feature distribution included in each user group and a preset probability threshold, determining the data feature of which the occurrence probability is smaller than the preset probability threshold in each user group.

In step A4, the data features of which the occurrence probability is smaller than the preset probability threshold in each user group are aggregated to obtain the probability of each user group having the preset risk.

In step A6, based on the probability that each user group has the preset risk and the preset risk probability threshold, determining the user group having the preset risk in one or more user groups, and outputting the determined related information of the user group having the preset risk.

The specific processing in the steps S602 to S618 and the steps A2 to A6 can refer to the related contents in the first to fourth embodiments, and will not be described herein again.

EXAMPLE six

Based on the same idea, the method for detecting a risk group provided in the embodiment of the present specification further provides a device for detecting a risk group, as shown in fig. 7.

The detection device of the risk group comprises: a data acquisition module 701, a data processing module 702, a probability determination module 703, and a risk group determination module 704, wherein:

a data obtaining module 701, configured to receive a detection request of a risk group, obtain target data of multiple different users to be processed based on the detection request, and perform clustering processing on the target data of the multiple different users to obtain one or more user groups formed by the target data, where the target data includes one or more different data characteristics;

a data processing module 702, configured to determine, based on target data of the multiple different users, a probability of occurrence corresponding to each data feature, and determine, based on each user group and target data included in each user group, a number of users included in each user group and a number of each data feature respectively;

a probability determining module 703, configured to determine occurrence probability of distribution of each data feature included in each user group based on the occurrence probability corresponding to each data feature, and the number of users and the number of each data feature included in each user group;

the risk group determining module 704 determines a user group with a preset risk in one or more user groups based on the occurrence probability of each data feature distribution included in each user group, and outputs the determined related information of the user group with the preset risk.

In the embodiment of this specification, the method further includes:

the coding module is used for coding the target data of the different users to obtain coding information corresponding to each data characteristic of each user;

the data processing module 702 determines the occurrence probability corresponding to each data feature in the target data of the multiple different users based on the coding information corresponding to each data feature of each user.

In this embodiment of the present specification, the encoding module performs encoding processing on each data feature of each user in the target data of the multiple different users respectively through a binary binomial feature encoding manner, to obtain binary encoded information corresponding to each data feature of each user, where the binary in the binarization is mutually exclusive.

In this embodiment of the present specification, the binary values in the binarization are 0 and 1, and the probability determining module 703 inputs the occurrence probability corresponding to each data feature, and the number of users and the number of data features included in each user group into a probability quality function PMF, so as to obtain the occurrence probability of each data feature distribution included in each user group.

In this embodiment, the risk group determining module 704 includes:

the first processing unit is used for determining the data features of which the occurrence probability is smaller than a preset probability threshold value in each user group based on the occurrence probability of each data feature distribution contained in each user group and the preset probability threshold value;

the first aggregation unit is used for aggregating the data characteristics of which the occurrence probability in each user group is smaller than the preset probability threshold value to obtain the probability of the preset risk of each user group;

and the first risk group determining unit is used for determining a user group with a preset risk in one or more user groups based on the probability of the preset risk in each user group and a preset risk probability threshold.

In this embodiment, the risk group determining module 704 includes:

the second processing unit is used for sequencing the occurrence probability of the data features contained in each user group from small to large according to the occurrence probability of each data feature distribution contained in each user group to obtain the sequenced data features in each user group;

the third processing unit selects N data characteristics arranged at the top from the sorted data characteristics in each user group, wherein N is an integer greater than or equal to 1;

the second aggregation unit is used for aggregating N data characteristics which are selected from the sorted data characteristics in each user group and arranged in the front to obtain the probability of each user group having a preset risk;

and the second risk group determining unit is used for determining a user group with a preset risk in one or more user groups based on the probability of the preset risk in each user group and a preset risk probability threshold.

In an embodiment of the present specification, the preset risk is a fraud risk or an illegal financial activity.

The embodiment of the present specification provides a detection apparatus for risk groups, which obtains target data of a plurality of different users to be processed based on a detection request of a risk group, and performs clustering processing on the target data of the plurality of different users to obtain one or more user groups composed of the target data, where the target data includes one or more different data features, then determines an occurrence probability corresponding to each data feature based on the target data of the plurality of different users, and determines the number of users included in each user group and the number of each data feature based on the target data included in each user group and each user group, and then determines the number of users included in each user group and the number of each data feature based on the occurrence probability corresponding to each data feature and the number of users included in each user group and the number of each data feature, determining the occurrence probability of each data feature distribution contained in each user group, further determining the user group with the preset risk, and finally outputting the relevant information of the determined user group with the preset risk, thus realizing group qualification by using data statistics, wherein the method is a universal unsupervised group qualification method and does not depend on any label data, in addition, a group qualification flow is designed, the flow uses the feature distribution as the basis, the abnormal probability of the group is calculated, and the abnormal probability of the group is finally summarized to obtain the abnormal degree of the group, the calculated abnormal index supports parallel calculation, supports the simultaneous parallel abnormal qualification of the large data abnormal group, the efficiency is higher, in addition, the method is a group abnormal degree judgment method based on the data statistics, and the method does not depend on manual experience, the result was stable.

EXAMPLE seven

Based on the same idea, the embodiments of the present specification further provide a device for detecting a risk group, as shown in fig. 8.

The detection device of the risk group comprises: a data acquisition module 801, a data statistics module 802, a probability determination module 803, and a group determination module 804, wherein:

a data obtaining module 801, configured to obtain target data of multiple different users to be processed, and perform clustering processing on the target data of the multiple different users to obtain one or more user groups formed by the target data, where the target data includes one or more different data characteristics;

a data statistics module 802, configured to determine occurrence probabilities corresponding to each data feature based on the target data of the multiple different users, and determine the number of users and the number of each data feature included in each user group based on the target data included in each user group and each user group;

a probability determining module 803, configured to determine an occurrence probability of each data feature distribution included in each user group based on the occurrence probability corresponding to each data feature, and the number of users and the number of each data feature included in each user group;

a group determining module 804, configured to determine a user group with a preset risk among one or more user groups based on the probability of occurrence of each data feature distribution included in each user group.

In an embodiment of this specification, the apparatus further includes:

the data statistics module 802 determines, based on the coding information corresponding to each data feature of each user, an occurrence probability corresponding to each data feature in the target data of the multiple different users.

In this embodiment, the encoding module performs encoding processing on each data feature of each user in the target data of multiple different users respectively through a binary feature encoding manner, to obtain binary encoded information corresponding to each data feature of each user, where the binary values in the binarization are mutually exclusive.

In this embodiment of the present specification, the binary values in the binarization are 0 and 1, and the probability determining module 803 inputs the occurrence probability corresponding to each data feature, and the number of users and the number of data features included in each user group into a probability quality function PMF, so as to obtain the occurrence probability of each data feature distribution included in each user group.

In this embodiment of the present specification, the group determining module 804 includes:

the characteristic selecting unit is used for determining the data characteristics of which the occurrence probability is smaller than the preset probability threshold value in each user group based on the occurrence probability of each data characteristic distribution contained in each user group and the preset probability threshold value;

the aggregation unit is used for aggregating the data characteristics of which the occurrence probability in each user group is smaller than the preset probability threshold value to obtain the probability of the preset risk of each user group;

and the group determining unit is used for determining the user group with the preset risk in one or more user groups based on the probability of the preset risk in each user group and a preset risk probability threshold.

the sequencing unit is used for sequencing the occurrence probability of the data features contained in each user group from small to large according to the occurrence probability of each data feature distribution contained in each user group to obtain the sequenced data features in each user group;

the selecting unit is used for selecting N data characteristics arranged at the front from the sorted data characteristics in each user group, wherein N is an integer greater than or equal to 1;

the aggregation unit is used for aggregating N data characteristics which are selected from the sorted data characteristics in each user group and arranged in the front to obtain the probability of each user group having a preset risk;

The embodiment of the present specification provides a device for detecting a risk group, which obtains target data of a plurality of different users to be processed, and performs clustering processing on the target data of the plurality of different users to obtain one or more user groups composed of the target data, wherein the target data includes one or more different data features, then determines a probability of occurrence corresponding to each data feature based on the target data of the plurality of different users, and determines a number of users and a number of each data feature included in each user group based on the target data included in each user group and each user group, and then determines a probability of occurrence of each data feature distribution included in each user group based on the probability of occurrence corresponding to each data feature and the number of each data feature included in each user group, thereby determining a user group with a preset risk, thus using data statistics to realize group qualification, which is a general unsupervised group qualification mode, which does not rely on any label data, in addition, designing a set of qualitative distribution using features as a basic flow, and calculating a final abnormal degree of abnormal data based on an abnormal statistical mode, which supports a high degree of abnormal group calculation, and further, which supports a method supports a manual statistical efficiency.

Example eight

Based on the same idea, the detection apparatus for a risk group provided in the embodiment of the present specification further provides a detection device for a risk group, as shown in fig. 9.

The detection device of the risk group may provide a terminal device or a server for the above-described embodiments.

The detection devices of the risk groups may vary significantly due to different configurations or capabilities, and may include one or more processors 901 and memory 902, where the memory 902 may store one or more stored applications or data. Memory 902 may be, among other things, transient storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in a detection device for a risk group. Still further, processor 901 may be configured to communicate with memory 902 to execute a series of computer-executable instructions in memory 902 on a detection device of a risk group. The detection apparatus of the risk group may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906.

In particular, in this embodiment, the detection apparatus for risk groups comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instructions for the detection apparatus for risk groups, and the one or more programs configured to be executed by one or more processors comprise computer-executable instructions for:

receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics;

determining the occurrence probability corresponding to each data feature based on the target data of the different users, and respectively determining the number of the users contained in each user group and the number of each data feature based on the target data contained in each user group and each user group;

determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature;

and determining one or more user groups with preset risks in each user group based on the occurrence probability of each data feature distribution in each user group, and outputting the determined relevant information of the user groups with the preset risks.

In the embodiment of this specification, the method further includes:

encoding the target data of the different users to obtain encoding information corresponding to each data feature of each user;

the determining the occurrence probability corresponding to each data feature based on the target data of the plurality of different users comprises:

and determining the occurrence probability corresponding to each data feature in the target data of the plurality of different users based on the coding information corresponding to each data feature of each user.

In this embodiment of this specification, the encoding processing on the target data of the multiple different users to obtain the encoding information corresponding to each data feature of each user includes:

and respectively carrying out coding processing on each data feature of each user in the target data of the plurality of different users through a binary feature coding mode to obtain binary coded information corresponding to each data feature of each user, wherein the binary codes in the binary process are mutually exclusive.

In this embodiment of the present specification, the binarizing is performed by setting binary values to 0 and 1, and determining the occurrence probability of each data feature distribution included in each user group based on the occurrence probability corresponding to each data feature, and the number of users and the number of data features included in each user group includes:

and respectively inputting the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature into a probability quality function PMF to obtain the occurrence probability of each data feature distribution contained in each user group.

In this embodiment of the present specification, the determining, based on the probability of occurrence of each data feature distribution included in each user group, a user group with a preset risk in one or more user groups includes:

determining data features of which the occurrence probability is smaller than a preset probability threshold in each user group based on the occurrence probability of each data feature distribution contained in each user group and the preset probability threshold;

aggregating the data features of which the occurrence probability is smaller than the preset probability threshold value in each user group to obtain the probability of the preset risk of each user group;

and determining the user groups with preset risks in one or more user groups based on the probability of the preset risks in each user group and a preset risk probability threshold.

according to the occurrence probability of each data feature distribution contained in each user group, sequencing the occurrence probability of the data features contained in each user group from small to large to obtain the sequenced data features in each user group;

selecting N data characteristics arranged at the front from the sorted data characteristics in each user group, wherein N is an integer greater than or equal to 1;

aggregating N data characteristics which are selected from the sorted data characteristics in each user group and arranged in front to obtain the probability of the preset risk of each user group;

Further, in particular in this embodiment, the detection apparatus for risk groups comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instructions for the detection apparatus for risk groups, and the one or more programs configured to be executed by the one or more processors comprise computer-executable instructions for:

acquiring target data of a plurality of different users to be processed, and clustering the target data of the plurality of different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics;

and determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data characteristic distribution contained in each user group.

The embodiment of the present specification provides a detection device for risk groups, which obtains target data of a plurality of different users to be processed based on a detection request of a risk group, and performs clustering processing on the target data of the plurality of different users to obtain one or more user groups formed by the target data, where the target data includes one or more different data features, then determines an occurrence probability corresponding to each data feature based on the target data of the plurality of different users, and determines the number of users included in each user group and the number of each data feature respectively based on the target data included in each user group and each user group, and then determines the number of users included in each user group and the number of each data feature based on the occurrence probability corresponding to each data feature and the number of users included in each user group and the number of each data feature, determining the occurrence probability of each data feature distribution contained in each user group, further determining the user group with a preset risk, and finally outputting the relevant information of the determined user group with the preset risk, thus realizing group qualification by using data statistics, wherein the method is a universal unsupervised group qualification method and does not depend on any label data, in addition, a group qualification flow is designed, the flow uses the feature distribution as a basis, the abnormal probability of the group is calculated, and the abnormal probability of the group is finally summarized to obtain the abnormal degree of the group, the calculated abnormal index supports parallel calculation, supports the parallel abnormal qualification of large data abnormal groups simultaneously, has high efficiency, in addition, the method is an abnormal degree judgment method based on the data statistics, and does not depend on manual experience, the result was stable.

Example nine

Further, based on the methods shown in fig. 1 to fig. 6, one or more embodiments of the present specification further provide a storage medium for storing computer-executable instruction information, in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and when executed by a processor, the storage medium stores the computer-executable instruction information, which can implement the following processes:

The embodiment of the specification further comprises:

In this embodiment of the specification, the binarizing is performed to obtain binary values of 0 and 1, and the determining, based on the occurrence probability corresponding to each data feature, the number of users included in each user group, and the number of each data feature, the occurrence probability of each data feature distribution included in each user group includes:

aggregating the N data characteristics which are selected from the sorted data characteristics in each user group and arranged in the front to obtain the probability of each user group having a preset risk;

In addition, in another specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, or the like, and when the storage medium stores computer executable instruction information, the storage medium can implement the following process when executed by a processor:

Embodiments of the present specification provide a storage medium that, by receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, clustering the target data of multiple different users to obtain one or more user groups composed of target data, wherein, the target data comprises one or more different data characteristics, then, based on the target data of a plurality of different users, the occurrence probability corresponding to each data characteristic is determined, and respectively determining the number of users contained in each user group and the number of each data feature based on each user group and target data contained in each user group, and then, based on the corresponding occurrence probability of each data feature, and the number of users and the number of each data feature contained in each user group, determining the probability of occurrence of each data feature distribution contained in each user group, further determining the user group with the preset risk, and finally outputting the determined relevant information of the user group with the preset risk, thus realizing group qualification by utilizing data statistics, the method is a universal unsupervised group qualitative method, does not depend on any label data, and designs a group qualitative process, the process uses the distribution of the characteristics as the basis, calculates the abnormal probability of the group, finally summarizes the abnormal probability of the group to obtain the abnormal degree of the group, the above-mentioned calculated abnormal index supports parallel calculation, supports parallel abnormal qualification for large data abnormal group at the same time, and has high efficiency, this method is a group abnormality degree determination method based on data statistics, and the method does not depend on manual experience, and the result has stability.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as ABEL (Advanced Boolean Expression Language), AHDL (alternate Hardware Description Language), traffic, CUPL (core universal Programming Language), HDCal, jhddl (Java Hardware Description Language), lava, lola, HDL, PALASM, rhyd (Hardware Description Language), and vhigh-Language (Hardware Description Language), which is currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in purely computer readable program code means, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable fraud case serial-parallel apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable fraud case serial-parallel apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable fraud case to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable fraud case serial-parallel apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present application. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of detecting a risk group, the method comprising:

determining the occurrence probability of the distribution of each data feature contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature;

2. The method of claim 1, further comprising:

3. The method according to claim 2, wherein the encoding the target data of the plurality of different users to obtain the encoded information corresponding to each data feature of each user comprises:

4. The method according to claim 3, wherein the binary values in the binarization are 0 and 1, respectively, and the determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, and the number of users contained in each user group and the number of each data feature comprises:

5. The method of claim 1, wherein the determining a user group with a preset risk among one or more user groups based on the probability of occurrence of each data feature distribution included in each user group comprises:

6. The method of claim 1, wherein the determining a user group with a preset risk among one or more user groups based on the probability of occurrence of each data feature distribution included in each user group comprises:

7. The method of claim 1, wherein the predetermined risk is a risk of fraud or illegal financial activity.

8. A method of detecting a risk group, the method comprising:

9. A device for detection of a risk group, the device comprising:

the data acquisition module is used for receiving a detection request of a risk group, acquiring target data of a plurality of different users to be processed based on the detection request, and clustering the target data of the different users to obtain one or more user groups formed by the target data, wherein the target data comprises one or more different data characteristics;

the data processing module is used for determining the occurrence probability corresponding to each data feature based on the target data of the different users, and respectively determining the number of the users contained in each user group and the number of each data feature based on the target data contained in each user group and each user group;

a probability determining module, configured to determine occurrence probability of distribution of each data feature included in each user group based on occurrence probability corresponding to each data feature, and number of users and number of each data feature included in each user group;

and the risk group determining module is used for determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data feature distribution contained in each user group, and outputting the determined relevant information of the user groups with the preset risks.

10. A device for detection of a risk group, the device comprising:

the data acquisition module is used for acquiring target data of a plurality of different users to be processed and clustering the target data of the different users to obtain one or more user groups consisting of the target data, wherein the target data comprises one or more different data characteristics;

the data statistics module is used for determining the occurrence probability corresponding to each data feature based on the target data of the different users, and respectively determining the number of the users contained in each user group and the number of each data feature based on the target data contained in each user group and each user group;

the probability determining module is used for determining the occurrence probability of each data feature distribution contained in each user group based on the occurrence probability corresponding to each data feature, the number of users contained in each user group and the number of each data feature;

and the group determining module is used for determining one or more user groups with preset risks in the user groups based on the occurrence probability of each data feature distribution contained in each user group.

11. A risk group detection apparatus, the risk group detection apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, are capable of causing the processor to:

12. A risk group detection apparatus, the risk group detection apparatus comprising:

a processor; and

13. A storage medium for storing computer executable instructions which, when executed by a processor, implement the following flow:

and determining one or more user groups with preset risks in each user group based on the occurrence probability of each data feature distribution contained in each user group, and outputting the determined relevant information of the user groups with the preset risks.

14. A storage medium for storing computer-executable instructions, which when executed by a processor implement the following: