CN109948641B - Abnormal group identification method and device - Google Patents

Abnormal group identification method and device Download PDF

Info

Publication number
CN109948641B
CN109948641B CN201910045152.6A CN201910045152A CN109948641B CN 109948641 B CN109948641 B CN 109948641B CN 201910045152 A CN201910045152 A CN 201910045152A CN 109948641 B CN109948641 B CN 109948641B
Authority
CN
China
Prior art keywords
analyzed
users
characteristic value
frequency
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910045152.6A
Other languages
Chinese (zh)
Other versions
CN109948641A (en
Inventor
苗加成
章鹏
杨程远
向彪
严欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910045152.6A priority Critical patent/CN109948641B/en
Publication of CN109948641A publication Critical patent/CN109948641A/en
Priority to TW108130766A priority patent/TWI718643B/en
Priority to PCT/CN2019/126030 priority patent/WO2020147488A1/en
Application granted granted Critical
Publication of CN109948641B publication Critical patent/CN109948641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides an abnormal group identification method and device. The method comprises the following steps: acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed; determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed; mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set; constructing a target bipartite graph according to the low-frequency maximum frequent characteristic value and the low-frequency characteristic value in the characteristic values of the users to be analyzed, and defining the weight of edges in the target bipartite graph; and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph. The method and the device improve the accuracy of abnormal group identification, and are simple in steps and easy to execute.

Description

Abnormal group identification method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying abnormal groups.
Background
At present, in various scenes (such as garbage registration, marketing cheating, card and account stealing, cheating and insurance) in the field of wind control, the trend of group partner crime is more and more obvious, the normal commercial order is seriously influenced, and huge loss is caused to merchants. Therefore, how to identify the group (i.e. abnormal group) has become one of the important issues for the business in the operation process.
In the common identification mode of the abnormal population, the identification accuracy of the abnormal population is low due to the loss of the label sample and the variability of the pattern making mode of the abnormal population.
Disclosure of Invention
One or more embodiments of the present disclosure provide a method and an apparatus for identifying an abnormal group, so as to solve the problem of low accuracy of identifying an abnormal group in the prior art.
To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:
in one aspect, one or more embodiments of the present specification provide an abnormal group identification method, including:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
Optionally, the obtaining the feature value of each user to be analyzed in the multiple users to be analyzed includes:
acquiring original personal data of the users to be analyzed;
discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
Optionally, the determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed includes:
constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
acquiring degrees of nodes corresponding to the characteristic values in the first second graph, and determining high-frequency characteristic values and low-frequency characteristic values in the characteristic values according to the degrees of the nodes corresponding to the characteristic values;
and determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
Optionally, the mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set includes:
mining a frequent multinomial set with a support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and determining a low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
Optionally, the determining the low-frequency maximum frequent feature value in the maximum frequent feature values of the user to be analyzed includes:
constructing a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, wherein the second bipartite graph comprises nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue of each user to be analyzed;
and acquiring the degree of the node corresponding to each maximum frequent eigenvalue in the second graph, and determining the low-frequency maximum frequent eigenvalue in the maximum frequent eigenvalues according to the degree of the node corresponding to each maximum frequent eigenvalue.
Optionally, the determining, according to the weight of the edge in the target bipartite graph and the clustering result of the multiple users to be analyzed, which is obtained by performing graph clustering on the target bipartite graph, includes:
deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm on the bipartite graph to be clustered, and determining a user to be analyzed corresponding to a node in each maximum connected subgraph as an abnormal group; or
Deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
Optionally, the determining, according to the weight of the edge in the target bipartite graph and the clustering result of the multiple users to be analyzed, which is obtained by performing graph clustering on the target bipartite graph, includes:
calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and determining abnormal groups in the users to be analyzed according to clustering results of the users to be analyzed, which are obtained by carrying out graph clustering on the target clustering graph.
Optionally, the determining, according to the clustering result of the multiple users to be analyzed obtained by performing graph clustering on the target cluster map, an abnormal group in the users to be analyzed includes:
deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
Deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to each node set as one abnormal group respectively.
In another aspect, one or more embodiments of the present specification provide an abnormal group identification apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the characteristic value of each user to be analyzed in a plurality of users to be analyzed;
the determining module is used for determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
the mining module is used for mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
the construction module is used for constructing a target bipartite graph according to the low-frequency maximum frequent characteristic value and the low-frequency characteristic value in the characteristic values of the users to be analyzed and defining the weight of the edge in the target bipartite graph;
and the clustering module is used for determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
Optionally, the obtaining module includes:
an acquisition unit configured to acquire original personal data of the plurality of users to be analyzed;
and the discretization unit is used for discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
Optionally, the determining module includes:
the first construction unit is used for constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
a first determining unit, configured to obtain degrees of nodes corresponding to the feature values in the first second graph, and determine a high-frequency feature value and a low-frequency feature value in the feature values according to the degrees of the nodes corresponding to the feature values;
and the second determining unit is used for determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
Optionally, the excavation module includes:
the mining unit is used for mining a frequent multinomial set with the support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
the matching unit is used for matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and the third determining unit is used for determining the low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
Optionally, the third determining unit includes:
a constructing subunit, configured to construct a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, where the second bipartite graph includes nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue thereof;
and the determining subunit is configured to obtain degrees of nodes corresponding to the maximum frequent feature values in the second graph, and determine the low-frequency maximum frequent feature value in the maximum frequent feature values according to the degrees of the nodes corresponding to the maximum frequent feature values.
Optionally, the clustering module includes:
the first clustering unit is used for deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the bipartite graph to be clustered, and determining users to be analyzed corresponding to nodes in each maximum connected subgraph as an abnormal group; or
And the second clustering unit is used for deleting edges with weights smaller than the first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
Optionally, the clustering module includes:
the calculating unit is used for calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
the second construction unit is used for converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and the third clustering unit is used for determining an abnormal group in the users to be analyzed according to the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target clustering graph.
Optionally, the third classification unit includes:
the first clustering subunit is used for deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
And the second clustering subunit is used for deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and respectively determining users to be analyzed corresponding to each node set as the abnormal group.
In yet another aspect, one or more embodiments of the present specification provide an abnormal group identification apparatus, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
In yet another aspect, one or more embodiments of the present specification provide a storage medium storing computer-executable instructions that, when executed, implement the following:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
By adopting the technical scheme of one or more embodiments of the specification, the high-frequency characteristic value and the low-frequency characteristic value in the characteristic value of each user to be analyzed are determined, the maximum frequent item set is mined by a frequent item set mining strategy preset for the high-frequency characteristic value of each user to be analyzed, the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained, a target bipartite graph is constructed according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, the weight of the edge in the target bipartite graph is set, and clustering is performed on the target bipartite graph according to the weight of the edge in the target bipartite graph so as to determine the abnormal group in the user to be analyzed. On one hand, a maximum frequent item set is mined through a frequent item set mining strategy preset for the high-frequency characteristic value of each user to be analyzed, and the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained to mine the behavior sequence of the user to be analyzed, so that the abnormal group is identified more accurately; on the other hand, the abnormal group is obtained only by acquiring the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, constructing the target bipartite graph according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, defining the weight of the edge in the target bipartite graph, and carrying out graph clustering on the target bipartite graph according to the weight of the edge in the target bipartite graph, and the steps are simple and easy to implement.
Drawings
In order to more clearly illustrate one or more embodiments or technical solutions in the prior art in the present specification, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive exercise.
Fig. 1 is a schematic flowchart of an abnormal group identification method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a process of determining a high-frequency feature value and a low-frequency feature value in feature values of users to be analyzed according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a first second diagram provided by an embodiment of the present application;
fig. 4 is a first schematic flow chart illustrating a process of obtaining a low-frequency maximum frequent eigenvalue according to an embodiment of the present application;
fig. 5 is a schematic flow chart illustrating a process of obtaining a low-frequency maximum frequent eigenvalue according to the embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating the determination of abnormal groups according to an embodiment of the present disclosure;
fig. 7 is a schematic composition diagram of an abnormal group identification apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an abnormal group identification device according to an embodiment of the present application.
Detailed Description
One or more embodiments of the present disclosure provide a method and an apparatus for identifying an abnormal group, so as to solve the problem of low accuracy of identifying an abnormal group in the prior art.
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments of the present disclosure without making any creative effort shall fall within the protection scope of one or more of the embodiments of the present disclosure.
Fig. 1 is a schematic flowchart of an abnormal group identification method provided in an embodiment of the present application, where an execution subject of the method may be, for example, a terminal device or a server, where the terminal device may be, for example, a personal computer, and the server may be, for example, an independent server or a server cluster composed of multiple servers, and this is not limited in this exemplary embodiment. As shown in fig. 1, the method may include the steps of:
step S102, obtaining the characteristic value of each user to be analyzed in a plurality of users to be analyzed.
In the embodiment of the present application, the original personal data of a plurality of users to be analyzed may be first obtained, and then, the original personal data of the plurality of users to be analyzed may be discretized to obtain the feature value of each user to be analyzed. Wherein obtaining raw personal data for a plurality of users to be analyzed comprises: the original personal data of each user to be analyzed can be obtained through an obtaining module, and the original personal data of each user to be analyzed is collected to obtain the original personal data of a plurality of users to be analyzed. The original personal data of each user to be analyzed may include personal basic data, behavior data, device data, and the like, which is not particularly limited in the present exemplary embodiment. The personal profile may include data of characteristics such as age, sex, occupation, income, school calendar, native place, contact address, account number, etc., and the present exemplary embodiment is not particularly limited thereto. For example, personal profile data may include: women (sex), 18 years (age), this discipline (academic calendar), lawyer (professional), Shaanxi (native). The behavior data may include data of a plurality of behavior characteristics, and specifically, the data of the behavior characteristics included in the behavior data may be set according to different application scenarios. For example, in a insurance scenario, behavioral data may include: 2018.10.03 insurance (insurance time), accident insurance (insurance type), 2019.2.1 insurance (insurance feature), etc. The device data may include, for example: the model of the device, the device attribution, the common address of the device, the frequency of replacing the device, and other features of the device, which are not particularly limited in this exemplary embodiment.
Discretizing the original personal data of the users to be analyzed to obtain the feature value of each user to be analyzed may include: the method comprises the steps of analyzing the distribution of data of each characteristic according to the data of each characteristic in the original personal data of a plurality of users to be analyzed, then performing box separation on the data of each characteristic according to the distribution of the data of each characteristic and by combining a box separation mode, determining a corresponding interval after the data of each characteristic are subjected to box separation as a characteristic value of the data of each corresponding characteristic, and determining the characteristic value of each user to be analyzed according to the characteristic value of the data of each characteristic and by combining the original personal data of each user to be analyzed.
The binning mode can be determined according to the property of the feature, and for the continuous feature (such as age, income, transaction amount, and the like), the binning mode with equal frequency, equal width and the like can be determined according to business experience and data distribution. For a class-type feature (e.g., gender, school calendar, occupation, etc.), the data for the class-type feature may be binned according to the particular class of the feature. For text-type features (e.g., addresses, etc.), binning may be performed in a manner that groups together text that is consistent in pattern.
It should be noted that, the user to be analyzed may be marked according to the unique identifier of the user to be analyzed, so as to distinguish the user to be analyzed. The unique identifier may be, for example: an identity card, a military officer card, an account id, etc., which are not particularly limited in this exemplary embodiment.
And step S104, determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed.
In the present exemplary embodiment, the high-frequency feature value and the low-frequency feature value among the feature values of the user to be analyzed may be determined in two ways, where:
the method I comprises the steps of counting the occurrence times of each characteristic value in the characteristic values of a plurality of users to be analyzed, and determining the height in the characteristic values according to the following determination ruleFrequency characteristic values and low frequency characteristic values, wherein the determination rule is as follows: if the times of the characteristic values appearing in the characteristic values of a plurality of users to be analyzed meet the formula T2i≥Xi>T1iThe eigenvalue is the low frequency eigenvalue, wherein XiT2 representing the number of times the ith feature value appears in the feature values of a plurality of users to be analyzediFor a second predetermined number of occurrences corresponding to the ith characteristic value, T1iFor the first predetermined number of occurrences corresponding to the ith characteristic value, T2i>T1iAnd T2iAnd T1iCan be determined according to the characteristic to which the ith characteristic value belongs, namely the characteristic is different, and the corresponding T2iAnd T1iThe specific numerical values of (A) are also different; if the times of the characteristic values appearing in the characteristic values of a plurality of users to be analyzed meet the formula T3i≥Xi>T2iThen the eigenvalue is a high frequency eigenvalue, where XiT2 representing the number of times the ith feature value appears in the feature values of a plurality of users to be analyzediFor a second predetermined number of occurrences corresponding to the ith characteristic value, T3iFor a third predetermined number of occurrences corresponding to the ith characteristic value, T3i>T2iAnd T2iAnd T3iCan be determined according to the characteristic to which the ith characteristic value belongs, namely the characteristic is different, and the corresponding T2iAnd T3iThe specific numerical values of (a) are also different.
After the high-frequency characteristic value and the low-frequency characteristic value are determined, the high-frequency characteristic value and the low-frequency characteristic value of each user to be analyzed can be obtained by respectively matching the high-frequency characteristic value and the low-frequency characteristic value with the characteristic value of each user to be analyzed. For example, the high-frequency characteristic values include: A. b, D, the low frequency eigenvalues include C, E, if the eigenvalues of the user to be analyzed include: A. b, C, E, the high frequency characteristic value of the user to be analyzed comprises A, B, and the low frequency characteristic value of the user to be analyzed comprises C, E; if the characteristic values of the user to be analyzed comprise: A. e, F, the high frequency characteristic value of the user to be analyzed includes A, and the low frequency characteristic value of the user to be analyzed includes E.
In a second mode, as shown in fig. 2, the method may include the following steps:
step S202, constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values.
In the embodiment of the application, each user to be analyzed is converted into a node, each user to be analyzed corresponds to only one node, and the characteristic value of each user to be analyzed is converted into a node, each characteristic value corresponds to only one node, that is, in the conversion process, if a node corresponding to one characteristic value already exists, the node is reused, and the node corresponding to the characteristic value does not need to be set, wherein the node corresponding to each user to be analyzed is located on one side of the first bipartite graph, the node corresponding to each characteristic value is located on the other side of the first bipartite graph, and an edge is added between the node corresponding to each user to be analyzed and the node corresponding to the characteristic value. For example, the number of the users to be analyzed is 5, which are the first user to be analyzed to the fifth user to be analyzed, respectively, where the feature values of the first user to be analyzed include: A. b, D, the characteristic values of the second user to be analyzed include: B. c, F, the feature values of the third user to be analyzed include: A. c, D, F, the feature values of the fourth user to be analyzed include: B. d, F, the feature values of the fifth user to be analyzed include: C. d, E, F, based on this, the first bipartite graph is constructed as shown in fig. 3, where the node 1 corresponding to the first user to be analyzed, the node 2 corresponding to the second user to be analyzed, the node 3 corresponding to the third user to be analyzed, the node 4 corresponding to the fourth user to be analyzed, and the node 5 corresponding to the fifth user to be analyzed are located on the left side of fig. 3, the node corresponding to the characteristic value a, the node corresponding to the characteristic value B, the node corresponding to the characteristic value C, the node corresponding to the characteristic value D, the node corresponding to the characteristic value E, and the node corresponding to the characteristic value F are located on the right side of fig. 3, and an edge is provided between the node corresponding to each user to be analyzed and the node corresponding to its characteristic value.
And step S204, acquiring the degrees of the nodes corresponding to the characteristic values in the first bipartite graph, and determining high-frequency characteristic values and low-frequency characteristic values in the characteristic values according to the degrees of the nodes corresponding to the characteristic values.
In the embodiment of the present application, the degree of the node corresponding to the feature value refers to the number of edges connected to the node corresponding to the feature value, and for example, in fig. 3, the degree of the node corresponding to the feature value a is 2, the degree of the node corresponding to the feature value B is 3, the degree of the node corresponding to the feature value C is 3, the degree of the node corresponding to the feature value D is 4, the degree of the node corresponding to the feature value E is 1, and the degree of the feature value F is 4.
The process of determining the high-frequency characteristic value and the low-frequency characteristic value in the characteristic values according to the degrees of the nodes corresponding to the characteristic values may include: determining the high-frequency characteristic value and the low-frequency characteristic value according to the characteristic values and combining the following determination rules, wherein the determination rules can be as follows: if the degree of the node corresponding to the characteristic value satisfies the formula K2i≥degree(Vi) If the characteristic value is more than 1, the characteristic value is a low-frequency characteristic value, wherein, degree (V)i) Is the ith characteristic value ViDegree of the corresponding node, K2iIs the ith characteristic value ViCorresponding first predetermined degree, K2i> 1, and K2iCan be based on the ith characteristic value ViThe characteristic is determined, namely the characteristic is different, and the corresponding K2iThe specific numerical values of (A) are also different; if the degree of the node corresponding to the characteristic value satisfies the formula K1i≥degree(Vi)>K2iThe eigenvalue is the high frequency eigenvalue, wherein degree (V)i) Is the ith characteristic value ViDegree of the corresponding node, K2iIs the ith characteristic value ViCorresponding first predetermined degree, K1iFor i-th characteristic value ViCorresponding second predetermined degree, K1i>K2iAnd K2iAnd K1iCan be based on the ith characteristic value ViThe characteristic is determined, namely the characteristic is different, and the corresponding K2iAnd K1iThe specific numerical values of (a) are also different.
For example, as shown in FIG. 3, if K2iIs 2, K1iAnd if the number is 3, the characteristic value A is a low-frequency characteristic value, and the characteristic values B and C are high-frequency characteristic values.
And S206, determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
In the embodiment of the application, the high-frequency characteristic values are respectively matched with the characteristic values of the users to be analyzed, and the characteristic values of the users to be analyzed, which are successfully matched with the high-frequency characteristic values, are determined as the high-frequency characteristic values of the corresponding users to be analyzed; and matching the low-frequency characteristic values with the characteristic values of the users to be analyzed respectively, and determining the characteristic values of the users to be analyzed, which are successfully matched with the low-frequency characteristic values, as the low-frequency characteristic values of the corresponding users to be analyzed. For example, as shown in FIG. 3, if K2iIs 2, K1iAnd if the number is 3, the characteristic value A is a low-frequency characteristic value, and the characteristic values B and C are high-frequency characteristic values. Based on this, the low-frequency feature value of the first user to be analyzed includes a feature value a, the high-frequency feature value of the first user to be analyzed includes a feature value B, the second user to be analyzed does not have a low-frequency feature value, and the high-frequency feature value of the second user to be analyzed includes: the feature value B and the feature value C, the low-frequency feature value of the third user to be analyzed comprises the feature value A, the high-frequency feature value of the third user to be analyzed comprises the feature value C, the fourth user to be analyzed does not have the low-frequency feature value, the high-frequency feature value of the fourth user to be analyzed comprises the feature value B, the fifth user to be analyzed does not have the low-frequency feature value, and the high-frequency feature value of the fifth user to be analyzed comprises the feature value C.
And S106, mining the maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring the low-frequency maximum frequent characteristic value in the maximum frequent item set.
In this embodiment of the present application, the preset frequent item set mining policy may be, for example, Apriori (frequent item set mining association rule) policy, and may also be FP-Growth, and the like, which is not particularly limited in this exemplary embodiment. The above process is described below by taking an example that the preset frequent item set mining strategy is FP-Growth, wherein as shown in fig. 4, the process may include the following steps:
step S402, according to the high-frequency characteristic value of each user to be analyzed and in combination with an FP-Growth method, mining a frequent multinomial set with the support degree meeting a preset support degree, and determining the maximum frequent multinomial set in the frequent multinomial set.
In this embodiment of the application, the support degree is the occurrence frequency of the high-frequency characteristic value in a plurality of users to be analyzed, and a specific numerical value of the preset support degree may be set by itself, for example, may be 1, may also be 2, and the like, which is not particularly limited in this exemplary embodiment. The frequent multinomial set refers to a set including at least two high frequency characteristic values. The frequent multinomial set with the support degree meeting the preset support degree means that the support degree of each high-frequency characteristic value in the frequent multinomial set is greater than the preset support degree.
The specific process of mining the frequent multinomial sets comprises the following steps: defining a preset support degree, scanning the high-frequency characteristic value of each user to be analyzed to obtain the occurrence frequency (namely the support degree) of each high-frequency characteristic value in a plurality of users to be analyzed, screening out the high-frequency characteristic values of each user to be analyzed, wherein the support degree is smaller than the preset support degree, constructing an FP tree according to the residual high-frequency characteristic values of each user to be analyzed, and mining a frequent multinomial set in the FP tree. And acquiring a frequent multinomial set without a superset condition in the frequent multinomial set, and determining the frequent multinomial set without the superset condition in the frequent multinomial set as a maximum frequent multinomial set. It should be noted that each of the maximum frequent item sets includes a plurality of high-frequency feature values, and here, the high-frequency feature values included in the maximum frequent item set are named as maximum frequent feature values, that is, each of the maximum frequent item sets includes a plurality of maximum frequent feature values.
And S404, matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed.
In the embodiment of the application, the characteristic value of each user to be analyzed is matched with the maximum frequent characteristic value in the maximum frequent item set, and the characteristic value successfully matched with the maximum frequent characteristic value in the maximum frequent item set in each user to be analyzed is determined as the maximum frequent characteristic value of each corresponding user to be analyzed.
And step S406, determining the low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
In the embodiment of the present application, the low-frequency maximum frequent eigenvalue can be determined in the following two ways, where:
the method comprises the following steps of firstly, counting the occurrence times of each maximum frequent characteristic value in a plurality of users to be analyzed according to the maximum frequent characteristic value of each user to be analyzed, and determining the low-frequency maximum frequent characteristic value in the maximum frequent characteristic value according to the occurrence times of each maximum frequent characteristic value in the plurality of users to be analyzed and by combining the following determination rules, wherein the determination rules are as follows: if the occurrence frequency of the maximum frequent eigenvalue in a plurality of users to be analyzed is in accordance with the formula P2i≥SiThen the maximum frequent eigenvalue is the low frequency maximum frequent eigenvalue, where P2iIs preset number of occurrences corresponding to ith maximum frequent eigenvalue, and P2iThe specific numerical value of (b) can be determined according to the characteristic to which the ith most frequent characteristic value belongs, namely the characteristic is different, and the corresponding P2iAlso different in specific values of SiThe number of occurrences of the ith most frequent feature value in a plurality of users to be analyzed.
In a second mode, as shown in fig. 5, the method may include the following steps:
step S502, constructing a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, wherein the second bipartite graph comprises nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue of each user to be analyzed.
In the embodiment of the application, each user to be analyzed is converted into a node, each user to be analyzed corresponds to only one node, the maximum frequent characteristic value of each user to be analyzed is converted into a node, each maximum frequent characteristic value corresponds to only one node, the node corresponding to each user to be analyzed is located on one side of the second bipartite graph, the node corresponding to each maximum frequent characteristic value is located on the other side of the second bipartite graph, and an edge is added between the node corresponding to each user to be analyzed and the node corresponding to the maximum frequent characteristic value thereof, so that the second bipartite graph is constructed.
Step S504, the degree of the node corresponding to each maximum frequent characteristic value is obtained in the second bipartite graph, and the low-frequency maximum frequent characteristic value is determined in the maximum frequent characteristic values according to the degree of the node corresponding to each maximum frequent characteristic value.
The process of determining the maximum frequent eigenvalue of the low frequency may include determining the maximum frequent eigenvalue of the low frequency according to the degrees of the nodes corresponding to the respective maximum frequent eigenvalues and in combination with a determination rule, wherein the determination rule may be that if the degrees of the nodes corresponding to the maximum frequent eigenvalues satisfy formula L2i≥degree(Vi) Then the maximum frequent eigenvalue is the low frequency maximum frequent eigenvalue, wherein degree (V)i) Degree of node corresponding to ith most frequent eigenvalue, L2iIth maximum frequent eigenvalue ViA corresponding predetermined degree, and L2iMay be based on the ith most frequent eigenvalue ViDetermination of the associated characteristics, i.e. different characteristics, corresponding to L2iThe specific numerical values of (a) are also different.
And S108, constructing a target bipartite graph according to the low-frequency maximum frequent characteristic value and the low-frequency characteristic value in the characteristic values of the users to be analyzed, and defining the weight of the edge in the target bipartite graph.
In the embodiment of the application, the maximum frequent characteristic value of the low frequency is matched with the characteristic value of each user to be analyzed, and the characteristic value of each user to be analyzed, which is successfully matched with the maximum frequent characteristic value of the low frequency, is determined as the maximum frequent characteristic value of the low frequency of each corresponding user to be analyzed. The process of constructing the target bipartite graph according to the low-frequency maximum frequent feature value of each user to be analyzed and the low-frequency feature value of each user to be analyzed obtained in step S104 may include: and respectively converting each user to be analyzed into nodes, converting each low-frequency characteristic value into nodes, converting each low-frequency maximum frequent characteristic value into nodes, adding edges between the node corresponding to each user to be analyzed and the node corresponding to the low-frequency characteristic value, and adding edges between the node corresponding to each user to be analyzed and the node corresponding to the low-frequency maximum frequent characteristic value, so as to complete the construction of the target bipartite graph.
Defining weights for edges in the target bipartite graph may include: defining the weight of an edge between a node corresponding to each user to be analyzed in the target bipartite graph and a node corresponding to the low-frequency characteristic value of the node, and defining the weight of an edge between a node corresponding to each user to be analyzed in the target bipartite graph and a node corresponding to the low-frequency maximum frequent characteristic value of the node. Defining the weight of an edge between a node corresponding to each user to be analyzed in the target bipartite graph and a node corresponding to the low-frequency characteristic value of the user to be analyzed may include: the weight of each low-frequency characteristic value is determined according to the characteristic to which each low-frequency characteristic value belongs, specifically, the higher the weight of the low-frequency characteristic value is, the higher the probability that the user to be analyzed including the low-frequency characteristic value is an abnormal group is, the lower the weight of the low-frequency characteristic value is, and the lower the probability that the user to be analyzed including the low-frequency characteristic value is an abnormal group is. After determining the weight of each low-frequency characteristic value, setting the weight of the edge connected with the node corresponding to each low-frequency characteristic value as the weight of each corresponding low-frequency characteristic value. For example, if the low-frequency feature values include frequent occurrence (feature values corresponding to occurrence features) and no business (feature values corresponding to occupation features), and the weight of the frequent occurrence is 0.5 and the weight of the no business is 0.1, the weights of the edges connected to the nodes corresponding to the frequent occurrence are all set to 0.5, and the weights of the edges connected to the nodes corresponding to the no business are all set to 0.1. Similarly, defining the weight of the edge between the node corresponding to each user to be analyzed in the target bipartite graph and the node corresponding to the low-frequency maximum frequent eigenvalue of the node may include: the weight of each low-frequency maximum frequent feature value is determined according to the feature to which each low-frequency maximum frequent feature value belongs, and specifically, the higher the weight of the low-frequency maximum frequent feature value is, the higher the probability that the user to be analyzed including the low-frequency maximum frequent feature value is an abnormal group is, the lower the weight of the low-frequency maximum frequent feature value is, and the lower the probability that the user to be analyzed including the low-frequency maximum frequent feature value is an abnormal group is. And setting the weight of the edge connected with the node corresponding to each low-frequency maximum frequent characteristic value as the weight of each corresponding low-frequency maximum frequent characteristic value.
Step S110, determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
In the embodiment of the present application, the abnormal group in the user to be analyzed may be determined in the following two ways, where:
in the first mode, edges with weights smaller than a first preset weight are deleted in the target bipartite graph to obtain a bipartite graph to be clustered, a connected algorithm is adopted for the bipartite graph to be clustered to obtain at least one maximum connected subgraph, and a user to be analyzed corresponding to a node in each maximum connected subgraph is determined as an abnormal group.
In the embodiment of the present application, the specific value of the first preset weight may be set by itself, and this is not particularly limited in this exemplary embodiment. And comparing the weight of each edge in the target bipartite graph with a first preset weight in sequence, deleting the edge in the target bipartite graph if the weight of the edge is less than the first preset weight, reserving the edge in the target bipartite graph if the weight of the edge is not less than the first preset weight, and determining the target bipartite graph with the screened weight less than the preset weight as the bipartite graph to be clustered. The method comprises the steps of adopting a communication algorithm to a bipartite graph to be clustered to obtain at least one maximum communication subgraph, screening out nodes corresponding to low-frequency characteristic values and nodes corresponding to low-frequency maximum frequent characteristic values in each maximum communication subgraph, collecting users to be analyzed corresponding to the remaining nodes in each maximum communication subgraph to obtain a user set to be analyzed corresponding to each maximum communication subgraph, and determining the user set to be analyzed corresponding to each maximum communication subgraph as an abnormal group.
And secondly, deleting edges with weights smaller than the first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as an abnormal group.
In the embodiment of the present application, since the principle of deleting the edge whose weight is smaller than the first preset weight in the bipartite graph to obtain the bipartite graph to be clustered is the same as that in the first embodiment, further description is omitted here. The community discovery algorithm may be, for example, a louvain algorithm, etc., and this exemplary embodiment is not particularly limited thereto. After the nodes in the bipartite graph to be clustered are divided through a community discovery algorithm to obtain a plurality of node sets, firstly, the nodes corresponding to the low-frequency characteristic values and the nodes corresponding to the low-frequency maximum frequent characteristic values are screened out from each node set, users to be analyzed corresponding to the remaining nodes in each node set are respectively gathered to obtain a user set to be analyzed corresponding to each node set, and the user set to be analyzed corresponding to each node set is respectively determined as an abnormal group.
Further, after the abnormal groups are obtained, in order to further verify the abnormal groups and further improve the accuracy of the abnormal group identification, the total number of the users to be analyzed in each abnormal group can be obtained, the abnormal groups with the total number of the users to be analyzed being less than the preset number are screened out from the abnormal groups, and the remaining abnormal groups are determined as the finally identified abnormal groups; the modularity of the maximum connected subgraph corresponding to each abnormal group can be calculated, the modularity of the maximum connected subgraph corresponding to each abnormal group is determined as the modularity of the corresponding abnormal group, the abnormal groups with the modularity smaller than the preset modularity are screened out from the abnormal groups, and the rest abnormal groups are determined as the finally identified abnormal groups. It should be noted that the above two verification methods are only exemplary and are not intended to limit the present invention, and the abnormal group may also be verified by analyzing the service characteristics of each user to be analyzed in the abnormal group.
In order to more accurately cluster the users to be analyzed to obtain a more accurate abnormal group, as shown in fig. 6, determining the abnormal group in the users to be analyzed according to the weights of the edges in the target bipartite graph and the clustering results of the multiple users to be analyzed obtained by graph clustering on the target bipartite graph may include the following steps:
step S602, calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph.
In the embodiment of the application, a node corresponding to a low-frequency characteristic value and a node corresponding to a maximum frequency characteristic value, which are commonly connected with nodes corresponding to any two users to be analyzed, are obtained in a target bipartite graph, and the node corresponding to the low-frequency characteristic value and the node corresponding to the maximum frequency characteristic value, which are commonly connected with the nodes corresponding to any two users to be analyzed, are determined as target nodes; calculating the weight between any two users to be analyzed according to the weight of the edge between the node corresponding to any one user to be analyzed and each target node and by combining the following formula:
Figure BDA0001948905550000171
where weight (e) is the weight between any two users to be analyzed, j is the total number of target nodes, and w (item)i) Is the ith target node itemiAnd the weight of the edge between the nodes corresponding to any one of the two users to be analyzed.
Step S604, converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map.
In the embodiment of the application, each user to be analyzed is converted into a node, that is, one user to be analyzed corresponds to only one node, an edge is set between any two nodes, and the weight between any two users to be analyzed is set as the weight of the edge between the two nodes corresponding to the any two users to be analyzed, so that the construction of the target cluster map is completed. As can be seen from the above, the target bipartite graph including the node corresponding to the user to be analyzed, the node corresponding to the low-frequency eigenvalue, and the node corresponding to the maximum low-frequency eigenvalue is converted into the target cluster graph including only the node corresponding to the user to be analyzed through steps S602 and S604.
And step S606, determining abnormal groups in the users to be analyzed according to clustering results of the users to be analyzed, which are obtained by carrying out graph clustering on the target clustering graph.
In the embodiment of the present application, the abnormal population may be determined in two ways, wherein:
in the first mode, edges with weights smaller than a second preset weight are deleted in a target clustering graph to obtain a graph to be clustered, a connected algorithm is adopted for the graph to be clustered to obtain at least one maximum connected subgraph, and users to be analyzed corresponding to nodes in each maximum connected subgraph are respectively determined as an abnormal group.
In the embodiment of the present application, the specific value of the second preset weight may be set by itself, and this is not particularly limited in this exemplary embodiment. And respectively comparing the weight of each edge in the target cluster map with a second preset weight, and deleting the edge with the weight smaller than the second preset weight in the target cluster map so as to convert the target cluster map into a to-be-clustered map. And collecting the users to be analyzed corresponding to the nodes in each maximum connected subgraph to obtain a user set to be analyzed corresponding to each maximum connected subgraph, and determining the user set to be analyzed corresponding to each maximum connected subgraph as an abnormal group respectively.
And secondly, deleting edges with the weight smaller than a second preset weight in the target clustering graph to obtain a to-be-clustered graph, dividing the to-be-clustered graph through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to each node set as an abnormal group respectively.
In the embodiments of the application, the second preset weight has already been described above, and therefore, the description thereof is omitted here. And respectively comparing the weight of each edge in the target cluster map with a second preset weight, and deleting the edge with the weight smaller than the second preset weight in the target cluster map so as to convert the target cluster map into a to-be-clustered map. The community discovery algorithm may be, for example, a louvain algorithm, etc., and this exemplary embodiment is not particularly limited thereto. After the nodes in the to-be-clustered graph are divided through a community discovery algorithm to obtain a plurality of node sets, users to be analyzed corresponding to the nodes in each node set are respectively gathered to obtain a user set to be analyzed corresponding to each node set, and the user set to be analyzed corresponding to each node set is respectively determined as an abnormal group.
According to the method, the weight between any two users to be analyzed is calculated according to the weight of the edge in the target bipartite graph, and the target cluster graph is constructed according to the weight between any two users to be analyzed, so that the target bipartite graph is converted into the target cluster graph, the target cluster graph can reflect the relation between the users to be analyzed more accurately and more intuitively, and the abnormal group obtained according to the target cluster graph is more accurate.
The above two ways of determining the abnormal group are exemplary and are not intended to limit the present invention.
Further, after the abnormal groups are obtained, in order to further verify the abnormal groups and further improve the accuracy of the abnormal group identification, the total number of the users to be analyzed in each abnormal group can be obtained, the abnormal groups with the total number of the users to be analyzed being less than the preset number are screened out from the abnormal groups, and the remaining abnormal groups are determined as the finally identified abnormal groups; the modularity of the maximum connected subgraph corresponding to each abnormal group can be calculated, the modularity of the maximum connected subgraph corresponding to each abnormal group is determined as the modularity of the corresponding abnormal group, the abnormal groups with the modularity smaller than the preset modularity are screened out from the abnormal groups, and the rest abnormal groups are determined as the finally identified abnormal groups. It should be noted that the above two verification methods are only exemplary and are not intended to limit the present invention, and the abnormal group may also be verified by analyzing the service characteristics of each user to be analyzed in the abnormal group.
In conclusion, the maximum frequent item set is mined by a frequent item set mining strategy preset for the high-frequency characteristic value of each user to be analyzed, and the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained to mine the behavior sequence of the user to be analyzed, so that the abnormal group is identified more accurately; in addition, the abnormal group is obtained only by acquiring the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, constructing the target bipartite graph according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, defining the weight of the edge in the target bipartite graph, and carrying out graph clustering on the target bipartite graph according to the weight of the edge in the target bipartite graph, so that the steps are simple and easy to implement.
Corresponding to the above abnormal group identification method, based on the same technical concept, an abnormal group identification apparatus is further provided in the embodiment of the present application, and fig. 7 is a schematic composition diagram of the abnormal group identification apparatus provided in the embodiment of the present application, where the apparatus is configured to execute the above abnormal group identification method, and as shown in fig. 7, the apparatus 700 may include: an obtaining module 701, a determining module 702, a mining module 703, a constructing module 704, and a clustering module 705, wherein:
an obtaining module 701, configured to obtain a feature value of each user to be analyzed in a plurality of users to be analyzed;
a determining module 702, configured to determine a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
the mining module 703 is configured to mine a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquire a low-frequency maximum frequent characteristic value in the maximum frequent item set;
a constructing module 704, configured to construct a target bipartite graph according to the low-frequency most frequent feature value and the low-frequency feature value in the feature values of the users to be analyzed, and define a weight of an edge in the target bipartite graph;
the clustering module 705 is configured to determine an abnormal group of the users to be analyzed according to the weights of the edges in the target bipartite graph and the clustering results of the multiple users to be analyzed, which are obtained by performing graph clustering on the target bipartite graph.
Optionally, the obtaining module 701 may include:
an acquisition unit configured to acquire original personal data of the plurality of users to be analyzed;
and the discretization unit is used for discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
Optionally, the determining module 702 may include:
the first construction unit is used for constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
a first determining unit, configured to obtain degrees of nodes corresponding to the feature values in the first second graph, and determine a high-frequency feature value and a low-frequency feature value in the feature values according to the degrees of the nodes corresponding to the feature values;
and the second determining unit is used for determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
Optionally, the mining module 703 may include:
the mining unit is used for mining a frequent multinomial set with the support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
the matching unit is used for matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and the third determining unit is used for determining the low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
Optionally, the third determining unit may include:
a constructing subunit, configured to construct a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, where the second bipartite graph includes nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue thereof;
and the determining subunit is configured to obtain degrees of nodes corresponding to the maximum frequent feature values in the second graph, and determine the low-frequency maximum frequent feature value in the maximum frequent feature values according to the degrees of the nodes corresponding to the maximum frequent feature values.
Optionally, the clustering module 705 may include:
the first clustering unit is used for deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the bipartite graph to be clustered, and determining users to be analyzed corresponding to nodes in each maximum connected subgraph as an abnormal group; or
And the second clustering unit is used for deleting edges with weights smaller than the first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
Optionally, the clustering module 705 may include:
the calculating unit is used for calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
the second construction unit is used for converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and the third clustering unit is used for determining an abnormal group in the users to be analyzed according to the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target clustering graph.
Optionally, the third classification unit may include:
the first clustering subunit is used for deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
And the second clustering subunit is used for deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and respectively determining users to be analyzed corresponding to each node set as the abnormal group.
According to the abnormal group identification device in the embodiment of the application, the maximum frequent item set is mined by a frequent item set mining strategy preset for the high-frequency characteristic value of each user to be analyzed, and the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained, so that the behavior sequence of the user to be analyzed is mined, and the identification of the abnormal group is more accurate; in addition, the abnormal group is obtained only by acquiring the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, constructing the target bipartite graph according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, defining the weight of the edge in the target bipartite graph, and carrying out graph clustering on the target bipartite graph according to the weight of the edge in the target bipartite graph, so that the steps are simple and easy to implement.
Based on the same technical concept, the embodiment of the present application further provides an abnormal group identification device, and fig. 8 is a schematic structural diagram of the abnormal group identification device provided in the embodiment of the present application, and the device is used for executing the abnormal group identification method.
As shown in fig. 8, the abnormal group identification device may have a relatively large difference due to different configurations or performances, and may include one or more processors 801 and a memory 802, where one or more stored applications or data may be stored in the memory 802. Wherein the memory 802 may be a transient storage or a persistent storage. The application program stored in memory 802 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in the device for identifying abnormal groups. Still further, the processor 801 may be configured to communicate with the memory 802 to execute a series of computer-executable instructions in the memory 802 on the anomalous population identification device. The abnormal group identification apparatus may also include one or more power supplies 803, one or more wired or wireless network interfaces 804, one or more input-output interfaces 805, one or more keyboards 806, and the like.
In one particular embodiment, the anomalous group identification device comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instructions for the anomalous group identification device, and execution of the one or more programs by one or more processors includes computer-executable instructions for:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
Optionally, when executed by the computer-executable instructions, the obtaining the feature value of each user to be analyzed in the plurality of users to be analyzed includes:
acquiring original personal data of the users to be analyzed;
discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
Optionally, when executed by the computer-executable instructions, the determining a high-frequency feature value and a low-frequency feature value in the feature values of the users to be analyzed includes:
constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
acquiring degrees of nodes corresponding to the characteristic values in the first second graph, and determining high-frequency characteristic values and low-frequency characteristic values in the characteristic values according to the degrees of the nodes corresponding to the characteristic values;
and determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
Optionally, when executed, the mining a maximum frequent item set according to the high-frequency feature value of each user to be analyzed and a preset frequent item set mining strategy, where acquiring a low-frequency maximum frequent feature value in the maximum frequent item set includes:
mining a frequent multinomial set with a support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and determining a low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
Optionally, when executed, the determining, by the computer-executable instructions, a low-frequency maximum frequent feature value in the maximum frequent feature values of the user to be analyzed includes:
constructing a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, wherein the second bipartite graph comprises nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue of each user to be analyzed;
and acquiring the degree of the node corresponding to each maximum frequent eigenvalue in the second graph, and determining the low-frequency maximum frequent eigenvalue in the maximum frequent eigenvalues according to the degree of the node corresponding to each maximum frequent eigenvalue.
Optionally, when executed, the determining, according to the weights of the edges in the target bipartite graph and the clustering results of the multiple users to be analyzed, which are obtained by graph clustering on the target bipartite graph, the abnormal group in the users to be analyzed includes:
deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm on the bipartite graph to be clustered, and determining a user to be analyzed corresponding to a node in each maximum connected subgraph as an abnormal group; or
Deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
Optionally, when executed, the determining, according to the weight of the edge in the target bipartite graph and the clustering result of the multiple users to be analyzed, the clustering result obtained by graph clustering on the target bipartite graph, an abnormal group in the users to be analyzed includes:
calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and determining abnormal groups in the users to be analyzed according to clustering results of the users to be analyzed, which are obtained by carrying out graph clustering on the target clustering graph.
Optionally, when executed, the determining, by the computer-executable instructions, the clustering result of the multiple users to be analyzed, which is obtained by graph clustering on the target cluster map, includes:
deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
Deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to each node set as one abnormal group respectively.
According to the abnormal group identification device in the embodiment of the application, the maximum frequent item set is mined by a frequent item set mining strategy preset for the high-frequency characteristic value of each user to be analyzed, and the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained, so that the behavior sequence of the user to be analyzed is mined, and the identification of the abnormal group is more accurate; in addition, the abnormal group is obtained only by acquiring the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, constructing the target bipartite graph according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, defining the weight of the edge in the target bipartite graph, and carrying out graph clustering on the target bipartite graph according to the weight of the edge in the target bipartite graph, so that the steps are simple and easy to implement.
Corresponding to the above abnormal group identification method, based on the same technical concept, an embodiment of the present application further provides a storage medium for storing computer executable instructions, where in a specific embodiment, the storage medium may be a usb disk, an optical disk, a hard disk, and the like, and when the computer executable instructions stored in the storage medium are executed by a processor, the following processes may be implemented:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
Optionally, when executed by a processor, the obtaining a feature value of each user to be analyzed in a plurality of users to be analyzed includes:
acquiring original personal data of the users to be analyzed;
discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
Optionally, when executed by a processor, the determining a high-frequency feature value and a low-frequency feature value in the feature values of the users to be analyzed includes:
constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
acquiring degrees of nodes corresponding to the characteristic values in the first second graph, and determining high-frequency characteristic values and low-frequency characteristic values in the characteristic values according to the degrees of the nodes corresponding to the characteristic values;
and determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
Optionally, when executed by a processor, the mining a maximum frequent item set according to the high-frequency feature value of each user to be analyzed and a preset frequent item set mining strategy, where acquiring a low-frequency maximum frequent feature value in the maximum frequent item set includes:
mining a frequent multinomial set with a support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and determining a low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
Optionally, the computer-executable instructions stored in the storage medium, when executed by the processor, determine a low-frequency maximum frequent feature value from among the maximum frequent feature values of the user to be analyzed, includes:
constructing a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, wherein the second bipartite graph comprises nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue of each user to be analyzed;
and acquiring the degree of the node corresponding to each maximum frequent eigenvalue in the second graph, and determining the low-frequency maximum frequent eigenvalue in the maximum frequent eigenvalues according to the degree of the node corresponding to each maximum frequent eigenvalue.
Optionally, when executed by a processor, the determining, according to the weights of the edges in the target bipartite graph and the clustering results of the multiple users to be analyzed, which are obtained by graph clustering on the target bipartite graph, an abnormal group in the users to be analyzed includes:
deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm on the bipartite graph to be clustered, and determining a user to be analyzed corresponding to a node in each maximum connected subgraph as an abnormal group; or
Deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
Optionally, when executed by a processor, the determining, according to the weights of the edges in the target bipartite graph and the clustering results of the multiple users to be analyzed, which are obtained by graph clustering on the target bipartite graph, an abnormal group in the users to be analyzed includes:
calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and determining abnormal groups in the users to be analyzed according to clustering results of the users to be analyzed, which are obtained by carrying out graph clustering on the target clustering graph.
Optionally, when executed by a processor, the determining, by the clustering result of the multiple users to be analyzed obtained by graph clustering on the target cluster map, an abnormal group in the users to be analyzed includes:
deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
Deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to each node set as one abnormal group respectively.
When the computer executable instructions stored in the storage medium in the embodiment of the application are executed by the processor, the maximum frequent item set is mined by performing a preset frequent item set mining strategy on the high-frequency characteristic value of each user to be analyzed, and the low-frequency maximum frequent characteristic value in the maximum frequent item set is obtained, so that the behavior sequence of the user to be analyzed is mined, and the identification of abnormal groups is more accurate; in addition, the abnormal group is obtained only by acquiring the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, constructing the target bipartite graph according to the low-frequency characteristic value and the low-frequency maximum frequent characteristic value of each user to be analyzed, defining the weight of the edge in the target bipartite graph, and carrying out graph clustering on the target bipartite graph according to the weight of the edge in the target bipartite graph, so that the steps are simple and easy to implement.
In the 90 th generation of 20 th century, it is obvious that improvements in Hardware (for example, improvements in Circuit structures such as diodes, transistors and switches) or software (for improvement in method flow) can be distinguished for a technical improvement, however, as technology develops, many of the improvements in method flow today can be regarded as direct improvements in Hardware Circuit structures, designers almost all obtain corresponding Hardware Circuit structures by Programming the improved method flow into Hardware circuits, and therefore, it cannot be said that an improvement in method flow cannot be realized by Hardware entity modules, for example, Programmable logic devices (Programmable logic devices L organic devices, P L D) (for example, Field Programmable Gate Arrays (FPGAs) are integrated circuits whose logic functions are determined by user Programming of devices), and a digital system is "integrated" on a P L D "by self Programming of designers without requiring many kinds of integrated circuits manufactured and manufactured by special chip manufacturers to design and manufacture, and only a Hardware program is written by Hardware logic editor software (software) such as Hardware editor software, software editor, software, Hardware editor, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software.
A controller may be implemented in any suitable manner, e.g., in the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers (PLC's) and embedded microcontrollers, examples of which include, but are not limited to, microcontrollers 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone L abs C8051F320, which may also be implemented as part of the control logic of a memory.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (11)

1. An abnormal group identification method, comprising:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
2. The abnormal group identification method according to claim 1, wherein the obtaining of the feature value of each of the users to be analyzed comprises:
acquiring original personal data of the users to be analyzed;
discretizing the original personal data of the users to be analyzed to obtain the characteristic value of each user to be analyzed.
3. The abnormal group identification method according to claim 1, wherein the determining a high-frequency feature value and a low-frequency feature value of the feature values of the users to be analyzed comprises:
constructing a first bipartite graph according to the characteristic values of the users to be analyzed, wherein the first bipartite graph comprises nodes corresponding to the users to be analyzed, nodes corresponding to the characteristic values, and edges between the nodes corresponding to the users to be analyzed and the nodes corresponding to the characteristic values;
acquiring degrees of nodes corresponding to the characteristic values in the first second graph, and determining high-frequency characteristic values and low-frequency characteristic values in the characteristic values according to the degrees of the nodes corresponding to the characteristic values;
and determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed according to the high-frequency characteristic value and the low-frequency characteristic value.
4. The abnormal group identification method according to claim 1, wherein the mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and the obtaining the low-frequency maximum frequent characteristic value in the maximum frequent item set comprises:
mining a frequent multinomial set with a support degree meeting a preset support degree according to the high-frequency characteristic value of each user to be analyzed and by combining an FP-Growth method, and determining a maximum frequent itemset in the frequent multinomial set;
matching the characteristic value of each user to be analyzed with the maximum frequent characteristic value in the maximum frequent item set to obtain the maximum frequent characteristic value of each user to be analyzed;
and determining a low-frequency maximum frequent characteristic value in the maximum frequent characteristic values of the users to be analyzed.
5. The abnormal group identification method according to claim 4, wherein the determining of the low-frequency most frequent feature value among the most frequent feature values of the user to be analyzed comprises:
constructing a second bipartite graph according to the maximum frequent eigenvalue of each user to be analyzed, wherein the second bipartite graph comprises nodes corresponding to each user to be analyzed, nodes corresponding to each maximum frequent eigenvalue, and edges between the nodes corresponding to each user to be analyzed and the nodes corresponding to the maximum frequent eigenvalue of each user to be analyzed;
and acquiring the degree of the node corresponding to each maximum frequent eigenvalue in the second graph, and determining the low-frequency maximum frequent eigenvalue in the maximum frequent eigenvalues according to the degree of the node corresponding to each maximum frequent eigenvalue.
6. The abnormal group identification method according to claim 1, wherein the determining the abnormal group of the users to be analyzed according to the weights of the edges in the target bipartite graph and the clustering results of the users to be analyzed, which are obtained by graph clustering on the target bipartite graph, comprises:
deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm on the bipartite graph to be clustered, and determining a user to be analyzed corresponding to a node in each maximum connected subgraph as an abnormal group; or
Deleting edges with weights smaller than a first preset weight in the target bipartite graph to obtain a bipartite graph to be clustered, dividing nodes in the bipartite graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to the nodes in each node set as one abnormal group.
7. The abnormal group identification method according to claim 1, wherein the determining the abnormal group of the users to be analyzed according to the weights of the edges in the target bipartite graph and the clustering results of the users to be analyzed, which are obtained by graph clustering on the target bipartite graph, comprises:
calculating the weight between any two users to be analyzed according to the weight of the edge in the target bipartite graph;
converting each user to be analyzed into a node, setting an edge between any two nodes, and setting the weight of the edge of any two nodes as the weight between any two corresponding users to be analyzed so as to construct a target cluster map;
and determining abnormal groups in the users to be analyzed according to clustering results of the users to be analyzed, which are obtained by carrying out graph clustering on the target clustering graph.
8. The abnormal group identification method according to claim 7, wherein the determining the abnormal group of the users to be analyzed according to the clustering result of the users to be analyzed, which is obtained by performing graph clustering on the target cluster map, comprises:
deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, obtaining at least one maximum connected subgraph by adopting a connected algorithm for the graph to be clustered, and respectively determining users to be analyzed corresponding to nodes in each maximum connected subgraph as one abnormal group; or
Deleting edges with weights smaller than a second preset weight in the target clustering graph to obtain a graph to be clustered, dividing the graph to be clustered through a community discovery algorithm to obtain a plurality of node sets, and determining users to be analyzed corresponding to each node set as one abnormal group respectively.
9. An abnormal group identification apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the characteristic value of each user to be analyzed in a plurality of users to be analyzed;
the determining module is used for determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
the mining module is used for mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
the construction module is used for constructing a target bipartite graph according to the low-frequency maximum frequent characteristic value and the low-frequency characteristic value in the characteristic values of the users to be analyzed and defining the weight of the edge in the target bipartite graph;
and the clustering module is used for determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
10. An abnormal group identification apparatus, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
11. A storage medium storing computer-executable instructions, wherein the computer-executable instructions, when executed, perform the steps of:
acquiring a characteristic value of each user to be analyzed in a plurality of users to be analyzed;
determining a high-frequency characteristic value and a low-frequency characteristic value in the characteristic values of the users to be analyzed;
mining a maximum frequent item set according to the high-frequency characteristic value of each user to be analyzed and a preset frequent item set mining strategy, and acquiring a low-frequency maximum frequent characteristic value in the maximum frequent item set;
constructing a target bipartite graph according to the low-frequency maximum frequent eigenvalue and the low-frequency eigenvalue in the eigenvalue of each user to be analyzed, and defining the weight of an edge in the target bipartite graph;
and determining an abnormal group in the users to be analyzed according to the weight of the edges in the target bipartite graph and the clustering result of the users to be analyzed, which is obtained by carrying out graph clustering on the target bipartite graph.
CN201910045152.6A 2019-01-17 2019-01-17 Abnormal group identification method and device Active CN109948641B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910045152.6A CN109948641B (en) 2019-01-17 2019-01-17 Abnormal group identification method and device
TW108130766A TWI718643B (en) 2019-01-17 2019-08-28 Method and device for identifying abnormal groups
PCT/CN2019/126030 WO2020147488A1 (en) 2019-01-17 2019-12-17 Method and device for identifying irregular group

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910045152.6A CN109948641B (en) 2019-01-17 2019-01-17 Abnormal group identification method and device

Publications (2)

Publication Number Publication Date
CN109948641A CN109948641A (en) 2019-06-28
CN109948641B true CN109948641B (en) 2020-08-04

Family

ID=67006647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910045152.6A Active CN109948641B (en) 2019-01-17 2019-01-17 Abnormal group identification method and device

Country Status (3)

Country Link
CN (1) CN109948641B (en)
TW (1) TWI718643B (en)
WO (1) WO2020147488A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948641B (en) * 2019-01-17 2020-08-04 阿里巴巴集团控股有限公司 Abnormal group identification method and device
CN110602101B (en) * 2019-09-16 2021-01-01 北京三快在线科技有限公司 Method, device, equipment and storage medium for determining network abnormal group
CN110609783B (en) * 2019-09-24 2023-08-04 京东科技控股股份有限公司 Method and device for identifying abnormal behavior user
CN110880040A (en) * 2019-11-08 2020-03-13 支付宝(杭州)信息技术有限公司 Method and system for automatically generating cumulative features
CN111160917A (en) * 2019-12-18 2020-05-15 北京三快在线科技有限公司 Object state detection method and device, electronic equipment and readable storage medium
CN111371767B (en) * 2020-02-20 2022-05-13 深圳市腾讯计算机系统有限公司 Malicious account identification method, malicious account identification device, medium and electronic device
CN111770047B (en) * 2020-05-07 2022-09-23 拉扎斯网络科技(上海)有限公司 Abnormal group detection method, device and equipment
CN111931048B (en) * 2020-07-31 2022-07-08 平安科技(深圳)有限公司 Artificial intelligence-based black product account detection method and related device
CN112529639A (en) * 2020-12-23 2021-03-19 中国银联股份有限公司 Abnormal account identification method, device, equipment and medium
CN112581062A (en) * 2020-12-25 2021-03-30 同方威视科技江苏有限公司 Express mail receiving and dispatching organization discovery method based on relationship mining and related equipment
CN112968870A (en) * 2021-01-29 2021-06-15 国家计算机网络与信息安全管理中心 Network group discovery method based on frequent itemset
CN113761080A (en) * 2021-04-01 2021-12-07 京东城市(北京)数字科技有限公司 Community division method, device, equipment and storage medium
CN114117418B (en) * 2021-11-03 2023-03-14 中国电信股份有限公司 Method, system, device and storage medium for detecting abnormal account based on community
CN114662110B (en) * 2022-05-18 2022-09-02 杭州海康威视数字技术股份有限公司 Website detection method and device and electronic equipment
CN116244650B (en) * 2023-05-12 2023-10-03 北京富算科技有限公司 Feature binning method, device, electronic equipment and computer readable storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8719190B2 (en) * 2007-07-13 2014-05-06 International Business Machines Corporation Detecting anomalous process behavior
US8625904B2 (en) * 2011-08-30 2014-01-07 Intellectual Ventures Fund 83 Llc Detecting recurring themes in consumer image collections
CN103812872B (en) * 2014-02-28 2016-11-23 中国科学院信息工程研究所 A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN103927398B (en) * 2014-05-07 2016-12-28 中国人民解放军信息工程大学 The microblogging excavated based on maximum frequent itemsets propagandizes colony's discovery method
TW201612790A (en) * 2014-09-29 2016-04-01 Chunghwa Telecom Co Ltd Method of increasing effectiveness of information security risk assessment and risk recognition
CN104573116B (en) * 2015-02-05 2017-11-03 哈尔滨工业大学 The traffic abnormity recognition methods excavated based on GPS data from taxi
CN105681312B (en) * 2016-01-28 2019-03-05 李青山 A kind of mobile Internet abnormal user detection method based on frequent item set mining
CN105959372B (en) * 2016-05-06 2019-05-14 华南理工大学 A kind of Internet user's data analysis method based on mobile application
CN107870934B (en) * 2016-09-27 2021-07-20 武汉安天信息技术有限责任公司 App user clustering method and device
CN107391548B (en) * 2017-04-06 2020-08-04 华东师范大学 Mobile application market examination user group detection method and system
CN107332931A (en) * 2017-08-07 2017-11-07 合肥工业大学 The recognition methods of waterborne troops of machine type forum and device
CN109948641B (en) * 2019-01-17 2020-08-04 阿里巴巴集团控股有限公司 Abnormal group identification method and device

Also Published As

Publication number Publication date
CN109948641A (en) 2019-06-28
TWI718643B (en) 2021-02-11
TW202029079A (en) 2020-08-01
WO2020147488A1 (en) 2020-07-23

Similar Documents

Publication Publication Date Title
CN109948641B (en) Abnormal group identification method and device
KR102178295B1 (en) Decision model construction method and device, computer device and storage medium
CN108665143B (en) Wind control model evaluation method and device
CN111698247B (en) Abnormal account detection method, device, equipment and storage medium
CN109597856A (en) A kind of data processing method, device, electronic equipment and storage medium
CN108268617B (en) User intention determining method and device
CN110020427B (en) Policy determination method and device
CN111738628A (en) Risk group identification method and device
CN108596410B (en) Automatic wind control event processing method and device
CN109508879B (en) Risk identification method, device and equipment
CN111538794B (en) Data fusion method, device and equipment
CN111080304A (en) Credible relationship identification method, device and equipment
CN110688974A (en) Identity recognition method and device
CN110674188A (en) Feature extraction method, device and equipment
CN111159428A (en) Method and device for automatically extracting event relation of knowledge graph in economic field
CN110634030A (en) Application service index mining method, device and equipment
CN115712866A (en) Data processing method, device and equipment
WO2017198087A1 (en) Feature-set augmentation using knowledge engine
CN108229564B (en) Data processing method, device and equipment
CN113095680A (en) Evaluation index system and construction method of electric power big data model
CN107392220B (en) Data stream clustering method and device
CN115456801A (en) Artificial intelligence big data wind control system, method and storage medium for personal credit
CN110895703A (en) Legal document routing identification method and device
CN111259975B (en) Method and device for generating classifier and method and device for classifying text
CN115048412A (en) Docking point dividing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201014

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201014

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman, British Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right