CN109948641A

CN109948641A - Anomaly groups recognition methods and device

Info

Publication number: CN109948641A
Application number: CN201910045152.6A
Authority: CN
Inventors: 苗加成; 章鹏; 杨程远; 向彪; 严欢
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2019-01-17
Filing date: 2019-01-17
Publication date: 2019-06-28
Anticipated expiration: 2039-01-17
Also published as: TWI718643B; WO2020147488A1; TW202029079A; CN109948641B

Abstract

The embodiment of the present application provides a kind of Anomaly groups recognition methods and device.Wherein method includes: to obtain the characteristic value of each user to be analyzed in multiple users to be analyzed；Determine the high-frequency characteristic value and characteristics of low-frequency value in the characteristic value of each user to be analyzed；According to the high-frequency characteristic value of each user to be analyzed and preset frequent item set mining strategy Mining Maximum Frequent Itemsets, the low frequency Maximum Frequent characteristic value that maximum frequent set is concentrated is obtained；According to the low frequency Maximum Frequent characteristic value and characteristics of low-frequency value building target bigraph (bipartite graph) in the characteristic value of each user to be analyzed, and define the weight on the side in target bigraph (bipartite graph)；According to the weight on the side in target bigraph (bipartite graph), and by carrying out the cluster result that figure clusters obtained multiple users to be analyzed to target bigraph (bipartite graph), determine the Anomaly groups in user to be analyzed.The embodiment of the present application improves the accuracy rate of Anomaly groups identification, and step is simple, easy to carry out.

Description

Anomaly groups recognition methods and device

Technical field

This specification is related to field of computer technology more particularly to a kind of Anomaly groups recognition methods and device.

Background technique

Currently, in air control field various scenes (such as rubbish registration, marketing cheating, steal card steal account, insurance fraud) in, The trend of clique's crime is more and more obvious, serious to affect normal business order, causes huge loss to businessman.Cause How this, identify that clique's (i.e. Anomaly groups) has become major issue one of of businessman during operation.

In the identification method of common Anomaly groups, due to exemplar missing and Anomaly groups way of committing offenses it is more Denaturation, causes Anomaly groups recognition accuracy lower.

Summary of the invention

The purpose of this specification one or more embodiment is to provide a kind of Anomaly groups recognition methods and device, to solve The certainly lower problem of Anomaly groups recognition accuracy in the prior art.

In order to solve the above technical problems, this specification one or more embodiment is achieved in that

On the one hand, this specification one or more embodiment provides a kind of Anomaly groups recognition methods, comprising:

Obtain the characteristic value of each user to be analyzed in multiple users to be analyzed；

Determine the high-frequency characteristic value and characteristics of low-frequency value in the characteristic value of each user to be analyzed；

Maximum Frequent is excavated according to the high-frequency characteristic value of each user to be analyzed and preset frequent item set mining strategy Item collection obtains the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated；

According to the low frequency Maximum Frequent characteristic value and the characteristics of low-frequency in the characteristic value of each user to be analyzed Value building target bigraph (bipartite graph), and define the weight on the side in the target bigraph (bipartite graph)；

According to the weight on the side in the target bigraph (bipartite graph), and as being carried out obtained by figure cluster to the target bigraph (bipartite graph) The cluster result of the multiple user to be analyzed arrived determines the Anomaly groups in the user to be analyzed.

Optionally, the characteristic value for obtaining each user to be analyzed in multiple users to be analyzed includes:

Obtain the original personal data of the multiple user to be analyzed；

Discretization is carried out to the original personal data of the multiple user to be analyzed, to obtain each user's to be analyzed Characteristic value.

Optionally, the high-frequency characteristic value and characteristics of low-frequency value packet in the characteristic value of each user to be analyzed of the determination It includes:

According to the characteristic value of each user to be analyzed construct the first bigraph (bipartite graph), wherein first bigraph (bipartite graph) include with It is the corresponding node of each user to be analyzed, corresponding with each corresponding node of characteristic value and each user to be analyzed Node node corresponding with its characteristic value between side；

The degree of the corresponding node of each characteristic value is obtained in first bigraph (bipartite graph), and according to each characteristic value pair The degree for the node answered determines high-frequency characteristic value and characteristics of low-frequency value in the characteristic value；

The height in the characteristic value of each user to be analyzed is determined according to the high-frequency characteristic value and the characteristics of low-frequency value Frequency characteristic value and characteristics of low-frequency value.

Optionally, described to be dug according to the high-frequency characteristic value of each user to be analyzed and preset frequent item set mining strategy Maximum frequent itemsets are dug, obtaining the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated includes:

According to the high-frequency characteristic value of each user to be analyzed and FP-Growth method is combined, support is excavated and meets in advance If the frequent multi itemset of support, and maximum frequent itemsets are determined in the frequent multi itemset；

The Maximum Frequent characteristic value progress that the characteristic value of each user to be analyzed and the maximum frequent set are concentrated Match, to obtain the Maximum Frequent characteristic value of each user to be analyzed；

Low frequency Maximum Frequent characteristic value is determined in the Maximum Frequent characteristic value of the user to be analyzed.

Optionally, described that low frequency Maximum Frequent characteristic value packet is determined in the Maximum Frequent characteristic value of the user to be analyzed It includes:

The second bigraph (bipartite graph) is constructed according to the Maximum Frequent characteristic value of each user to be analyzed, wherein described two or two Figure include and the corresponding node of each user to be analyzed, with each corresponding node of Maximum Frequent characteristic value and each institute State the side between the corresponding node of user to be analyzed node corresponding with its Maximum Frequent characteristic value；

The degree of the corresponding node of each Maximum Frequent characteristic value is obtained in second bigraph (bipartite graph), and according to each described The degree of the corresponding node of Maximum Frequent characteristic value determines low frequency Maximum Frequent characteristic value in the Maximum Frequent characteristic value.

Optionally, the weight according to the side in the target bigraph (bipartite graph), and by the target bigraph (bipartite graph) into Row figure clusters the cluster result of obtained the multiple user to be analyzed, determines the Anomaly groups packet in the user to be analyzed It includes:

Side of the weight less than the first default weight is deleted in the target bigraph (bipartite graph), to obtain bigraph (bipartite graph) to be clustered, and At least one maximal connected subgraphs is obtained using interconnection algorithm to the bigraph (bipartite graph) to be clustered, and will be each described largest connected The corresponding user to be analyzed of node in subgraph is determined as the Anomaly groups；Or

Side of the weight less than the first default weight is deleted in the target bigraph (bipartite graph), to obtain bigraph (bipartite graph) to be clustered, and The node in the bigraph (bipartite graph) to be clustered is divided by community discovery algorithm, to obtain multiple node sets, and will The corresponding user to be analyzed of node in each node set is determined as the Anomaly groups.

Weight between the user to be analyzed according to the weight calculation any two on the side in the target bigraph (bipartite graph)；

Node is converted by each user to be analyzed, and side is set between any two node, and by any two The weight on the side of node is set as the weight between user to be analyzed described in corresponding any two, to construct target dendrogram；

By carrying out the cluster result that figure clusters obtained the multiple user to be analyzed to the target dendrogram, really Anomaly groups in the fixed user to be analyzed.

Optionally, described to cluster obtained the multiple user's to be analyzed by carrying out figure to the target dendrogram Cluster result determines that the Anomaly groups in the user to be analyzed include:

Side of the weight less than the second default weight is deleted in the target dendrogram, to obtain figure to be clustered, and to institute It states figure to be clustered and at least one maximal connected subgraphs is obtained using interconnection algorithm, and will be in each maximal connected subgraphs The corresponding user to be analyzed of node is identified as the Anomaly groups；Or

Weight is deleted in the target dendrogram less than the side of the second default weight, to obtain figure to be clustered, and is passed through Community discovery algorithm divides the figure to be clustered, to obtain multiple node sets, and by each node set Corresponding user to be analyzed is identified as the Anomaly groups.

On the other hand, this specification one or more embodiment provides a kind of Anomaly groups identification device, comprising:

Module is obtained, for obtaining the characteristic value of each user to be analyzed in multiple users to be analyzed；

Determining module, the high-frequency characteristic value and characteristics of low-frequency value in characteristic value for determining each user to be analyzed；

Excavate module, for according to each user to be analyzed high-frequency characteristic value and preset frequent item set mining strategy Mining Maximum Frequent Itemsets obtain the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated；

Construct module, in the characteristic value according to each user to be analyzed the low frequency Maximum Frequent characteristic value and The characteristics of low-frequency value constructs target bigraph (bipartite graph), and defines the weight on the side in the target bigraph (bipartite graph)；

Cluster module, for the weight according to the side in the target bigraph (bipartite graph), and by the target bigraph (bipartite graph) The cluster result that figure clusters obtained the multiple user to be analyzed is carried out, determines the abnormal group in the user to be analyzed Body.

Optionally, the acquisition module includes:

Acquiring unit, for obtaining the original personal data of the multiple user to be analyzed；

Discretization unit carries out discretization for the original personal data to the multiple user to be analyzed, each to obtain The characteristic value of the user to be analyzed.

Optionally, the determining module includes:

First construction unit, for constructing the first bigraph (bipartite graph) according to the characteristic value of each user to be analyzed, wherein described First bigraph (bipartite graph) include and the corresponding node of each user to be analyzed, with each corresponding node of characteristic value and each institute State the side between the corresponding node of user to be analyzed node corresponding with its characteristic value；

First determination unit, for obtaining the degree of the corresponding node of each characteristic value in first bigraph (bipartite graph), and High-frequency characteristic value and characteristics of low-frequency value are determined in the characteristic value according to the degree of the corresponding node of each characteristic value；

Second determination unit, for determining each use to be analyzed according to the high-frequency characteristic value and the characteristics of low-frequency value High-frequency characteristic value and characteristics of low-frequency value in the characteristic value at family.

Optionally, the excavation module includes:

Unit is excavated, for the high-frequency characteristic value according to each user to be analyzed and in conjunction with FP-Growth method, is excavated Support meets the frequent multi itemset of default support, and maximum frequent itemsets are determined in the frequent multi itemset；

Matching unit, the Maximum Frequent for concentrating the characteristic value of each user to be analyzed and the maximum frequent set Characteristic value is matched, to obtain the Maximum Frequent characteristic value of each user to be analyzed；

Third determination unit, for determining that low frequency Maximum Frequent is special in the Maximum Frequent characteristic value of the user to be analyzed Value indicative.

Optionally, the third determination unit includes:

Subelement is constructed, for constructing the second bigraph (bipartite graph) according to the Maximum Frequent characteristic value of each user to be analyzed, In, second bigraph (bipartite graph) includes node corresponding with each user to be analyzed, corresponding with each Maximum Frequent characteristic value Node and the corresponding node of each user to be analyzed node corresponding with its Maximum Frequent characteristic value between side；

Subelement is determined, for obtaining the corresponding node of each Maximum Frequent characteristic value in second bigraph (bipartite graph) Degree, and determine that low frequency is maximum in the Maximum Frequent characteristic value according to the degree of the corresponding node of each Maximum Frequent characteristic value Frequent characteristic value.

Optionally, the cluster module includes:

First cluster cell, for deleting side of the weight less than the first default weight in the target bigraph (bipartite graph), with At least one maximal connected subgraphs is obtained using interconnection algorithm to bigraph (bipartite graph) to be clustered, and to the bigraph (bipartite graph) to be clustered, and The corresponding user to be analyzed of node in each maximal connected subgraphs is determined as the Anomaly groups；Or

Second cluster cell, for deleting side of the weight less than the first default weight in the target bigraph (bipartite graph), with The node in the bigraph (bipartite graph) to be clustered is divided to bigraph (bipartite graph) to be clustered, and by community discovery algorithm, it is more to obtain A node set, and the corresponding user to be analyzed of node in each node set is determined as the abnormal group Body.

Optionally, the cluster module includes:

Computing unit, for the user to be analyzed according to the weight calculation any two on the side in the target bigraph (bipartite graph) Between weight；

Second construction unit for converting node for each user to be analyzed, and is set between any two node It sets side, and sets the weight between user to be analyzed described in corresponding any two for the weight on the side of any two node, To construct target dendrogram；

Third cluster cell, for obtained the multiple to be analyzed by carrying out figure cluster to the target dendrogram The cluster result of user determines the Anomaly groups in the user to be analyzed.

Optionally, the third cluster cell includes:

First cluster subelement, for deleting side of the weight less than the second default weight in the target dendrogram, with Figure to be clustered is obtained, and at least one maximal connected subgraphs is obtained using interconnection algorithm to the figure to be clustered, and will be each The corresponding user to be analyzed of node in the maximal connected subgraphs is identified as the Anomaly groups；Or

Second cluster subelement, for deleting side of the weight less than the second default weight in the target dendrogram, with Figure to be clustered is obtained, and the figure to be clustered is divided by community discovery algorithm, to obtain multiple node sets, and The corresponding user to be analyzed of each node set is identified as the Anomaly groups.

In another aspect, this specification one or more embodiment provides a kind of Anomaly groups identification equipment, comprising:

Processor；And

It is arranged to the memory of storage computer executable instructions, the computer executable instructions make when executed The processor:

In another aspect, this specification one or more embodiment provides a kind of storage medium, can be held for storing computer Row instruction, the computer executable instructions realize following below scheme when executed:

Using the technical solution of this specification one or more embodiment, in the characteristic value by each user to be analyzed of determination High-frequency characteristic value and characteristics of low-frequency value, and preset frequent item set digging is carried out by high-frequency characteristic value to each user to be analyzed Tactful Mining Maximum Frequent Itemsets are dug, obtain the low frequency Maximum Frequent characteristic value that maximum frequent set is concentrated, and according to respectively wait divide Characteristics of low-frequency value and low frequency Maximum Frequent characteristic value the building target bigraph (bipartite graph) of analysis user, and the side being arranged in target bigraph (bipartite graph) Weight, to be clustered according to the weight on the side in target bigraph (bipartite graph) and to target bigraph (bipartite graph), in determination user to be analyzed Anomaly groups.On the one hand, preset frequent item set mining strategy is carried out by the high-frequency characteristic value to each user to be analyzed to dig Maximum frequent itemsets are dug, and obtain the low frequency Maximum Frequent characteristic value of maximum frequent set concentration, to excavate the row of user to be analyzed For sequence, so that the identification of Anomaly groups is more accurate；On the other hand, only special by obtaining the low frequency of each user to be analyzed Value indicative and low frequency Maximum Frequent characteristic value, and according to the characteristics of low-frequency value of each user to be analyzed and low frequency Maximum Frequent characteristic value structure Target bigraph (bipartite graph) is built, and defines the weight on the side in target bigraph (bipartite graph), and according to the weight on the side in target bigraph (bipartite graph) and right Target bigraph (bipartite graph) carries out figure cluster, and to obtain Anomaly groups, step is simple, and easy to carry out.

Detailed description of the invention

In order to illustrate more clearly of this specification one or more embodiment or technical solution in the prior art, below will A brief introduction will be made to the drawings that need to be used in the embodiment or the description of the prior art, it should be apparent that, it is described below Attached drawing is only some embodiments recorded in this specification one or more embodiment, and those of ordinary skill in the art are come It says, without any creative labor, is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow diagram of Anomaly groups recognition methods provided by the embodiments of the present application；

Fig. 2 is high-frequency characteristic value and low frequency spy in the characteristic value of each user to be analyzed of determination provided by the embodiments of the present application The flow diagram of value indicative；

Fig. 3 is the schematic diagram of the first bigraph (bipartite graph) provided by the embodiments of the present application；

Fig. 4 is the flow diagram one provided by the embodiments of the present application for obtaining low frequency Maximum Frequent characteristic value；

Fig. 5 is the flow diagram two provided by the embodiments of the present application for obtaining low frequency Maximum Frequent characteristic value；

Fig. 6 is the flow diagram of determining Anomaly groups provided by the embodiments of the present application；

Fig. 7 is the composition schematic diagram of Anomaly groups identification device provided by the embodiments of the present application；

Fig. 8 is the structural schematic diagram that Anomaly groups provided by the embodiments of the present application identify equipment.

Specific embodiment

This specification one or more embodiment provides a kind of Anomaly groups recognition methods and device, to solve existing skill The lower problem of Anomaly groups recognition accuracy in art.

In order to make those skilled in the art more fully understand the technical solution in this specification one or more embodiment, Below in conjunction with the attached drawing in this specification one or more embodiment, to the technology in this specification one or more embodiment Scheme is clearly and completely described, it is clear that and described embodiment is only this specification a part of the embodiment, rather than Whole embodiments.Based on this specification one or more embodiment, those of ordinary skill in the art are not making creativeness The model of this specification one or more embodiment protection all should belong in every other embodiment obtained under the premise of labour It encloses.

Fig. 1 is the flow diagram of Anomaly groups recognition methods provided by the embodiments of the present application, the executing subject of this method It such as can be terminal device or server, wherein terminal device for example can be for personal computer etc., and server for example can be with Be independent a server, be also possible to the server cluster being made of multiple servers, the present exemplary embodiment to this not Do particular determination.As shown in Figure 1, this method may comprise steps of:

Step S102, the characteristic value of each user to be analyzed in multiple users to be analyzed is obtained.

In the embodiment of the present application, the original personal data that multiple users to be analyzed can be obtained first, then, to multiple The original personal data of user to be analyzed carry out discretization, to obtain the characteristic value of each user to be analyzed.Wherein, obtain it is multiple to The original personal data of analysis user include: the original personal data that each user to be analyzed can be obtained by an acquisition module, And the original personal data of each user to be analyzed are gathered to obtain the original personal data of multiple users to be analyzed.Each to The original personal data of analysis user may each comprise personal master data, behavioral data, device data etc., this exemplary implementation Example does not do particular determination to this.It may include age, gender, occupation, income, educational background, native place, correspondent party in personal master data The data of the features such as formula, account, the present exemplary embodiment do not do particular determination to this.For example, personal master data may include: Female's (gender), 18 years old (age), undergraduate course (educational background), lawyer's (occupation), Shaanxi (native place).Behavioral data may include multiple behaviors The data of feature, specifically, the data for the behavioural characteristic for including in behavioral data can be set according to the difference of application scenarios It sets.For example, in the case where insuring scene, behavioral data may include: No. 2018.10.03 insure (time of policy purchase), accident insurance (is insured Type), No. 2019.2.1 be in danger (feature of being in danger) etc..Device data for example may include: device model, equipment ownership place, make With the common address of equipment, the data of the features such as frequency of more exchange device, the present exemplary embodiment does not do particular determination to this.

Discretization is carried out to the original personal data of multiple users to be analyzed, it can to obtain the characteristic value of each user to be analyzed To include: point for the data that the data of each feature in the original personal data according to multiple users to be analyzed analyze each feature Cloth, distribution further according to the data of each feature simultaneously combine branch mailbox mode to carry out branch mailbox to the data of each feature, and by each feature Corresponding section is determined as the characteristic value of the data of corresponding each feature, and the spy of the data according to each feature after data branch mailbox The original personal data of value indicative and each user to be analyzed of combination determine the characteristic value of each user to be analyzed.

Branch mailbox mode can be determined according to the property belonging to feature, for continuous type feature (such as the age, receive Enter, transaction amount etc.), can be determined according to business experience and data distribution using etc. the branch mailbox mode such as frequency, wide.For classification The feature (for example, gender, educational background, occupation etc.) of type, can according to the specific category of feature to the data of the feature of classification type into Row branch mailbox.For the feature (such as address etc.) of text-type, can in such a way that the consistent text of mode to be polymerized to one kind into Row branch mailbox.

It should be noted that user to be analyzed can be marked according to the unique identification of user to be analyzed, to area Divide user to be analyzed.Unique identification for example can be with are as follows: identity card, officer's identity card, account id etc., the present exemplary embodiment does not do this Particular determination.

Step S104, the high-frequency characteristic value and characteristics of low-frequency value in the characteristic value of each user to be analyzed are determined.

In the present example embodiment, the high frequency in the characteristic value of user to be analyzed can be determined by following two mode Characteristic value and characteristics of low-frequency value, in which:

The number that mode one, each characteristic value of statistics occur in the characteristic value of multiple users to be analyzed, and according to following Determine that rule determines high-frequency characteristic value and characteristics of low-frequency value in characteristic value, wherein determine rule are as follows: if characteristic value it is multiple to Analyze the number coincidence formula T2 occurred in the characteristic value of user_i≥X_i> T1_i, then characteristic value is characteristics of low-frequency value, wherein X_i For the number that ith feature value occurs in the characteristic value of multiple users to be analyzed, T2_iFor ith feature value corresponding second Default frequency of occurrence, T1_iIt is worth corresponding first default frequency of occurrence, T2 for ith feature_i> T1_i, and T2_iAnd T1_iSpecific number Value can the feature according to belonging to ith feature value be determined, i.e. feature is different, corresponding T2_iAnd T1_iSpecific value It is different；If the number coincidence formula T3 that characteristic value occurs in the characteristic value of multiple users to be analyzed_i≥X_i> T2_i, then characteristic value For high-frequency characteristic value, wherein X_iFor the number that ith feature value occurs in the characteristic value of multiple users to be analyzed, T2_iIt is i-th The corresponding second default frequency of occurrence of a characteristic value, T3_iIt is worth corresponding third for ith feature and presets frequency of occurrence, T3_i> T2_i, and T2_iAnd T3_iSpecific value can the feature according to belonging to ith feature value be determined, i.e. feature is different, corresponding T2_iAnd T3_iSpecific value it is also different.

After determining high-frequency characteristic value and characteristics of low-frequency value, can by by high-frequency characteristic value and characteristics of low-frequency respectively with The characteristic value of each user to be analyzed matches, to obtain the high-frequency characteristic value and characteristics of low-frequency value of each user to be analyzed.For example, High-frequency characteristic value includes: A, B, D, and characteristics of low-frequency value includes that C, E are somebody's turn to do if the characteristic value of user to be analyzed includes: A, B, C, E The high-frequency characteristic value of user to be analyzed includes A, B, and the characteristics of low-frequency value of the user to be analyzed includes C, E；If user's to be analyzed Characteristic value includes: A, E, F, then the high-frequency characteristic value of the user to be analyzed includes A, and the characteristics of low-frequency value of the user to be analyzed includes E。

Mode two, as shown in Fig. 2, may comprise steps of:

Step S202, according to the characteristic value of each user to be analyzed construct the first bigraph (bipartite graph), wherein the first bigraph (bipartite graph) include with The corresponding node of each user to be analyzed, node corresponding with each characteristic value and the corresponding node of each user to be analyzed and Qi Te Side between the corresponding node of value indicative.

In the embodiment of the present application, each user to be analyzed is separately converted to node, each user to be analyzed only corresponds to One node, and node is converted by the characteristic value of each user to be analyzed, each characteristic value only corresponds to a node, that is, is converting During, if the corresponding node of a characteristic value has existed, it is multiplexed the node, it is corresponding with this feature value to no longer need to setting Node, wherein node corresponding with each user to be analyzed is located at the side of the first bigraph (bipartite graph), node corresponding with each characteristic value Add positioned at the other side of the first bigraph (bipartite graph), and between the corresponding node of each user to be analyzed node corresponding with its characteristic value Edged.For example, user to be analyzed is 5, the respectively first user to be analyzed to the 5th user to be analyzed, wherein first wait divide The characteristic value for analysing user includes: A, B, D, and the characteristic value of the second user to be analyzed includes: B, C, F, the spy of third user to be analyzed Value indicative includes: A, C, D, F, and the characteristic value of the 4th user to be analyzed includes: B, D, F, and the characteristic value of the 5th user to be analyzed includes: C, D, E, F are based on this, and the first bigraph (bipartite graph) of building is as shown in Figure 3, wherein the corresponding node 1, second of the first user to be analyzed The corresponding node of the corresponding node 2 of user to be analyzed, third user to be analyzed 3, the corresponding node 4 of the 4th user to be analyzed and The corresponding node 5 of 5th user to be analyzed is located at the left side of Fig. 3, the corresponding node of characteristic value A, the corresponding node of characteristic value B, spy The corresponding node of value indicative C, the corresponding node of characteristic value D, the corresponding node of characteristic value E, the corresponding node of characteristic value F are located at Fig. 3 Right side, and side is set between each corresponding node of user to be analyzed and the corresponding node of its characteristic value.

Step S204, the degree of the corresponding node of each characteristic value is obtained in the first bigraph (bipartite graph), and corresponding according to each characteristic value The degree of node high-frequency characteristic value and characteristics of low-frequency value are determined in characteristic value.

In the embodiment of the present application, the degree of the corresponding node of characteristic value refers to the number on the side of node connection corresponding with characteristic value Amount, for example, the degree of the corresponding node of characteristic value A is 2, the degree of the corresponding node of characteristic value B is 3 in Fig. 3, characteristic value C is corresponding Node degree be 3, the degree of the corresponding node of characteristic value D is 4, the degree of the corresponding node of characteristic value E is 1, the degree of characteristic value F is 4。

The process of high-frequency characteristic value and characteristics of low-frequency value is determined in characteristic value according to the degree of the corresponding node of each characteristic value It may include: according to each characteristic value and following determining rules to be combined to determine high-frequency characteristic value and characteristics of low-frequency value, wherein determining rule It then can be with are as follows: if the degree of the corresponding node of characteristic value meets formula K2_i≥degree(V_i) > 1, then characteristic value is characteristics of low-frequency Value, wherein degree (V_i) it is ith feature value V_iThe degree of corresponding node, K2_iFor ith feature value V_iCorresponding first is pre- If degree, K2_i> 1, and K2_iSpecific value can be according to ith feature value V_iAffiliated feature is determined, i.e., feature is different, Corresponding K2_iSpecific value it is also different；If the degree of the corresponding node of characteristic value meets formula K1_i≥degree(V_i) > K2_i, Then characteristic value is high-frequency characteristic value, wherein degree (V_i) it is ith feature value V_iThe degree of corresponding node, K2_iFor i-th of spy Value indicative V_iCorresponding first preset degree, K1_iFor i-th each characteristic value V_iCorresponding second preset degree, K1_i> K2_i, and K2_iAnd K1_i's Specific value can be according to ith feature value V_iAffiliated feature is determined, i.e., feature is different, corresponding K2_iAnd K1_iTool Body numerical value is also different.

For example, as shown in figure 3, if K2_iIt is 2, K1_iIt is 3, then characteristic value A is characteristics of low-frequency value, and characteristic value B, characteristic value C are High-frequency characteristic value.

Step S206, determine that the high frequency in the characteristic value of each user to be analyzed is special according to high-frequency characteristic value and characteristics of low-frequency value Value indicative and characteristics of low-frequency value.

In the embodiment of the present application, high-frequency characteristic value is matched with the characteristic value of each user to be analyzed respectively, and will The high frequency for being determined as corresponding each user to be analyzed with the characteristic value of high-frequency characteristic value successful match in each user to be analyzed is special Value indicative；Characteristics of low-frequency value matched with the characteristic value in each user to be analyzed respectively, and by each user to be analyzed with The characteristic value of characteristics of low-frequency value successful match is determined as the characteristics of low-frequency value of corresponding each user to be analyzed.For example, such as Fig. 3 institute Show, if K2_iIt is 2, K1_iIt is 3, then characteristic value A is characteristics of low-frequency value, and characteristic value B, characteristic value C are high-frequency characteristic value.Based on this, The characteristics of low-frequency value of one user to be analyzed include characteristic value A, the first user to be analyzed high-frequency characteristic value include characteristic value B, Two users to be analyzed do not have a characteristics of low-frequency value, and the high-frequency characteristic value of the second user to be analyzed includes: characteristic value B, characteristic value C, and The characteristics of low-frequency value of three users to be analyzed includes characteristic value A, and the high-frequency characteristic value of third user to be analyzed includes characteristic value C, the Four users to be analyzed do not have characteristics of low-frequency value, and the high-frequency characteristic value of the 4th user to be analyzed includes characteristic value B, the 5th use to be analyzed Family does not have characteristics of low-frequency value, and the high-frequency characteristic value of the 5th user to be analyzed includes characteristic value C.

Step S106, it is excavated according to the high-frequency characteristic value of each user to be analyzed and preset frequent item set mining strategy maximum Frequent item set obtains the low frequency Maximum Frequent characteristic value that maximum frequent set is concentrated.

In the embodiment of the present application, preset frequent item set mining strategy for example can be Apriori (Mining Association Rules Frequent item set) strategy, can also be for FP-Growth etc., the present exemplary embodiment does not do particular determination to this.In the following, with pre- If frequent item set mining strategy be FP-Growth for, the above process is illustrated, wherein as shown in figure 4, can wrap Include following steps:

Step S402, according to the high-frequency characteristic value of each user to be analyzed and FP-Growth method is combined, it is full to excavate support The frequent multi itemset of the default support of foot, and maximum frequent itemsets are determined in frequent multi itemset.

In the embodiment of the present application, support is frequency of occurrence of the high-frequency characteristic value in multiple users to be analyzed, is preset The specific value of support can be with self-setting, such as can be 1 or 2 etc., and the present exemplary embodiment is not spy to this It is different to limit.Frequent multi itemset refers to the set including at least two high-frequency characteristic values.Support meets the frequent more of default support Item collection refers to that the support of each high-frequency characteristic value in frequent multi itemset is all larger than default support.

The process of specific Mining Frequent multi itemset includes: to define default support, scans the high frequency of each user to be analyzed Characteristic value, to obtain frequency of occurrence (i.e. support) of each high-frequency characteristic value in multiple users to be analyzed, and respectively wait divide It analyses and screens out the high-frequency characteristic value that support is less than default support in the high-frequency characteristic value of user, and according to each user to be analyzed In remaining high-frequency characteristic value construct FP tree, and the Mining Frequent multi itemset in FP tree.It obtains in frequent multi itemset without superset The frequent multi itemset of condition, and the frequent multi itemset without superset condition in frequent multi itemset is determined as maximum frequent itemsets. It should be noted that it includes multiple high-frequency characteristic values that each maximum frequent set, which is concentrated, herein, include by maximum frequent set concentration High-frequency characteristic value is named as Maximum Frequent characteristic value, i.e., it includes multiple Maximum Frequent characteristic values that each maximum frequent set, which is concentrated,.

Step S404, the Maximum Frequent characteristic value progress for concentrating the characteristic value of each user to be analyzed and maximum frequent set Match, to obtain the Maximum Frequent characteristic value of each user to be analyzed.

In the embodiment of the present application, the Maximum Frequent feature characteristic value of each user to be analyzed and maximum frequent set concentrated Value is matched, and by each user to be analyzed with maximum frequent set concentrate Maximum Frequent characteristic value successful match characteristic value It is determined as the Maximum Frequent characteristic value of corresponding each user to be analyzed.

Step S406, low frequency Maximum Frequent characteristic value is determined in the Maximum Frequent characteristic value of user to be analyzed.

In the embodiment of the present application, low frequency Maximum Frequent characteristic value can be determined by following two mode, in which:

Mode one, according to each Maximum Frequent characteristic value of Maximum Frequent feature Data-Statistics of each user to be analyzed it is multiple to point The frequency of occurrence in user is analysed, and according to each Maximum Frequent characteristic value under the frequency of occurrence in multiple users to be analyzed and combination It states and determines that rule determines low frequency Maximum Frequent characteristic value in Maximum Frequent characteristic value, wherein determine rule are as follows: if Maximum Frequent Frequency of occurrence coincidence formula P2 of the characteristic value in multiple users to be analyzed_i≥S_i, then Maximum Frequent characteristic value is the maximum frequency of low frequency Numerous characteristic value, wherein P2_iFor the corresponding default frequency of occurrence of i-th of Maximum Frequent characteristic value, and P2_iSpecific value can root It is determined according to feature belonging to i-th of Maximum Frequent characteristic value, i.e., feature is different, corresponding P2_iSpecific value it is also different, S_iFor frequency of occurrence of i-th of Maximum Frequent characteristic value in multiple users to be analyzed.

Mode two, as shown in figure 5, may comprise steps of:

Step S502, the second bigraph (bipartite graph) is constructed according to the Maximum Frequent characteristic value of each user to be analyzed, wherein the two or two Figure includes and the corresponding node of each user to be analyzed, node corresponding with each Maximum Frequent characteristic value and each user to be analyzed Side between corresponding node node corresponding with its Maximum Frequent characteristic value.

In the embodiment of the present application, each user to be analyzed is separately converted to node, each user to be analyzed only corresponds to One node, and node is converted by the Maximum Frequent characteristic value of each user to be analyzed, each Maximum Frequent characteristic value only corresponds to One node, wherein node corresponding with each user to be analyzed is located at the side of the second bigraph (bipartite graph), with each Maximum Frequent characteristic value Corresponding node is located at the other side of the second bigraph (bipartite graph), and in each corresponding node of user to be analyzed and its Maximum Frequent characteristic value Side is added between corresponding node, to complete the building to the second bigraph (bipartite graph).

Step S504, the degree of the corresponding node of each Maximum Frequent characteristic value is obtained in the second bigraph (bipartite graph), and according to respectively most The degree of the big frequently corresponding node of characteristic value determines low frequency Maximum Frequent characteristic value in Maximum Frequent characteristic value.

In the embodiment of the present application, the degree of the corresponding node of Maximum Frequent characteristic value is special with the Maximum Frequent in bigraph (bipartite graph) The quantity on the connected side of the corresponding node of value indicative.The process for determining low frequency Maximum Frequent characteristic value may include: according to each maximum The degree of the corresponding node of frequent characteristic value simultaneously combines following determining rules to determine low frequency Maximum Frequent characteristic value, wherein determining rule It can be with are as follows: if the degree of the corresponding node of Maximum Frequent characteristic value meets formula L2_i≥degree(V_i), then Maximum Frequent characteristic value For low frequency Maximum Frequent characteristic value, wherein degree (V_i) be the corresponding node of i-th of Maximum Frequent characteristic value degree, L2_iI-th A Maximum Frequent characteristic value V_iCorresponding preset degree, and L2_iSpecific value can be according to i-th of Maximum Frequent characteristic value V_iInstitute The feature of category is determined, i.e., feature is different, corresponding L2_iSpecific value it is also different.

Step S108, according to the low frequency Maximum Frequent characteristic value and characteristics of low-frequency value structure in the characteristic value of each user to be analyzed Target bigraph (bipartite graph) is built, and defines the weight on the side in target bigraph (bipartite graph).

In the embodiment of the present application, by the characteristic value progress in low frequency Maximum Frequent characteristic value and each user to be analyzed Match, and will be determined as in each user to be analyzed with the characteristic value of low frequency Maximum Frequent characteristic value successful match corresponding each to be analyzed The low frequency Maximum Frequent characteristic value of user.It is obtained according in the low frequency Maximum Frequent characteristic value of each user to be analyzed and step S104 The process of the characteristics of low-frequency value building target bigraph (bipartite graph) of each user to be analyzed taken may include: to turn each user to be analyzed respectively Node is turned to, and converts node for each characteristics of low-frequency value, converts node for each low frequency Maximum Frequent characteristic value, and each Side is added between the corresponding node of user to be analyzed node corresponding with its characteristics of low-frequency value, and corresponding in each user to be analyzed Side is added between node node corresponding with its low frequency Maximum Frequent characteristic value, to complete the building to target bigraph (bipartite graph).

Define target bigraph (bipartite graph) in side weight may include: define target bigraph (bipartite graph) in each user to be analyzed it is corresponding Node is worth the weight on the side between corresponding node with its characteristics of low-frequency, and defines each user couple to be analyzed in target bigraph (bipartite graph) The weight on the side between the node answered node corresponding with its low frequency Maximum Frequent characteristic value.Wherein, it defines in target bigraph (bipartite graph) The corresponding node of each user to be analyzed may include: according to each low with the weight that its characteristics of low-frequency is worth the side between corresponding node Feature belonging to frequency characteristic value determines that the weight of each characteristics of low-frequency value includes simultaneously specifically, the weight of characteristics of low-frequency value is higher The user to be analyzed of the characteristics of low-frequency value is that the probability of an Anomaly groups is higher, and the weight of characteristics of low-frequency value is lower, wraps simultaneously It is lower for the probability of an Anomaly groups to include the user to be analyzed of the characteristics of low-frequency value.In the weight for determining each characteristics of low-frequency value Afterwards, the weight that the side of corresponding node connection is worth with each characteristics of low-frequency is disposed as to the weight of corresponding each characteristics of low-frequency value. For example, if characteristics of low-frequency value include frequently be in danger (the corresponding characteristic value of feature of being in danger), have no property (the corresponding feature of job characteristics Value), and the weight being frequently in danger is 0.5, unemployed weight is 0.1, then, and the power on the side of node connection corresponding with being frequently in danger It is disposed as 0.5 again, the weight on the side of node connection corresponding with having no property is disposed as 0.1.Similarly, it defines in target bigraph (bipartite graph) The weight on the side between the corresponding node of each user to be analyzed node corresponding with its low frequency Maximum Frequent characteristic value may include: The weight that each low frequency Maximum Frequent characteristic value is determined according to feature belonging to each low frequency Maximum Frequent characteristic value, specifically, low frequency The weight of Maximum Frequent characteristic value is higher, while being an abnormal group including the user to be analyzed of the low frequency Maximum Frequent characteristic value The probability of body is higher, and the weight of low frequency Maximum Frequent characteristic value is lower, at the same including the low frequency Maximum Frequent characteristic value to point It is lower to analyse the probability that user is an Anomaly groups.By the power on the side of node corresponding with each low frequency Maximum Frequent characteristic value connection Reset the weight for being set to corresponding each low frequency Maximum Frequent characteristic value.

Step S110, according to the weight on the side in target bigraph (bipartite graph), and by carrying out figure cluster institute to target bigraph (bipartite graph) The cluster result of obtained multiple users to be analyzed, determines the Anomaly groups in user to be analyzed.

In the embodiment of the present application, the Anomaly groups in user to be analyzed can be determined by following two mode, in which:

Mode one deletes side of the weight less than the first default weight in target bigraph (bipartite graph), to obtain bigraph (bipartite graph) to be clustered, And at least one maximal connected subgraphs is obtained using interconnection algorithm to bigraph (bipartite graph) to be clustered, and will be in each maximal connected subgraphs The corresponding user to be analyzed of node be determined as an Anomaly groups.

In the embodiment of the present application, the specific value of the first default weight can be with self-setting, the present exemplary embodiment pair This does not do particular determination.The weight on each side in target bigraph (bipartite graph) is successively compared with the first default weight, if side Weight then deletes the side less than the first default weight in target bigraph (bipartite graph), if the weight on side is not less than the first default weight, Retain the side in target bigraph (bipartite graph), the target bigraph (bipartite graph) for screening out the side that weight is less than default weight is determined as two to be clustered Figure.Interconnection algorithm is used to bigraph (bipartite graph) to be clustered to obtain at least one maximal connected subgraphs, in each maximal connected subgraphs Screen out and the corresponding node of characteristics of low-frequency value and node corresponding with low frequency Maximum Frequent characteristic value, and by each largest connected son The corresponding user to be analyzed of remaining node gathers in figure, to obtain the corresponding user to be analyzed of each maximal connected subgraphs Set, and the corresponding user's set to be analyzed of each maximal connected subgraphs is identified as an Anomaly groups.

Mode two deletes side of the weight less than the first default weight in target bigraph (bipartite graph), to obtain bigraph (bipartite graph) to be clustered, And the node in bigraph (bipartite graph) to be clustered is divided by community discovery algorithm, to obtain multiple node sets, and will be every The corresponding user to be analyzed of node in a node set is determined as an Anomaly groups.

In the embodiment of the present application, due in bigraph (bipartite graph) delete weight less than the first default weight side, with obtain to The principle for clustering bigraph (bipartite graph) is identical as the principle in aforesaid way one, therefore is not repeating herein.Community discovery algorithm for example may be used Think louvain algorithm etc., the present exemplary embodiment does not do particular determination to this.Passing through community discovery algorithm to be clustered two Node in portion's figure is divided after obtaining multiple node sets, is screened out in each node set first and characteristics of low-frequency value pair The node and the corresponding node of low frequency Maximum Frequent characteristic value answered, and it is respectively that node remaining in each node set is corresponding User to be analyzed gathers, to obtain the corresponding user to be analyzed set of each node set, and by each node set pair The user to be analyzed set answered is identified as an Anomaly groups.

Further, after obtaining Anomaly groups, in order to further be verified to Anomaly groups, so that it is further The accuracy for improving Anomaly groups identification, the total quantity of the user to be analyzed in available each Anomaly groups, and in exception The total quantity that user to be analyzed is screened out in group is less than the Anomaly groups of preset quantity, and remaining Anomaly groups are determined as most The Anomaly groups identified eventually；The modularity of the corresponding maximal connected subgraphs of each Anomaly groups can also be calculated, and will be each The modularity of the corresponding maximal connected subgraphs of Anomaly groups is determined as the modularity of corresponding Anomaly groups, and in Anomaly groups In screen out modularity be less than presetting module degree Anomaly groups, remaining Anomaly groups are determined as to the abnormal group finally identified Body.It should be noted that above two verification mode is exemplary only, it is not intended to limit the present invention, can also passes through The service feature of each of analysis Anomaly groups user to be analyzed verifies Anomaly groups.

In order to more accurately be clustered to user to be analyzed, to obtain more accurate Anomaly groups, such as Fig. 6 institute Show, it is obtained multiple wait divide according to the weight on the side in target bigraph (bipartite graph), and by carrying out figure cluster to target bigraph (bipartite graph) The cluster result for analysing user, determines that the Anomaly groups in user to be analyzed may comprise steps of:

Step S602, according to the weight between the weight calculation any two user to be analyzed on the side in target bigraph (bipartite graph).

In the embodiment of the present application, it is common that node corresponding with any two user to be analyzed is obtained in target bigraph (bipartite graph) And the characteristics of low-frequency corresponding node of value and node corresponding with low frequency Maximum Frequent characteristic value of connection, and will be waited for any two Node corresponding with characteristics of low-frequency value that the corresponding node of analysis user connects jointly and corresponding with low frequency Maximum Frequent characteristic value Node be determined as destination node；According to the corresponding node of any one of any two user to be analyzed user to be analyzed with The weight on the side between each destination node simultaneously combines following formula to calculate the weight between any two user to be analyzed, above-mentioned Formula are as follows:

Wherein, weight of the weight (e) between any two user to be analyzed, j are the total quantity of destination node, w (item_i) it is i-th of destination node item_iSection corresponding with any one user to be analyzed in any two user to be analyzed The weight on the side between point.

Step S604, node is converted by each user to be analyzed, and side is set between any two node, and will be any The weight on the side of two nodes is set as the weight between corresponding any two user to be analyzed, to construct target dendrogram.

In the embodiment of the present application, node is converted by each user to be analyzed, i.e. a user to be analyzed only corresponds to one Node, and side is set between any two node, and sets this for the weight between any two user to be analyzed The weight on the side between corresponding two nodes of the user to be analyzed of meaning two, to complete the building of target dendrogram.From the foregoing, it will be observed that It will include that the corresponding node of user to be analyzed and characteristics of low-frequency are worth corresponding node and low by step S602 and step S604 The target bigraph (bipartite graph) of the corresponding node of frequency Maximum Frequent characteristic value is converted into target only including the corresponding node of user to be analyzed Dendrogram.

Step S606, by carrying out the cluster result that figure clusters obtained multiple users to be analyzed to target dendrogram, Determine the Anomaly groups in user to be analyzed.

In the embodiment of the present application, Anomaly groups can be determined by following two mode, in which:

Mode one deletes side of the weight less than the second default weight in target dendrogram, to obtain figure to be clustered, and it is right Figure to be clustered obtains at least one maximal connected subgraphs using interconnection algorithm, and by the node pair in each maximal connected subgraphs The user to be analyzed answered is identified as an Anomaly groups.

In the embodiment of the present application, the specific value of the second default weight can be with self-setting, the present exemplary embodiment pair This does not do particular determination.The weight on each side in target dendrogram is compared with the second default weight respectively, and in mesh It marks and deletes weight in dendrogram less than the side of the second default weight, convert figure to be clustered for target dendrogram.It will each most The corresponding user to be analyzed of node in big connected subgraph gathers, corresponding to be analyzed to obtain each maximal connected subgraphs User's set, and the corresponding user's set to be analyzed of each maximal connected subgraphs is identified as an Anomaly groups.

Mode two deletes weight less than the side of the second default weight in target dendrogram, to obtain figure to be clustered, and leads to It crosses community discovery algorithm and treats dendrogram and divided, to obtain multiple node sets, and each node set is corresponding User to be analyzed is identified as an Anomaly groups.

In application embodiment, the second default weight has been explained above, therefore is not repeating herein.It will The weight on each side in target dendrogram is compared with the second default weight respectively, and deletes weight in target dendrogram Less than the side of the second default weight, figure to be clustered is converted by target dendrogram.Community discovery algorithm for example can be Louvain algorithm etc., the present exemplary embodiment does not do particular determination to this.It is treated in dendrogram by community discovery algorithm Node is divided after obtaining multiple node sets, is respectively carried out the corresponding user to be analyzed of the node in each node set Set, to obtain the corresponding user to be analyzed set of each node set, and by the corresponding user to be analyzed of each node set Set is identified as an Anomaly groups.

From the foregoing, it will be observed that passing through the power between the weight calculation any two user to be analyzed according to the side in target bigraph (bipartite graph) Weight, and target dendrogram is constructed according to the weight before any two user to be analyzed, target is converted by target bigraph (bipartite graph) Dendrogram so that target dendrogram is more accurate and more intuitive reaction user to be analyzed between relationship so that root The Anomaly groups obtained according to target dendrogram are more accurate.

It should be noted that the mode of above two determining Anomaly groups carries out illustratively, being not used to limit this hair It is bright.

It is excavated in conclusion carrying out preset frequent item set mining strategy by the high-frequency characteristic value to each user to be analyzed Maximum frequent itemsets, and the low frequency Maximum Frequent characteristic value of maximum frequent set concentration is obtained, to excavate the behavior of user to be analyzed Sequence, so that the identification of Anomaly groups is more accurate；In addition, only by obtain each user to be analyzed characteristics of low-frequency value and Low frequency Maximum Frequent characteristic value, and target is constructed according to the characteristics of low-frequency value of each user to be analyzed and low frequency Maximum Frequent characteristic value Bigraph (bipartite graph), and define the weight on the side in target bigraph (bipartite graph), and according to the weight on the side in target bigraph (bipartite graph) and to target two Portion's figure carries out figure cluster, and to obtain Anomaly groups, step is simple, and easy to carry out.

Corresponding above-mentioned Anomaly groups recognition methods, based on the same technical idea, the embodiment of the present application also provides one kind Anomaly groups identification device, Fig. 7 are the composition schematic diagram of Anomaly groups identification device provided by the embodiments of the present application, which uses In executing above-mentioned Anomaly groups recognition methods, as shown in fig. 7, the device 700 may include: to obtain module 701, determining module 702, module 703, building module 704, cluster module 705 are excavated, in which:

Module 701 is obtained, for obtaining the characteristic value of each user to be analyzed in multiple users to be analyzed；

Determining module 702, the high-frequency characteristic value and characteristics of low-frequency in characteristic value for determining each user to be analyzed Value；

Excavate module 703, for according to each user to be analyzed high-frequency characteristic value and preset frequent item set mining Tactful Mining Maximum Frequent Itemsets obtain the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated；

Module 704 is constructed, for the low frequency Maximum Frequent feature in the characteristic value according to each user to be analyzed Value and the characteristics of low-frequency value construct target bigraph (bipartite graph), and define the weight on the side in the target bigraph (bipartite graph)；

Cluster module 705, for the weight according to the side in the target bigraph (bipartite graph), and by the target two Figure carries out the cluster result that figure clusters obtained the multiple user to be analyzed, determines the abnormal group in the user to be analyzed Body.

Optionally, the acquisition module 701 may include:

Optionally, the determining module 702 may include:

Optionally, the excavation module 703 may include:

Optionally, the third determination unit may include:

Optionally, the cluster module 705 may include:

Optionally, the third cluster cell may include:

Anomaly groups identification device in the embodiment of the present application is carried out pre- by the high-frequency characteristic value to each user to be analyzed If frequent item set mining strategy Mining Maximum Frequent Itemsets, and obtain maximum frequent set concentration low frequency Maximum Frequent feature Value, to excavate the behavior sequence of user to be analyzed, so that the identification of Anomaly groups is more accurate；In addition, only passing through acquisition The characteristics of low-frequency value and low frequency Maximum Frequent characteristic value of each user to be analyzed, and according to the characteristics of low-frequency value of each user to be analyzed and Low frequency Maximum Frequent characteristic value constructs target bigraph (bipartite graph), and defines the weight on the side in target bigraph (bipartite graph), and according to target two The weight on the side in portion's figure simultaneously carries out figure cluster to target bigraph (bipartite graph), and to obtain Anomaly groups, step is simple, and easy to carry out.

Above-mentioned Anomaly groups recognition methods is answered, based on the same technical idea, the embodiment of the present application also provides a kind of different Normal Stock discrimination equipment, Fig. 8 are the structural schematic diagram that Anomaly groups provided by the embodiments of the present application identify equipment, which is used for Execute above-mentioned Anomaly groups recognition methods.

As shown in figure 8, Anomaly groups identification equipment can generate bigger difference because configuration or performance are different, can wrap One or more processor 801 and memory 802 are included, one or more has been can store in memory 802 and has deposited Store up application program or data.Wherein, memory 802 can be of short duration storage or persistent storage.It is stored in the application of memory 802 Program may include one or more modules (diagram is not shown), and each module may include identifying equipment to Anomaly groups In series of computation machine executable instruction.Further, processor 801 can be set to communicate with memory 802, different The series of computation machine executable instruction in memory 802 is executed in normal Stock discrimination equipment.Anomaly groups identification equipment may be used also To include one or more power supplys 803, one or more wired or wireless network interfaces 804, one or one with Upper input/output interface 805, one or more keyboards 806 etc..

In a specific embodiment, Anomaly groups identification equipment include memory and one or more Program, perhaps more than one program is stored in memory and one or more than one program may include for one of them One or more modules, and each module may include executable to the series of computation machine in Anomaly groups identification equipment Instruction, and be configured to execute this or more than one program by one or more than one processor to include for carrying out Following computer executable instructions:

Optionally, computer executable instructions when executed, it is described obtain in multiple users to be analyzed it is each it is described to Analysis user characteristic value include:

Obtain the original personal data of the multiple user to be analyzed；

Optionally, computer executable instructions when executed, in the characteristic value of each user to be analyzed of determination High-frequency characteristic value and characteristics of low-frequency value include:

Optionally, computer executable instructions when executed, the high-frequency characteristic according to each user to be analyzed Value and preset frequent item set mining strategy Mining Maximum Frequent Itemsets obtain the maximum frequency of low frequency that the maximum frequent set is concentrated Numerous characteristic value includes:

Optionally, computer executable instructions when executed, the Maximum Frequent feature in the user to be analyzed Determine that low frequency Maximum Frequent characteristic value includes: in value

Optionally, computer executable instructions when executed, the power according to the side in the target bigraph (bipartite graph) Weight, and by carrying out the cluster result that figure clusters obtained the multiple user to be analyzed to the target bigraph (bipartite graph), really The Anomaly groups in the user to be analyzed include: calmly

Optionally, computer executable instructions when executed, the weight according to the side in the target bigraph (bipartite graph), And by carrying out the cluster result that figure clusters obtained the multiple user to be analyzed to the target bigraph (bipartite graph), determine institute The Anomaly groups stated in user to be analyzed include:

Optionally, computer executable instructions are when executed, described by carrying out figure cluster to the target dendrogram The cluster result of obtained the multiple user to be analyzed determines that the Anomaly groups in the user to be analyzed include:

Anomaly groups in the embodiment of the present application identify equipment, are carried out by the high-frequency characteristic value to each user to be analyzed pre- If frequent item set mining strategy Mining Maximum Frequent Itemsets, and obtain maximum frequent set concentration low frequency Maximum Frequent feature Value, to excavate the behavior sequence of user to be analyzed, so that the identification of Anomaly groups is more accurate；In addition, only passing through acquisition The characteristics of low-frequency value and low frequency Maximum Frequent characteristic value of each user to be analyzed, and according to the characteristics of low-frequency value of each user to be analyzed and Low frequency Maximum Frequent characteristic value constructs target bigraph (bipartite graph), and defines the weight on the side in target bigraph (bipartite graph), and according to target two The weight on the side in portion's figure simultaneously carries out figure cluster to target bigraph (bipartite graph), and to obtain Anomaly groups, step is simple, and easy to carry out.

Corresponding above-mentioned Anomaly groups recognition methods, based on the same technical idea, the embodiment of the present application also provides one kind Storage medium, for storing computer executable instructions, in a specific embodiment, which can be USB flash disk, light Disk, hard disk etc., the computer executable instructions of storage medium storage are able to achieve following below scheme when being executed by processor:

Optionally, for the computer executable instructions of storage medium storage when being executed by processor, the acquisition is multiple The characteristic value of each user to be analyzed in user to be analyzed includes:

Obtain the original personal data of the multiple user to be analyzed；

Optionally, the computer executable instructions of storage medium storage are when being executed by processor, each institute of determination The high-frequency characteristic value and characteristics of low-frequency value stated in the characteristic value of user to be analyzed include:

Optionally, the computer executable instructions of storage medium storage are described according to each institute when being executed by processor State user to be analyzed high-frequency characteristic value and preset frequent item set mining strategy Mining Maximum Frequent Itemsets, obtain the maximum Low frequency Maximum Frequent characteristic value in frequent item set includes:

Optionally, the storage medium storage computer executable instructions when being executed by processor, it is described it is described to It analyzes and determines that low frequency Maximum Frequent characteristic value includes: in the Maximum Frequent characteristic value of user

Optionally, the computer executable instructions of storage medium storage are described according to when being executed by processor The weight on the side in target bigraph (bipartite graph), and it is obtained the multiple wait divide by carrying out figure cluster to the target bigraph (bipartite graph) The cluster result for analysing user, determines that the Anomaly groups in the user to be analyzed include:

Optionally, the computer executable instructions of storage medium storage are described by institute when being executed by processor It states target dendrogram and carries out the cluster result that figure clusters obtained the multiple user to be analyzed, determine the user to be analyzed In Anomaly groups include:

The computer executable instructions of storage medium storage in the embodiment of the present application are when being executed by processor, by right The high-frequency characteristic value of each user to be analyzed carries out preset frequent item set mining strategy Mining Maximum Frequent Itemsets, and obtains maximum Low frequency Maximum Frequent characteristic value in frequent item set, to excavate the behavior sequence of user to be analyzed, so that Anomaly groups It is more accurate to identify；In addition, only by obtaining the characteristics of low-frequency value and low frequency Maximum Frequent characteristic value of each user to be analyzed, and root Target bigraph (bipartite graph) is constructed according to the characteristics of low-frequency value and low frequency Maximum Frequent characteristic value of each user to be analyzed, and defines target bigraph (bipartite graph) In side weight, and according to the weight on the side in target bigraph (bipartite graph) and figure cluster is carried out to target bigraph (bipartite graph), it is different to obtain Chang Qunti, step is simple, and easy to carry out.

In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.

Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.

System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.

It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims

1. a kind of Anomaly groups recognition methods characterized by comprising

According to the high-frequency characteristic value of each user to be analyzed and preset frequent item set mining strategy Mining Maximum Frequent Itemsets, Obtain the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated；

According in the characteristic value of each user to be analyzed the low frequency Maximum Frequent characteristic value and the characteristics of low-frequency value structure Target bigraph (bipartite graph) is built, and defines the weight on the side in the target bigraph (bipartite graph)；

It is obtained according to the weight on the side in the target bigraph (bipartite graph), and by carrying out figure cluster to the target bigraph (bipartite graph) The cluster result of the multiple user to be analyzed determines the Anomaly groups in the user to be analyzed.

2. Anomaly groups recognition methods according to claim 1, which is characterized in that described to obtain in multiple users to be analyzed The characteristic value of each user to be analyzed include:

Obtain the original personal data of the multiple user to be analyzed；

Discretization is carried out to the original personal data of the multiple user to be analyzed, to obtain the feature of each user to be analyzed Value.

3. Anomaly groups recognition methods according to claim 1, which is characterized in that each user to be analyzed of determination Characteristic value in high-frequency characteristic value and characteristics of low-frequency value include:

The first bigraph (bipartite graph) is constructed according to the characteristic value of each user to be analyzed, wherein first bigraph (bipartite graph) includes and each institute State the corresponding node of user to be analyzed, node corresponding with each characteristic value and the corresponding section of each user to be analyzed Side between point node corresponding with its characteristic value；

The degree of the corresponding node of each characteristic value is obtained in first bigraph (bipartite graph), and corresponding according to each characteristic value The degree of node determines high-frequency characteristic value and characteristics of low-frequency value in the characteristic value；

Determine that the high frequency in the characteristic value of each user to be analyzed is special according to the high-frequency characteristic value and the characteristics of low-frequency value Value indicative and characteristics of low-frequency value.

4. Anomaly groups recognition methods according to claim 1, which is characterized in that described according to each user to be analyzed High-frequency characteristic value and preset frequent item set mining strategy Mining Maximum Frequent Itemsets, obtain what the maximum frequent set was concentrated Low frequency Maximum Frequent characteristic value includes:

According to the high-frequency characteristic value of each user to be analyzed and FP-Growth method is combined, support is excavated and meets default branch The frequent multi itemset for degree of holding, and maximum frequent itemsets are determined in the frequent multi itemset；

The characteristic value of each user to be analyzed is matched with the Maximum Frequent characteristic value that the maximum frequent set is concentrated, with Obtain the Maximum Frequent characteristic value of each user to be analyzed；

5. Anomaly groups recognition methods according to claim 4, which is characterized in that it is described the user to be analyzed most Determine that low frequency Maximum Frequent characteristic value includes: in big frequently characteristic value

The second bigraph (bipartite graph) is constructed according to the Maximum Frequent characteristic value of each user to be analyzed, wherein the second bigraph (bipartite graph) packet Include and the corresponding node of each user to be analyzed, node corresponding with each Maximum Frequent characteristic value and it is each it is described to Analyze the side between the corresponding node of user node corresponding with its Maximum Frequent characteristic value；

The degree of the corresponding node of each Maximum Frequent characteristic value is obtained in second bigraph (bipartite graph), and according to each maximum The degree of the corresponding node of frequent characteristic value determines low frequency Maximum Frequent characteristic value in the Maximum Frequent characteristic value.

6. Anomaly groups recognition methods according to claim 1, which is characterized in that described according in the target bigraph (bipartite graph) Side weight, and by carrying out the cluster that figure clusters obtained the multiple user to be analyzed to the target bigraph (bipartite graph) As a result, determining that the Anomaly groups in the user to be analyzed include:

Side of the weight less than the first default weight is deleted in the target bigraph (bipartite graph), to obtain bigraph (bipartite graph) to be clustered, and to institute It states bigraph (bipartite graph) to be clustered and at least one maximal connected subgraphs is obtained using interconnection algorithm, and by each maximal connected subgraphs In the corresponding user to be analyzed of node be determined as the Anomaly groups；Or

Weight is deleted in the target bigraph (bipartite graph) less than the side of the first default weight, to obtain bigraph (bipartite graph) to be clustered, and is passed through Community discovery algorithm divides the node in the bigraph (bipartite graph) to be clustered, to obtain multiple node sets, and will be each The corresponding user to be analyzed of node in the node set is determined as the Anomaly groups.

7. Anomaly groups recognition methods according to claim 1, which is characterized in that described according in the target bigraph (bipartite graph) Side weight, and by carrying out the cluster that figure clusters obtained the multiple user to be analyzed to the target bigraph (bipartite graph) As a result, determining that the Anomaly groups in the user to be analyzed include:

Node is converted by each user to be analyzed, and side is set between any two node, and by any two node The weight on side be set as the weight between user to be analyzed described in corresponding any two, to construct target dendrogram；

By carrying out the cluster result that figure clusters obtained the multiple user to be analyzed to the target dendrogram, institute is determined State the Anomaly groups in user to be analyzed.

8. Anomaly groups recognition methods according to claim 7, which is characterized in that described by the target dendrogram The cluster result that figure clusters obtained the multiple user to be analyzed is carried out, determines the Anomaly groups in the user to be analyzed Include:

Side of the weight less than the second default weight is deleted in the target dendrogram, to obtain figure to be clustered, and to it is described to Dendrogram obtains at least one maximal connected subgraphs using interconnection algorithm, and by the node in each maximal connected subgraphs Corresponding user to be analyzed is identified as the Anomaly groups；Or

Weight is deleted in the target dendrogram less than the side of the second default weight, to obtain figure to be clustered, and passes through community It was found that algorithm divides the figure to be clustered, and to obtain multiple node sets, and each node set is corresponding User to be analyzed be identified as the Anomaly groups.

9. a kind of Anomaly groups identification device characterized by comprising

Module is excavated, for according to the high-frequency characteristic value of each user to be analyzed and the excavation of preset frequent item set mining strategy Maximum frequent itemsets obtain the low frequency Maximum Frequent characteristic value that the maximum frequent set is concentrated；

Module is constructed, for the low frequency Maximum Frequent characteristic value in the characteristic value according to each user to be analyzed and described Characteristics of low-frequency value constructs target bigraph (bipartite graph), and defines the weight on the side in the target bigraph (bipartite graph)；

Cluster module, for the weight according to the side in the target bigraph (bipartite graph), and by being carried out to the target bigraph (bipartite graph) Figure clusters the cluster result of obtained the multiple user to be analyzed, determines the Anomaly groups in the user to be analyzed.

10. a kind of Anomaly groups identify equipment characterized by comprising

Processor；And

It is arranged to the memory of storage computer executable instructions, the computer executable instructions make described when executed Processor:

11. a kind of storage medium, for storing computer executable instructions, which is characterized in that the computer executable instructions Following below scheme is realized when executed: