WO2021239004A1 - Procédé et appareil de détection de communautés anormales, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de détection de communautés anormales, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2021239004A1
WO2021239004A1 PCT/CN2021/096155 CN2021096155W WO2021239004A1 WO 2021239004 A1 WO2021239004 A1 WO 2021239004A1 CN 2021096155 W CN2021096155 W CN 2021096155W WO 2021239004 A1 WO2021239004 A1 WO 2021239004A1
Authority
WO
WIPO (PCT)
Prior art keywords
relationship
community
cluster
abnormal
guarantee
Prior art date
Application number
PCT/CN2021/096155
Other languages
English (en)
Chinese (zh)
Inventor
曹合心
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021239004A1 publication Critical patent/WO2021239004A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • This application relates to the field of data processing technology, and in particular to an abnormal community detection method, device, computer equipment, and storage medium.
  • the purpose of the embodiments of the present application is to propose an abnormal community detection method, device, computer equipment, and storage medium, which aims to solve the technical problem that the abnormal community cannot be efficiently extracted under the condition of multiple guarantee relationships.
  • an embodiment of the present application provides an abnormal community detection method, which adopts the following technical solutions:
  • An abnormal community detection method includes the following steps:
  • Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • a community with similar features is determined as a relationship cluster
  • the relationship cluster is an abnormal cluster
  • the community in the abnormal cluster is an abnormal community
  • the abnormal community is extracted.
  • an embodiment of the present application also provides an abnormal community detection device, which adopts the following technical solutions:
  • the segmentation module is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
  • the first confirmation module is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • the second confirmation module is used to determine a community with similar characteristics as a relationship cluster according to the characteristic information
  • a classification module configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
  • the extraction module is used to extract the abnormal community when the community in the abnormal cluster is determined to be an abnormal cluster.
  • an embodiment of the present application also provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes
  • the computer-readable instructions further implement the following steps:
  • Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • a community with similar features is determined as a relationship cluster
  • the relationship cluster is an abnormal cluster
  • the community in the abnormal cluster is an abnormal community
  • the abnormal community is extracted.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processing The device also performs the following steps:
  • Determining feature information of the community where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • a community with similar features is determined as a relationship cluster
  • the relationship cluster is an abnormal cluster
  • the community in the abnormal cluster is an abnormal community
  • the abnormal community is extracted.
  • the above-mentioned abnormal community detection method, device, computer equipment and storage medium by constructing a guarantee relationship network and segmenting the guarantee relationship network, obtain communities with abnormal guarantee relationships; the communities with abnormal guarantee relationships include those with abnormal guarantee relationships
  • the collection of accounts, in a large-scale guarantee relationship network will be divided into communities of the order of millions or even tens of millions. Therefore, when the community with the abnormal guarantee relationship is obtained, the characteristic information of the community is determined.
  • the feature information includes at least one of node size, edge size, clustering coefficient, number of connected triangles, and average degree; according to the feature information, a community with similar features is determined to be a relationship cluster, and there are communities with similar features. It may be the same abnormal communities.
  • the communities with similar characteristics are grouped into a relationship cluster; the Euclidean distance of the relationship cluster is calculated, and the abnormal communities in the relationship cluster are further determined according to the Euclidean distance, that is, according to the Euclidean distance.
  • Categorize the relationship clusters based on the distance determine whether the relationship cluster is an abnormal cluster based on the classification result, when determining that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community , Thereby achieving the effect of efficiently extracting abnormal communities in the case of multi-guarantee relationships.
  • Figure 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a schematic flowchart of a method for detecting abnormal communities provided by an embodiment of the present application
  • Fig. 3 is a schematic diagram of a guarantee relationship network in an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a guarantee mode in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an embodiment of the abnormal community detection device of the present application.
  • Fig. 6 is a schematic structural diagram of an embodiment of the computer device of the present application.
  • segmentation module 910 segmentation module 910, first confirmation module 920, second confirmation module 930, calculation module 940, classification module 950, extraction module 960.
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens and support for web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4
  • laptop portable computers and desktop computers etc.
  • the server 105 may be a server that provides various services, for example, a background server that provides support for pages displayed on the terminal devices 101, 102, and 103.
  • the abnormal community detection method provided in the embodiments of the present application is generally executed by the server/terminal, and accordingly, the abnormal community detection device is generally set in the server/terminal device.
  • terminals, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the abnormal community detection method includes the following steps:
  • Step S200 construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
  • the guarantee relationship network is composed of nodes and guarantee relationships.
  • the nodes include a source node and a target node.
  • the source node represents the guarantor
  • the target node represents the guarantor.
  • the guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C.
  • Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B.
  • the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
  • NI node
  • Sim similarity between nodes
  • NNI neighbor nodes
  • Step S300 Determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • each community is regarded as a subgraph.
  • the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained.
  • the 26-dimensional feature is Characteristic information of the community.
  • graphx is a component of graphs and graph calculations in the spark framework.
  • the feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
  • the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to
  • the above-mentioned feature information can also be stored in a blockchain, and the feature information can be shared between different platforms through the blockchain.
  • Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • Step S400 Determine, according to the feature information, a community with similar features as a relationship cluster
  • Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig.
  • FIG. 4(a) shows the mutual guarantee mode formed by A and B
  • Fig. 4(b) shows A, B
  • Figure 4(c) shows the joint guarantee circle model formed by A, B, and C
  • Figure 4(d) shows the multi-party guarantee model formed by A, B, and C
  • communities are communities with distinct characteristics, that is, four different relationship clusters.
  • the characteristic information of the community can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example.
  • the characteristic information includes 3 nodes and 1 total triangle, etc. .
  • communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar.
  • the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
  • Step S500 calculating the Euclidean distance of the relation cluster
  • the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin ⁇ 0,0,...0 ⁇ , denoted by dis i .
  • the formula for calculating the Euclidean distance is as follows:
  • the feature vector of the i-th relationship cluster is ⁇ x i1 ,x i2 ,...,x i26 ⁇
  • the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
  • Step S600 Classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
  • sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance.
  • the preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold.
  • the relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster.
  • the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
  • Step S700 When it is determined that the relationship cluster is an abnormal cluster, determine that the community in the abnormal cluster is an abnormal community, and extract the abnormal community.
  • the relationship cluster is an abnormal cluster
  • the community in the relationship cluster is an abnormal guarantee
  • all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
  • the intelligent automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be parallelized at one time.
  • the large-scale guarantee network that handles millions of users has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
  • the abnormal community detection method before segmenting the guarantee relationship network, the abnormal community detection method further includes:
  • intersection length is less than the preset length, it is determined that the guarantor and the guaranteed person do not belong to the same community, and the unnecessary relationship that the guarantor and the guaranteed person do not belong to the same community is deleted.
  • each node is given the label set of the community to which it belongs.
  • the same node may belong to multiple communities with different label sets. For example, there is a node A belonging to the label set. It is the two communities of A and B. Node B belongs to the two communities with label sets B and C.
  • Deleting the non-essential relationship between node A and node B does not belong to the same community is to delete node B and community C Relationship, the relationship between node A and community A, only the relationship between node A and community B, and the relationship between node B and community B are retained.
  • the triplets format data in the Graphx module contains both relationship information and node attribute information. For each guarantee relationship, call the .srcAttr method to obtain the label set of the guarantor, which is the source node, and call the .dstAttr method to obtain the guarantor, which is the target. The label set of the node.
  • intersection length of the label set of the source node and the target node is not less than the preset length, that is, the intersection of the label set of the source node and the target node is not empty, it means that the source node and the target node have at least the same community label, which means they belong to The same community; if the length of the intersection of the source node and the target node label set is less than the preset length, it is determined that the source node and the target node do not belong to the same community, and the unnecessary relationship between the source node and the target node is deleted;
  • the preset length is any length set in advance.
  • the guarantor and the guarantor who do not belong to the same community relationship are deleted, which saves redundant data processing procedures, and improves data processing accuracy and data processing efficiency.
  • step 400 determining a community with similar characteristics as a relationship cluster according to the characteristic information includes:
  • the communities with similar characteristics are clustered into a relationship cluster.
  • the structured data includes the community number and characteristic information of the community.
  • the structured data is usually stored in a relational database.
  • the community number and the characteristic information of the community are packaged, and the community number and the characteristic information are organized into structured data. data.
  • the structured data is called based on the relational database, and various clustering analysis algorithms are called at the same time to analyze the structured data, thereby obtaining a composition of communities with similar characteristics Relationship clusters.
  • the k-means (k-means clustering algorithm, k-means clustering) algorithm is called, based on this algorithm, communities with similar characteristics can be clustered into a relationship cluster.
  • the organization of the structured data of the community is realized, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the structured data can further improve the characteristics of the community. The processing efficiency of similar communities.
  • the foregoing acquiring structured data corresponding to the community based on the characteristic information includes:
  • the community number is the logo information of the community. When the community is divided, each community will be assigned its corresponding community number, and different communities correspond to different community numbers.
  • the feature information is the number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in degree, average in degree, maximum in degree, minimum in degree, and in degree standard deviation included in each community , Total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard deviation, total triangle number, average triangle number , Maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient and other information. Call the community number of the community, package the community number and feature information, and obtain structured data.
  • the structured data of each community is obtained according to the community number and characteristic information, so that the structured data can be used to process communities with similar characteristics more quickly and efficiently, and the data is improved.
  • the speed of processing is improved.
  • step S500, calculating the Euclidean distance of the relationship cluster includes:
  • the Euclidean distance from the origin of the relationship cluster is calculated.
  • the Euclidean distance from the feature of the i-th relation cluster to the origin ⁇ 0, 0,..., 0 ⁇ is calculated according to the Euclidean distance calculation method.
  • the calculation of the Euclidean distance of the relationship clusters is realized, and the Euclidean distance of each relationship cluster is used to divide the relationship clusters, so as to accurately obtain the abnormal relationship clusters according to the Euclidean distance.
  • the foregoing classification of the relationship clusters according to the Euclidean distance includes:
  • the relationship clusters are classified according to the lower quartile and the upper quartile.
  • the lower quartile and the upper quartile are the lower quartile and the upper quartile obtained by sorting from small to large according to Euclidean distance.
  • the value of the lower quartile is smaller than the value of the upper quartile, and the relationship cluster can be classified according to the interval range of the lower quartile and the upper quartile.
  • relationship clusters that belong to the upper quartile and the lower quartile range are classified as abnormal clusters
  • the relationship clusters that do not belong to the upper quartile and the lower quartile range are classified as non Abnormal clusters.
  • the division of the relationship clusters according to the upper quartile and the lower quartile in the Euclidean distance is realized, which further realizes the accurate judgment of abnormal clusters in the relationship cluster.
  • the foregoing classification of the relationship clusters according to the lower quartile and the upper quartile includes:
  • the relationship cluster is a normal cluster.
  • Abnormal clusters include extreme relationship clusters and suspected relationship clusters. Among them, the extreme relationship cluster is a certain abnormal relationship cluster, and the suspected relationship cluster is a possible abnormal relationship cluster.
  • the lower quartile and upper quartile of Euclidean distance with Q1 and Q3, respectively.
  • IQR interquartile range
  • the relationship cluster is determined Is an extreme relationship cluster; if the Euclidean distance of the relationship cluster is within the interval between the minimum threshold and Q1 (the Euclidean distance can be equal to the value of Q1), or the Euclidean distance of the relationship cluster is within the interval between Q3 and the maximum threshold (the Euclidean distance The distance can be equal to the value of Q3), that is, H1 ⁇ disi ⁇ Q1, or Q3 ⁇ disi ⁇ H2, then the relationship cluster is determined to be a suspected relationship cluster. If the Euclidean distance of the relational cluster is within the interval of Q1 and Q3, and is not equal to the value of Q1 or Q3, that is, Q1 ⁇ disi ⁇ Q3, the relational cluster is determined to be a normal cluster.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • this application provides an embodiment of an abnormal community detection device.
  • the device embodiment corresponds to the method embodiment shown in FIG. Specifically, it can be applied to various electronic devices.
  • the abnormal community detection device 900 in this embodiment includes a segmentation module 910, a first confirmation module 920, a second confirmation module 930, a calculation module 940, a classification module 950, and an extraction module 960. :
  • the segmentation module 910 is used to construct a guarantee relationship network, segment the guarantee relationship network, and obtain communities with abnormal guarantee relationships;
  • the segmentation module 910 includes:
  • the first obtaining unit is configured to obtain the guarantee relationship in the guarantee relationship network, and determine the guarantor and the guaranteed party in the guarantee relationship;
  • the first confirmation unit is used to determine whether the intersection length of the label set between the guarantor and the guaranteed person is less than a preset length
  • a deletion unit configured to determine that the guarantor and the guaranteed person do not belong to the same community if the intersection length is less than the preset length, and delete that the guarantor and the guaranteed person do not belong to the same community Non-essential relationship.
  • the guarantee relationship network is composed of nodes and guarantee relationships.
  • the nodes include a source node and a target node.
  • the source node represents the guarantor
  • the target node represents the guarantor.
  • the guarantee relationship network is constructed as shown in Figure 3, where Set(A,B) It means that user A belongs to community A and B, Set(A) means that user C belongs to community A, Set(B) means that user B belongs to community B, Edge(C,A,1) means that user C guarantees user A, user There is only one guarantee relationship between A and user C. Edge(B,A,1) means that user B guarantees user C.
  • Edge(A,B,1) means user A guarantees user B, and there is only one guarantee relationship between user A and user B.
  • the guarantee network is segmented, and the guarantee network can be segmented based on the LPANNI algorithm (a large-scale heterogeneous information network community discovery algorithm). Specifically, calculate the influence of each node (NI), the similarity between nodes (Sim) and the influence of neighbor nodes (NNI), and then iteratively update the label set of the community based on the influence of neighbor nodes (NNI) and the membership coefficient, according to This tag set has a community with an abnormal guarantee relationship.
  • NI node
  • Sim similarity between nodes
  • NNI neighbor nodes
  • the first confirmation module 920 is configured to determine feature information of the community, where the feature information includes at least one of node size, edge size, aggregation coefficient, number of connected triangles, and average degree;
  • each community is regarded as a subgraph.
  • the feature generation is performed on each subgraph, and the 26-dimensional feature is obtained.
  • the 26-dimensional feature is Characteristic information of the community.
  • graphx is a component of graphs and graph calculations in the spark framework.
  • the feature information specifically includes: number of nodes, number of edges, average degree, maximum degree, minimum degree, degree standard deviation, total in-degree, average in-degree, and maximum in-degree , Minimum in-degree, in-degree standard deviation, total out-degree, average out-degree, maximum out-degree, minimum out-degree, out-degree standard deviation, average in-degree ratio, maximum in-degree ratio, minimum in-degree ratio, in-degree ratio standard Difference, total number of triangles, average number of triangles, maximum number of triangles, minimum number of triangles, triangle standard deviation coefficient, clustering coefficient.
  • the number of nodes is the number of nodes in the current community; the edge connects the source node (guarantor) and the target node (guarantee), the number of edges is the number of edges in the current community; the average degree is the total of the current community The value of the number of angles divided by the total number of nodes; the maximum degree and the minimum degree are the maximum and minimum degrees between the edges in the current community; the standard deviation of the degrees is the standard deviation of the degrees; one guarantor is one guarantor Guaranty, the guarantor is an in-degree of the guaranteed person, and the total in-degree is the total in-degree in the community; the average in-degree is the ratio of the total in-degree to the total number of nodes; one guaranteed person is a guarantor For guarantee, the guaranteed person is an out-degree of the guarantor; the standard deviation of the in-degree ratio is the standard deviation of the ratio of the number of in-degrees of the node in the current community to
  • the second confirmation module 930 is configured to determine, according to the characteristic information, a community with similar characteristics as a relationship cluster;
  • the second confirmation module 930 includes:
  • the second acquiring unit is configured to acquire structured data corresponding to the community according to the characteristic information
  • the clustering unit is used to group communities with similar characteristics into a relationship cluster based on the structured data.
  • the second acquiring unit includes:
  • the third obtaining unit is used to obtain the community number of the community
  • the sorting unit is used to sort the community number and the feature information into structured data.
  • Fig. 4 is a schematic diagram of the guarantee mode in this embodiment, in which Fig.
  • FIG. 4(a) shows the mutual guarantee mode formed by A and B
  • Fig. 4(b) shows A, B
  • Figure 4(c) shows the joint guarantee circle model formed by A, B, and C
  • Figure 4(d) shows the multi-party guarantee model formed by A, B, and C
  • communities are communities with distinct characteristics, that is, four different relationship clusters.
  • the characteristic information of the community can accurately describe the typical structure of the community. Take the community structure of the joint guarantee circle as an example.
  • the characteristic information includes 3 nodes and 1 total triangle, etc. .
  • communities with similar features can be clustered into a relationship cluster. Specifically, whether the community is similar to the community can be calculated by calculating the average error between the communities, comparing the average error with a preset threshold, and if the average error is not greater than the preset threshold, it is determined that the two The communities are similar; if the error average is greater than the preset threshold, it is determined that the two communities are not similar.
  • the error average value can be calculated according to the feature vector of the community, and the feature vector is obtained by normalizing the feature information.
  • a calculation module 940 configured to calculate the Euclidean distance of the relationship cluster
  • the calculation module 940 includes:
  • the first calculation unit is configured to calculate the average value of each feature in the relationship cluster, and calculate the feature vector of the relationship cluster according to the average value;
  • the second calculation unit is configured to calculate the Euclidean distance from the origin of the relationship cluster according to the feature vector.
  • the Euclidean distance is the distance from the feature of the i-th relation cluster to the origin ⁇ 0,0,...0 ⁇ , denoted by dis i .
  • the formula for calculating the Euclidean distance is as follows:
  • the feature vector of the i-th relationship cluster is ⁇ x i1 ,x i2 ,...,x i26 ⁇
  • the Euclidean distance of each relationship cluster is calculated according to the calculation formula.
  • the classification module 950 is configured to classify the relationship clusters according to the Euclidean distance, and determine whether the relationship clusters are abnormal clusters based on the classification results;
  • the classification module 950 includes:
  • the fourth acquiring unit is configured to acquire the lower quartile and the upper quartile in the Euclidean distance according to the size of the Euclidean distance;
  • the classification unit is used to classify the relationship clusters according to the lower quartile and the upper quartile.
  • the classification unit includes;
  • the second confirmation unit is configured to determine that the relationship cluster is an abnormal cluster if the Euclidean distance is less than or equal to the lower quartile or greater than or equal to the upper quartile;
  • the third confirmation unit is configured to determine that the relationship cluster is a normal cluster if the Euclidean distance is greater than the lower quartile and smaller than the upper quartile.
  • sorting the relational clusters according to the Euclidean distance can be sorted according to a preset sorting method according to the Euclidean distance.
  • the preset sorting method includes a method of descending or descending according to the size of Euclidean distance, and a method of dividing and sorting according to a certain threshold.
  • the relationship clusters are classified according to the Euclidean distance, and the size of the Euclidean distance of each relationship cluster determines whether the relationship cluster belongs to an abnormal cluster.
  • the relational cluster is an anomalous cluster; if the Euclidean distance of the relational cluster is not Within the Euclidean distance interval corresponding to the abnormal cluster, the relational cluster is a normal cluster.
  • the extraction module 960 is configured to determine that the community in the abnormal cluster is an abnormal community when it is determined that the relationship cluster is an abnormal cluster, and extract the abnormal community.
  • the relationship cluster is an abnormal cluster
  • the community in the relationship cluster is an abnormal guarantee
  • all the communities in the relationship cluster are abnormal communities, and all abnormal communities are extracted from the relationship cluster.
  • the automatic screening of abnormal guarantee structures is realized, and the processing efficiency of multi-order guarantee relationships under coordinated multi-account crimes is improved, and it can be executed under the framework of big data analysis and can be processed in parallel at one time.
  • the large-scale guarantee network with millions of users has good scalability, which further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
  • FIG. 6 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 6 includes a memory 61, a processor 62, and a network interface 63 that communicate with each other through a system bus. It should be pointed out that the figure only shows the computer device 6 with components 61-63, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 61 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the memory 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6.
  • the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk equipped on the computer device 6, a smart media card (SMC), a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device.
  • the memory 61 is generally used to store an operating system and various application software installed in the computer device 6, such as computer-readable instructions for an abnormal community detection method.
  • the memory 61 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 62 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 62 is generally used to control the overall operation of the computer device 6.
  • the processor 62 is configured to run computer-readable instructions or processed data stored in the memory 61, for example, computer-readable instructions for running the abnormal community detection method.
  • the network interface 63 may include a wireless network interface or a wired network interface, and the network interface 63 is generally used to establish a communication connection between the computer device 6 and other electronic devices.
  • the computer device realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be executed under the framework of big data analysis.
  • the large-scale guarantee network that can process millions of users in parallel at one time has good scalability, and further improves the efficiency and accuracy of data processing under the large-scale guarantee network.
  • This application also provides another implementation manner, that is, to provide a computer-readable storage medium that stores computer-readable instructions for detecting abnormal communities, and the computer-readable instructions for detecting abnormal communities are The instructions may be executed by at least one processor, so that the at least one processor executes the steps of the abnormal community detection method described above.
  • the computer-readable storage medium realizes the automatic screening of abnormal guarantee structures, improves the processing efficiency of multi-order guarantee relations under multi-account collaborative crimes, and can be used in the big data analysis framework It can process large-scale guarantee networks of millions of users in parallel at one time, and has good scalability, which further improves the efficiency and accuracy of data processing under large-scale guarantee networks.
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un procédé de détection de communautés anormales consiste à : construire un réseau de relations de garantie, et segmenter le réseau de relations de garantie de façon à obtenir des communautés ayant des relations de garantie anormales (S200) ; déterminer des informations de caractéristiques de chacune des communautés, les informations de caractéristiques comprenant au moins un des éléments suivants : une échelle de nœuds, une échelle d'arêtes, un coefficient de regroupement, le nombre de triangles connectés et un degré moyen (S300) ; déterminer, en fonction des informations de caractéristiques, des communautés ayant des caractéristiques similaires comme étant une grappe de relations (S400) ; calculer les distances euclidiennes respectives de grappes de relations (S500) ; classifier les grappes de relations en fonction des distances euclidiennes, et déterminer si une grappe de relations est une grappe anormale sur la base d'un résultat de classification (S600) ; et dans l'affirmative, déterminer que des communautés dans la grappe anormale sont des communautés anormales, et extraire les communautés anormales (S700). Le procédé permet une extraction efficace de communautés anormales. De plus, le procédé se rapporte à la technologie des chaînes de blocs étant donné que les informations de caractéristiques peuvent être stockées dans une chaîne de blocs.
PCT/CN2021/096155 2020-05-27 2021-05-26 Procédé et appareil de détection de communautés anormales, dispositif informatique et support de stockage WO2021239004A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010462900.3 2020-05-27
CN202010462900.3A CN111784528B (zh) 2020-05-27 2020-05-27 异常社群检测方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021239004A1 true WO2021239004A1 (fr) 2021-12-02

Family

ID=72753396

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096155 WO2021239004A1 (fr) 2020-05-27 2021-05-26 Procédé et appareil de détection de communautés anormales, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN111784528B (fr)
WO (1) WO2021239004A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798312A (zh) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 一种基于孤立森林算法的金融交易系统异常识别方法
CN114337469A (zh) * 2021-12-31 2022-04-12 中冶赛迪重庆信息技术有限公司 一种层流辊道电机故障检测方法、系统、介质及电子终端
CN114650167A (zh) * 2022-02-08 2022-06-21 联想(北京)有限公司 一种异常检测方法、装置、设备及计算机可读存储介质
CN114897068A (zh) * 2022-05-07 2022-08-12 国家计算机网络与信息安全管理中心 一种数据中心用铅酸电池组内异常自动识别方法
CN115550194A (zh) * 2022-12-01 2022-12-30 中国科学院合肥物质科学研究院 基于类最远采样的区块链网络传输方法及存储介质
CN117978543A (zh) * 2024-03-28 2024-05-03 贵州华谊联盛科技有限公司 基于态势感知的网络安全预警方法及系统
CN118378193A (zh) * 2024-06-20 2024-07-23 山东征途信息科技股份有限公司 一种基于大数据的智慧社区数据分析方法及系统

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784528B (zh) * 2020-05-27 2024-07-02 平安科技(深圳)有限公司 异常社群检测方法、装置、计算机设备及存储介质
CN112308694A (zh) * 2020-11-24 2021-02-02 拉卡拉支付股份有限公司 一种欺诈团伙的发现方法及装置
CN114117418B (zh) * 2021-11-03 2023-03-14 中国电信股份有限公司 基于社群检测异常账户的方法、系统、设备及存储介质
CN114065192A (zh) * 2021-11-16 2022-02-18 安天科技集团股份有限公司 一种构建威胁情报共享行为群的方法、装置、设备及介质
CN114662629B (zh) * 2022-03-23 2022-09-16 中国邮电器材集团有限公司 一种用于在多级节点结构中识别工业码的方法和装置
CN114745161B (zh) * 2022-03-23 2023-08-22 烽台科技(北京)有限公司 一种异常流量的检测方法、装置、终端设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035003A (zh) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 基于机器学习的反欺诈模型建模方法和反欺诈监控方法
US20200019985A1 (en) * 2018-07-13 2020-01-16 Cognant Llc Fraud discovery in a digital advertising ecosystem
CN111784528A (zh) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 异常社群检测方法、装置、计算机设备及存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541886B (zh) * 2010-12-20 2015-04-01 郝敬涛 一种识别用户群和用户之间关系的系统和方法
CN104933621A (zh) * 2015-06-19 2015-09-23 天睿信科技术(北京)有限公司 一种担保圈的大数据分析系统和方法
CN107480685B (zh) * 2016-06-08 2021-02-23 国家计算机网络与信息安全管理中心 一种基于GraphX的分布式幂迭代聚类方法和装置
CN106097090A (zh) * 2016-06-22 2016-11-09 西安交通大学 一种基于图理论的纳税人利益关联团体识别方法
CN106778476A (zh) * 2016-11-18 2017-05-31 中国科学院深圳先进技术研究院 人体姿态识别方法及人体姿态识别装置
CN106709800B (zh) * 2016-12-06 2020-08-11 中国银联股份有限公司 一种基于特征匹配网络的社团划分方法和装置
CN107767258B (zh) * 2017-09-29 2021-07-02 新华三大数据技术有限公司 风险传播确定方法及装置
CN107749033A (zh) * 2017-11-09 2018-03-02 厦门市美亚柏科信息股份有限公司 一种网络社区活跃用户簇的发现方法、终端设备及存储介质
CN108734479A (zh) * 2018-04-12 2018-11-02 阿里巴巴集团控股有限公司 保险欺诈识别的数据处理方法、装置、设备及服务器
CN110334264B (zh) * 2019-06-27 2021-04-09 北京邮电大学 一种针对异构动态信息网络的社区检测方法及装置
CN110376290B (zh) * 2019-07-19 2020-08-04 中南大学 基于多维核密度估计的声发射源定位方法
CN110516713A (zh) * 2019-08-02 2019-11-29 阿里巴巴集团控股有限公司 一种目标群体识别方法、装置及设备
CN110610205A (zh) * 2019-09-04 2019-12-24 成都威嘉软件有限公司 社交网络中的社区识别方法
CN110647590A (zh) * 2019-09-23 2020-01-03 税友软件集团股份有限公司 一种目标社群数据的识别方法及相关装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109035003A (zh) * 2018-07-04 2018-12-18 北京玖富普惠信息技术有限公司 基于机器学习的反欺诈模型建模方法和反欺诈监控方法
US20200019985A1 (en) * 2018-07-13 2020-01-16 Cognant Llc Fraud discovery in a digital advertising ecosystem
CN111784528A (zh) * 2020-05-27 2020-10-16 平安科技(深圳)有限公司 异常社群检测方法、装置、计算机设备及存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN YINGXIAN: "Research on Social Network Community Detection Mechanism", BASIC SCIENCES, CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 March 2016 (2016-03-15), XP055871625 *
DONG XIAOJIANG: "Parallelization of AP Clustering Community Detection Algorithm Based on Hadoop Platform", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 February 2018 (2018-02-15), XP055871614 *
PENG ZHONGYUAN: "Research on Sybil Attack Detection Algorithm Based on Random Walks Betweenness in Social Networks", INFORMATION SCIENCE AND TECHNOLOGY, CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 January 2015 (2015-01-15), XP055871608 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798312A (zh) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 一种基于孤立森林算法的金融交易系统异常识别方法
CN111798312B (zh) * 2019-08-02 2024-03-01 深圳索信达数据技术有限公司 一种基于孤立森林算法的金融交易系统异常识别方法
CN114337469A (zh) * 2021-12-31 2022-04-12 中冶赛迪重庆信息技术有限公司 一种层流辊道电机故障检测方法、系统、介质及电子终端
CN114337469B (zh) * 2021-12-31 2023-11-28 中冶赛迪信息技术(重庆)有限公司 一种层流辊道电机故障检测方法、系统、介质及电子终端
CN114650167A (zh) * 2022-02-08 2022-06-21 联想(北京)有限公司 一种异常检测方法、装置、设备及计算机可读存储介质
CN114897068A (zh) * 2022-05-07 2022-08-12 国家计算机网络与信息安全管理中心 一种数据中心用铅酸电池组内异常自动识别方法
CN115550194A (zh) * 2022-12-01 2022-12-30 中国科学院合肥物质科学研究院 基于类最远采样的区块链网络传输方法及存储介质
CN115550194B (zh) * 2022-12-01 2023-04-28 中国科学院合肥物质科学研究院 基于类最远采样的区块链网络传输方法及存储介质
CN117978543A (zh) * 2024-03-28 2024-05-03 贵州华谊联盛科技有限公司 基于态势感知的网络安全预警方法及系统
CN117978543B (zh) * 2024-03-28 2024-06-04 贵州华谊联盛科技有限公司 基于态势感知的网络安全预警方法及系统
CN118378193A (zh) * 2024-06-20 2024-07-23 山东征途信息科技股份有限公司 一种基于大数据的智慧社区数据分析方法及系统

Also Published As

Publication number Publication date
CN111784528A (zh) 2020-10-16
CN111784528B (zh) 2024-07-02

Similar Documents

Publication Publication Date Title
WO2021239004A1 (fr) Procédé et appareil de détection de communautés anormales, dispositif informatique et support de stockage
WO2021174944A1 (fr) Procédé de distribution sélective de message basé sur l'activité de cible et dispositif associé
WO2022126963A1 (fr) Procédé de profilage de client basé sur un corpus de réponse client, et dispositif associé
WO2022126970A1 (fr) Procédé et dispositif d'identification de risques de fraude financière, dispositif informatique et support de stockage
WO2022095352A1 (fr) Procédé et appareil d'identification d'utilisateur anormal basés sur une décision intelligente, et dispositif informatique
US11727053B2 (en) Entity recognition from an image
WO2021143267A1 (fr) Procédé de traitement de modèle de classification à grain fin basé sur la détection d'image, et dispositifs associés
CN111612041B (zh) 异常用户识别方法及装置、存储介质、电子设备
WO2020207167A1 (fr) Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur
US20200117686A1 (en) Determining identity in an image that has multiple people
CN106844407B (zh) 基于数据集相关性的标签网络产生方法和系统
CN113127633B (zh) 智能会议管理方法、装置、计算机设备及存储介质
WO2022142001A1 (fr) Procédé d'évaluation d'objet cible sur la base d'une fusion de multiples cartes de score, et dispositif associé
US8121967B2 (en) Structural data classification
WO2022156084A1 (fr) Procédé pour prédire le comportement d'un objet cible sur la base d'un visage et d'un texte interactif, et dispositif associé
WO2022105119A1 (fr) Procédé de génération de corpus d'apprentissage pour un modèle de reconnaissance d'intention, et dispositif associé
WO2021175021A1 (fr) Procédé et appareil de poussée de produit, dispositif informatique et support d'enregistrement
WO2021217933A1 (fr) Procédé et appareil de division en communautés pour réseau homogène, et dispositif informatique et support de stockage
WO2021003803A1 (fr) Procédé et appareil de traitement de données, support de stockage et dispositif électronique
CN113762703A (zh) 确定企业画像的方法和装置、计算设备和存储介质
CN112668482A (zh) 人脸识别训练方法、装置、计算机设备及存储介质
CN114926282A (zh) 一种异常交易的识别方法、装置、计算机设备及存储介质
CN115619245A (zh) 一种基于数据降维方法的画像构建和分类方法及系统
Wang et al. An unsupervised strategy for defending against multifarious reputation attacks
CN114124460A (zh) 工控系统入侵检测方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21812814

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 24/01/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21812814

Country of ref document: EP

Kind code of ref document: A1