WO2021254027A1 - Method and apparatus for identifying suspicious community, and storage medium and computer device - Google Patents

Method and apparatus for identifying suspicious community, and storage medium and computer device Download PDF

Info

Publication number
WO2021254027A1
WO2021254027A1 PCT/CN2021/092940 CN2021092940W WO2021254027A1 WO 2021254027 A1 WO2021254027 A1 WO 2021254027A1 CN 2021092940 W CN2021092940 W CN 2021092940W WO 2021254027 A1 WO2021254027 A1 WO 2021254027A1
Authority
WO
WIPO (PCT)
Prior art keywords
merchant
community
nodes
node
merchant node
Prior art date
Application number
PCT/CN2021/092940
Other languages
French (fr)
Chinese (zh)
Inventor
陈泽瀛
吴亚乾
吴锐
李欣刚
陶森林
Original Assignee
银联商务股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 银联商务股份有限公司 filed Critical 银联商务股份有限公司
Publication of WO2021254027A1 publication Critical patent/WO2021254027A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • This application relates to the technical field of knowledge graphs, in particular to a method, device, storage medium and computer equipment for identifying suspicious communities.
  • the process for new merchants to enter the network is mainly after manually registering the merchant's network entry information, and then manually verifying whether the entry information is true. If the entry information is true, the entry is passed. Since the network access information submitted by the merchants is legal, it is difficult to provide timely and effective early warnings for the above-mentioned suspicious organizations’ online fraud through traditional risk control rules and manual verification methods, resulting in the problem of low efficiency in identifying suspicious organizations.
  • this application provides a method, device, storage medium and computer equipment for identifying suspicious communities, which can improve the efficiency of identifying suspicious communities.
  • an embodiment of the present application provides a method for identifying suspicious communities, including:
  • the newly-added merchant node and neighboring merchant nodes of the newly-added merchant node are filtered out;
  • the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and determine multiple communities group;
  • a suspicious community is determined from the multiple community groups.
  • the method before constructing a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, the method includes:
  • Different weights are set for different associated elements, and the associated weight data between different merchant nodes is determined according to the associated elements between multiple merchant nodes and the corresponding weight of each associated element, and the associated weight data includes different The sum of the weights corresponding to the related elements between the merchant nodes.
  • the network access information includes network access time
  • the filtering out the newly added merchant node and the neighboring merchant nodes of the newly added merchant node includes:
  • the number of associated steps between each newly added merchant node and the historical merchant node is calculated, and the historical merchant node whose associated step number is less than the preset number of steps is determined as the neighboring merchant node.
  • the community detection is performed on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the association weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node , Identified multiple community groups, including:
  • multiple community groups are determined.
  • the multiple preset community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of aggregation of the community, the weight distribution of the edges in the community, and the importance of the merchant nodes Or the maximum weight of the connected edge of the merchant node.
  • the multiple preset business indicators include the proportion of merchant nodes in an abnormal state, the number of revoked merchant nodes, or the number of verified merchant nodes.
  • the method further includes:
  • an embodiment of the present application provides a suspicious community identification device, the device includes:
  • the building module is used to construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes;
  • the screening module is used for screening the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node in the knowledge graph;
  • the generating module is used to perform community detection on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the association weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, Identify multiple community groups;
  • the determining module is used to determine the suspicious community from the multiple community groups based on multiple preset community indicators and multiple preset business indicators.
  • an embodiment of the present application provides a storage medium that includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned method for identifying a suspicious community.
  • an embodiment of the present application provides a computer device including a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are processed.
  • the device loads and executes the steps of the above method for identifying suspicious communities.
  • a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out
  • the neighboring merchant nodes of the node based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined
  • groups according to preset multiple community indicators and preset multiple business indicators, suspicious communities are determined from the multiple community groups, so that the efficiency of identifying suspicious communities can be improved.
  • FIG. 1 is a flowchart of a method for identifying a suspicious community provided by an embodiment of the present application
  • Figure 2 is a flowchart of a method for identifying suspicious communities provided by another embodiment of the present application.
  • FIG. 3 is a schematic diagram of the structure of merchant nodes and associated elements provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a knowledge graph provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a newly added merchant node and neighboring merchant nodes provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of the structure of a community group provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a suspicious community identification device provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
  • methods for identifying suspicious communities mainly include manual verification when entering the network and risk control detection after entering the network.
  • the way of manual verification when entering the network includes: for a new merchant that enters the network, the business personnel collect the network entry information of the merchant, where the network entry information includes business license information, legal person identity information, settlement account number, and region. , Bank account name, network access time and other information. After the business personnel have collected the network access information, they verify whether the information provided by the merchant is true and whether there are bad or illegal records in the industry and commerce or public security system, so as to assess the risk of the merchant applying for the network, and decide whether to allow the merchant to enter the network based on this.
  • this method requires a high degree of reliance on business personnel's experience judgment and verification, which is not only inefficient, but also does not have the ability to detect multi-dimensional information.
  • the method of risk control detection after entering the network includes: after the merchant enters the network, determine whether the merchant has suspicious improper transactions based on the transaction behavior that occurs at the merchant, and thereby determine whether the merchant is a suspicious merchant .
  • the main shortcomings of the risk control detection method after entering the network are the high latency, which is highly dependent on the later transaction behavior, and the detection in the large data volume and multi-dimensional scenarios has the disadvantages of complex calculations, less coverage scenarios, and high time costs. .
  • an identification method for suspicious organizations is provided in the embodiments of the present application to solve problems existing in related technologies.
  • Fig. 1 is a flowchart of a method for identifying a suspicious community provided by an embodiment of the application. As shown in Fig. 1, the method includes:
  • Step 101 Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
  • the network access information may include network access information including business license information, legal person identity information, settlement account number, region to which it belongs, bank account name, network access time and other information.
  • the merchant node is used to indicate the merchant entity. For example, if the network access information of the merchant A is obtained, the merchant A is used as the merchant node.
  • the correlation element is used to indicate the same network access information among multiple merchant nodes. For example, if the legal person identity information of the merchant node A and the merchant node B is the same, the legal person identity information is the correlation element between the merchant node A and the merchant node B.
  • the associated weight data includes the sum of weights corresponding to the associated elements between different merchant nodes.
  • the associated weight data includes the sum of weights corresponding to the associated elements between different merchant nodes.
  • Step 102 In the knowledge graph, filter out the newly added merchant node and the neighboring merchant nodes of the newly added merchant node.
  • the merchant node whose network access time is within the preset time period is determined as the newly added merchant node.
  • the number of associated steps between each newly added merchant node and the historical merchant node is calculated, and the historical merchant node whose associated step number is less than the preset number of steps is determined as the neighboring merchant node.
  • Step 103 Perform community detection on the newly-added merchant node and neighboring merchant nodes of the newly-added merchant node according to the correlation weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, and determine multiple community groups.
  • the associated elements between each newly added merchant node and each neighboring merchant node are obtained, according to the number of associated elements of each newly added merchant node and the weight corresponding to the associated elements, and each neighboring merchant node
  • the number of related elements and the corresponding weights of related elements determine multiple community groups.
  • Step 104 Determine a suspicious community from multiple community groups according to multiple preset community indicators and multiple preset business indicators.
  • the preset multiple community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of community aggregation, the weight distribution of the edges in the community, and the importance of merchant nodes Or the maximum weight of the connected edge of the merchant node.
  • the preset multiple business indicators include the proportion of merchant nodes in an abnormal state, the number of cancelled merchant nodes, or the number of verified merchant nodes.
  • a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
  • new merchant nodes and new merchants are screened out
  • the neighboring merchant nodes of the node based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined Groups, according to preset multiple community indicators and preset multiple business indicators, determine suspicious communities from multiple community groups, thereby improving the efficiency of identifying suspicious communities.
  • Fig. 2 is a flowchart of a method for identifying a suspicious community provided by another embodiment of this application. As shown in Fig. 2, the method includes:
  • Step 201 Determine multiple merchant nodes and related elements between multiple merchant nodes according to the obtained network access information of multiple merchants.
  • the network access information may include network access information including business license information, legal person identity information, settlement account number, region to which it belongs, bank account name, network access time and other information.
  • the merchant node is used to indicate the merchant entity. For example, if the network access information of the merchant A is obtained, the merchant A is used as the merchant node.
  • the correlation element is used to indicate the same network access information among multiple merchant nodes. For example, if the legal person identity information of the merchant node A and the merchant node B is the same, the legal person identity information is the correlation element between the merchant node A and the merchant node B.
  • the network access information of multiple merchants is processed in batches through the big data component, and the associated elements between the merchant node and the multiple merchant nodes are extracted from the network access information, so as to quickly establish The relationship between merchant nodes in order to build a knowledge graph in the subsequent steps.
  • big data components can include Hive+Hadoop components. For example, as shown in Figure 3, merchant node 0, merchant node 1, merchant node 2, merchant node 3, merchant node 4, merchant node 5, merchant node 6 are extracted from the network access information.
  • the associated element between merchant node 0 and merchant node 1 is the business address
  • the associated element between merchant node 4 and merchant node 1 and merchant node 2 is corporate identity information
  • the associated element between merchant node 4 and merchant node 3 For legal person identification information and business license information.
  • the associated element between the merchant node 4, the merchant node 5, and the merchant node 6 is the settlement account number.
  • step 201 it also includes performing data standardization processing and abnormal data filtering processing on the obtained network access information of multiple merchants. Since there are many sources of network access information, the network access information of merchants obtained includes structured information and unstructured information. Therefore, it is necessary to perform data standardization processing on all network access information, and perform information blending on the network access information of merchants and convert them into standard format. Secondly, because the original network access information has input errors, data type conversion abnormalities, etc., it is also necessary to filter abnormal data in the merchant’s network access information.
  • the filter processing can include null value processing, special symbol processing, etc.
  • the abnormal data filtering processing on the network access information can avoid the occurrence of abnormal association relationships between merchants.
  • Step 202 Set different weights for different associated elements, and determine the associated weight data between different merchant nodes according to the associated elements between multiple merchant nodes and the corresponding weight of each associated element, and the associated weight data includes The sum of the weights corresponding to the related elements between different merchant nodes.
  • step 202 Before performing step 202, it should be noted that the method for identifying suspicious communities provided by this application can be applied to systems that cannot handle heterogeneous graphs (different types of nodes). Therefore, before constructing a knowledge graph (isomorphic graph) , It is necessary to perform aggregation operations on different related elements in order to complete the conversion of a heterogeneous map to a homogeneous map (the same type of node). For the specific conversion method, please refer to the description of step 203 below. In the embodiments of this application, because the importance of different related elements is different, when aggregating different related elements, it is necessary to assign different weights to different related elements, so as to be able to complete the conversion from a heterogeneous map to a homogeneous map. In addition, Assigning different weights to different related elements can further improve the accuracy of the identification of suspicious communities.
  • the subsequent step 203 can convert the above-mentioned heterogeneous map of FIG. 3 to construct the isomorphic map of FIG. 4, as shown in FIG. 4, for example, between merchant node 0 and merchant node 1.
  • the weight of the business address is w4
  • the associated weight data between merchant node 0 and merchant node 1 is w4.
  • the weight of the corporate identity information between the merchant node 4 and the merchant node 3 is w1 and the weight of the business license information is w2, then the correlation weight data between the merchant node 4 and the merchant node 3 is w1+w2, regarding the heterogeneous map conversion
  • the process to the isomorphic map refer to the description of step 203 below.
  • Step 203 Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
  • the heterogeneous graph of FIG. 3 includes multiple types of nodes, including merchant nodes, business address nodes, business license nodes, legal person identity information nodes, and settlement account nodes.
  • a knowledge graph (isomorphic graph) needs to be constructed. Since the isomorphic map can more intuitively see the relationship between the merchant nodes, and at the same time, because the importance of different related elements is different, when different related elements are aggregated, different related elements are given different weights. Therefore, in the process of transforming heterogeneous graphs (different types of nodes) into isomorphic graphs (same type of nodes), while aggregating different related elements into a homogeneous edge, it is also necessary to carry out the weighting of different related elements. Sum, determine the correlation weight data between different merchant nodes, so that the heterogeneous map can be converted into a homogeneous map. Specifically, as shown in FIG.
  • the weight of the business address between the merchant node 0 and the merchant node 1 is w4, and as shown in FIG. 4, the correlation weight data between the merchant node 0 and the merchant node 1 is w4.
  • the weight of the corporate identity information between the merchant node 4 and the merchant node 3 is w1 and the weight of the business license information is w2, then as shown in Figure 4, the association between the merchant node 4 and the merchant node 3
  • the weight data is w1+w2 to convert other types of nodes, so that only the merchant node is included in Figure 4, thereby completing the conversion from a heterogeneous map to a homogeneous map.
  • Figure 4 includes 7 merchant nodes, and each merchant node has an associated weight data between the historical merchant node, and the associated weight data includes the correlation element correspondence between the merchant node and the historical merchant node The sum of the weights.
  • the knowledge map of this application is the converted isomorphic map.
  • the graph database that can be used in the process of constructing the knowledge graph in the embodiment of the present application includes Neo4j, and the graph database can facilitate data query and data modification.
  • Step 204 In the knowledge graph, filter the newly added merchant node and the neighboring merchant nodes of the newly added merchant node.
  • step 204 before step 204 is performed, it also includes: when a new merchant node is connected to the network, the network access information of multiple new merchant nodes and the associated weights between multiple new merchant nodes and other merchant nodes need to be added
  • the data is added to the existing knowledge graph.
  • the correlation weight data between multiple newly added merchant nodes and other merchant nodes may include the correlation weight data between multiple newly added merchant nodes, and the newly added merchant nodes and historical merchants Association weight data between nodes. Therefore, when the complete knowledge graph is used to identify suspicious communities, due to the continuous accumulation of data, not only requires extremely high time consumption, but also requires relatively high resource requirements for single-node servers.
  • the embodiment of the present application performs step 204 to select the newly-added merchant node in the knowledge graph by performing step 204 before the identification of the suspicious community And the neighboring merchant node of the newly added merchant node, which can greatly reduce the amount of data and reduce the response time.
  • step 204 may specifically include:
  • Step 2041 a merchant node whose network access time is within a preset time period is determined as a new merchant node.
  • the network access information of the merchant node includes the network access time.
  • the preset time period can be set according to the needs. For example, taking 1 month as the preset time period, the merchant nodes within May and before May are already in the knowledge graph, so the merchants who will be online within 6 months The node is determined to be a new merchant node.
  • Step 2042 calculate the number of associated steps between each newly added merchant node and the historical merchant node, and determine the historical merchant node whose associated step number is less than the preset number of steps as neighboring merchant nodes.
  • the number of associated steps is used to indicate the number of edges in the knowledge graph, and the preset number of steps can be set according to requirements, for example, the preset number of steps includes 3 steps.
  • merchant node 7 and merchant node 8 are newly-added merchant nodes.
  • the number of associated steps between merchant node 7 and merchant node 5 Is 1 step
  • the number of associated steps between merchant node 7 and merchant node 6 is 2 steps
  • the number of associated steps between merchant node 7 and merchant node 4 is 2 steps
  • the number of associated steps between merchant node 7 and merchant node 3 The number is 3 steps
  • the number of associated steps between merchant node 7 and merchant node 2 is 3
  • the number of associated steps between merchant node 7 and merchant node 1 is 3 steps
  • merchant nodes 1-6 are merchant nodes 7 neighboring merchant nodes.
  • the new merchant includes multiple new merchant nodes, only the number of associated steps between any new merchant node and the historical merchant node is less than the preset number of steps.
  • merchant node 7 and merchant node 8 in Figure 5 are new merchant nodes, and merchant nodes 1-6 are all neighboring merchant nodes of merchant node 7, so merchant nodes 1-6 are merchant nodes 8 neighboring merchant nodes.
  • step 204 in the knowledge graph, the newly added merchant nodes and the neighboring merchant nodes of the newly added merchant nodes are screened out, which can greatly reduce the time cost of calculating suspicious communities, and it is necessary to process the newly added merchant nodes regularly in a short period of time.
  • the network access information of the merchant node so that the time delay is relatively low, and the risk warning response is relatively timely.
  • Step 205 Obtain the associated elements between each newly added merchant node and each neighboring merchant node.
  • step 201 is to determine multiple merchant nodes and related elements between multiple merchant nodes based on the obtained network access information of multiple merchants.
  • step 205 is to determine multiple newly-added merchant nodes and the associated elements between each newly-added merchant node and each neighboring merchant node according to the acquired network access information of the newly-added merchant.
  • Step 206 Determine a plurality of community groups according to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements.
  • the process of determining multiple community groups can be achieved by using a label propagation algorithm (Label Propagation Algorithm, LPA for short).
  • LPA detects community groups through a network structure, thereby eliminating the need for pre-defined objective functions or prior information ,
  • the merchant nodes with the higher the degree of aggregation and the merchant nodes with the greater number of associated weights are determined as the same community group, and the merchant nodes in the same community group are pre-marked with the same community label so that LPA can be semi-supervised Run to improve accuracy.
  • the merchant node 1, the merchant node 2, the merchant node 3, and the merchant node 4 are the same community group, that is, the community 1.
  • Merchant node 5, merchant node 6, merchant node 7, and merchant node 8 are the same community group, namely community 2.
  • Step 207 Determine the suspicious community from the multiple community groups according to multiple preset community indicators and multiple preset business indicators.
  • the preset multiple community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of aggregation of the community, the weight distribution of the edges in the community, and the importance of the merchant nodes. Or the maximum weight of the connected edge of the merchant node.
  • the community index is used to measure the degree of community aggregation of the community group
  • the number of nodes in the community is used to indicate the number of merchant nodes in each community group
  • the number of black merchant nodes in the community is used to indicate the trigger risk in each community group Cases and the number of black merchant nodes.
  • the number of edges in the community is used to indicate the number of associated edges between merchant nodes within the community group, and the number of edges outside the community is used to indicate the number of connected edges between the community group and external community groups, and the degree of community aggregation Used to indicate the level of community aggregation, that is, the ratio of the number of edges in the community to the theoretical maximum number of edges.
  • the maximum theoretical number of edges in the community group is calculated from the number of nodes in the community, that is, for a community with n nodes, the maximum two-way theory
  • the number of edges is:
  • the maximum number of one-way theoretical edges is:
  • the weight distribution of the inner edge of the community is used to indicate the distribution of the weight of the inner edge of the statistical community group among the partitions.
  • the maximum weight of the connected edge of a merchant node is used for the largest weight among all connected edges of each merchant node, that is, the largest weight among all the associated elements of each merchant node.
  • the business indicators include multiple business indicators preset by calculating relevant indicators from the perspective of business rules, including the proportion of merchant nodes in an abnormal state, the number of merchant nodes that are cancelled, or the number of merchant nodes that are verified. In addition, you can also Including other parameters, this application does not limit it.
  • step 207 needs to be performed according to preset multiple community indicators and preset multiple business indicators. , Filter the results of all associations to filter out suspicious associations, which can improve the accuracy of the identification of suspicious associations.
  • Step 208 Calculate the importance of each merchant node in the suspicious community through the centrality algorithm.
  • the PageRank algorithm can also be used to calculate the importance of each merchant node in the suspicious community.
  • the PageRank algorithm can measure the transfer effect of merchant nodes. Often this merchant node is more important and suspicious, so it needs to be a priority check object.
  • Step 209 Sort the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
  • the value of N can be set according to requirements, which is not limited in this application.
  • the top N merchant nodes are determined as highly suspicious merchants, so that the auditor can check whether the highly suspicious merchants are fraudulent merchants.
  • This application provides a method for detecting online fraud by merchant groups based on knowledge graphs.
  • the file information provided by the merchants when they enter the network is extracted through data standardization processing, data filtering and other processes, and different weights are set for different file information. In this way, it is compatible with file information of different dimensions and constructs a knowledge graph of business association relationships.
  • this method reduces the data volume and calculation cost by sampling the data of the neighbors of the new merchants.
  • the sampled merchant network runs LPA and PageRank for unsupervised weighted community detection and importance calculation.
  • this method calculates community indicators and business indicators, and screens out suspicious communities (suspicious fraud groups) based on the calculated indicators, and outputs the community results and related calculation indicators.
  • a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out
  • the neighboring merchant nodes of the node based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined Groups, according to preset multiple community indicators and preset multiple business indicators, determine suspicious communities from multiple community groups, thereby improving the efficiency of identifying suspicious communities.
  • FIG. 7 is a schematic structural diagram of a suspicious community identification device provided by an embodiment of the present application. As shown in FIG. 7, the device includes: a construction module 11, a screening module 12, a generation module 13, and a determination module 14.
  • the construction module 11 is used to construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
  • the screening module 12 is used for screening the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node in the knowledge graph.
  • the generating module 13 is configured to perform community detection on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the correlation weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, Identify multiple community groups.
  • the determining module 14 is configured to determine a suspicious community from the multiple community groups based on multiple preset community indicators and multiple preset business indicators.
  • the device further includes:
  • the determining module 14 is also used to determine multiple merchant nodes and the associated elements between the multiple merchant nodes according to the obtained network access information of multiple merchants; set different weights for different associated elements, and set different weights according to the multiple
  • the correlation elements between the merchant nodes and the weight corresponding to each correlation element determine the correlation weight data between the different merchant nodes, and the correlation weight data includes the sum of the weights corresponding to the correlation elements between the different merchant nodes.
  • the network access information includes the network access time;
  • the screening module 12 of the device specifically includes: a determination sub-module 121 and a calculation sub-module 122.
  • the determining submodule 121 is configured to determine a merchant node whose network access time is within a preset time period as the newly added merchant node.
  • the calculation sub-module 122 calculates the number of associated steps between each newly added merchant node and the historical merchant node.
  • the determining submodule 121 is further configured to determine a historical merchant node whose associated number of steps is less than a preset number of steps as the neighboring merchant node.
  • the generating module 13 of the device specifically includes: an obtaining sub-module 131 and a determining sub-module 132.
  • the acquiring sub-module 131 is used to acquire the associated elements between each newly added merchant node and each neighboring merchant node.
  • the determining sub-module 132 is used to determine a plurality of community groups according to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements.
  • the multiple preset community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of community aggregation, the weight distribution of the edges in the community, and the number of merchant nodes. The degree of importance or the maximum weight of the connected edge of the merchant node.
  • the multiple preset business indicators include the proportion of merchant nodes in an abnormal state, the number of revoked merchant nodes, or the number of verified merchant nodes.
  • the device further includes: a calculation module 15.
  • the calculation module 15 is used to calculate the importance of each merchant node in the suspicious community through a centrality algorithm.
  • the determining module 14 is also used to rank the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
  • a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out
  • the neighboring merchant nodes of the node based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined
  • groups according to preset multiple community indicators and preset multiple business indicators, suspicious communities are determined from the multiple community groups, so that the efficiency of identifying suspicious communities can be improved.
  • the embodiment of the present application provides a storage medium, the storage medium includes a stored program, where the device where the storage medium is located is controlled to execute each step of the above-mentioned suspicious community identification method when the program is running.
  • the storage medium includes a stored program
  • the device where the storage medium is located is controlled to execute each step of the above-mentioned suspicious community identification method when the program is running.
  • the embodiment of the present application provides a computer device including a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are loaded and executed by the processor to realize the above suspicious community Identify the steps of the method.
  • a computer device including a memory and a processor
  • the memory is used to store information including program instructions
  • the processor is used to control the execution of the program instructions
  • the program instructions are loaded and executed by the processor to realize the above suspicious community Identify the steps of the method.
  • the above-mentioned embodiment of the method for identifying suspicious communities please refer to the above-mentioned embodiment of the method for identifying suspicious communities.
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
  • the computer device 4 of this embodiment includes: a processor 41, a memory 42, and a computer program 43 that is stored in the storage 42 and can run on the processor 41.
  • the computer program 43 is executed by the processor 41, In order to avoid repetition, the method for identifying the suspicious community in the implementation embodiment will not be repeated here. Or, when the computer program is executed by the processor 41, the function of each model/unit in the device for identifying suspicious communities in the embodiment is realized. To avoid repetition, it will not be repeated here.
  • the computer device 4 includes, but is not limited to, a processor 41 and a memory 42.
  • FIG. 8 is only an example of the computer device 4, and does not constitute a limitation on the computer device 4. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • the computer device 4 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 41 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 42 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4.
  • the memory 42 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 42 may also include both an internal storage unit of the computer device 4 and an external storage device.
  • the memory 42 is used to store computer programs and other programs and data required by the computer device 4.
  • the memory 42 can also be used to temporarily store data that has been output or will be output.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute the method described in each embodiment of the present application. Part of the steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Technology Law (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method and apparatus for identifying a suspicious community, and a storage medium and a computer device. The method comprises: constructing a knowledge graph according to acquired network access information of a plurality of merchant nodes and correlation weight data between different merchant nodes (101); selecting newly added merchant nodes and adjacent merchant nodes of the newly added merchant nodes from the knowledge graph (102); according to association weight data between the newly added merchant nodes and the adjacent merchant nodes of the newly added merchant nodes, performing community detection on the newly added merchant nodes and the adjacent merchant nodes of the newly added merchant nodes, so as to determine a plurality of community groups (103); and according to a plurality of preset community indexes and a plurality of preset service indexes, determining a suspicious community from the plurality of community groups (104). Therefore, the identification efficiency of a suspicious community can be improved.

Description

一种可疑社团的识别方法、装置、存储介质和计算机设备Method, device, storage medium and computer equipment for identifying suspicious communities 技术领域Technical field
本申请涉及知识图谱技术领域,具体地涉及一种可疑社团的识别方法、装置、存储介质和计算机设备。This application relates to the technical field of knowledge graphs, in particular to a method, device, storage medium and computer equipment for identifying suspicious communities.
背景技术Background technique
随着移动互联网和移动支付的快速发展,金融欺诈模式也在不断丰富和发展,比如,在自助签约等入网渠道,涌现出一批虚假申请、套利、盗刷、赌博的作案团伙商户,这种欺诈行为不再局限于个体模式,而是通过有组织的社团模式进行,这些商户利用相同的档案信息进行集中虚假入网申请和短时集中套利,在套利、套刷后,很大可能会变为沉默商户或者注销商户,这种欺诈行为由于体量大、持续时间短,往往会带来较大的金融损失,并且难以侦测。With the rapid development of mobile Internet and mobile payment, financial fraud models are also constantly enriched and developed. For example, a number of fraudulent applications, arbitrage, stealing, and gambling gang merchants have emerged in online channels such as self-service signing. Fraud is no longer limited to the individual model, but is carried out through an organized community model. These merchants use the same file information for centralized false network access applications and short-term centralized arbitrage. After arbitrage and arbitrage, it is likely to become Silence merchants or cancel merchants, this kind of fraud will often cause large financial losses due to its large size and short duration, and it is difficult to detect.
在相关技术中,新商户入网流程主要是由人工登记商户入网资料后,并由人工审核入网资料是否属实,若入网资料属实,则入网通过。由于商户所提交的入网资料合法,因此通过传统风控规则和人工校验的方式很难对上述可疑社团的入网欺诈行为做出及时的有效预警,从而造成可疑社团的识别效率低的问题。In related technologies, the process for new merchants to enter the network is mainly after manually registering the merchant's network entry information, and then manually verifying whether the entry information is true. If the entry information is true, the entry is passed. Since the network access information submitted by the merchants is legal, it is difficult to provide timely and effective early warnings for the above-mentioned suspicious organizations’ online fraud through traditional risk control rules and manual verification methods, resulting in the problem of low efficiency in identifying suspicious organizations.
申请内容Application content
有鉴于此,本申请提供一种可疑社团的识别方法、装置、存储介质和计算机设备,能够提高可疑社团的识别效率。In view of this, this application provides a method, device, storage medium and computer equipment for identifying suspicious communities, which can improve the efficiency of identifying suspicious communities.
一方面,本申请实施例提供了一种可疑社团的识别方法,包括:On the one hand, an embodiment of the present application provides a method for identifying suspicious communities, including:
根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱;Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes;
在所述知识图谱中,筛选出新增商户节点和所述新增商户节点的邻近商户节点;In the knowledge graph, the newly-added merchant node and neighboring merchant nodes of the newly-added merchant node are filtered out;
根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行 社团检测,确定出多个社团群体;According to the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and determine multiple communities group;
根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团。According to preset multiple community indicators and preset multiple business indicators, a suspicious community is determined from the multiple community groups.
可选地,在所述根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱之前,包括:Optionally, before constructing a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, the method includes:
根据获取的多个商户的入网信息,确定出多个商户节点以及所述多个商户节点之间的关联要素;Determine multiple merchant nodes and related elements between the multiple merchant nodes according to the obtained network access information of multiple merchants;
对不同的关联要素设定不同的权重,并根据多个商户节点之间的关联要素以及每个关联要素对应的权重,确定出不同商户节点之间的关联权重数据,所述关联权重数据包括不同商户节点之间的关联要素对应的权重之和。Different weights are set for different associated elements, and the associated weight data between different merchant nodes is determined according to the associated elements between multiple merchant nodes and the corresponding weight of each associated element, and the associated weight data includes different The sum of the weights corresponding to the related elements between the merchant nodes.
可选地,所述入网信息包括入网时间;Optionally, the network access information includes network access time;
所述筛选出新增商户节点和所述新增商户节点的邻近商户节点,包括:The filtering out the newly added merchant node and the neighboring merchant nodes of the newly added merchant node includes:
将所述入网时间处于预设时间段内的商户节点确定为所述新增商户节点;Determining the merchant node whose network access time is within a preset time period as the newly added merchant node;
计算出每个新增商户节点与历史商户节点之间的关联步数,并将所述关联步数小于预设步数的历史商户节点,确定为所述邻近商户节点。The number of associated steps between each newly added merchant node and the historical merchant node is calculated, and the historical merchant node whose associated step number is less than the preset number of steps is determined as the neighboring merchant node.
可选地,所述根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,包括:Optionally, the community detection is performed on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the association weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node , Identified multiple community groups, including:
获取每个新增商户节点和每个邻近商户节点之间的关联要素;Obtain the correlation elements between each newly added merchant node and each neighboring merchant node;
根据每个新增商户节点的关联要素的数量以及关联要素对应的权重,以及每个邻近商户节点的关联要素的数量以及关联要素对应的权重,确定出多个社团群体。According to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements, multiple community groups are determined.
可选地,所述预设的多个社团指标包括社团内节点数、社团内黑商户节点数、社团内边数、社团外边数、社团聚合程度、社团内边权重分布、商户节点的重要程度或者商户节点的连接边最大权重。Optionally, the multiple preset community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of aggregation of the community, the weight distribution of the edges in the community, and the importance of the merchant nodes Or the maximum weight of the connected edge of the merchant node.
可选地,所述预设的多个业务指标包括非正常状态的商户节点占比、撤销商户节点数或者核实商户节点数。Optionally, the multiple preset business indicators include the proportion of merchant nodes in an abnormal state, the number of revoked merchant nodes, or the number of verified merchant nodes.
可选地,在所述根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团之后,还包括:Optionally, after the suspicious community is determined from the multiple community groups based on preset multiple community indicators and multiple preset business indicators, the method further includes:
通过中心度算法计算出所述可疑社团中每个商户节点的重要程度;Calculate the importance of each merchant node in the suspicious community through a centrality algorithm;
对每个商户节点的重要程度进行高到低排序,并将前N个商户节点确定为高可疑商户。Sort the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
另一方面,本申请实施例提供了一种可疑社团的识别装置,所述装置包括:On the other hand, an embodiment of the present application provides a suspicious community identification device, the device includes:
构建模块,用于根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱;The building module is used to construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes;
筛选模块,用于在所述知识图谱中,筛选出新增商户节点和所述新增商户节点的邻近商户节点;The screening module is used for screening the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node in the knowledge graph;
生成模块,用于根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体;The generating module is used to perform community detection on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the association weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, Identify multiple community groups;
确定模块,用于根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团。The determining module is used to determine the suspicious community from the multiple community groups based on multiple preset community indicators and multiple preset business indicators.
另一方面,本申请实施例提供了一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行上述的可疑社团的识别方法。On the other hand, an embodiment of the present application provides a storage medium that includes a stored program, wherein when the program is running, the device where the storage medium is located is controlled to execute the above-mentioned method for identifying a suspicious community.
另一方面,本申请实施例提供了一种计算机设备,包括存储器和处理器,所述存储器用于存储包括程序指令的信息,所述处理器用于控制程序指令的执行,所述程序指令被处理器加载并执行上述的可疑社团的识别方法的步骤。On the other hand, an embodiment of the present application provides a computer device including a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are processed. The device loads and executes the steps of the above method for identifying suspicious communities.
本申请实施例提供的技术方案中,根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱,在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点,根据新增商户节点和新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,根据预设的多个社团指标和预设的多个业务指标,从所述多个 社团群体中,确定出可疑社团,从而能够提高可疑社团的识别效率。In the technical solution provided by the embodiments of the present application, a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out The neighboring merchant nodes of the node, based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined For groups, according to preset multiple community indicators and preset multiple business indicators, suspicious communities are determined from the multiple community groups, so that the efficiency of identifying suspicious communities can be improved.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative labor, other drawings can be obtained from these drawings.
图1是本申请一实施例所提供的一种可疑社团的识别方法的流程图;FIG. 1 is a flowchart of a method for identifying a suspicious community provided by an embodiment of the present application;
图2是本申请又一实施例所提供的一种可疑社团的识别方法的流程图;Figure 2 is a flowchart of a method for identifying suspicious communities provided by another embodiment of the present application;
图3是本申请一实施例所提供的商户节点和关联要素的结构示意图;FIG. 3 is a schematic diagram of the structure of merchant nodes and associated elements provided by an embodiment of the present application;
图4是本申请一实施例所提供的一种知识图谱的结构示意图;Fig. 4 is a schematic structural diagram of a knowledge graph provided by an embodiment of the present application;
图5是本申请一实施例所提供的一种新增商户节点和邻近商户节点的结构示意图;FIG. 5 is a schematic structural diagram of a newly added merchant node and neighboring merchant nodes provided by an embodiment of the present application;
图6是本申请一实施例所提供的一种社团群体的结构示意图;Figure 6 is a schematic diagram of the structure of a community group provided by an embodiment of the present application;
图7是本申请一实施例所提供的一种可疑社团的识别装置的结构示意图;FIG. 7 is a schematic structural diagram of a suspicious community identification device provided by an embodiment of the present application;
图8为本申请实施例提供的一种计算机设备的示意图。FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
具体实施方式detailed description
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solutions of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present application. The singular forms of "a", "the" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings.
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,甲和/或乙,可以表示:单独存在甲,同时存在甲和乙,单独存在乙这三种情况。另外,本文中字符“/”, 一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used in this article is only an association relationship describing related objects, which means that there can be three kinds of relationships. For example, A and/or B can mean that A and B, there are three cases of B alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.
在介绍本申请实施例所提供的一种可疑社团的识别方法之前,对相关技术中的欺诈团伙识别方法进行简单介绍:Before introducing a method for identifying suspicious communities provided by the embodiments of this application, a brief introduction to the method for identifying fraudulent groups in related technologies is given:
在相关技术中,可疑社团的识别方法主要包括入网时人工校验的方式以及入网后风控侦测的方式。In related technologies, methods for identifying suspicious communities mainly include manual verification when entering the network and risk control detection after entering the network.
在一种实现方案中,入网时人工校验的方式包括:对于新入网的商户,业务人员采集该商户的入网信息,其中,该入网信息包括营业执照信息、法人身份信息、结算账号、所属地区、银行账户名称、入网时间等信息。当业务人员采集入网信息完毕后,在工商或者公安系统验证商户所提供信息是否属实,是否有不良或者违法记录,从而评估申请入网的商户的风险,并据此决定是否允许该商户入网。然而该方式需要高度依赖业务人员经验判断、核查,不仅效率低下,而且不具有对多维度信息的侦测能力。In an implementation scheme, the way of manual verification when entering the network includes: for a new merchant that enters the network, the business personnel collect the network entry information of the merchant, where the network entry information includes business license information, legal person identity information, settlement account number, and region. , Bank account name, network access time and other information. After the business personnel have collected the network access information, they verify whether the information provided by the merchant is true and whether there are bad or illegal records in the industry and commerce or public security system, so as to assess the risk of the merchant applying for the network, and decide whether to allow the merchant to enter the network based on this. However, this method requires a high degree of reliance on business personnel's experience judgment and verification, which is not only inefficient, but also does not have the ability to detect multi-dimensional information.
在另一种实现方案中,入网后风控侦测的方式包括:在商户入网后,根据发生在商户上的交易行为判断该商户是否存在可疑的不正当交易,从而判断该商户是否为可疑商户。然而入网后风控侦测的方式的主要缺点是延迟较高,高度依赖后期的交易行为,且对于大数据量和多维度场景下的侦测存在计算复杂、覆盖场景少、时间成本较高等缺点。In another implementation scheme, the method of risk control detection after entering the network includes: after the merchant enters the network, determine whether the merchant has suspicious improper transactions based on the transaction behavior that occurs at the merchant, and thereby determine whether the merchant is a suspicious merchant . However, the main shortcomings of the risk control detection method after entering the network are the high latency, which is highly dependent on the later transaction behavior, and the detection in the large data volume and multi-dimensional scenarios has the disadvantages of complex calculations, less coverage scenarios, and high time costs. .
在上述的欺诈团伙识别方法中,入网时进行人工校验,需要审核人员校验数据准确性外,核查与该商户有相同档案信息的其他商户信息,对于大数据量、多维度档案信息、多度关联下的社团信息审核人员无法进行人工核查,因此造成时间成本高的问题,并且相关技术中的方案准确划分社团群体。而入网后风控侦测是针对欺诈行为进行事后侦测,强依赖于发生在商户上的交易信息,对于团伙欺诈入网情况无法有效的及时侦测。In the above-mentioned fraudulent group identification method, manual verification is performed when entering the network. In addition to verifying the accuracy of the data, the reviewer needs to verify the information of other merchants that have the same file information as the merchant. For large data volume, multi-dimensional file information, and more The community information reviewer under the degree of association cannot perform manual verification, which causes the problem of high time cost, and the scheme in the related technology accurately divides the community group. The risk control detection after access to the network is a post-mortem detection of fraud, which relies heavily on the transaction information that occurs on the merchant, and cannot effectively detect the fraudulent access to the network in a timely manner.
针对相关技术中的欺诈团伙识别方法,本申请实施例中提供了一种可疑社团的识别方法用于解决相关技术中存在的问题。Regarding the method for identifying fraudulent groups in related technologies, an identification method for suspicious organizations is provided in the embodiments of the present application to solve problems existing in related technologies.
图1为本申请一实施例提供的一种可疑社团的识别方法的流程图,如图1所示,该方法包括:Fig. 1 is a flowchart of a method for identifying a suspicious community provided by an embodiment of the application. As shown in Fig. 1, the method includes:
步骤101、根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱。Step 101: Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
本申请实施例中,入网信息可包括入网信息包括营业执照信息、法人身份信息、结算账号、所属地区、银行账户名称、入网时间等信息。商户节点用于指示商户实体。例如获取到商户A的入网信息,则将商户A作为商户节点。关联要素用于指示多个商户节点之间的相同的入网信息。例如,商户节点A和商户节点B的法人身份信息相同,则该法人身份信息则为商户节点A和商户节点B之间的关联要素。In this embodiment of the application, the network access information may include network access information including business license information, legal person identity information, settlement account number, region to which it belongs, bank account name, network access time and other information. The merchant node is used to indicate the merchant entity. For example, if the network access information of the merchant A is obtained, the merchant A is used as the merchant node. The correlation element is used to indicate the same network access information among multiple merchant nodes. For example, if the legal person identity information of the merchant node A and the merchant node B is the same, the legal person identity information is the correlation element between the merchant node A and the merchant node B.
本申请实施例中,关联权重数据包括不同商户节点之间的关联要素对应的权重之和,具体可参见下述实施例的说明。In the embodiment of the present application, the associated weight data includes the sum of weights corresponding to the associated elements between different merchant nodes. For details, please refer to the description of the following embodiments.
步骤102、在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点。Step 102: In the knowledge graph, filter out the newly added merchant node and the neighboring merchant nodes of the newly added merchant node.
本申请实施例中,将入网时间处于预设时间段内的商户节点确定为新增商户节点。计算出每个新增商户节点与历史商户节点之间的关联步数,并将关联步数小于预设步数的历史商户节点,确定为邻近商户节点。In the embodiment of the present application, the merchant node whose network access time is within the preset time period is determined as the newly added merchant node. The number of associated steps between each newly added merchant node and the historical merchant node is calculated, and the historical merchant node whose associated step number is less than the preset number of steps is determined as the neighboring merchant node.
步骤103、根据新增商户节点和新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体。Step 103: Perform community detection on the newly-added merchant node and neighboring merchant nodes of the newly-added merchant node according to the correlation weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, and determine multiple community groups.
本申请实施例中,获取每个新增商户节点和每个邻近商户节点之间的关联要素,根据每个新增商户节点的关联要素的数量以及关联要素对应的权重,以及每个邻近商户节点的关联要素的数量以及关联要素对应的权重,确定出多个社团群体。In the embodiment of the present application, the associated elements between each newly added merchant node and each neighboring merchant node are obtained, according to the number of associated elements of each newly added merchant node and the weight corresponding to the associated elements, and each neighboring merchant node The number of related elements and the corresponding weights of related elements determine multiple community groups.
步骤104、根据预设的多个社团指标和预设的多个业务指标,从多个社团群体中,确定出可疑社团。Step 104: Determine a suspicious community from multiple community groups according to multiple preset community indicators and multiple preset business indicators.
本申请实施例中,预设的多个社团指标包括社团内节点数、社团内黑商户节点数、社团内边数、社团外边数、社团聚合程度、社团内边权重分布、商户节点的重要程度或者商户节点的连接边最大权重。预设的多个业务指标包括非正常状态的商户节点占比、撤销商户节点数或者核实商户节点数。In the embodiment of this application, the preset multiple community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of community aggregation, the weight distribution of the edges in the community, and the importance of merchant nodes Or the maximum weight of the connected edge of the merchant node. The preset multiple business indicators include the proportion of merchant nodes in an abnormal state, the number of cancelled merchant nodes, or the number of verified merchant nodes.
本申请实施例提供的技术方案中,根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱,在知识图谱中, 筛选出新增商户节点和新增商户节点的邻近商户节点,根据新增商户节点和新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,根据预设的多个社团指标和预设的多个业务指标,从多个社团群体中,确定出可疑社团,从而能够提高可疑社团的识别效率。In the technical solution provided by the embodiments of the present application, a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes. In the knowledge graph, new merchant nodes and new merchants are screened out The neighboring merchant nodes of the node, based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined Groups, according to preset multiple community indicators and preset multiple business indicators, determine suspicious communities from multiple community groups, thereby improving the efficiency of identifying suspicious communities.
图2为本申请又一实施例提供的一种可疑社团的识别方法的流程图,如图2所示,该方法包括:Fig. 2 is a flowchart of a method for identifying a suspicious community provided by another embodiment of this application. As shown in Fig. 2, the method includes:
步骤201、根据获取的多个商户的入网信息,确定出多个商户节点以及多个商户节点之间的关联要素。Step 201: Determine multiple merchant nodes and related elements between multiple merchant nodes according to the obtained network access information of multiple merchants.
本申请实施例中,入网信息可包括入网信息包括营业执照信息、法人身份信息、结算账号、所属地区、银行账户名称、入网时间等信息。商户节点用于指示商户实体。例如获取到商户A的入网信息,则将商户A作为商户节点。关联要素用于指示多个商户节点之间的相同的入网信息。例如,商户节点A和商户节点B的法人身份信息相同,则该法人身份信息则为商户节点A和商户节点B之间的关联要素。In this embodiment of the application, the network access information may include network access information including business license information, legal person identity information, settlement account number, region to which it belongs, bank account name, network access time and other information. The merchant node is used to indicate the merchant entity. For example, if the network access information of the merchant A is obtained, the merchant A is used as the merchant node. The correlation element is used to indicate the same network access information among multiple merchant nodes. For example, if the legal person identity information of the merchant node A and the merchant node B is the same, the legal person identity information is the correlation element between the merchant node A and the merchant node B.
本申请实施例中,在执行步骤201的过程中,通过大数据组件批量处理多个商户的入网信息,从入网信息中抽取出商户节点与多个商户节点之间的关联要素,从而能快速建立商户节点之间的关系,以便后续步骤构建知识图谱。其中,大数据组件可包括Hive+Hadoop组件。例如,如图3所示,从入网信息中抽取出商户节点0、商户节点1、商户节点2、商户节点3、商户节点4、商户节点5、商户节点6,此外,从入网信息中还抽取出商户节点0和商户节点1之间的关联要素为营业地址,商户节点4和商户节点1、商户节点2之间的关联要素为法人身份信息,商户节点4和商户节点3之间的关联要素为法人身份信息和营业执照信息。商户节点4和商户节点5、商户节点6之间的关联要素为结算账号。In the embodiment of the present application, in the process of performing step 201, the network access information of multiple merchants is processed in batches through the big data component, and the associated elements between the merchant node and the multiple merchant nodes are extracted from the network access information, so as to quickly establish The relationship between merchant nodes in order to build a knowledge graph in the subsequent steps. Among them, big data components can include Hive+Hadoop components. For example, as shown in Figure 3, merchant node 0, merchant node 1, merchant node 2, merchant node 3, merchant node 4, merchant node 5, merchant node 6 are extracted from the network access information. In addition, the network access information is also extracted The associated element between merchant node 0 and merchant node 1 is the business address, the associated element between merchant node 4 and merchant node 1, and merchant node 2 is corporate identity information, and the associated element between merchant node 4 and merchant node 3 For legal person identification information and business license information. The associated element between the merchant node 4, the merchant node 5, and the merchant node 6 is the settlement account number.
需要说明的是,在执行步骤201之前,还包括对获取的多个商户的入网信息进行数据标准化处理和异常数据过滤处理。由于入网信息的来源途径较多,因此获取的商户的入网信息包括结构化信息和非结构化信息,因此需要对所有的入网信息做数据标准化处理,并对商户的入网信息进行信 息勾兑,转换为标准格式。其次,由于原始的入网信息存在录入错误、数据类型转换异常等原因,因此还需要对商户的入网信息中的异常数据做过滤处理,其中,过滤处理可包括空值处理、特殊符号处理等,通过对入网信息进行异常数据过滤处理从而能够避免出现不正常的商户之间的关联关系。It should be noted that before step 201 is executed, it also includes performing data standardization processing and abnormal data filtering processing on the obtained network access information of multiple merchants. Since there are many sources of network access information, the network access information of merchants obtained includes structured information and unstructured information. Therefore, it is necessary to perform data standardization processing on all network access information, and perform information blending on the network access information of merchants and convert them into standard format. Secondly, because the original network access information has input errors, data type conversion abnormalities, etc., it is also necessary to filter abnormal data in the merchant’s network access information. The filter processing can include null value processing, special symbol processing, etc. The abnormal data filtering processing on the network access information can avoid the occurrence of abnormal association relationships between merchants.
步骤202、对不同的关联要素设定不同的权重,并根据多个商户节点之间的关联要素以及每个关联要素对应的权重,确定出不同商户节点之间的关联权重数据,关联权重数据包括不同商户节点之间的关联要素对应的权重之和。Step 202: Set different weights for different associated elements, and determine the associated weight data between different merchant nodes according to the associated elements between multiple merchant nodes and the corresponding weight of each associated element, and the associated weight data includes The sum of the weights corresponding to the related elements between different merchant nodes.
在执行步骤202之前,需要说明的是,本申请所提供的一种可疑社团的识别方法能够适用于无法处理异构图谱(不同类型的节点)的系统,因此构建知识图谱(同构图谱)之前,需要对不同的关联要素做聚合操作,以便完成异构图谱到同构图谱(同种类型的节点)的转化,关于具体的转换方式可参见下述步骤203的描述。本申请实施例中,由于不同的关联要素的重要性不同,因此在聚合不同的关联要素时,需要对不同的关联要素赋予不同的权重,以便能够完成异构图谱到同构图谱的转化,此外,对不同的关联要素赋予不同的权重,能够进一步提高可疑社团的识别的准确率。Before performing step 202, it should be noted that the method for identifying suspicious communities provided by this application can be applied to systems that cannot handle heterogeneous graphs (different types of nodes). Therefore, before constructing a knowledge graph (isomorphic graph) , It is necessary to perform aggregation operations on different related elements in order to complete the conversion of a heterogeneous map to a homogeneous map (the same type of node). For the specific conversion method, please refer to the description of step 203 below. In the embodiments of this application, because the importance of different related elements is different, when aggregating different related elements, it is necessary to assign different weights to different related elements, so as to be able to complete the conversion from a heterogeneous map to a homogeneous map. In addition, , Assigning different weights to different related elements can further improve the accuracy of the identification of suspicious communities.
本申请实施例中,由于不同的关联要素的重要程度不同,例如,营业执照信息的重要程度大于法人身份信息的重要程度,因此在设定关联要素的权重时,营业执照信息的权重大于法人身份信息的权重。具体地,通过执行步骤202,以便后续步骤203能够将上述图3的异构图谱进行转换,构建图4的同构图谱,如图4所示,例如,商户节点0和商户节点1之间的营业地址的权重为w4,则商户节点0和商户节点1之间的关联权重数据为w4。商户节点4和商户节点3之间的法人身份信息的权重为w1和营业执照信息的权重为w2,则商户节点4和商户节点3之间的关联权重数据为w1+w2,关于异构图谱转换至同构图谱的过程可参见下述步骤203的描述。In the embodiments of this application, because the importance of different related elements is different, for example, the importance of business license information is greater than the importance of legal person identity information, so when setting the weight of related elements, the weight of business license information is more important than legal person identity The weight of the information. Specifically, by performing step 202, the subsequent step 203 can convert the above-mentioned heterogeneous map of FIG. 3 to construct the isomorphic map of FIG. 4, as shown in FIG. 4, for example, between merchant node 0 and merchant node 1. The weight of the business address is w4, and the associated weight data between merchant node 0 and merchant node 1 is w4. The weight of the corporate identity information between the merchant node 4 and the merchant node 3 is w1 and the weight of the business license information is w2, then the correlation weight data between the merchant node 4 and the merchant node 3 is w1+w2, regarding the heterogeneous map conversion For the process to the isomorphic map, refer to the description of step 203 below.
步骤203、根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱。Step 203: Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
本申请实施例中,在构建知识图谱(同构图谱)之前,对异构图谱(不同类型的节点)转化到同构图谱(同种类型的节点)的过程进行简单介绍:In the embodiments of this application, before constructing the knowledge graph (isomorphic graph), the process of transforming heterogeneous graphs (different types of nodes) into isomorphic graphs (same type of nodes) is briefly introduced:
例如,图3所示,从入网信息中抽取出商户节点0、商户节点1、商户节点2、商户节点3、商户节点4、商户节点5、商户节点6,营业地址1,营业执照1、法人身份信息1以及结算账号1,即图3的异构图谱中包括多个类型的节点,分别包括商户节点、营业地址节点、营业执照节点、法人身份信息节点以及结算账号节点。For example, as shown in Figure 3, merchant node 0, merchant node 1, merchant node 2, merchant node 3, merchant node 4, merchant node 5, merchant node 6, business address 1, business license 1, legal person are extracted from the network access information Identity information 1 and settlement account number 1, that is, the heterogeneous graph of FIG. 3 includes multiple types of nodes, including merchant nodes, business address nodes, business license nodes, legal person identity information nodes, and settlement account nodes.
由于在后续可疑社团的识别过程中,需要构建知识图谱(同构图谱)。由于同构图谱可以更加直观的看出商户节点之间的关系,同时,由于不同关联要素重要性不同,在聚合不同关联要素时,不同关联要素赋予不同的权重。因此对异构图谱(不同类型的节点)转化到同构图谱(同种类型的节点)的过程中,将不同的关联要素聚合为一条同构边的同时,还需要对不同关联要素的权重进行求和,确定出不同商户节点之间的关联权重数据,从而能够将异构图谱转化为同构图谱。具体地,如图3所示,商户节点0和商户节点1之间的营业地址的权重为w4,则如图4所示,商户节点0和商户节点1之间的关联权重数据为w4。如图3所示,商户节点4和商户节点3之间的法人身份信息的权重为w1和营业执照信息的权重为w2,则如图4所示,商户节点4和商户节点3之间的关联权重数据为w1+w2,以此转换其他类型的节点,使得图4中只包括商户节点这一类型节点,从而完成异构图谱到同构图谱的转换。如图4所示,图4中包括7个商户节点,且每个商户节点与历史商户节点之间具有一个关联权重数据,该关联权重数据包括该商户节点与历史商户节点之间的关联要素对应的权重之和。需要说明的是,本申请的知识图谱即为转换后的同构图谱。Because in the subsequent identification process of suspicious communities, a knowledge graph (isomorphic graph) needs to be constructed. Since the isomorphic map can more intuitively see the relationship between the merchant nodes, and at the same time, because the importance of different related elements is different, when different related elements are aggregated, different related elements are given different weights. Therefore, in the process of transforming heterogeneous graphs (different types of nodes) into isomorphic graphs (same type of nodes), while aggregating different related elements into a homogeneous edge, it is also necessary to carry out the weighting of different related elements. Sum, determine the correlation weight data between different merchant nodes, so that the heterogeneous map can be converted into a homogeneous map. Specifically, as shown in FIG. 3, the weight of the business address between the merchant node 0 and the merchant node 1 is w4, and as shown in FIG. 4, the correlation weight data between the merchant node 0 and the merchant node 1 is w4. As shown in Figure 3, the weight of the corporate identity information between the merchant node 4 and the merchant node 3 is w1 and the weight of the business license information is w2, then as shown in Figure 4, the association between the merchant node 4 and the merchant node 3 The weight data is w1+w2 to convert other types of nodes, so that only the merchant node is included in Figure 4, thereby completing the conversion from a heterogeneous map to a homogeneous map. As shown in Figure 4, Figure 4 includes 7 merchant nodes, and each merchant node has an associated weight data between the historical merchant node, and the associated weight data includes the correlation element correspondence between the merchant node and the historical merchant node The sum of the weights. It should be noted that the knowledge map of this application is the converted isomorphic map.
需要说明的是,在一种可选方案中,在构建知识图谱之前,若不同商户节点之间的关联权重数据低于预设值,则不需要根据这些商户节点构建知识图谱,从而能够降低构建知识图谱的计算时间,降低后续识别可疑社团的计算量。It should be noted that in an optional solution, before the knowledge graph is constructed, if the correlation weight data between different merchant nodes is lower than the preset value, there is no need to construct the knowledge graph based on these merchant nodes, which can reduce the construction of the knowledge graph. The calculation time of the knowledge graph reduces the amount of calculation for subsequent identification of suspicious communities.
需要说明的是,在构建知识图谱的过程中,可将多个商户节点以及不同商户节点之间的关联权重数据导入到图形数据库,通过图形数据库将商 户节点转化为顶点,将不同商户节点之间的关联权重数据转化为边,从而完成知识图谱的构建。本申请实施例在实现构建知识图谱的过程中可采用的图形数据库包括Neo4j,通过该图形数据库能够便于数据查询和数据修改。It should be noted that in the process of constructing the knowledge graph, multiple merchant nodes and the associated weight data between different merchant nodes can be imported into the graph database, and the merchant nodes are converted into vertices through the graph database, and the different merchant nodes The associated weight data of is transformed into edges to complete the construction of the knowledge graph. The graph database that can be used in the process of constructing the knowledge graph in the embodiment of the present application includes Neo4j, and the graph database can facilitate data query and data modification.
步骤204、在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点。Step 204: In the knowledge graph, filter the newly added merchant node and the neighboring merchant nodes of the newly added merchant node.
本申请实施例中,在执行步骤204之前,还包括:当新增商户节点入网时,需要把多个新增商户节点的入网信息、多个新增商户节点和其他商户节点之间的关联权重数据添加到已经存在的知识图谱中,其中,多个新增商户节点和其他商户节点之间的关联权重数据可包括多个新增商户节点之间的关联权重数据和新增商户节点与历史商户节点之间的关联权重数据。因此在对完整的知识图谱进行可疑社团的识别时,由于随着数据的不断积累,不仅仅需要极高的时间消耗,而且对单节点服务器的资源要求也比较高。此外,由于非邻近的历史商户节点和新增商户节点关系较远,关联性不强,因此本申请实施例在可疑社团的识别之前,通过执行步骤204在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点,从而能够极大的降低数据量,减少响应时间。In this embodiment of the application, before step 204 is performed, it also includes: when a new merchant node is connected to the network, the network access information of multiple new merchant nodes and the associated weights between multiple new merchant nodes and other merchant nodes need to be added The data is added to the existing knowledge graph. Among them, the correlation weight data between multiple newly added merchant nodes and other merchant nodes may include the correlation weight data between multiple newly added merchant nodes, and the newly added merchant nodes and historical merchants Association weight data between nodes. Therefore, when the complete knowledge graph is used to identify suspicious communities, due to the continuous accumulation of data, not only requires extremely high time consumption, but also requires relatively high resource requirements for single-node servers. In addition, since the non-neighboring historical merchant node and the newly-added merchant node are relatively far from each other, and the correlation is not strong, the embodiment of the present application performs step 204 to select the newly-added merchant node in the knowledge graph by performing step 204 before the identification of the suspicious community And the neighboring merchant node of the newly added merchant node, which can greatly reduce the amount of data and reduce the response time.
本申请实施例中,步骤204可具体包括:In the embodiment of the present application, step 204 may specifically include:
步骤2041、将入网时间处于预设时间段内的商户节点确定为新增商户节点。Step 2041, a merchant node whose network access time is within a preset time period is determined as a new merchant node.
本申请实施例中,商户节点的入网信息包括入网时间。预设时间段可根据需求设定,例如,以1个月为预设时间段为例,5月在内以及5月之前的商户节点均已在知识图谱中,因此将6月内入网的商户节点确定为新增商户节点。In the embodiment of the present application, the network access information of the merchant node includes the network access time. The preset time period can be set according to the needs. For example, taking 1 month as the preset time period, the merchant nodes within May and before May are already in the knowledge graph, so the merchants who will be online within 6 months The node is determined to be a new merchant node.
步骤2042、计算出每个新增商户节点与历史商户节点之间的关联步数,并将关联步数小于预设步数的历史商户节点,确定为邻近商户节点。Step 2042, calculate the number of associated steps between each newly added merchant node and the historical merchant node, and determine the historical merchant node whose associated step number is less than the preset number of steps as neighboring merchant nodes.
本申请实施例中,关联步数用于指示知识图谱中边的数量,预设步数可根据需求设定,例如预设步数包括3步。In the embodiment of the present application, the number of associated steps is used to indicate the number of edges in the knowledge graph, and the preset number of steps can be set according to requirements, for example, the preset number of steps includes 3 steps.
在一种可选的方案中,例如,如图5所示,商户节点7和商户节点8 为新增商户节点,以商户节点7为例,商户节点7与商户节点5之间的关联步数为1步,商户节点7与商户节点6之间的关联步数为2步,商户节点7与商户节点4之间的关联步数为2步,商户节点7与商户节点3之间的关联步数为3步,商户节点7与商户节点2之间的关联步数为3步,商户节点7与商户节点1之间的关联步数为3步,因此,商户节点1-6均为商户节点7的邻近商户节点。In an alternative solution, for example, as shown in Fig. 5, merchant node 7 and merchant node 8 are newly-added merchant nodes. Taking merchant node 7 as an example, the number of associated steps between merchant node 7 and merchant node 5 Is 1 step, the number of associated steps between merchant node 7 and merchant node 6 is 2 steps, the number of associated steps between merchant node 7 and merchant node 4 is 2 steps, and the number of associated steps between merchant node 7 and merchant node 3 The number is 3 steps, the number of associated steps between merchant node 7 and merchant node 2 is 3, and the number of associated steps between merchant node 7 and merchant node 1 is 3 steps, therefore, merchant nodes 1-6 are merchant nodes 7 neighboring merchant nodes.
需要说明的是,若新增商户中包括多个新增商户节点,只需任一新增商户节点与历史商户节点之间的关联步数小于预设步数即可。例如,如图5所示,图5中商户节点7和商户节点8为新增商户节点,且商户节点1-6均为商户节点7的邻近商户节点,因此商户节点1-6均为商户节点8的邻近商户节点。It should be noted that if the new merchant includes multiple new merchant nodes, only the number of associated steps between any new merchant node and the historical merchant node is less than the preset number of steps. For example, as shown in Figure 5, merchant node 7 and merchant node 8 in Figure 5 are new merchant nodes, and merchant nodes 1-6 are all neighboring merchant nodes of merchant node 7, so merchant nodes 1-6 are merchant nodes 8 neighboring merchant nodes.
本申请实施例中,通过步骤204,在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点,能够可以大大降低计算可疑社团的时间成本,并且需要短时定期处理新增商户节点的入网信息,从而时间延时较低,风险预警响应较为及时。In the embodiment of this application, through step 204, in the knowledge graph, the newly added merchant nodes and the neighboring merchant nodes of the newly added merchant nodes are screened out, which can greatly reduce the time cost of calculating suspicious communities, and it is necessary to process the newly added merchant nodes regularly in a short period of time. The network access information of the merchant node, so that the time delay is relatively low, and the risk warning response is relatively timely.
步骤205、获取每个新增商户节点和每个邻近商户节点之间的关联要素。Step 205: Obtain the associated elements between each newly added merchant node and each neighboring merchant node.
本申请实施例中,步骤205的执行过程可参见上述步骤201,区别在于,步骤201是根据获取的多个商户的入网信息,确定出多个商户节点以及多个商户节点之间的关联要素,而步骤205是根据获取的新增商户的入网信息,确定出多个新增商户节点以及每个新增商户节点和每个邻近商户节点之间的关联要素。In the embodiment of the present application, the execution process of step 205 can be referred to step 201 above. The difference is that step 201 is to determine multiple merchant nodes and related elements between multiple merchant nodes based on the obtained network access information of multiple merchants. And step 205 is to determine multiple newly-added merchant nodes and the associated elements between each newly-added merchant node and each neighboring merchant node according to the acquired network access information of the newly-added merchant.
步骤206、根据每个新增商户节点的关联要素的数量以及关联要素对应的权重,以及每个邻近商户节点的关联要素的数量以及关联要素对应的权重,确定出多个社团群体。Step 206: Determine a plurality of community groups according to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements.
本申请实施例中,在确定出多个社团群体的过程可采用标签传播算法(Label Propagation Algorithm,简称LPA)实现,LPA通过网络结构检测社团群体,从而不需要预先定义的目标函数或者先验信息,将聚集程度越高的商户节点、关联权重数量越大的商户节点,确定为同一个社团群体, 并对同一个社团群体中的商户节点预先标记相同的社团标签,以便LPA可以按照半监督方式运行,提高准确度。In the embodiments of this application, the process of determining multiple community groups can be achieved by using a label propagation algorithm (Label Propagation Algorithm, LPA for short). LPA detects community groups through a network structure, thereby eliminating the need for pre-defined objective functions or prior information , The merchant nodes with the higher the degree of aggregation and the merchant nodes with the greater number of associated weights are determined as the same community group, and the merchant nodes in the same community group are pre-marked with the same community label so that LPA can be semi-supervised Run to improve accuracy.
在一种可能的实现方案中,例如,如图6所示,通过执行步骤206,确定出商户节点1、商户节点2、商户节点3、商户节点4为同一社团群体,即社团1,确定出商户节点5、商户节点6、商户节点7、商户节点8为同一社团群体,即社团2。In a possible implementation scheme, for example, as shown in FIG. 6, by performing step 206, it is determined that the merchant node 1, the merchant node 2, the merchant node 3, and the merchant node 4 are the same community group, that is, the community 1. Merchant node 5, merchant node 6, merchant node 7, and merchant node 8 are the same community group, namely community 2.
步骤207、根据预设的多个社团指标和预设的多个业务指标,从多个社团群体中,确定出可疑社团。Step 207: Determine the suspicious community from the multiple community groups according to multiple preset community indicators and multiple preset business indicators.
本申请实施例中,预设的多个社团指标包括社团内节点数、社团内黑商户节点数、社团内边数、社团外边数、社团聚合程度、社团内边权重分布、商户节点的重要程度或者商户节点的连接边最大权重。其中,社团指标用于衡量该社团群体的社团聚合程度,社团内节点数用于指示每个社团群体内的商户节点的数量,社团内黑商户节点数用于指示每个社团群体内的触发风险案例以及黑商户节点的数量,社团内边数用于指示该社团群体内部的商户节点之间的关联边数,社团外边数用于指示该社团群体和外部社团群体的连接边数,社团聚合程度用于指示社团聚合层度即该社团内边数和理论最大边数的比值,社团群体内最大理论边数由社团内节点数计算得出,即对于一个含有n个节点的社团,最大双向理论边数为:
Figure PCTCN2021092940-appb-000001
最大单向理论边数为:
Figure PCTCN2021092940-appb-000002
社团内边权重分布用于指示分区间统计社团群体内边权重分布值。商户节点的连接边最大权重用于自身每个商户节点所有连接边中的最大权重,即每个商户节点所有关联要素中的最大权重。
In the embodiment of this application, the preset multiple community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of aggregation of the community, the weight distribution of the edges in the community, and the importance of the merchant nodes. Or the maximum weight of the connected edge of the merchant node. Among them, the community index is used to measure the degree of community aggregation of the community group, the number of nodes in the community is used to indicate the number of merchant nodes in each community group, and the number of black merchant nodes in the community is used to indicate the trigger risk in each community group Cases and the number of black merchant nodes. The number of edges in the community is used to indicate the number of associated edges between merchant nodes within the community group, and the number of edges outside the community is used to indicate the number of connected edges between the community group and external community groups, and the degree of community aggregation Used to indicate the level of community aggregation, that is, the ratio of the number of edges in the community to the theoretical maximum number of edges. The maximum theoretical number of edges in the community group is calculated from the number of nodes in the community, that is, for a community with n nodes, the maximum two-way theory The number of edges is:
Figure PCTCN2021092940-appb-000001
The maximum number of one-way theoretical edges is:
Figure PCTCN2021092940-appb-000002
The weight distribution of the inner edge of the community is used to indicate the distribution of the weight of the inner edge of the statistical community group among the partitions. The maximum weight of the connected edge of a merchant node is used for the largest weight among all connected edges of each merchant node, that is, the largest weight among all the associated elements of each merchant node.
本申请实施例中,业务指标包括从业务规则角度计算相关指标预设的多个业务指标包括非正常状态的商户节点占比、撤销商户节点数或者核实商户节点数,除此之外,还可以包括其他参数,本申请对此不做限定。In the embodiment of this application, the business indicators include multiple business indicators preset by calculating relevant indicators from the perspective of business rules, including the proportion of merchant nodes in an abnormal state, the number of merchant nodes that are cancelled, or the number of merchant nodes that are verified. In addition, you can also Including other parameters, this application does not limit it.
本申请实施例中,通过上述步骤206,确定出多个社团群体之后,并非每一个社团群体均为可疑社团,因此需要执行步骤207根据预设的多个社团指标和预设的多个业务指标,对所有社团结果进行过滤,筛选出可疑社团,从而能够提高可疑社团的识别准确率。In the embodiment of this application, after multiple community groups are determined through step 206, not every community group is a suspicious community. Therefore, step 207 needs to be performed according to preset multiple community indicators and preset multiple business indicators. , Filter the results of all associations to filter out suspicious associations, which can improve the accuracy of the identification of suspicious associations.
步骤208、通过中心度算法计算出可疑社团中每个商户节点的重要程度。Step 208: Calculate the importance of each merchant node in the suspicious community through the centrality algorithm.
本申请实施例中,通过计算出可疑社团中每个商户节点的重要程度,除了可作为上述步骤207中的预设的社团指标之外,还可以便于后续步骤筛选出高可疑商户,并输出高可疑商户,以便审核人员查看。此外,还可以通过PageRank算法计算出可疑社团中每个商户节点的重要程度,PageRank算法可度量商户节点的传递效应,与其他商户节点连接越多的商户节点、关联权重数据越大的商户节点,往往这个商户节点就越重要、越可疑,因此需作为优先核查对象。In this embodiment of the application, by calculating the importance of each merchant node in the suspicious community, in addition to being used as the preset community indicator in step 207, it can also facilitate the subsequent steps to screen out highly suspicious merchants and output high Suspicious merchants for review by reviewers. In addition, the PageRank algorithm can also be used to calculate the importance of each merchant node in the suspicious community. The PageRank algorithm can measure the transfer effect of merchant nodes. Often this merchant node is more important and suspicious, so it needs to be a priority check object.
步骤209、对每个商户节点的重要程度进行高到低排序,并将前N个商户节点确定为高可疑商户。Step 209: Sort the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
本申请实施例中,N的数值可根据需求设定,本申请对此不做限定。通过执行步骤209,将前N个商户节点确定为高可疑商户,以便审核人员核查该高可疑商户是否为欺诈商户。In the embodiments of this application, the value of N can be set according to requirements, which is not limited in this application. By performing step 209, the top N merchant nodes are determined as highly suspicious merchants, so that the auditor can check whether the highly suspicious merchants are fraudulent merchants.
本申请提供一种基于知识图谱的商户团伙入网欺诈侦测方法,对商户入网时提供的档案信息通过数据标准化处理、数据过滤等流程提取商户关联关系,并针对不同的档案信息设置不同的权重,从而兼容不同维度的档案信息,并构建商户关联关系知识图谱。在已构建的商户关联关系知识图谱基础上,并鉴于远亲商户节点和新入网商户关系较弱,本方法通过对新入网商户的近邻关系商户进行数据采样,从而降低数据量和计算成本,并对采样后的商户关系网运行LPA和PageRank进行无监督加权社团检测和重要性计算。最后,针对社团检测结果,本方法计算社团指标和业务指标,并根据计算的指标筛选出可疑社团(可疑欺诈团伙),同时输出社团结果和相关计算指标。This application provides a method for detecting online fraud by merchant groups based on knowledge graphs. The file information provided by the merchants when they enter the network is extracted through data standardization processing, data filtering and other processes, and different weights are set for different file information. In this way, it is compatible with file information of different dimensions and constructs a knowledge graph of business association relationships. On the basis of the established knowledge graph of merchant association relations, and in view of the weaker relationship between distant relative merchant nodes and new merchants, this method reduces the data volume and calculation cost by sampling the data of the neighbors of the new merchants. The sampled merchant network runs LPA and PageRank for unsupervised weighted community detection and importance calculation. Finally, for the community detection results, this method calculates community indicators and business indicators, and screens out suspicious communities (suspicious fraud groups) based on the calculated indicators, and outputs the community results and related calculation indicators.
本申请实施例提供的技术方案中,根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱,在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点,根据新增商户节点和新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,根据 预设的多个社团指标和预设的多个业务指标,从多个社团群体中,确定出可疑社团,从而能够提高可疑社团的识别效率。In the technical solution provided by the embodiments of the present application, a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out The neighboring merchant nodes of the node, based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined Groups, according to preset multiple community indicators and preset multiple business indicators, determine suspicious communities from multiple community groups, thereby improving the efficiency of identifying suspicious communities.
图7是本申请一实施例所提供的一种可疑社团的识别装置的结构示意图,如图7所示,该装置包括:构建模块11、筛选模块12、生成模块13和确定模块14。FIG. 7 is a schematic structural diagram of a suspicious community identification device provided by an embodiment of the present application. As shown in FIG. 7, the device includes: a construction module 11, a screening module 12, a generation module 13, and a determination module 14.
构建模块11用于根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱。The construction module 11 is used to construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes.
筛选模块12用于在所述知识图谱中,筛选出新增商户节点和所述新增商户节点的邻近商户节点。The screening module 12 is used for screening the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node in the knowledge graph.
生成模块13用于根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体。The generating module 13 is configured to perform community detection on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the correlation weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, Identify multiple community groups.
确定模块14用于根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团。The determining module 14 is configured to determine a suspicious community from the multiple community groups based on multiple preset community indicators and multiple preset business indicators.
本申请实施例中,该装置还包括:In the embodiment of the present application, the device further includes:
确定模块14还用于根据获取的多个商户的入网信息,确定出多个商户节点以及所述多个商户节点之间的关联要素;对不同的关联要素设定不同的权重,并根据多个商户节点之间的关联要素以及每个关联要素对应的权重,确定出不同商户节点之间的关联权重数据,所述关联权重数据包括不同商户节点之间的关联要素对应的权重之和。The determining module 14 is also used to determine multiple merchant nodes and the associated elements between the multiple merchant nodes according to the obtained network access information of multiple merchants; set different weights for different associated elements, and set different weights according to the multiple The correlation elements between the merchant nodes and the weight corresponding to each correlation element determine the correlation weight data between the different merchant nodes, and the correlation weight data includes the sum of the weights corresponding to the correlation elements between the different merchant nodes.
本申请实施例中,所述入网信息包括入网时间;该装置的筛选模块12具体包括:确定子模块121和计算子模块122。In the embodiment of the present application, the network access information includes the network access time; the screening module 12 of the device specifically includes: a determination sub-module 121 and a calculation sub-module 122.
确定子模块121用于将所述入网时间处于预设时间段内的商户节点确定为所述新增商户节点。The determining submodule 121 is configured to determine a merchant node whose network access time is within a preset time period as the newly added merchant node.
计算子模块122计算出每个新增商户节点与历史商户节点之间的关联步数。The calculation sub-module 122 calculates the number of associated steps between each newly added merchant node and the historical merchant node.
确定子模块121还用于将所述关联步数小于预设步数的历史商户节点,确定为所述邻近商户节点。The determining submodule 121 is further configured to determine a historical merchant node whose associated number of steps is less than a preset number of steps as the neighboring merchant node.
本申请实施例中,该装置的生成模块13具体包括:获取子模块131 和确定子模块132。In the embodiment of the present application, the generating module 13 of the device specifically includes: an obtaining sub-module 131 and a determining sub-module 132.
获取子模块131用于获取每个新增商户节点和每个邻近商户节点之间的关联要素。The acquiring sub-module 131 is used to acquire the associated elements between each newly added merchant node and each neighboring merchant node.
确定子模块132用于根据每个新增商户节点的关联要素的数量以及关联要素对应的权重,以及每个邻近商户节点的关联要素的数量以及关联要素对应的权重,确定出多个社团群体。The determining sub-module 132 is used to determine a plurality of community groups according to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements.
本申请实施例中,所述预设的多个社团指标包括社团内节点数、社团内黑商户节点数、社团内边数、社团外边数、社团聚合程度、社团内边权重分布、商户节点的重要程度或者商户节点的连接边最大权重。In the embodiment of this application, the multiple preset community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of community aggregation, the weight distribution of the edges in the community, and the number of merchant nodes. The degree of importance or the maximum weight of the connected edge of the merchant node.
本申请实施例中,所述预设的多个业务指标包括非正常状态的商户节点占比、撤销商户节点数或者核实商户节点数。In this embodiment of the application, the multiple preset business indicators include the proportion of merchant nodes in an abnormal state, the number of revoked merchant nodes, or the number of verified merchant nodes.
本申请实施例中,该装置还包括:计算模块15。In the embodiment of the present application, the device further includes: a calculation module 15.
计算模块15用于通过中心度算法计算出所述可疑社团中每个商户节点的重要程度。The calculation module 15 is used to calculate the importance of each merchant node in the suspicious community through a centrality algorithm.
确定模块14还用于对每个商户节点的重要程度进行高到低排序,并将前N个商户节点确定为高可疑商户。The determining module 14 is also used to rank the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
本申请实施例提供的技术方案中,根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱,在知识图谱中,筛选出新增商户节点和新增商户节点的邻近商户节点,根据新增商户节点和新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团,从而能够提高可疑社团的识别效率。In the technical solution provided by the embodiments of the present application, a knowledge graph is constructed based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, and in the knowledge graph, new merchant nodes and new merchants are screened out The neighboring merchant nodes of the node, based on the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and multiple communities are determined For groups, according to preset multiple community indicators and preset multiple business indicators, suspicious communities are determined from the multiple community groups, so that the efficiency of identifying suspicious communities can be improved.
本申请实施例提供了一种存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述可疑社团的识别方法的实施例的各步骤,具体描述可参见上述可疑社团的识别方法的实施例。The embodiment of the present application provides a storage medium, the storage medium includes a stored program, where the device where the storage medium is located is controlled to execute each step of the above-mentioned suspicious community identification method when the program is running. For specific description, please refer to the above-mentioned suspicious community Examples of identification methods.
本申请实施例提供了一种计算机设备,包括存储器和处理器,存储器用于存储包括程序指令的信息,处理器用于控制程序指令的执行,程序指令被处理器加载并执行时实现上述可疑社团的识别方法的步骤。具体描述 可参见上述可疑社团的识别方法的实施例。The embodiment of the present application provides a computer device including a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are loaded and executed by the processor to realize the above suspicious community Identify the steps of the method. For specific description, please refer to the above-mentioned embodiment of the method for identifying suspicious communities.
图8为本申请实施例提供的一种计算机设备的示意图。如图8所示,该实施例的计算机设备4包括:处理器41、存储器42以及存储在存储42中并可在处理器41上运行的计算机程序43,该计算机程序43被处理器41执行时实现实施例中的应用于可疑社团的识别方法,为避免重复,此处不一一赘述。或者,该计算机程序被处理器41执行时实现实施例中应用于可疑社团的识别装置中各模型/单元的功能,为避免重复,此处不一一赘述。FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application. As shown in FIG. 8, the computer device 4 of this embodiment includes: a processor 41, a memory 42, and a computer program 43 that is stored in the storage 42 and can run on the processor 41. When the computer program 43 is executed by the processor 41, In order to avoid repetition, the method for identifying the suspicious community in the implementation embodiment will not be repeated here. Or, when the computer program is executed by the processor 41, the function of each model/unit in the device for identifying suspicious communities in the embodiment is realized. To avoid repetition, it will not be repeated here.
计算机设备4包括,但不仅限于,处理器41、存储器42。本领域技术人员可以理解,图8仅仅是计算机设备4的示例,并不构成对计算机设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备4还可以包括输入输出设备、网络接入设备、总线等。The computer device 4 includes, but is not limited to, a processor 41 and a memory 42. Those skilled in the art can understand that FIG. 8 is only an example of the computer device 4, and does not constitute a limitation on the computer device 4. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. For example, the computer device 4 may also include input and output devices, network access devices, buses, and so on.
所称处理器41可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 41 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
存储器42可以是计算机设备4的内部存储单元,例如计算机设备4的硬盘或内存。存储器42也可以是计算机设备4的外部存储设备,例如计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器42还可以既包括计算机设备4的内部存储单元也包括外部存储设备。存储器42用于存储计算机程序以及计算机设备4所需的其他程序和数据。存储器42还可以用于暂时地存储已经输出或者将要输出的数据。The memory 42 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. The memory 42 may also be an external storage device of the computer device 4, such as a plug-in hard disk equipped on the computer device 4, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on. Further, the memory 42 may also include both an internal storage unit of the computer device 4 and an external storage device. The memory 42 is used to store computer programs and other programs and data required by the computer device 4. The memory 42 can also be used to temporarily store data that has been output or will be output.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述 描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute the method described in each embodiment of the present application. Part of the steps. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above descriptions are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in this application Within the scope of protection.

Claims (10)

  1. 一种可疑社团的识别方法,包括:A method of identifying suspicious associations, including:
    根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱;Construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes;
    在所述知识图谱中,筛选出新增商户节点和所述新增商户节点的邻近商户节点;In the knowledge graph, the newly-added merchant node and neighboring merchant nodes of the newly-added merchant node are filtered out;
    根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体;According to the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, perform community detection on the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, and determine multiple communities group;
    根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团。According to preset multiple community indicators and preset multiple business indicators, a suspicious community is determined from the multiple community groups.
  2. 根据权利要求1所述的方法,在所述根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱之前,包括:The method according to claim 1, before said constructing a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes, comprising:
    根据获取的多个商户的入网信息,确定出多个商户节点以及所述多个商户节点之间的关联要素;Determine multiple merchant nodes and related elements between the multiple merchant nodes according to the obtained network access information of multiple merchants;
    对不同的关联要素设定不同的权重,并根据多个商户节点之间的关联要素以及每个关联要素对应的权重,确定出不同商户节点之间的关联权重数据,所述关联权重数据包括不同商户节点之间的关联要素对应的权重之和。Different weights are set for different associated elements, and the associated weight data between different merchant nodes is determined according to the associated elements between multiple merchant nodes and the corresponding weight of each associated element, and the associated weight data includes different The sum of the weights corresponding to the related elements between the merchant nodes.
  3. 根据权利要求1所述的方法,所述入网信息包括入网时间;The method according to claim 1, wherein the network access information includes network access time;
    所述筛选出新增商户节点和所述新增商户节点的邻近商户节点,包括:The filtering out the newly added merchant node and the neighboring merchant nodes of the newly added merchant node includes:
    将所述入网时间处于预设时间段内的商户节点确定为所述新增商户节点;Determining the merchant node whose network access time is within a preset time period as the newly added merchant node;
    计算出每个新增商户节点与历史商户节点之间的关联步数,并将所述关联步数小于预设步数的历史商户节点,确定为所述邻近商户节点。The number of associated steps between each newly added merchant node and the historical merchant node is calculated, and the historical merchant node whose associated step number is less than the preset number of steps is determined as the neighboring merchant node.
  4. 根据权利要求2所述的方法,所述根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体,包括:The method according to claim 2, wherein according to the correlation weight data between the newly added merchant node and the neighboring merchant nodes of the newly added merchant node, the neighboring of the newly added merchant node and the newly added merchant node Merchant nodes conduct community detection and identify multiple community groups, including:
    获取每个新增商户节点和每个邻近商户节点之间的关联要素;Obtain the correlation elements between each newly added merchant node and each neighboring merchant node;
    根据每个新增商户节点的关联要素的数量以及关联要素对应的权重,以及每个邻近商户节点的关联要素的数量以及关联要素对应的权重,确定出多个社团群体。According to the number of associated elements of each newly added merchant node and the corresponding weight of the associated elements, and the number of associated elements of each neighboring merchant node and the corresponding weight of the associated elements, multiple community groups are determined.
  5. 根据权利要求1所述的方法,所述预设的多个社团指标包括社团内节点数、社团内黑商户节点数、社团内边数、社团外边数、社团聚合程度、社团内边权重分布、商户节点的重要程度或者商户节点的连接边最大权重。The method according to claim 1, wherein the plurality of preset community indicators include the number of nodes in the community, the number of black merchant nodes in the community, the number of edges in the community, the number of edges outside the community, the degree of community aggregation, the weight distribution of the edges in the community, The importance of the merchant node or the maximum weight of the connected edge of the merchant node.
  6. 根据权利要求1所述的方法,所述预设的多个业务指标包括非正常状态的商户节点占比、撤销商户节点数或者核实商户节点数。The method according to claim 1, wherein the plurality of preset business indicators include the proportion of merchant nodes in an abnormal state, the number of revoked merchant nodes, or the number of verified merchant nodes.
  7. 根据权利要求1所述的方法,在所述根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团之后,还包括:The method according to claim 1, after determining a suspicious community from the multiple community groups based on the multiple preset community indicators and the multiple preset business indicators, the method further comprises:
    通过中心度算法计算出所述可疑社团中每个商户节点的重要程度;Calculate the importance of each merchant node in the suspicious community through a centrality algorithm;
    对每个商户节点的重要程度进行高到低排序,并将前N个商户节点确定为高可疑商户。Sort the importance of each merchant node from high to low, and determine the top N merchant nodes as highly suspicious merchants.
  8. 一种可疑社团的识别装置,包括:A device for identifying suspicious associations, including:
    构建模块,用于根据获取的多个商户节点的入网信息以及不同商户节点之间的关联权重数据,构建知识图谱;The building module is used to construct a knowledge graph based on the obtained network access information of multiple merchant nodes and the associated weight data between different merchant nodes;
    筛选模块,用于在所述知识图谱中,筛选出新增商户节点和所述新增 商户节点的邻近商户节点;The screening module is used for screening the newly added merchant node and the neighboring merchant nodes of the newly added merchant node in the knowledge graph;
    生成模块,用于根据所述新增商户节点和所述新增商户节点的邻近商户节点之间的关联权重数据,对新增商户节点和所述新增商户节点的邻近商户节点进行社团检测,确定出多个社团群体;The generating module is used to perform community detection on the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node according to the association weight data between the newly-added merchant node and the neighboring merchant nodes of the newly-added merchant node, Identify multiple community groups;
    确定模块,用于根据预设的多个社团指标和预设的多个业务指标,从所述多个社团群体中,确定出可疑社团。The determining module is used to determine the suspicious community from the multiple community groups based on multiple preset community indicators and multiple preset business indicators.
  9. 一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至7中任意一项所述的可疑社团的识别方法。A storage medium, the storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the method for identifying a suspicious community according to any one of claims 1 to 7 when the program is running.
  10. 一种计算机设备,包括存储器和处理器,所述存储器用于存储包括程序指令的信息,所述处理器用于控制程序指令的执行,所述程序指令被处理器加载并执行时实现权利要求1至7任意一项所述的可疑社团的识别方法的步骤。A computer device comprising a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the program instructions are loaded and executed by the processor to implement claims 1 to 7 Steps of the method for identifying suspicious communities as described in any one of them.
PCT/CN2021/092940 2020-06-16 2021-05-11 Method and apparatus for identifying suspicious community, and storage medium and computer device WO2021254027A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010546897.3 2020-06-16
CN202010546897.3A CN111709756A (en) 2020-06-16 2020-06-16 Method and device for identifying suspicious communities, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
WO2021254027A1 true WO2021254027A1 (en) 2021-12-23

Family

ID=72540651

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/092940 WO2021254027A1 (en) 2020-06-16 2021-05-11 Method and apparatus for identifying suspicious community, and storage medium and computer device

Country Status (2)

Country Link
CN (1) CN111709756A (en)
WO (1) WO2021254027A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362157A (en) * 2021-05-27 2021-09-07 中国银联股份有限公司 Abnormal node identification method, model training method, device and storage medium
CN114820001A (en) * 2022-05-27 2022-07-29 中国建设银行股份有限公司 Target customer screening method, device, equipment and medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709756A (en) * 2020-06-16 2020-09-25 银联商务股份有限公司 Method and device for identifying suspicious communities, storage medium and computer equipment
CN112288330A (en) * 2020-11-24 2021-01-29 拉卡拉支付股份有限公司 Method and device for identifying cheating community
CN112215616B (en) * 2020-11-30 2021-04-30 四川新网银行股份有限公司 Method and system for automatically identifying abnormal fund transaction based on network
CN113205129B (en) * 2021-04-28 2023-04-07 五八有限公司 Cheating group identification method and device, electronic equipment and storage medium
CN113641827A (en) * 2021-06-29 2021-11-12 武汉众智数字技术有限公司 Phishing network identification method and system based on knowledge graph

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949046A (en) * 2018-11-02 2019-06-28 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN110110093A (en) * 2019-04-08 2019-08-09 深圳众赢维融科技有限公司 A kind of recognition methods, device, electronic equipment and the storage medium of knowledge based map
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN110825883A (en) * 2019-10-30 2020-02-21 杭州叙简科技股份有限公司 Knowledge graph-based hybrid group discovery method
CN111709756A (en) * 2020-06-16 2020-09-25 银联商务股份有限公司 Method and device for identifying suspicious communities, storage medium and computer equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070364A (en) * 2019-03-27 2019-07-30 北京三快在线科技有限公司 Method and apparatus, storage medium based on the fraud of graph model detection clique
CN111274495B (en) * 2020-01-20 2023-08-25 平安科技(深圳)有限公司 Data processing method, device, computer equipment and storage medium for user relationship strength
CN111275564A (en) * 2020-02-11 2020-06-12 山西大学 Method and system for detecting community number of microblog network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949046A (en) * 2018-11-02 2019-06-28 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN110110093A (en) * 2019-04-08 2019-08-09 深圳众赢维融科技有限公司 A kind of recognition methods, device, electronic equipment and the storage medium of knowledge based map
CN110135853A (en) * 2019-04-25 2019-08-16 阿里巴巴集团控股有限公司 Clique's user identification method, device and equipment
CN110825883A (en) * 2019-10-30 2020-02-21 杭州叙简科技股份有限公司 Knowledge graph-based hybrid group discovery method
CN111709756A (en) * 2020-06-16 2020-09-25 银联商务股份有限公司 Method and device for identifying suspicious communities, storage medium and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362157A (en) * 2021-05-27 2021-09-07 中国银联股份有限公司 Abnormal node identification method, model training method, device and storage medium
CN114820001A (en) * 2022-05-27 2022-07-29 中国建设银行股份有限公司 Target customer screening method, device, equipment and medium

Also Published As

Publication number Publication date
CN111709756A (en) 2020-09-25

Similar Documents

Publication Publication Date Title
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
WO2021174944A1 (en) Message push method based on target activity, and related device
TWI673666B (en) Method and device for data risk control
TWI804575B (en) Method and apparatus, computer readable storage medium, and computing device for identifying high-risk users
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
WO2020100108A1 (en) Systems and method for scoring entities and networks in a knowledge graph
EP2329447A1 (en) Evaluating loan access using online business transaction data
US11570214B2 (en) Crowdsourced innovation laboratory and process implementation system
CN111681091A (en) Financial risk prediction method and device based on time domain information and storage medium
CN111367965B (en) Target object determining method, device, electronic equipment and storage medium
CN101685519A (en) Credit evaluation method and credit evaluation system
CN112801498A (en) Risk identification model training method, risk identification device and risk identification equipment
Wang et al. An unsupervised strategy for defending against multifarious reputation attacks
CN115062163A (en) Abnormal tissue identification method, abnormal tissue identification device, electronic device and medium
Dong Application of Big Data Mining Technology in Blockchain Computing
CN112950359A (en) User identification method and device
WO2019196502A1 (en) Marketing activity quality assessment method, server, and computer readable storage medium
CN116361571A (en) Artificial intelligence-based merchant portrait generation method, device, equipment and medium
CN115358894A (en) Intellectual property life cycle trusteeship management method, device, equipment and medium
CN114493853A (en) Credit rating evaluation method, credit rating evaluation device, electronic device and storage medium
CN111126788A (en) Risk identification method and device and electronic equipment
CN112926991A (en) Cascade group severity grade dividing method and system
CN112200644A (en) Method and device for identifying fraudulent user, computer equipment and storage medium
CN112529303A (en) Risk prediction method, device, equipment and storage medium based on fuzzy decision
CN110675268A (en) Risk client identification method and device and server

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21825392

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21825392

Country of ref document: EP

Kind code of ref document: A1