CN114547143A - Core business object mining method and device - Google Patents

Core business object mining method and device Download PDF

Info

Publication number
CN114547143A
CN114547143A CN202210139199.0A CN202210139199A CN114547143A CN 114547143 A CN114547143 A CN 114547143A CN 202210139199 A CN202210139199 A CN 202210139199A CN 114547143 A CN114547143 A CN 114547143A
Authority
CN
China
Prior art keywords
node
target node
nodes
seed
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210139199.0A
Other languages
Chinese (zh)
Other versions
CN114547143B (en
Inventor
刘东亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202210139199.0A priority Critical patent/CN114547143B/en
Publication of CN114547143A publication Critical patent/CN114547143A/en
Application granted granted Critical
Publication of CN114547143B publication Critical patent/CN114547143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the present specification provides a method and an apparatus for mining a core business object, including: acquiring a business relation graph which comprises a plurality of nodes respectively corresponding to a plurality of business objects and a connecting edge established according to the business relation among the business objects; dividing the business relation graph into a plurality of connected sub-community graphs based on a community discovery algorithm, and determining seed nodes contained in each sub-community graph; for any sub-community graph, executing a plurality of rounds of iteration, wherein any round of iteration comprises that for any target node, according to the previous round of attention values of the target node and each adjacent node, each round of attention value is determined; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.

Description

Core business object mining method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of data mining and graph computing, and in particular, to a method and an apparatus for mining a core business object.
Background
The conventional technical solution for identifying core business objects is to construct a graph network reflecting business relationships between business objects, and then identify the core business objects from a large number of business objects by using a graph algorithm including, for example, a traversal and routing algorithm, a centrality algorithm, and a community discovery algorithm. However, the technical scheme also has the problems of low calculation speed and insufficient rationality and interpretability of the process of acquiring the core business object.
Therefore, a new method for mining core business objects is needed.
Disclosure of Invention
Embodiments in this specification aim to provide a new method for mining a core business object, by which the processing speed for mining the core business object from a large number of business objects can be increased, and meanwhile, the process of acquiring the core business object is more reasonable and interpretable, thereby solving the deficiencies in the prior art.
According to a first aspect, a method for mining a core business object is provided, which includes:
acquiring a business relation graph which comprises a plurality of nodes respectively corresponding to a plurality of business objects and a connecting edge established according to the business relation among the business objects;
based on a community discovery algorithm, dividing the business relationship graph into a plurality of connected sub-community graphs, and determining seed nodes contained in each sub-community graph;
for any sub-community graph, executing a plurality of rounds of iteration, wherein any round of iteration comprises that for any target node, according to the previous round of attention values of the target node and each adjacent node, each round of attention value is determined; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.
In a possible implementation manner, the business object is one of an account and a bank card, and the business relationship is a transaction relationship between the account and the bank card.
In a possible implementation manner, the method further includes determining a set of core business objects according to a sum of the core business objects determined by the respective sub-community maps.
In one possible embodiment, the community discovery algorithm is one of a Louvian algorithm and an Infomap algorithm.
In a possible implementation manner, the seed nodes include an initial seed node and a predicted seed node, and each of the sub-community graphs includes a plurality of initial seed nodes;
the determining the seed nodes contained in each sub-community graph includes: and determining whether the rest nodes are prediction seed nodes or not according to the initial seed nodes included in each sub-community graph.
In one possible embodiment, determining whether the remaining nodes are predicted seed nodes according to the initial seed nodes included in each sub-community graph includes:
and determining whether the rest nodes are predicted seed nodes or not according to the initial seed nodes included by each sub-community graph based on a Label Propagation Algorithm (LPA).
In one possible implementation, updating the centrality of the target node according to each current round of attention and whether each neighboring node is a seed node, includes:
and updating the centrality of the target node according to the attention of each current round, the weight value of the connecting edge of the target node and each adjacent node and whether each adjacent node is a seed node.
In one possible embodiment, determining each current round attention value according to the previous round attention value of the target node and each adjacent node comprises:
for the target node and any first neighboring node thereof,
determining a first iteration increment according to the proportion of the previous round of attention values of the target node and the first adjacent node to the sum of the previous round of attention values of the target node and all adjacent nodes;
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
In one possible embodiment, the sum of the previous attention force values of the target node and all the adjacent nodes thereof comprises:
the sum of the previous round of attention values of the target node and all of its valid neighboring nodes, wherein the previous round of attention values of the target node and the valid neighboring nodes is greater than a predetermined attention threshold.
In one possible embodiment, determining the current round attention value of the target node and the first neighboring node according to the sum of the previous round attention value of the target node and the first neighboring node and the first iteration increment comprises:
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node, the first iteration increment and a preset attenuation value.
In one possible embodiment, the method further comprises:
before the iteration updating of the plurality of rounds, setting an initial value of the attention value of the target node and each adjacent node aiming at the target node, wherein the initial value of the target node and any first adjacent node is the logarithm of the sum of the seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node.
According to a second aspect, there is provided a core business object mining apparatus, including:
a service relation graph obtaining unit configured to obtain a service relation graph including a plurality of nodes corresponding to the plurality of service objects, respectively, and a connection edge established according to a service relation between the service objects;
the sub-community graph acquisition unit is configured to divide the business relation graph into a plurality of connected sub-community graphs based on a community discovery algorithm and determine seed nodes contained in each sub-community graph;
the core business object determining unit is configured to execute a plurality of rounds of iteration on any sub-community graph, wherein any round of iteration comprises that for any target node, each round of attention value is determined according to the target node and the previous round of attention value of each adjacent node; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and under the condition that the centrality reaches a preset threshold and the target node belongs to the seed node, determining the business object corresponding to the target node as a core business object.
In a possible implementation manner, the business object is one of an account and a bank card, and the business relationship is a transaction relationship between the account and the bank card.
In one possible embodiment, the apparatus further comprises,
and the core business object set determining unit is configured to determine a core business object set according to the sum of the core business objects determined by the sub-community graphs.
In one possible embodiment, the community discovery algorithm is one of a Louvian algorithm and an Infomap algorithm.
In a possible implementation manner, the seed nodes include an initial seed node and a predicted seed node, and each of the sub-community graphs includes a plurality of initial seed nodes;
the sub-community graph acquisition unit is further configured to: and determining whether the rest nodes are prediction seed nodes or not according to the initial seed nodes included in each sub-community graph.
In a possible implementation manner, the sub-community map obtaining unit is further configured to:
and determining whether the rest nodes are predicted seed nodes or not according to the initial seed nodes included by each sub-community graph based on a Label Propagation Algorithm (LPA).
In a possible implementation, the core business object determining unit is further configured to:
and updating the centrality of the target node according to the attention of each current round, the weight value of the connecting edge of the target node and each adjacent node and whether each adjacent node is a seed node.
In a possible implementation, the core business object determining unit is further configured to:
for the target node and any first neighboring nodes thereof,
determining a first iteration increment according to the proportion of the previous round of attention values of the target node and the first adjacent node to the sum of the previous round of attention values of the target node and all adjacent nodes;
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
In one possible embodiment, the sum of the previous attention force values of the target node and all the adjacent nodes thereof comprises:
the sum of the previous round of attention values of the target node and all of its valid neighboring nodes, wherein the previous round of attention values of the target node and the valid neighboring nodes is greater than a predetermined attention threshold.
In a possible implementation, the core business object determining unit is further configured to:
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node, the first iteration increment and a preset attenuation value.
In a possible embodiment, the core business object determination unit is further configured to,
before the iteration updating of the plurality of rounds, setting an initial value of the attention value of the target node and each adjacent node aiming at the target node, wherein the initial value of the target node and any first adjacent node is the logarithm of the sum of the seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
By using one or more of the method, the apparatus, the computing device and the storage medium in the above aspects, the processing speed of excavating the core business object from a large number of business objects can be effectively increased, and the obtained core business object has more rationality and interpretability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating a mining method for a core business object according to an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating a method for mining a core business object according to an embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of seed node prediction based on a graph of sub-communities, in accordance with an embodiment of the present description;
FIG. 4 is a block diagram illustrating seed node prediction based on a business relationship graph according to an embodiment of the present description;
fig. 5 is a block diagram illustrating a mining apparatus for a core business object according to an embodiment of the present disclosure.
Detailed Description
The solution provided by the present specification will be described below with reference to the accompanying drawings.
In many industries, such as finance, payment, etc., business activities typically involve a large number of business objects, such as accounts for conducting business, users, bank cards used in business, etc. Meanwhile, some illegal industries may use the accounts or bank cards mastered by the illegal industries to engage in illegal business activities, which may cause losses to enterprises and other users providing business services. Therefore, there is a need for an efficient technical means to identify business objects for illegal activities, especially core business objects where the role is more critical. Existing solutions for identifying such core objects are generally implemented by constructing a business relationship network reflecting the relationships between business objects, and predetermining an illegitimate seed object in the business objects. Then, a core business object in the business objects engaged in illegal activities is identified from the business relationship network by using a graphic algorithm such as a traversal and routing algorithm, a centrality algorithm, a community discovery algorithm and the like.
However, the above-described scheme has the following problems: the core object is usually determined based on the centrality of the business objects engaged in illegal activities, which is mainly determined by the strength of the business relationships between the business objects, which is usually predetermined and does not change during the calculation process. On one hand, on the other hand, the centrality cannot be updated based on the change of the strength of the business relationship, and only the centrality can be updated by means of artificially set attenuation values, so that the iterative computation process for obtaining the final core business object is very slow in convergence, long in computation time and large in total consumed computing resources. On the other hand, the finally obtained core object is mainly determined by simply depending on the service relationship strength, and does not have sufficient rationality in many service scenes. For example, in some scenarios, normal objects may also have strong business relationships therebetween, and if the business objects are managed and controlled based on the strong relationships only, the business interpretability is lacking, and it is difficult to persuade the user.
In order to solve the above technical problem, an embodiment of the present specification provides a method for mining a core business object. Fig. 1 is a schematic diagram illustrating a mining method of a core business object according to an embodiment of the present disclosure. As shown in fig. 1, first, a business relationship graph is established according to business objects and business relationships between the business objects, where nodes in the business relationship graph correspond to the business objects, and edges between the nodes correspond to the business relationships between the business objects. Some initial seed nodes (e.g., the light gray nodes shown in fig. 1) may be included in the nodes, and these seed nodes may correspond to business objects that are predicted to be used for illegal business activities. Then, the business relationship graph may be divided into a plurality of subgraphs, i.e., sub-community graphs (e.g., sub-community graph 1, sub-community graph 2, and sub-community graph 3 in fig. 1) according to a community discovery algorithm, and in each sub-community graph, other seed nodes (e.g., dark gray nodes shown in fig. 1) are inferred from the known initial seed nodes by using a label transfer algorithm.
Next, in each community graph, the attention value of each node to each adjacent node thereof may be initialized, for example, in the child community graph 1, the attention value a (2, 3) of the node 2 to the adjacent node 3 thereof is initialized, the attention values a (3, 1) and a (3, 2) of the nodes 3 to the adjacent nodes 1 and 2 thereof, and the attention value a (1, 3) of the node 1 to the adjacent node 3 thereof are initialized, respectively. Similarly, in the sub-community fig. 2 and the sub-community fig. 3, attention values of the nodes adjacent thereto may be initialized. The centrality of each node is then determined based on the attention and the label values of whether its respective neighboring nodes are seed nodes (including initial and predicted seed nodes). Thereafter, the attention of each node to the adjacent node may be updated through multiple rounds of iterations, and the centrality of each node after the attention is updated (for example, the centrality C1, C2, and C3 of each of the nodes 1, 2, and 3 in the sub-community fig. 1) is calculated until the centrality of the specific node reaches a predetermined threshold and the specific node is the seed node, the service object corresponding to the specific node is determined as the core service object (for example, in the sub-community fig. 1, the service object corresponding to the node 3 is determined as the core object according to the fact that the centrality of the node 3 exceeds the predetermined threshold). In addition, in the above iteration, it may be determined whether the neighboring nodes of the node are effective in the calculation of the centrality of the node according to a preset attention threshold, that is, when the attention of a certain node to a certain neighboring node is reduced to a predetermined degree, the neighboring node does not contribute to the calculation of the centrality of the node. This is equivalent to "clipping" the nodes according to dynamic attention in calculating centrality. In different embodiments, the centrality of a node may be determined according to, for example, a weight of a connection edge between the node and its adjacent node, where the edge weight may correspond to a strength of a business relationship of business objects, for example, in an example, the business relationship may be a transaction relationship between different business objects, and the edge weight may correspond to, for example, an amount, a frequency, and the like of a transaction.
The method has the following advantages: in one aspect, in the calculation of the node centrality, an attention parameter of the target node to the adjacent node is introduced, and the attention parameter can reflect the influence degree of the adjacent node to the target node. From the business perspective, compared with the prior art, the centrality is determined mainly based on the weight of the relation edge, the influence degree of the adjacent node, especially the seed adjacent node, is obviously more reasonable and interpretable from the business perspective, for example, the influence degree has high attention to a plurality of illegal seeds, and the probability of becoming a core illegal seed is higher. On the other hand, in the multiple rounds of iterative computation of the node centrality, whether the adjacent node participates in the computation of the centrality is dynamically determined according to the attention of the adjacent node, and the effective network structure is basically dynamically trimmed, so that the network convergence can be accelerated in the computation of the centrality, the computation amount is reduced, and the computation speed is improved.
The details of the process are further set forth below. Fig. 2 is a flowchart illustrating a mining method for a core business object according to an embodiment of the present disclosure. As shown in fig. 2, the method at least comprises the following steps:
step 21, obtaining a business relation graph, wherein the business relation graph comprises a plurality of nodes corresponding to a plurality of business objects respectively, and a connecting edge established according to business relations among the business objects;
step 22, dividing the business relation graph into a plurality of connected sub-community graphs based on a community discovery algorithm, and determining seed nodes contained in each sub-community graph;
step 23, for any sub-community graph, executing a plurality of rounds of iteration, wherein any round of iteration comprises that for any target node, according to the previous round of attention values of the target node and each adjacent node, each round of attention value is determined; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.
First, in step 21, a business relationship graph is obtained, which includes a plurality of nodes corresponding to a plurality of business objects, respectively, and a connection edge established according to the business relationship between the business objects.
In this step, the obtained business relationship graph may be an undirected graph, which includes nodes corresponding to the business objects and edges between the nodes, where the edges correspond to the business relationships between the business objects.
In different embodiments, a service relationship diagram may be established according to different specific service objects and different specific service relationships therebetween, which is not limited in this specification. In one embodiment, the business object may be one of an account and a bank card. In another embodiment, the business relationship may be a transaction relationship between accounts or bank cards.
Then, in step 22, based on a community discovery algorithm, the business relationship graph is divided into a plurality of connected sub-community graphs, and the seed nodes included in each sub-community graph are determined.
The Community discovery (Community Detection) algorithm is a kind of algorithm for discovering the Community structure in the network. In this step, the complete relationship graph may be divided into a plurality of sub-graphs based on a community discovery algorithm. The relevance within the subgraph is made as large as possible, and the relevance between the subgraphs is as low as possible, and such subgraphs can be referred to as a community, also referred to as a sub-community graph in the specification. In different embodiments, different community discovery algorithms may be specifically used, and the present specification does not limit this. In one embodiment, the community discovery algorithm may be one of a Louvian algorithm and an Infomap algorithm.
In this step, a seed node is also determined among the nodes included in the child community graph. The specific manner in which the seed nodes are determined may vary in different embodiments. In some embodiments, other seed nodes may be determined from the partial initial seed nodes already in the sub-community graph. It is understood that the initial seed nodes are pre-labeled based on known information. FIG. 3 illustrates a diagram of seed node prediction based on a graph of sub-communities, according to an embodiment of the present disclosure. In the embodiment shown in FIG. 3, several initial seed nodes (e.g., light gray nodes in FIG. 3) are included in the divided sub-community graph; then, it may be determined whether the remaining nodes are predicted seed nodes (e.g., dark gray nodes in fig. 3) according to the initial seed nodes included by the sub-community graph. The finally determined seed nodes include an initial seed node and a predicted seed node. In a specific embodiment, whether the remaining nodes are predicted seed nodes may be determined based on the label propagation algorithm LPA from the initial seed node.
In other embodiments, the predicting the seed node may also be performed before the sub-community graph is divided, and the seed node in the sub-community graph may directly correspond to the seed node in the business relationship graph. Fig. 4 is a block diagram illustrating seed node prediction based on a business relationship graph according to an embodiment of the present disclosure. As shown in fig. 4, before dividing the sub-community graph, it may be determined whether the remaining nodes are prediction seed nodes (e.g., dark gray nodes in fig. 4) according to the initial seed nodes (e.g., light gray nodes in fig. 4) in the business relationship graph. After determining other potential seed nodes (prediction seed nodes) of the business relationship graph according to the initial seed nodes, dividing the business relationship graph into a plurality of sub-community graphs. In this way, the seed node in the sub-community graph may directly correspond to the seed node in the business relationship graph.
Thereafter, in step 33, for each sub-community graph, through multiple iterations, the attention of each node to its neighboring nodes is updated, and according to the attention, the centrality of each node is determined, and at least according to the centrality, it is determined whether the business object corresponding to each node is a core business object. For any sub-community graph, a plurality of rounds of iteration can be executed, and in any round of iteration, for any target node, the attention value of each round can be determined according to the attention value of the target node and the previous round of the adjacent nodes; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.
In each iteration, the attention value of the current round can be determined according to the attention value of the previous round and the iteration increment of the round. Thus, in one embodiment, for the target node and any first adjacent node thereof, the first iteration increment may be determined according to a ratio of the last round of attention value of the target node and the first adjacent node to a sum of the last round of attention value of the target node and all adjacent nodes thereof; and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
In a specific embodiment, before the number of iterations is updated, an initial value of the attention value of the target node and each adjacent node may be set for the target node, where the initial value of the target node and any first adjacent node is a logarithm of a sum of seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node. In this embodiment, the initial value of attention may be expressed as:
Figure BDA0003505973420000091
wherein,
Figure BDA0003505973420000092
for the attention value of node j in the ith sub-community to its neighboring node kAn initial value (i.e., the attention value of the assumed 0 th iteration is used as the initial value of attention, it should be noted that the 0 th iteration does not really occur, but is set for convenience of calculation), K is the number of adjacent nodes of the node j, and in one example, when the adjacent node K is a seed node,
Figure BDA0003505973420000093
is 1, otherwise is 0.
After the initial value is set, according to the above embodiment, in the subsequent nth iteration, the update of the attention value of the target node j to the adjacent node k thereof can be expressed as:
Figure BDA0003505973420000094
wherein,
Figure BDA0003505973420000095
Figure BDA0003505973420000096
the attention values of the target node j in the ith sub-community to the adjacent node k in the nth-1 th iteration and the nth iteration respectively,
Figure BDA0003505973420000097
the sum of the previous attention force values of the target node and all the adjacent nodes is taken as the target node,
Figure BDA0003505973420000098
corresponding to the first iteration increment.
In another embodiment, in order to make the process of obtaining the core seed node converge more quickly, the current round attention value of the target node and the first neighboring node may also be determined according to the sum of the previous round attention value of the target node and the first neighboring node, the first iteration increment, and a preset attenuation value. In a specific embodiment, the updating of attention may be expressed mathematically as follows:
Figure BDA0003505973420000099
wherein,
Figure BDA00035059734200000910
Figure BDA00035059734200000911
in the (n-1) th iteration and the (n) th iteration, the attention value of the node j in the ith sub-community to the adjacent node k is respectively, and a is a preset attention threshold value. Delta (. sigma.) is an indicative function, and has a value of 0 or 1, wherein
Figure BDA00035059734200000912
For True, the suggestive function takes a value of 1, when
Figure BDA00035059734200000913
In False, the exponential function takes a value of 0, and β is a predetermined attenuation value. The introduction of the indicative function means that the influence of the adjacent node with attention lower than the threshold a is neglected, and the adjacent node is cut out equivalently, only the adjacent node with attention higher than the threshold is kept as the effective node, thereby further accelerating the convergence of the iteration.
In each iteration, the centrality of the target node can be determined according to the attention of the current iteration and the weight values of the connecting edges of the target node and the adjacent nodes, which are calculated as above. Therefore, in one embodiment, the centrality of the target node may be updated according to each current round of attention, the weight value of the connecting edge of the target node and each adjacent node, and whether each adjacent node is a seed node.
In a specific embodiment, the update of the centrality may be expressed mathematically as follows:
Figure BDA0003505973420000101
wherein,
Figure BDA0003505973420000102
the centrality of a node j in the ith sub-community is obtained;
Figure BDA0003505973420000103
the attention value of the node j in the ith sub-community to the adjacent node k;
Figure BDA0003505973420000104
the edge weight between the node j in the ith sub-community and the adjacent node k;
Figure BDA0003505973420000105
whether the neighboring node k, which is the node j in the ith sub-community, is a seed node, in one example, when the neighboring node k is a seed node,
Figure BDA0003505973420000106
is 1, otherwise is 0.
In order to speed up the process of acquiring the core seed node (the node corresponding to the core business object), the centrality of the target node may be determined only according to the attention value of the target node to the effective adjacent node. In one embodiment, the valid neighboring nodes may be determined according to a predetermined attention threshold. In a specific embodiment, the sum of the previous round of attention values of the target node and all of its neighboring nodes may be the sum of the previous round of attention values of the target node and all of its valid neighboring nodes, wherein the previous round of attention values of the target node and the valid neighboring nodes are greater than a predetermined attention threshold.
After the core seed nodes respectively contained in each sub-community graph are determined according to each sub-community graph, all the core service nodes contained in the service relation graph can be determined by combining the core service nodes contained in all the sub-community graphs, and then the corresponding core service object set is determined. Therefore, in one embodiment, the core business object set may also be determined according to the sum of the core business objects determined by the respective sub-community maps.
According to another aspect of the embodiment, a core business object mining device is also provided. Fig. 5 is a block diagram illustrating a mining apparatus for a core business object according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 includes:
a business relationship diagram obtaining unit 51 configured to obtain a business relationship diagram, which includes a plurality of nodes corresponding to a plurality of business objects, and a connection edge established according to a business relationship between the business objects;
the sub-community graph obtaining unit 52 is configured to divide the service relationship graph into a plurality of connected sub-community graphs based on a community discovery algorithm, and determine seed nodes included in each sub-community graph;
the core business object determining unit 53 is configured to execute a plurality of iterations for any sub-community graph, where any iteration includes, for any target node, determining each attention value in the current iteration according to the previous attention value of the target node and each adjacent node; updating the centrality of the target node according to the attention of each round and whether each adjacent node is a seed node; and under the condition that the centrality reaches a preset threshold and the target node belongs to the seed node, determining the business object corresponding to the target node as a core business object.
In one embodiment, the business object may be one of an account and a bank card, and the business relationship may be a transaction relationship between the account and the bank card.
In one embodiment, the apparatus may further comprise,
and the core business object set determining unit is configured to determine a core business object set according to the sum of the core business objects determined by the sub-community graphs.
In one embodiment, the community discovery algorithm may be one of a Louvian algorithm and an Infomap algorithm.
In one embodiment, the seed nodes may include an initial seed node and a predicted seed node, and each of the sub-community graphs includes a number of initial seed nodes;
the sub-community map acquisition unit may be further configured to: and determining whether the rest nodes are prediction seed nodes or not according to the initial seed nodes included in each sub-community graph.
In one embodiment, the sub-community map obtaining unit may be further configured to:
and determining whether the rest nodes are predicted seed nodes or not according to the initial seed nodes included by each sub-community graph based on a Label Propagation Algorithm (LPA).
In one embodiment, the core business object determining unit may be further configured to:
and updating the centrality of the target node according to the attention of each current round, the weight value of the connecting edge of the target node and each adjacent node and whether each adjacent node is a seed node.
In one embodiment, the core business object determining unit may be further configured to:
for the target node and any first neighboring nodes thereof,
determining a first iteration increment according to the proportion of the previous round of attention values of the target node and the first adjacent node to the sum of the previous round of attention values of the target node and all adjacent nodes;
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
In one embodiment, the sum of the previous attention values of the target node and all the neighboring nodes thereof may include: the sum of the previous round of attention values of the target node and all of its valid neighboring nodes, wherein the previous round of attention values of the target node and the valid neighboring nodes is greater than a predetermined attention threshold.
In one embodiment, the core business object determining unit may be further configured to:
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node, the first iteration increment and a preset attenuation value.
In one embodiment, the core business object determination unit may be further configured to,
before the iteration updating of the plurality of rounds, setting an initial value of the attention value of the target node and each adjacent node aiming at the target node, wherein the initial value of the target node and any first adjacent node is the logarithm of the sum of the seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node.
Yet another aspect of the present specification provides a computer readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform any of the methods described above.
Yet another aspect of the present specification provides a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements any of the methods described above.
It is to be understood that the terms "first," "second," and the like, herein are used for descriptive purposes only and not for purposes of limitation, to distinguish between similar concepts.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (24)

1. A method for mining a core business object comprises the following steps:
acquiring a business relation graph which comprises a plurality of nodes respectively corresponding to a plurality of business objects and a connecting edge established according to the business relation among the business objects;
dividing the business relation graph into a plurality of connected sub-community graphs based on a community discovery algorithm, and determining seed nodes contained in each sub-community graph;
for any sub-community graph, executing a plurality of rounds of iteration, wherein any round of iteration comprises that for any target node, according to the previous round of attention values of the target node and each adjacent node, each round of attention value is determined; updating the centrality of the target node according to the attention values of the current round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.
2. The method of claim 1, wherein the business object is one of an account and a bank card, and the business relationship is a transaction relationship between the account and the bank card.
3. The method of claim 1, further comprising determining a set of core business objects from a sum of core business objects determined by respective sub-community maps.
4. The method of claim 1, wherein the community discovery algorithm is one of a Louv ian algorithm and an Infmap algorithm.
5. The method of claim 1, wherein the seed nodes comprise an initial seed node and a predicted seed node, and each of the sub-community graphs comprises a number of initial seed nodes;
the determining the seed nodes contained in each sub-community graph includes: and determining whether the rest nodes are prediction seed nodes or not according to the initial seed nodes included in each sub-community graph.
6. The method of claim 5, wherein determining whether the remaining nodes are predicted seed nodes according to the initial seed nodes included in each sub-community graph comprises:
and determining whether the rest nodes are predicted seed nodes or not according to the initial seed nodes included by each sub-community graph based on a Label Propagation Algorithm (LPA).
7. The method of claim 1, wherein updating the centrality of the target node based on the respective round of attention values and whether each neighboring node is a seed node comprises:
and updating the centrality of the target node according to the attention values of the current round, the weight values of the connecting edges of the target node and the adjacent nodes and whether the adjacent nodes are seed nodes or not.
8. The method of claim 1, wherein determining each current round of attention value based on the previous round of attention values of the target node and each neighboring node comprises:
for the target node and any first neighboring nodes thereof,
determining a first iteration increment according to the proportion of the previous round of attention values of the target node and the first adjacent node to the sum of the previous round of attention values of the target node and all adjacent nodes;
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
9. The method of claim 8, wherein the sum of the previous round of attention values of the target node and all of its neighboring nodes comprises:
the sum of the previous round of attention values of the target node and all of its valid neighboring nodes, wherein the previous round of attention values of the target node and the valid neighboring nodes is greater than a predetermined attention threshold.
10. The method of claim 8, wherein determining the current round of attention values for the target node and the first neighboring node based on a sum of the last round of attention values for the target node and the first neighboring node and the first iterative increment comprises:
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node, the first iteration increment and a preset attenuation value.
11. The method of claim 1, further comprising:
before the iteration updating of the plurality of rounds, setting an initial value of the attention value of the target node and each adjacent node aiming at the target node, wherein the initial value of the target node and any first adjacent node is the logarithm of the sum of the seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node.
12. An apparatus for mining a core business object, comprising:
a service relation graph obtaining unit configured to obtain a service relation graph including a plurality of nodes corresponding to the plurality of service objects, respectively, and a connection edge established according to a service relation between the service objects;
the sub-community graph acquisition unit is configured to divide the business relation graph into a plurality of connected sub-community graphs based on a community discovery algorithm and determine seed nodes contained in each sub-community graph;
the core business object determining unit is configured to execute a plurality of rounds of iteration on any sub-community graph, wherein any round of iteration comprises that for any target node, each round of attention value is determined according to the target node and the previous round of attention value of each adjacent node; updating the centrality of the target node according to the attention values of the current round and whether each adjacent node is a seed node; and determining the business object corresponding to the target node as a core business object under the condition that the centrality reaches a preset threshold value and the target node belongs to the seed node.
13. The apparatus of claim 12, wherein the business object is one of an account and a bank card, and the business relationship is a transaction relationship between the account and the bank card.
14. The apparatus of claim 12, further comprising,
and the core business object set determining unit is configured to determine a core business object set according to the sum of the core business objects determined by the sub-community graphs.
15. The apparatus of claim 12, wherein the community discovery algorithm is one of a Louv ian algorithm and an Infmap algorithm.
16. The apparatus of claim 12, wherein the seed nodes comprise an initial seed node and a predicted seed node, and each of the sub-community graphs comprises a number of initial seed nodes;
the sub-community graph acquisition unit is further configured to: and determining whether the rest nodes are prediction seed nodes or not according to the initial seed nodes included in each sub-community graph.
17. The apparatus of claim 16, wherein the sub-community graph obtaining unit is further configured to:
and determining whether the rest nodes are predicted seed nodes or not according to the initial seed nodes included by each sub-community graph based on a Label Propagation Algorithm (LPA).
18. The apparatus of claim 12, wherein the core business object determination unit is further configured to:
and updating the centrality of the target node according to the attention values of the current round, the weight values of the connecting edges of the target node and the adjacent nodes and whether the adjacent nodes are seed nodes or not.
19. The apparatus of claim 12, wherein the core business object determination unit is further configured to:
for the target node and any first neighboring nodes thereof,
determining a first iteration increment according to the proportion of the previous round attention value of the target node and the first adjacent node and the sum of the previous round attention values of the target node and all adjacent nodes;
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node and the first iteration increment.
20. The apparatus of claim 19, wherein the sum of the previous attention values of the target node and all its neighboring nodes comprises:
a sum of previous round of awareness values for the target node and all of its valid neighboring nodes, wherein the previous round of awareness values for the target node and the valid neighboring nodes is greater than a predetermined attention threshold.
21. The apparatus of claim 19, wherein the core business object determination unit is further configured to:
and determining the attention value of the current round of the target node and the first adjacent node according to the sum of the attention value of the previous round of the target node and the first adjacent node, the first iteration increment and a preset attenuation value.
22. The apparatus of claim 12, wherein the core business object determination unit is further configured to,
before the iteration updating of the plurality of rounds, setting an initial value of the attention value of the target node and each adjacent node aiming at the target node, wherein the initial value of the target node and any first adjacent node is the logarithm of the sum of the seed label values of all adjacent nodes of the target node; the seed label value is used to indicate whether the neighboring node is a seed node.
23. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-11.
24. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-11.
CN202210139199.0A 2022-02-15 2022-02-15 Core business object mining method and device Active CN114547143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210139199.0A CN114547143B (en) 2022-02-15 2022-02-15 Core business object mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210139199.0A CN114547143B (en) 2022-02-15 2022-02-15 Core business object mining method and device

Publications (2)

Publication Number Publication Date
CN114547143A true CN114547143A (en) 2022-05-27
CN114547143B CN114547143B (en) 2024-10-01

Family

ID=81676373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210139199.0A Active CN114547143B (en) 2022-02-15 2022-02-15 Core business object mining method and device

Country Status (1)

Country Link
CN (1) CN114547143B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067808A1 (en) * 2012-09-06 2014-03-06 International Business Machines Corporation Distributed Scalable Clustering and Community Detection
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
US20140172826A1 (en) * 2012-12-13 2014-06-19 Florian Hoffmann Social network analyzer
CN110825935A (en) * 2019-09-26 2020-02-21 福建新大陆软件工程有限公司 Community core character mining method, system, electronic equipment and readable storage medium
CN111831923A (en) * 2020-07-14 2020-10-27 北京芯盾时代科技有限公司 Method, device and storage medium for identifying associated specific account
US20200372373A1 (en) * 2019-05-21 2020-11-26 Sisense Ltd. System and method for generating organizational memory using semantic knowledge graphs

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067808A1 (en) * 2012-09-06 2014-03-06 International Business Machines Corporation Distributed Scalable Clustering and Community Detection
US20140172826A1 (en) * 2012-12-13 2014-06-19 Florian Hoffmann Social network analyzer
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
US20200372373A1 (en) * 2019-05-21 2020-11-26 Sisense Ltd. System and method for generating organizational memory using semantic knowledge graphs
CN110825935A (en) * 2019-09-26 2020-02-21 福建新大陆软件工程有限公司 Community core character mining method, system, electronic equipment and readable storage medium
CN111831923A (en) * 2020-07-14 2020-10-27 北京芯盾时代科技有限公司 Method, device and storage medium for identifying associated specific account

Also Published As

Publication number Publication date
CN114547143B (en) 2024-10-01

Similar Documents

Publication Publication Date Title
CN115131566A (en) Automatic image segmentation method based on super-pixels and improved fuzzy C-means clustering
CN111813869B (en) Distributed data-based multi-task model training method and system
CN110166344B (en) Identity identification method, device and related equipment
CN112926990A (en) Method and device for fraud identification
CN111274026A (en) Load balancing method and device and electronic equipment
CN110428139A (en) The information forecasting method and device propagated based on label
CN111178678B (en) Network node importance evaluation method based on community influence
CN112163096A (en) Malicious group determination method and device, electronic equipment and storage medium
CN114997317A (en) Method and device for training wind control model and predicting risk category
CN113904943B (en) Account detection method and device, electronic equipment and storage medium
CN105049315B (en) A kind of virtual network improvement mapping method based on virtual network segmentation
CN114547143A (en) Core business object mining method and device
CN111736774B (en) Redundant data processing method and device, server and storage medium
CN110322350B (en) Method, device, equipment and storage medium for cutting hollow block in consensus network
CN110781410A (en) Community detection method and device
CN111405563A (en) Risk detection method and device for protecting user privacy
CN114511760B (en) Sample equalization method, device, equipment and storage medium
CN114416819A (en) Mining method and device for public clients
CN109901931B (en) Reduction function quantity determination method, device and system
CN112765236A (en) Adaptive abnormal equipment mining method, storage medium, equipment and system
CN113259170B (en) Method for identifying sub-network and key target thereof in computer network and application thereof
US20230342420A1 (en) Approximate maximal clique enumeration for dynamic graphs
CN115511649A (en) Overlapping community discovery method and device based on modularity optimization
CN118154317A (en) Transaction object identification method, device, electronic equipment and storage medium
CN115600817A (en) Method, device, equipment and storage medium for analyzing interaction information between objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant