CN112163848B - Role division system oriented to stream network, working method and medium thereof - Google Patents

Role division system oriented to stream network, working method and medium thereof Download PDF

Info

Publication number
CN112163848B
CN112163848B CN202010995079.1A CN202010995079A CN112163848B CN 112163848 B CN112163848 B CN 112163848B CN 202010995079 A CN202010995079 A CN 202010995079A CN 112163848 B CN112163848 B CN 112163848B
Authority
CN
China
Prior art keywords
embedding
node
network
transfer
role
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010995079.1A
Other languages
Chinese (zh)
Other versions
CN112163848A (en
Inventor
杜研
王巍
王佰玲
辛国栋
刘扬
黄俊恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weihai Harvey Asset Management Co ltd
Original Assignee
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Weihai filed Critical Harbin Institute of Technology Weihai
Priority to CN202010995079.1A priority Critical patent/CN112163848B/en
Publication of CN112163848A publication Critical patent/CN112163848A/en
Application granted granted Critical
Publication of CN112163848B publication Critical patent/CN112163848B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/10Payment architectures specially adapted for electronic funds transfer [EFT] systems; specially adapted for home banking systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/30Payment architectures, schemes or protocols characterised by the use of specific devices or networks
    • G06Q20/34Payment architectures, schemes or protocols characterised by the use of specific devices or networks using cards, e.g. integrated circuit [IC] cards or magnetic cards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Development Economics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a role division system, a working method thereof and a medium thereof, comprising a data acquisition module, a directional weighting network acquisition module, an embedding module and a clustering module; the data acquisition module is used for acquiring transfer data; the directional weighting network acquisition module is used for representing the transfer data into a directional weighting network; the embedding module is used for firstly extracting two undirected subgraphs for each node, then adopting the GraphWave algorithm to perform structural embedding, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding; the clustering module clusters the node embedding obtained in the previous step by using the improved self-organizing map neural network to obtain the role division of the nodes. The invention can quickly find the role composition of an economic organization and find the roles which are possibly advanced members by combining experience.

Description

Role division system oriented to stream network, working method and medium thereof
Technical Field
The invention relates to a role division system, a working method thereof and a medium, in particular to a role division system oriented to a streaming network and a working method thereof.
Background
In the network, each node has its own role. The same roles have similar behavior, function, and effect. For example, an enterprise network, a role may be manager, group leader, and general staff. Roles help to study key nodes of an organization or system, analyze hierarchical structures, and provide reference information for comparison among multiple networks.
The role division of the bank card number on an economic organization is beneficial to knowing the organization structure, is beneficial to quickly finding out advanced members and core members, and can not draw attention of the investigated organization because only bank data is needed. This is useful in investigating suspected illegal economic organizations such as marketing organizations.
There is no presently disclosed method of role-dividing the bank card number, however, since transfer data may be expressed as a streaming network, role division may be expressed as node role division of the streaming network. A streaming network is a special, directionally weighted network whose edges represent the flow of energy, material, money, information, etc., and whose weights represent traffic. The flow network we are concerned with is an unbalanced flow network, meaning that the ingress traffic of a node is not necessarily equal to the egress traffic. This is common in the real world, such as funding flow networks within an enterprise. Several methods are proposed at home and abroad on the similar problems. But these methods are not effective on directionally weighted transfer networks because they can only score out predefined three, four classes of typical roles, or ignore the direction and weight of the network.
The method for processing the approximation problem is specifically as follows:
tang Shiqi et al propose a role discovery algorithm, essentially a node embedding algorithm, which uses the centrality of the nodes, eigenvectors, eigenvalues to construct an eigenvalue matrix, and then uses a non-negative matrix factorization to process the object into a generic network.
Li Wan proposes a role discovery algorithm inspired by the concept of physical fields and potentials, proposes a concept of topology potentials in the network, and proposes a topology potential and an ingress topology potential for a directionally weighted network, however, the egress (ingress) topology potential is only an influential summation and is insensitive to the network structure, and the algorithm can only separate four typical roles or fuzzy roles between the four roles. Such roles are literally the same as the roles studied herein and are substantially different.
Zheng Kunlun proposes a role analysis algorithm which finds important nodes based on some properties of the nodes (centrality, bets, pageRank rank), then deletes important nodes, deletes only affected nodes, and then continues this process in the rest of the network, generally by continually dismantling the network to find important nodes, without considering the weighting of the edges, which is not applicable to weighted networks.
Zhong Xiaoyu proposes a role-partitioning method which, like the Li Wan method described above, can only partition three typical roles.
Node embedding (representation learning) is an important step of role division, and various node embedding methods have been proposed in foreign countries. RolX, struct 2vec and GraphWave are representation learning methods based on structural roles, where points of structural role similarity get close vectors. And extracting a feature matrix by RolX through a ReFeX algorithm, and then decomposing through a non-negative matrix to obtain an embedded vector. struct 2vec reconstructs the network from the structural similarity of nodes, connects two similar points with strong edges, and obtains a vector representation of the nodes with deep walk on the new network. struct 2vec focuses only on topology, ignoring the properties of edges and points. The GraphWave uses diffusion wavelet to process Laplacian matrix, and uses wavelet coefficient as probability distribution to process, so that GraphWave has good performance on undirected network, and can consider the weight of edge. RolX, struct 2ve and GraphWave can only handle undirected graphs. DEG is a representation learning method on directed graphs, LINE, node2vec can be used for both undirected and directed graphs, but they are based on affinity rather than similarity. Graph2Gauss embeds nodes using network structure and node parameters, but the network structure refers to distance only, so that nodes with close distance are more likely to obtain similar embedded vectors, and Graph2Gauss is irrelevant to roles. At present, no method is available for node embedding (representing nodes as vectors) of a stream network computing network, and then clustering is performed on the node embedding in a vector space. Thereby realizing node role division of the streaming network.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a role division system oriented to a flow network;
the invention also provides a working method of the role division system;
the purpose of the invention is that: and carrying out role division on a group of bank card numbers, indicating a total of several roles, wherein each role corresponds to the bank card numbers. The role division of the invention can not recognize the meaning of each role, such as a boss or a manager, only knows that bank card numbers belonging to the same role have similar transfer behaviors, and the transfer behaviors of card numbers of different roles are different. The role division of the bank card numbers refers to dividing all the input bank card numbers into a plurality of groups, wherein the bank card numbers in each group have the same role, and the roles of the bank card numbers in different groups are different, for example, a data set containing 5 bank card numbers a, b, c, d, e is input, and after being processed by a role division system, two groups, one group abc and one group de, are obtained, so that the two roles of the card numbers are indicated, and the abc is one role, and the de is one role.
Term interpretation:
1. a converging graph, for a point a, a converging graph, the points of the graph including all points of the original graph, the edges of the graph including edges that may carry material flowing to point a. Describing a converging sub-graph edge set E of a point by mathematical language gat (a) Is generated by the following steps: for each edge of the original image pointing to a < n, a >, add it to E gat (a) In the first-order upstream neighbor n, n is called a, the same processing is carried out on each first-order upstream neighbor n of a, namely, each edge pointing to n is less than m, n is more than E gat (a) In, and m is called the second order upstream neighbor of a, and so on, all edges pointing from the k+1 order upstream neighbor of a to the k order upstream neighbor are added to E gat (a) Until no new edges are added, the edges in the converging subgraph retain the weights of the original graph, and the direction is discarded.
2. A diffusion subgraph, for a point a, a diffusion subgraph is an undirected graph, the points of the graph including all points of the original graph, the edges of the graph containing edges that may carry material flowing from point a. Diffusion sub-graph edge set E describing point a in mathematical language dif (a) Is generated by the following steps: and E is connected with gat (a) Symmetrical to the generation of (a), adding all edges pointing from the k-th order downstream neighbor of a to the k+1-th order downstream neighbor to E dif (a) Until no new edges are added, the edges in the diffusion subgraph retain the weights of the original graph, and the direction is discarded.
3. PCA, principal Components Analysis principal component analysis, a well-known dimension reduction method.
The technical scheme of the invention is as follows:
a role division system for a stream network comprises a data acquisition module, a directional weighting network acquisition module, an embedding module and a clustering module which are connected in sequence;
the data acquisition module is used for: acquiring transfer data;
the directional weighted network acquisition module is used for: representing the transfer data as a directed weighted network;
the embedded module is used for: firstly, extracting two undirected subgraphs for each node, then adopting GraphWave algorithm structural embedding, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding;
the clustering module is used for: and clustering the node embedding obtained in the previous step by using the improved self-organizing map neural network to obtain the role division of the nodes.
The working method of the role division system facing the flow network comprises the following steps:
(1) The data acquisition module acquires transfer data; the transfer data refer to transfer data between the bank cards, and the transfer data between each bank card comprise a card number of a transfer-out party, a card number of the transfer-in party, an amount and time;
(2) The directional weighting network acquisition module represents transfer data as a directional weighting network;
(3) Acquiring node embedding through the embedding module; firstly, extracting two undirected subgraphs for each node, then obtaining structural embedding by using a GraphWave algorithm, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding;
(4) Role division is achieved through the clustering module: and clustering the node embedding obtained in the previous step by using an improved self-organizing map neural network to obtain the role division of the nodes.
Preferably, in step (2), the transfer data is represented as a directional weighting network, which means: and (3) representing the card numbers of all the transfer-out parties and the card numbers of all the transfer-in parties as points in the directional weighting network, representing the accumulated transfer money between the card numbers of the transfer-out parties and the card numbers of the transfer-in parties as directional edges between the two points, pointing to the transfer-in parties from the transfer-out parties, and obtaining the directional weighting network by taking the weights as money.
According to the present invention, preferably, in step (3), two undirected subgraphs are extracted for each node, and the undirected subgraphs include a convergence subgraph and a diffusion subgraph, including the following steps:
g= (V, E) representing the original graph, i.e. the directional weighting network obtained in step (2); v is a set of points, including each point in the directed weighted network; e is a set of edges, including each edge in the directed weighted network;
the acquisition process of the convergence graph is as follows:
G gat (a)=(V,E gat (a) A convergence graph for point a; point a is any point in the set of points V; e (E) gat (a) Refers to G gat (a) Is a set of edges;
E gat (a) The solving process of (1) is as follows: for each edge pointing to a < n, a >, add it to E gat (a) In the first-order upstream neighbor n, n is called a, the same processing is carried out on each first-order upstream neighbor n of a, namely, each edge pointing to n is less than m, n is more than E gat (a) In, and m is called the second order upstream neighbor of a, and so on, all edges pointing from the k+1 order upstream neighbor of a to the k order upstream neighbor are added to E gat (a) Until no new edge is added, the edges in the converging subgraph keep the weight of the original graph, and the direction is discarded;
the diffusion subgraph acquisition process is as follows:
G dif (a)=(V,E dif (a) A diffusion subgraph about point a); e (E) dif (a) Refers to G dif (a) Is a set of edges;
E dif (a) The solving process of (1) is as follows: for each edge with a < a, n >, which is a, it is added to E dif (a) In the first order downstream neighbor n, n is called a, the same processing is carried out on each first order downstream neighbor n of a, namely each source point is called aThe edges of n < n, m > being added to E dif (a) In, and m is called the second order downstream neighbor of a, and so on, all edges pointing from the k-order downstream neighbor of a to the k+1-order upstream neighbor are added to E dif (a) Until no new edges are added, the edges in the diffusion subgraph retain the weights of the original graph, and the direction is discarded.
According to the present invention, in step (3), a GraphWave algorithm is adopted to obtain structural embedding, where the structural embedding is a vector, and represents the position of a point in a continuous space of a structural role, and the method includes the following steps:
processing G using GraphWave algorithm gat (a) Obtaining structural embedded X of point a in convergence subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Processing G using GraphWave algorithm dif (a) Obtaining structural embedded Y of point a in diffusion subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Then the integrity of point a is embedded (X a ,Y a )。
According to the present invention, in the step (3), the node embedding is obtained by integrating the structural embedding and the node in-out flow difference, and the method comprises the following steps:
the node's ingress and egress traffic difference is important information for the nodes in the streaming network. The weight of the directed edge < i, j > is marked as w ij I, j respectively represent the card number of the transfer-out party and the card number of the transfer-in party, w ij A cumulative transfer amount for i to j; then the incoming flow at point a is f in =∑w ia The output flow is f out =∑w ai The output flow difference is d=f in -f out
In order to get a complete vector representation of the node, the outgoing traffic difference and the structural embedding needs to be integrated. However, the structural embedding is typically a vector of more than 16 dimensions (depending on the parameters), and if the input and output flow differences are pieced together directly, the effect of the output flow difference will be very weak. Structural embedding is reduced to 2 dimensions, and then the structural embedding is spliced with the in-out flow difference.
Reducing the structural embedding to 2 dimensions;
further preferably, the structural embedding is reduced to 2 dimensions by PCA;
it should be noted that, because the flow difference is actually used as the weight of the node, the incoming and outgoing flow difference is replaced by any other value that can be used as the weight of the node;
the structural embedding and the in-out flow difference are spliced together to obtain a 3-dimensional vector representation, so that for each point, a 3-dimensional vector representation, namely node embedding, is obtained.
According to the invention, in the step (4), the node embedding obtained in the previous step is clustered by using the improved self-organizing map neural network SOM to obtain the role division of the nodes, and the method comprises the following steps:
the improved self-organizing map neural network SOM comprises an input layer and a competition layer, wherein the competition layer is fully connected with the input layer, and the number of neurons of the input layer is the same as the dimension of a single input; the input layer neurons accept inputs and the competing layer neurons compete with each other.
Improved self-organizing map neural network SOM weight [ w ] 1 ,w 2 ,w 3 ......]Is the location of the competing layer neurons; if there are two neurons in the input layer, then the competing layer neurons are arranged in a two-dimensional plane.
A. Initializing dense competitive layer neurons;
B. every time a stimulus p is input, the neural network calls the input as a stimulus, the stimulus p refers to node embedding obtained in the step (3), and the nearest i.e. i w is selected from the competition layer i -p|| 2 The smallest neuron i is the winning neuron;
C. adjusting winning neurons and a neighborhood N (I) thereof, wherein the neighborhood N (I) refers to the position of each neuron in a region near the I, so that the neurons are close to p, and the learning rate is represented by alpha, and the adjustment position is shown as a formula (I):
w j ′=w j +α(p-w j ),j∈N(i) (I)
in formula (I), j is a neuron in the neighborhood, w j Is the location of the neuron, w j ' is the new position after adjustment;
after multiple training, neurons move into clusters;
D. inputting two stimulations of the same cluster, corresponding to the same winning neuron, wherein the cluster center is the winning neuron;
a significant advantage of SOM is that when it is used as a clustering algorithm, there is no need to specify the number of clusters, which is what we need. However, multiple cluster centers often occur within a natural cluster. To solve this problem, a resolution concept is proposed, where resolution refers to the fact that the weights of competing layers can only take discrete values. In discrete vector space, neurons may move to the same location as they move to several inputs that are very close together, thus reducing cluster centers and improving integrity. How does an appropriate resolution be selected? It is clear that the optimal resolution varies from dataset to dataset, which introduces new problems.
E. Checking all the cluster centers, and if two cluster centers are neighbors or the neighbor positions of one cluster center and the other cluster center are overlapped, merging the two cluster centers, wherein the obtained cluster centers are the role division results.
Further preferably, initializing dense competitive layer neurons means: 4 neurons per unit area. That is, the neurons are uniformly distributed on a two-dimensional plane like the crossing points of a square grid, and the side length of the square is 0.5, so that 4 neurons are distributed per unit area.
According to the invention, in step C, a higher resolution training of 0.2 is used. It corresponds to drawing a square grid on a two-dimensional plane, the square side being 0.2, and the neuron moving to the nearest grid intersection after each repositioning, for example, at (0.12,1.27), then moving to (0.2,1.2).
A computer-readable storage medium, wherein a program of an operation method of a role division system for a flow network is stored in the computer-readable storage medium, and when the program of the operation method of the role division system for a flow network is executed by a processor, the steps of the operation method of the role division system for a flow network are implemented.
The beneficial effects of the invention are as follows:
1. the role division method provided by the invention can quickly find the role composition of an economic organization, find the role possibly of an advanced member by combining experience, and if the investigated organization is an illegal organization, can investigate the controllers of key bank accounts in the organization earlier, thereby improving the case handling efficiency, reducing the loss of public property and maintaining the economic order.
2. The invention divides the roles of a plurality of economic organizations, can compare the role compositions of the organizations, and is favorable for identifying special organizations with the role compositions different from those of common organizations.
3. The practical range comprises a tangential network, and the application prospect comprises member role division of illegal economic organizations, role division of transportation hubs, abnormal organization identification based on role composition and the like. A streaming network is a special, directionally weighted network whose edges represent the flow of energy, material, money, information, etc., and whose weights represent traffic.
Drawings
FIG. 1 is a block diagram of a role-splitting system for a streaming network;
FIG. 2 is a schematic diagram of a method of operation of a role-splitting system for a streaming network;
FIG. 3 (a) is a schematic diagram of the original drawing;
FIG. 3 (b) is a schematic diagram of a converging sub-graph with point A extracted from the graph (a);
fig. 3 (c) is a schematic view of a diffusion sub-graph of point a extracted from fig. (a).
Detailed Description
The invention is further defined by, but is not limited to, the following drawings and examples in conjunction with the specification.
Example 1
A role division system facing a flow network is shown in figure 1, and comprises a data acquisition module, a directional weighting network acquisition module, an embedding module and a clustering module which are connected in sequence; the data acquisition module is used for: acquiring transfer data; the directional weighted network acquisition module is used for: representing the transfer data as a directed weighted network; the embedded module is used for: firstly, extracting two undirected subgraphs for each node, then adopting GraphWave algorithm structural embedding, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding; the clustering module is used for: and clustering the node embedding obtained in the previous step by using the improved self-organizing map neural network to obtain the role division of the nodes.
Example 2
The working method of the role-division system for a streaming network in embodiment 1, as shown in fig. 2, includes the following steps:
(1) The data acquisition module acquires transfer data; the transfer data refer to transfer data between the bank cards, and the transfer data between each bank card comprise a card number of a transfer-out party, a card number of the transfer-in party, an amount and time;
(2) The directional weighting network acquisition module represents the transfer data as a directional weighting network; is as follows: and (3) representing the card numbers of all the transfer-out parties and the card numbers of all the transfer-in parties as points in the directional weighting network, representing the accumulated transfer money between the card numbers of the transfer-out parties and the card numbers of the transfer-in parties as directional edges between the two points, pointing to the transfer-in parties from the transfer-out parties, and obtaining the directional weighting network by taking the weights as money.
(3) Acquiring node embedding through an embedding module; firstly, extracting two undirected subgraphs for each node, then obtaining structural embedding by using a GraphWave algorithm, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding;
(4) Role division is achieved through a clustering module: and clustering the node embedding obtained in the previous step by using an improved self-organizing map neural network to obtain the role division of the nodes.
In the step (3), two undirected subgraphs are extracted for each node, wherein each undirected subgraph comprises a converging subgraph and a diffusion subgraph, and the method comprises the following steps:
g= (V, E) representing the original graph, as shown in fig. 3 (a), i.e. the directional weighting network obtained in step (2); v is a set of points, including each point in the directed weighted network, including A, B, C, D, E, F, G, H, I, J; e is a set of edges, including each edge in the directed weighted network;
as shown in fig. 3 (b), the acquisition process of the convergence sub image of the point a is extracted as follows:
G gat (a)=(V,E gat (a) A convergence graph for point a; point a is any point in the set of points V; e (E) gat (a) Refers to G gat (a) Is a set of edges;
E gat (a) The solving process of (1) is as follows: for each edge pointing to a < n, a >, add it to E gat (a) In the first-order upstream neighbor n, n is called a, the same processing is carried out on each first-order upstream neighbor n of a, namely, each edge pointing to n is less than m, n is more than E gat (a) In, and m is called the second order upstream neighbor of a, and so on, all edges pointing from the k+1 order upstream neighbor of a to the k order upstream neighbor are added to E gat (a) Until no new edge is added, the edges in the converging subgraph keep the weight of the original graph, and the direction is discarded;
as shown in fig. 3 (c), the process of obtaining the diffusion subgraph of the extraction point a is as follows:
G dif (a)=(V,E dif (a) A diffusion subgraph about point a); e (E) dif (a) Refers to G dif (a) Is a set of edges;
E dif (a) The solving process of (1) is as follows: for each edge with a < a, n >, which is a, it is added to E dif (a) In the first order downstream neighbor n, n is called a, the same treatment is carried out on each first order downstream neighbor n of a, namely, the edge with each source point being n is less than n, and m is added to E dif (a) In, and m is called the second order downstream neighbor of a, and so on, all edges pointing from the k-order downstream neighbor of a to the k+1-order upstream neighbor are added to E dif (a) Until no new edges are added, the edges in the diffusion subgraph retain the weights of the original graph, and the direction is discarded.
In the step (3), a GraphWave algorithm is adopted to obtain structural embedding, wherein the structural embedding is a vector and represents the position of a point in a continuous space of a structural role, and the method comprises the following steps:
processing G using GraphWave algorithm gat (a) Obtaining structural embedded X of point a in convergence subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Processing G using GraphWave algorithm aif (a) Obtaining structural embedded Y of point a in diffusion subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Then point aComplete structural embedding (X) a ,Y a )。
In the step (3), the node embedding is obtained by integrating the structural embedding and the node in-out flow difference, and the method comprises the following steps:
the node's ingress and egress traffic difference is important information for the nodes in the streaming network. The weight of the directed edge < i, j > is marked as w ij I, j respectively represent the card number of the transfer-out party and the card number of the transfer-in party, w ij A cumulative transfer amount for i to j; then the incoming flow at point a is f in =∑w ia The output flow is f out =∑w ai The output flow difference is d=f in -f out
In order to get a complete vector representation of the node, the outgoing traffic difference and the structural embedding needs to be integrated. However, the structural embedding is typically a vector of more than 16 dimensions (depending on the parameters), and if the input and output flow differences are pieced together directly, the effect of the output flow difference will be very weak. Structural embedding is reduced to 2 dimensions, and then the structural embedding is spliced with the in-out flow difference.
Reducing the structural embedding to 2 dimensions by PCA;
it should be noted that, because the flow difference is actually used as the weight of the node, the incoming and outgoing flow difference is replaced by any other value that can be used as the weight of the node;
the structural embedding and the in-out flow difference are spliced together to obtain a 3-dimensional vector representation, so that for each point, a 3-dimensional vector representation, namely node embedding, is obtained.
In the step (4), the node embedding obtained in the previous step is clustered by using an improved self-organizing map (SOM), so as to obtain the role division of the nodes, and the method comprises the following steps:
the improved self-organizing map neural network SOM comprises an input layer and a competition layer, wherein the competition layer is fully connected with the input layer, and the number of neurons of the input layer is the same as the dimension of a single input; the input layer neurons accept inputs and the competing layer neurons compete with each other.
Improved self-organizing map neural network SOM weight [ w ] 1 ,w 2 ,w 3 ......]Is the location of the competing layer neurons; if there are two neurons in the input layer, then the competing layer neurons are arranged in a two-dimensional plane.
A. Initializing dense competitive layer neurons;
B. every time a stimulus p is input, the neural network calls the input as a stimulus, the stimulus p refers to node embedding obtained in the step (3), and the nearest i.e. i w is selected from the competition layer i -p|| 2 The smallest neuron i is the winning neuron;
C. adjusting winning neurons and a neighborhood N (I) thereof, wherein the neighborhood N (I) refers to the position of each neuron in a region near the I, so that the neurons are close to p, and the learning rate is represented by alpha, and the adjustment position is shown as a formula (I):
w j ′=w j +α(p-w j ),j∈N(i) (I)
in formula (I), j is a neuron in the neighborhood, w j Is the location of the neuron, w j ' is the new position after adjustment;
after multiple training, neurons move into clusters;
D. inputting two stimulations of the same cluster, corresponding to the same winning neuron, wherein the cluster center is the winning neuron;
a significant advantage of SOM is that when it is used as a clustering algorithm, there is no need to specify the number of clusters, which is what we need. However, multiple cluster centers often occur within a natural cluster. To solve this problem, a resolution concept is proposed, where resolution refers to the fact that the weights of competing layers can only take discrete values. In discrete vector space, neurons may move to the same location as they move to several inputs that are very close together, thus reducing cluster centers and improving integrity. How does an appropriate resolution be selected? It is clear that the optimal resolution varies from dataset to dataset, which introduces new problems.
E. Checking all the cluster centers, and if two cluster centers are neighbors or the neighbor positions of one cluster center and the other cluster center are overlapped, merging the two cluster centers, wherein the obtained cluster centers are the role division results.
Initializing dense competitive layer neurons refers to: 4 neurons per unit area. That is, the neurons are uniformly distributed on a two-dimensional plane like the crossing points of a square grid, and the side length of the square is 0.5, so that 4 neurons are distributed per unit area.
In step C, a higher resolution of 0.2 is used for training. It corresponds to drawing a square grid on a two-dimensional plane, the square side being 0.2, and the neuron moving to the nearest grid intersection after each repositioning, for example, at (0.12,1.27), then moving to (0.2,1.2).
Example 3
A computer-readable storage medium having stored therein a program for the operation method of the flow network oriented character division system according to embodiment 2, wherein the program for the operation method of the flow network oriented character division system according to embodiment 2, when executed by a processor, implements the steps of the operation method of the flow network oriented character division system according to any one of embodiment 2.

Claims (8)

1. The working method of the role division system for the streaming network is characterized by comprising a data acquisition module, a directional weighting network acquisition module, an embedding module and a clustering module which are connected in sequence;
the data acquisition module is used for: acquiring transfer data; the directional weighted network acquisition module is used for: representing the transfer data as a directed weighted network; the embedded module is used for: firstly, extracting two undirected subgraphs for each node, then adopting Graphwave algorithm to perform structural embedding, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding; the clustering module is used for: clustering the node embedding obtained in the previous step by using an improved self-organizing map neural network to obtain role division of the nodes; the method comprises the following steps:
(1) The data acquisition module acquires transfer data; the transfer data refer to transfer data between the bank cards, and the transfer data between each bank card comprise a card number of a transfer-out party, a card number of the transfer-in party, an amount and time;
(2) The directional weighting network acquisition module represents transfer data as a directional weighting network;
(3) Acquiring node embedding through the embedding module; firstly, extracting two undirected subgraphs for each node, then obtaining structural embedding by using a GraphWave algorithm, and finally integrating the structural embedding and the access flow difference of the node to obtain node embedding;
(4) Role division is achieved through the clustering module: clustering the node embedding obtained in the previous step by using an improved self-organizing map neural network to obtain role division of the nodes;
in the step (3), two undirected subgraphs are extracted for each node, wherein each undirected subgraph comprises a converging subgraph and a diffusion subgraph, and the method comprises the following steps:
g= (V, E) representing the original graph, i.e. the directional weighting network obtained in step (2); v is a set of points, including each point in the directed weighted network; e is a set of edges, including each edge in the directed weighted network;
the acquisition process of the convergence graph is as follows:
G gat (a)=(V,E gat (a) A convergence graph for point a; point a is any point in the set of points V; e (E) gat (a) Refers to G gat (a) Is a set of edges;
E gat (a) The solving process of (1) is as follows: for each edge pointing to a<n,a>Adding it to E gat (a) In the first-order upstream neighbor n, n is called a, and the same treatment is carried out on each first-order upstream neighbor n of a, namely each edge pointing to n<m,n>Added to E gat (a) In, and m is called the second order upstream neighbor of a, and so on, all edges pointing from the k+1 order upstream neighbor of a to the k order upstream neighbor are added to E gat (a) Until no new edge is added, the edges in the converging subgraph keep the weight of the original graph, and the direction is discarded;
the diffusion subgraph acquisition process is as follows:
G dif (a)=(V,E dif (a) A diffusion subgraph about point a); e (E) dif (a) Refers to G dif (a) A kind of electronic deviceA collection of edges;
E dif (a) The solving process of (1) is as follows: for each edge with a source point a<a,n>Adding it to E dif (a) In the first-order downstream neighbors, n is called a, the same processing is carried out on each first-order downstream neighbor n of a, namely each edge with the source point of n<n,m>Added to E dif (a) In, and m is called the second order downstream neighbor of a, and so on, all edges pointing from the k-order downstream neighbor of a to the k+1-order upstream neighbor are added to E dif (a) Until no new edges are added, the edges in the diffusion subgraph retain the weights of the original graph, and the direction is discarded.
2. The method of claim 1, wherein in the step (2), the transfer data is represented as a directional weighted network, which means that: and (3) representing the card numbers of all the transfer-out parties and the card numbers of all the transfer-in parties as points in the directional weighting network, representing the accumulated transfer money between the card numbers of the transfer-out parties and the card numbers of the transfer-in parties as directional edges between the two points, pointing to the transfer-in parties from the transfer-out parties, and obtaining the directional weighting network by taking the weights as money.
3. The method for operating a role-partitioning system for a streaming network according to claim 1, wherein in step (3), a GraphWave algorithm is adopted to obtain a structural embedding, and the structural embedding is a vector, and represents a position of a point in a continuous space of a structural role, and the method comprises the following steps:
processing G using GraphWave algorithm gat (a) Obtaining structural embedded X of point a in convergence subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Processing G using GraphWave algorithm dif (a) Obtaining structural embedded Y of point a in diffusion subgraph a The method comprises the steps of carrying out a first treatment on the surface of the Then the integrity of point a is embedded (X a ,Y a )。
4. The method for operating a role-partitioning system for a streaming network according to claim 1, wherein in step (3), the node embedding is obtained by integrating the structural embedding and the node ingress and egress traffic difference, comprising the steps of:
record the directed edge<i,j>The weight of (2) is w ij I, j respectively represent the card number of the transfer-out party and the card number of the transfer-in party, w ij A cumulative transfer amount for i to j; then the incoming flow at point a is f in =∑w ia The output flow is f out =∑w ai The output flow difference is d=f in -f out
Reducing the structural embedding to 2 dimensions;
the structural embedding and the in-out flow difference are spliced together to obtain a 3-dimensional vector representation, so that for each point, a 3-dimensional vector representation, namely node embedding, is obtained.
5. The method of claim 4, wherein the structured embedding is reduced to 2 dimensions by PCA.
6. The method for operating a role-partitioning system for a streaming network according to any one of claims 1 to 5, wherein in step (4), the node embedding obtained in the previous step is clustered by using an improved self-organizing map neural network SOM to obtain role partitioning of the nodes, comprising the steps of:
the improved self-organizing map neural network SOM comprises an input layer and a competition layer, wherein the competition layer is fully connected with the input layer, and the number of neurons of the input layer is the same as the dimension of a single input;
improved self-organizing map neural network SOM weight [ w ] 1 ,w 2 ,w 3 ......]Is the location of the competing layer neurons;
A. initializing dense competitive layer neurons;
B. each time a stimulus p is input, the stimulus p refers to node embedding obtained in the step (3), and the nearest i.e. i w is selected from the competition layer i -p|| 2 The smallest neuron i is the winning neuron;
C. adjusting winning neurons and a neighborhood N (I) thereof, wherein the neighborhood N (I) refers to the position of each neuron in a region near the I, so that the neurons are close to p, and the learning rate is represented by alpha, and the adjustment position is shown as a formula (I):
w j ′=w j +α(p-w j ),j∈N(i) (I)
in formula (I), j is a neuron in the neighborhood, w j Is the location of the neuron, w j ' is the new position after adjustment;
after multiple training, neurons move into clusters;
D. inputting two stimulations of the same cluster, corresponding to the same winning neuron, wherein the cluster center is the winning neuron;
E. checking all the cluster centers, and if two cluster centers are neighbors or the neighbor positions of one cluster center and the other cluster center are overlapped, merging the two cluster centers, wherein the obtained cluster centers are the role division results.
7. The method of claim 6, wherein initializing dense competitive layer neurons means: 4 neurons per unit area.
8. A computer-readable storage medium, in which a program of the method of operation of the flow network oriented character division system of any one of claims 1 to 7 is stored, which when executed by a processor, implements the steps of the method of operation of the flow network oriented character division system of any one of claims 1 to 7.
CN202010995079.1A 2020-09-21 2020-09-21 Role division system oriented to stream network, working method and medium thereof Active CN112163848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010995079.1A CN112163848B (en) 2020-09-21 2020-09-21 Role division system oriented to stream network, working method and medium thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010995079.1A CN112163848B (en) 2020-09-21 2020-09-21 Role division system oriented to stream network, working method and medium thereof

Publications (2)

Publication Number Publication Date
CN112163848A CN112163848A (en) 2021-01-01
CN112163848B true CN112163848B (en) 2023-05-12

Family

ID=73863075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010995079.1A Active CN112163848B (en) 2020-09-21 2020-09-21 Role division system oriented to stream network, working method and medium thereof

Country Status (1)

Country Link
CN (1) CN112163848B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116799A (en) * 2021-11-23 2022-03-01 河北航天信息技术有限公司 Abnormal transaction loop identification method, device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111211994A (en) * 2019-11-28 2020-05-29 南京邮电大学 Network traffic classification method based on SOM and K-means fusion algorithm
CN111241289A (en) * 2020-01-17 2020-06-05 北京工业大学 SOM algorithm based on graph theory

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side
CN109710754A (en) * 2018-11-12 2019-05-03 中国科学院信息工程研究所 A kind of group abnormality behavioral value method based on depth structure study
CN110032606B (en) * 2019-03-29 2021-05-14 创新先进技术有限公司 Sample clustering method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111211994A (en) * 2019-11-28 2020-05-29 南京邮电大学 Network traffic classification method based on SOM and K-means fusion algorithm
CN111241289A (en) * 2020-01-17 2020-06-05 北京工业大学 SOM algorithm based on graph theory

Also Published As

Publication number Publication date
CN112163848A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
Połap et al. Meta-heuristic as manager in federated learning approaches for image processing purposes
Fu et al. Axiom-based grad-cam: Towards accurate visualization and explanation of cnns
Gao et al. Deep leaf‐bootstrapping generative adversarial network for structural image data augmentation
Koga et al. Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing
Li et al. Modular community detection in networks
Badawi et al. A hybrid memetic algorithm (genetic algorithm and great deluge local search) with back-propagation classifier for fish recognition
Nerurkar et al. A comparative analysis of community detection algorithms on social networks
Lim et al. Performance effect analysis for insect classification using convolutional neural network
CN111626311B (en) Heterogeneous graph data processing method and device
Coelho et al. Applying swarm ensemble clustering technique for fault prediction using software metrics
Pang et al. Backdoor cleansing with unlabeled data
Yin et al. Intrusion detection for capsule networks based on dual routing mechanism
CN112163848B (en) Role division system oriented to stream network, working method and medium thereof
Pal et al. ESOEA: Ensemble of single objective evolutionary algorithms for many-objective optimization
Song et al. Feature grouping for intrusion detection system based on hierarchical clustering
KR102039244B1 (en) Data clustering method using firefly algorithm and the system thereof
Zhao et al. Bayesian statistical inference in machine learning anomaly detection
CN111310838A (en) Drug effect image classification and identification method based on depth Gabor network
Melnykov et al. Recent developments in model-based clustering with applications
Liu et al. Swarm intelligence for classification of remote sensing data
Ellouze Social Network Community Detection by Combining Self‐Organizing Maps and Genetic Algorithms
Tian et al. [Retracted] Intrusion Detection Method Based on Deep Learning
Li et al. An improved artificial immune network algorithm for data clustering based on secondary competition selection
CN113469816A (en) Digital currency identification method, system and storage medium based on multigroup technology
CN111814153A (en) Commercial website security maintenance method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240923

Address after: 264200 No. 2, Wenhua West Road, Shandong, Weihai

Patentee after: Weihai Harvey Asset Management Co.,Ltd.

Country or region after: China

Address before: No.2, Wenhua West Road, Huancui District, Weihai City, Shandong Province

Patentee before: HARBIN INSTITUTE OF TECHNOLOGY (WEIHAI)

Country or region before: China