CN112163848B

CN112163848B - Role division system oriented to stream network, working method and medium thereof

Info

Publication number: CN112163848B
Application number: CN202010995079.1A
Authority: CN
Inventors: 杜研; 王巍; 王佰玲; 辛国栋; 刘扬; 黄俊恒
Original assignee: Harbin Institute of Technology Weihai
Current assignee: Qingdao Hawei Industrial Technology Co ltd
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2023-05-12
Anticipated expiration: 2040-09-21
Also published as: CN112163848A

Abstract

The invention relates to a role division system and its working method and medium, including a data acquisition module, a directional weighted network acquisition module, an embedded module and a clustering module; the data acquisition module is used to acquire transfer data; the directional weighted network acquisition module is used for The transfer data is expressed as a directed weighted network; the embedding module is used to first extract two undirected subgraphs for each node, then use the GraphWave algorithm to structurally embed, and finally integrate the structural embedding and the node’s in-out flow difference to obtain node embedding ; The clustering module uses the improved self-organizing map neural network to cluster the node embeddings obtained in the previous step to obtain the role division of the nodes. The invention can quickly discover the role composition of an economic organization, and find out the roles that may be senior members in combination with experience.

Description

A stream network-oriented role division system and its working method and medium

技术领域technical field

本发明涉及一种角色划分系统及其工作方法、介质，尤其涉及一种面向流网络的角色划分系统及其工作方法。The invention relates to a role division system and its working method and medium, in particular to a flow network-oriented role division system and its working method.

背景技术Background technique

网络中，每个节点都有自己的角色。同样的角色有相似的行为、功能、作用。例如企业网络，角色可以是经理，组长，普通员工。角色有助于研究一个组织或系统的关键节点，分析层级结构，也能为多个网络间的比较提供参考信息。In the network, each node has its own role. The same role has similar behaviors, functions, and roles. For example, in an enterprise network, roles can be managers, team leaders, and ordinary employees. Roles help to study the key nodes of an organization or system, analyze the hierarchical structure, and provide reference information for comparison between multiple networks.

对一个经济组织进行银行卡号的角色划分有利于了解其组织结构，有利于快速找到高级成员，核心成员，而且由于只需要银行数据，不会引起被调查组织的注意。这在调查疑似非法经济组织比如传销组织时十分有用。The role division of bank card numbers for an economic organization is conducive to understanding its organizational structure, and is conducive to quickly finding senior members and core members, and because only bank data is required, it will not attract the attention of the investigated organization. This is useful when investigating suspected illegal economic organizations such as pyramid schemes.

目前没有公开的对银行卡号进行角色划分的方法，然而，由于转账数据可以表示为流网络，角色划分也可以表示为对流网络的节点角色划分。流网络是一种特殊的有向加权网络，其边表示能量，物质，货币，信息等的流动，边的权重表示流量。我们关注的流网络是非平衡的流网络，指节点的入流量未必等于出流量。这在现实世界中很常见，例如企业内的资金流网络。在相近的问题上国内外提出了一些方法。但是由于这些方法只能划分出预定义的三、四类典型角色，或者忽略网络的方向和权重，它们在有向加权的转账网络上效果不好。Currently, there is no public method to classify bank card numbers into roles. However, since transfer data can be expressed as a flow network, role division can also be expressed as a node role division of a convective network. Flow network is a special directed weighted network, its edges represent the flow of energy, matter, currency, information, etc., and the weight of the edges represents the flow. The flow network we focus on is an unbalanced flow network, which means that the incoming traffic of a node is not necessarily equal to the outgoing traffic. This is common in the real world, such as a network of money flows within a business. Some methods have been proposed at home and abroad on similar issues. However, since these methods can only classify the predefined three or four types of typical roles, or ignore the direction and weight of the network, they do not work well on directed weighted transfer networks.

处理近似问题的方法具体如下：The methods for dealing with approximation problems are as follows:

唐诗琦等人提出了一种角色发现算法，实质上也就是节点嵌入算法，利用节点的度中心性，特征向量，特征值去构建特征矩阵，然后使用非负矩阵分解，处理的对象是一般的网络。Tang Shiqi and others proposed a role discovery algorithm, which is essentially a node embedding algorithm. It uses the degree centrality, eigenvectors, and eigenvalues of nodes to construct a feature matrix, and then uses non-negative matrix decomposition. The object of processing is general network.

李婉钰提出了一种角色发现算法，受物理学场和势的概念的启发，在网络中提出了拓扑势的概念，并针对有向加权网络，提出了出拓扑势和入拓扑势，不过，出(入)拓扑势只是影响力求和，对网络结构不敏感，该算法只能分出四种典型角色或者介于四种角色之间的模糊角色。这样的角色和本文研究的角色字面相同而实质差异巨大。Li Wanyu proposed a role discovery algorithm. Inspired by the concept of physical field and potential, the concept of topological potential was proposed in the network, and for the directed weighted network, the out-topological potential and in-topological potential were proposed. However, the out- The (in) topological potential is just the sum of the influence and is not sensitive to the network structure. The algorithm can only distinguish four typical roles or fuzzy roles between the four roles. Such a role is literally the same as the role studied in this paper, but the substantive difference is huge.

郑昆仑提出了一种角色分析算法，该算法根据节点的一些属性(度中心性、介数、PageRank排名)寻找重要节点，然后删除重要节点，再删除只受影响的节点，然后在剩下的网络中继续这一过程，总体上是通过不断地拆解网络来找到重要节点，该算法没有考虑边的权重，不适用于加权网络。Zheng Kunlun proposed a role analysis algorithm, which looks for important nodes according to some attributes of nodes (degree centrality, betweenness, PageRank ranking), then deletes important nodes, and then deletes only affected nodes, and then in the remaining To continue this process in the network, the important nodes are generally found by continuously dismantling the network. This algorithm does not consider the weight of the edges and is not suitable for weighted networks.

钟晓宇提出了一种角色划分方法，和前述的李婉钰的方法类似，该方法只能分出三种典型角色。Zhong Xiaoyu proposed a role division method, which is similar to Li Wanyu's method mentioned above. This method can only distinguish three typical roles.

节点嵌入(表示学习)是角色划分的重要步骤，目前在国外已经有各种节点嵌入方法被提出。RolX，struc2vec和GraphWave是基于结构性角色的表示学习方法，结构性角色相似的点得到靠近的向量。RolX使用ReFeX算法提取特征矩阵，再使用非负矩阵分解，得到嵌入向量。struc2vec根据节点的结构相似性重构网络，用强边连接两个相似的点，在新的网络上用DeepWalk获得节点的向量表示。struc2vec仅关注拓扑结构，忽略边和点的属性。GraphWave使用扩散小波处理拉普拉斯矩阵，将小波系数作为概率分布处理，GraphWave在无向网络上有很好的表现，而且能考虑边的权重。RolX，struc2ve和GraphWave都只能处理无向图。DEG是有向图上的表示学习方法，LINE，node2vec既可以用于无向图也可以用于有向图，但它们依据的是亲密度而不是相似度。Graph2Gauss利用网络结构和节点参数嵌入节点，但其所说的网络结构仅仅指距离，这样距离近的节点更可能得到相近的嵌入向量，所以Graph2Gauss和角色无关。目前，尚没有一种方法面向流网络计算网络的节点嵌入(将节点表示成向量)，进而在向量空间中对节点嵌入进行聚类划分。从而实现流网络的节点角色划分。Node embedding (representation learning) is an important step in role division, and various node embedding methods have been proposed abroad. RolX, struc2vec and GraphWave are representation learning methods based on structural roles, and points with similar structural roles get close vectors. RolX uses the ReFeX algorithm to extract the feature matrix, and then uses non-negative matrix decomposition to obtain the embedding vector. struc2vec reconstructs the network according to the structural similarity of the nodes, connects two similar points with strong edges, and uses DeepWalk to obtain the vector representation of the nodes on the new network. struc2vec only pays attention to the topology, ignoring the attributes of edges and vertices. GraphWave uses diffusion wavelets to process Laplacian matrices, and treats wavelet coefficients as probability distributions. GraphWave has a good performance on undirected networks and can consider edge weights. RolX, struc2ve and GraphWave can only handle undirected graphs. DEG is a representation learning method on directed graphs, LINE, node2vec can be used for both undirected and directed graphs, but they are based on intimacy rather than similarity. Graph2Gauss uses network structure and node parameters to embed nodes, but the network structure it says only refers to distance, so nodes with close distances are more likely to get similar embedding vectors, so Graph2Gauss has nothing to do with roles. At present, there is no method for computing the node embedding of the flow network (representing the node as a vector), and then clustering and dividing the node embedding in the vector space. In this way, the node role division of the flow network is realized.

发明内容Contents of the invention

针对现有技术的不足，本发明提供了一种面向流网络的角色划分系统；Aiming at the deficiencies of the prior art, the present invention provides a flow network-oriented role division system;

本发明还提供了角色划分系统的工作方法；The invention also provides a working method of the role division system;

本发明的目的是：对一组银行卡号进行角色划分，指出一共有几种角色，每种角色对应哪些银行卡号。本发明的角色划分并不能识别出每种角色的含义，比如是老板还是经理，只知道属于同一角色的银行卡号都有相似的转账行为，而且不同角色的卡号的转账行为不同。银行卡号的角色划分是指，把输入的所有银行卡号分到多个组，每个组里的银行卡号具有相同角色，不同组的银行卡号的角色不同，比如输入包含5个银行卡号a、b、c、d、e的数据集，经过角色划分系统处理后，得到两组，一组abc，一组是de，那么就表示这些卡号有两种角色，abc是一个角色，de是一个角色。The purpose of the present invention is to divide a group of bank card numbers into roles, point out several roles in total, and which bank card numbers correspond to each role. The role division of the present invention cannot identify the meaning of each role, such as a boss or a manager. It only knows that bank card numbers belonging to the same role have similar transfer behaviors, and card numbers of different roles have different transfer behaviors. The role division of bank card numbers refers to dividing all the input bank card numbers into multiple groups. The bank card numbers in each group have the same role, and the roles of bank card numbers in different groups are different. For example, the input contains 5 bank card numbers a and b The data sets of , c, d, and e are processed by the role division system to obtain two groups, one group is abc, and the other group is de, which means that these card numbers have two roles, abc is a role, and de is a role.

术语解释：Explanation of terms:

1、聚敛子图，对于一个点a，a点的聚敛子图是一个无向图，该图的点包括原图所有点，该图的边包含可能承载流向a点的物质的边。用数学语言描述a点的聚敛子图边集E_gat(a)的生成过程：对于原图每一条指向a的边＜n，a＞，将其加到E_gat(a)中，并称n为a的一阶上游邻居，对于每一个a的一阶上游邻居n做同样的处理，即将每一条指向n的边＜m，n＞加到E_gat(a)中，并称m为a的二阶上游邻居，以此类推，将所有从a的k+1阶上游邻居指向k阶上游邻居的边加到E_gat(a)中，直到没有新的边被添加，聚敛子图中的边保留原图的权重，丢弃方向。1. Convergence subgraph, for a point a, the convergence subgraph of point a is an undirected graph, the points of this graph include all the points of the original graph, and the edges of this graph include edges that may carry material flowing to point a. Use mathematical language to describe the generation process of the convergent subgraph edge set E _gat (a) of point a: For each edge <n, a> pointing to a in the original graph, add it to E _gat (a), and call n is the first-order upstream neighbor of a, do the same for each first-order upstream neighbor n of a, that is, add each edge <m, n> pointing to n to E _gat (a), and call m a’s The second-order upstream neighbors, and so on, add all the edges from the k+1-order upstream neighbors of a to the k-order upstream neighbors to E _gat (a), until no new edges are added, and the edges in the subgraph are aggregated The weight of the original image is retained, and the direction is discarded.

2、扩散子图，对于一个点a，a点的扩散子图是一个无向图，该图的点包括原图所有点，该图的边包含可能承载从a点流出的物质的边。用数学语言描述a点的扩散子图边集E_dif(a)的生成过程：与E_gat(a)的生成过程对称，将所有从a的k阶下游邻居指向k+1阶下游邻居的边加到E_dif(a)中，直到没有新的边被添加，扩散子图中的边保留原图的权重，丢弃方向。2. Diffusion subgraph, for a point a, the diffusion subgraph of point a is an undirected graph, the points of this graph include all the points of the original graph, and the edges of this graph include edges that may carry the material flowing out from point a. Use mathematical language to describe the generation process of the diffusion subgraph edge set E _dif (a) of a point: it is symmetric to the generation process of E _gat (a), and all the edges from the k-order downstream neighbors of a point to the k+1-order downstream neighbors Added to E _dif (a), until no new edges are added, the edges in the diffusion subgraph retain the weights of the original graph, and discard the direction.

3、PCA，Principal Components Analysis主成分分析，一种著名的降维方法。3. PCA, Principal Components Analysis Principal component analysis, a well-known dimensionality reduction method.

本发明的技术方案为：Technical scheme of the present invention is:

一种面向流网络的角色划分系统，包括依次连接的数据采集模块、有向加权网络获取模块、嵌入模块及聚类模块；A stream network-oriented role division system, including sequentially connected data acquisition module, directed weighted network acquisition module, embedding module and clustering module;

所述数据采集模块用于：获取转账数据；The data collection module is used to: obtain transfer data;

所述有向加权网络获取模块用于：将转账数据表示成有向加权网络；The directional weighted network acquisition module is used to: represent the transfer data as a directional weighted network;

所述嵌入模块用于：首先为每一个节点抽取两张无向子图，然后采用GraphWave算法结构性嵌入，最后整合结构性嵌入和节点的出入流量差得到节点嵌入；The embedding module is used to: firstly extract two undirected subgraphs for each node, then use the GraphWave algorithm to structurally embed, and finally integrate the structural embedding and the difference between the incoming and outgoing traffic of the node to obtain the node embedding;

所述聚类模块用于：用改进的自组织映射神经网络对上一步得到的节点嵌入进行聚类，得到节点的角色划分。The clustering module is used to cluster the node embeddings obtained in the previous step by using the improved self-organizing map neural network to obtain the role division of the nodes.

上述面向流网络的角色划分系统的工作方法，包括步骤如下：The working method of the above flow network-oriented role division system includes the following steps:

(1)所述数据采集模块获取转账数据；转账数据是指银行卡之间的转账的数据，每一条银行卡之间的转账的数据包括转出方的卡号、转入方的卡号、金额、时间；(1) The data acquisition module obtains the transfer data; the transfer data refers to the transfer data between bank cards, and the transfer data between each bank card includes the card number of the transfer party, the card number of the transfer party, the amount, time;

(2)所述有向加权网络获取模块将转账数据表示成有向加权网络；(2) The directed weighted network acquisition module represents the transfer data as a directed weighted network;

(3)通过所述嵌入模块获取节点嵌入；首先为每一个节点抽取两张无向子图，然后用GraphWave算法获取结构性嵌入，最后整合结构性嵌入和节点的出入流量差得到节点嵌入；(3) Obtain node embedding by the embedding module; first extract two undirected subgraphs for each node, then obtain structural embedding with GraphWave algorithm, and finally integrate structural embedding and node's flow difference to obtain node embedding;

(4)通过所述聚类模块实现角色划分：使用改进的自组织映射神经网络对上一步得到的节点嵌入进行聚类，得到节点的角色划分。(4) Realize role division through the clustering module: use the improved self-organizing map neural network to cluster the node embeddings obtained in the previous step to obtain the role division of the nodes.

根据本发明优选的，步骤(2)中，将转账数据表示成有向加权网络，是指：将所有转出方的卡号及所有转入方的卡号表示成有向加权网络里的点，转出方的卡号、转入方的卡号之间的累计转账金额表示成两个点之间的有向边，从转出方指向转入方，权重为金额，即得有向加权网络。Preferably according to the present invention, in step (2), expressing the transfer data as a directed weighted network means: expressing the card numbers of all transfer-out parties and all transfer-in parties as points in the directed weighted network, and transferring The cumulative transfer amount between the card number of the sender and the card number of the transferee is expressed as a directed edge between two points, from the transferer to the transferer, and the weight is the amount, that is, a directed weighted network.

根据本发明优选的，步骤(3)中，为每一个节点抽取两张无向子图，无向子图包括聚敛子图和扩散子图，包括步骤如下：Preferably according to the present invention, in step (3), extract two undirected subgraphs for each node, the undirected subgraphs include convergent subgraphs and diffusion subgraphs, including steps as follows:

G＝(V，E)，表示原图，即步骤(2)得到的有向加权网络；V是点集，包括有向加权网络中的每个点；E是边集，包括有向加权网络中的每条边；G=(V, E), represents the original graph, that is, the directed weighted network obtained in step (2); V is a point set, including each point in the directed weighted network; E is an edge set, including the directed weighted network each edge in

聚敛子图的获取过程如下：The process of obtaining the converged subgraph is as follows:

G_gat(a)＝(V，E_gat(a))，表示关于点a的聚敛子图；点a是点集V中的任意一点；E_gat(a)是指G_gat(a)的边集；G _gat (a)=(V, E _gat (a)), represents the converged subgraph about point a; point a is any point in the point set V; E _gat (a) refers to the edge of G _gat (a) set;

E_gat(a)的求取过程为：对于每一条指向a的边＜n，a＞，将其加到E_gat(a)中，并称n为a的一阶上游邻居，对于每一个a的一阶上游邻居n做同样的处理，即将每一条指向n的边＜m，n＞加到E_gat(a)中，并称m为a的二阶上游邻居，以此类推，将所有从a的k+1阶上游邻居指向k阶上游邻居的边加到E_gat(a)中，直到没有新的边被添加，聚敛子图中的边保留原图的权重，丢弃方向；The calculation process of E _gat (a) is: for each edge <n, a> pointing to a, add it to E _gat (a), and call n the first-order upstream neighbor of a, for each a The first-order upstream neighbor n of a does the same treatment, that is, every edge <m, n> pointing to n is added to E _gat (a), and m is called the second-order upstream neighbor of a, and so on. The k+1-order upstream neighbor of a points to the k-order upstream neighbor's edge and adds it to E _gat (a) until no new edge is added, and the edge in the converged subgraph retains the weight of the original graph and discards the direction;

扩散子图的获取过程如下：The acquisition process of the diffusion subgraph is as follows:

G_dif(a)＝(V，E_dif(a))，表示关于点a的扩散子图；E_dif(a)是指G_dif(a)的边集；G _dif (a)=(V, E _dif (a)), represents the diffusion subgraph about point a; E _dif (a) refers to the edge set of G _dif (a);

E_dif(a)的求取过程为：对于每一条源点为a的边＜a，n＞，将其加到E_dif(a)中，并称n为a的一阶下游邻居，对于每一个a的一阶下游邻居n做同样的处理，即将每一条源点为n的边＜n，m＞加到E_dif(a)中，并称m为a的二阶下游邻居，以此类推，将所有从a的k阶下游邻居指向k+1阶上游邻居的边加到E_dif(a)中，直到没有新的边被添加，扩散子图中的边保留原图的权重，丢弃方向。The calculation process of E _dif (a) is: for each edge <a, n> whose source point is a, add it to E _dif (a), and call n the first-order downstream neighbor of a, for each Do the same for a first-order downstream neighbor n of a, that is, add each edge <n, m> with source point n to E _dif (a), and call m the second-order downstream neighbor of a, and so on , add all edges pointing to k+1-order upstream neighbors from the k-order downstream neighbors of a to E _dif (a), until no new edges are added, the edges in the diffusion subgraph retain the weight of the original graph, and the direction is discarded .

根据本发明优选的，步骤(3)中，采用GraphWave算法获取结构性嵌入，结构性嵌入是一个向量，表示点在结构性角色的连续空间的位置，包括步骤如下：Preferably according to the present invention, in step (3), adopt GraphWave algorithm to obtain structural embedding, structural embedding is a vector, represents the position of point in the continuous space of structural character, comprises steps as follows:

采用GraphWave算法处理G_gat(a)，得到a点在聚敛子图中的结构性嵌入X_a；采用GraphWave算法处理G_dif(a)，得到a点在扩散子图中的结构性嵌入Y_a；则a点的完整结构性嵌入(X_a，Y_a)。The GraphWave algorithm is used to process G _gat (a), and the structural embedding _X a of point a in the convergence subgraph is obtained; the GraphWave algorithm is used to process G _dif (a), and the structural embedding Y _a of point a in the diffusion subgraph is obtained; Then the complete structural embedding of point a (X _a , Y _a ).

根据本发明优选的，步骤(3)中，整合结构性嵌入和节点的出入流量差得到节点嵌入，包括步骤如下：Preferably according to the present invention, in step (3), the node embedding is obtained by integrating the structural embedding and the difference between the incoming and outgoing traffic of the node, including the following steps:

节点的出入流量差是流网络中节点的重要信息。记有向边＜i，j＞的权重为w_ij，i，j分别表示转出方的卡号、转入方的卡号，w_ij为i向j的累计转账金额；那么，点a的入流量为f_in＝∑w_ia，出流量为f_out＝∑w_ai，出入流量差为d＝f_in-f_out；The difference between incoming and outgoing traffic of a node is important information of a node in a flow network. Note that the weight of the side <i, j> is w _ij , i, j respectively represent the card number of the transfer-out party and the card number of the transfer-in party, and w _ij is the cumulative transfer amount from i to j; then, the inflow of point a f _in ＝∑w _ia , the outflow is f _out ＝∑w _ai , and the inflow and outflow difference is d＝f _in -f _out ;

为了得到节点完整的向量表示，需要将出入流量差和结构性嵌入整合。不过，结构性嵌入一般是超过16维的向量(取决于参数)，如果直接和出入流量差拼在一起，出入流量差的影响将非常弱。需要先将结构性嵌入降到2维，再和出入流量差拼接。In order to obtain a complete vector representation of a node, it is necessary to integrate the flow difference and structural embedding. However, the structural embedding is generally a vector of more than 16 dimensions (depending on the parameters). If it is directly put together with the difference in flow, the impact of the difference in flow will be very weak. It is necessary to reduce the structural embedding to 2 dimensions first, and then splicing it with the flow difference between in and out.

将结构性嵌入降到2维；Reduce structural embeddings to 2 dimensions;

进一步优选的，通过PCA将结构性嵌入降到2维；Further preferably, the structural embedding is reduced to 2 dimensions by PCA;

值得一提的是，因为流量差实际上就是用作节点的权重，所以，将出入流量差替换成任何其他能作为节点的权重的值；It is worth mentioning that, since the flow difference is actually used as the weight of the node, replace the incoming and outgoing flow difference with any other value that can be used as the weight of the node;

将结构性嵌入和出入流量差拼接在一起，得到一个3维的向量表示，至此，对于每一个点，都得到了一个3维的向量表示，即节点嵌入。The structural embedding and the flow difference between in and out are spliced together to obtain a 3-dimensional vector representation. So far, for each point, a 3-dimensional vector representation is obtained, that is, node embedding.

根据本发明优选的，步骤(4)中，使用改进的自组织映射神经网络SOM对上一步得到的节点嵌入进行聚类，得到节点的角色划分，包括步骤如下：Preferably according to the present invention, in step (4), use improved self-organizing map neural network SOM to carry out clustering to the node embedding obtained in previous step, obtain the role division of node, comprise steps as follows:

所述改进的自组织映射神经网络SOM包括输入层和竞争层，所述竞争层和输入层全连接，所述输入层的神经元的个数和单个输入的维数相同；输入层神经元接受输入，竞争层神经元相互竞争。The improved self-organizing map neural network SOM includes an input layer and a competition layer, the competition layer and the input layer are fully connected, and the number of neurons in the input layer is the same as the dimension of a single input; the input layer neurons accept Input, competing layer neurons compete with each other.

改进的自组织映射神经网络SOM的权重[w₁，w₂，w₃......]为所述竞争层神经元的位置；如果输入层有两个神经元，那么竞争层神经元就排列在二维平面上。The weight [w ₁ , w ₂ , w ₃ ......] of the improved self-organizing map neural network SOM is the position of the neuron in the competition layer; if the input layer has two neurons, then the neuron in the competition layer arranged on a two-dimensional plane.

A、初始化密集的竞争层神经元；A. Initialize dense competitive layer neurons;

B、每输入一个刺激p，神经网络将输入叫做刺激，所述刺激p是指步骤(3)得到的节点嵌入，从所述竞争层选出距离最近即||w_i-p||₂最小的神经元i作为获胜神经元；B. Every time a stimulus p is input, the neural network calls the input a stimulus. The stimulus p refers to the node embedding obtained in step (3), and the shortest distance is selected from the competition layer, that is ||w _i -p|| _{2 is} the smallest The neuron i of is the winning neuron;

C、调整获胜神经元及其邻域N(i)，邻域N(i)是指i附近的区域内每一个神经元的位置，使它们向p靠近，用α表示学习率，调整位置如式(I)所示：C. Adjust the winning neuron and its neighborhood N(i). Neighborhood N(i) refers to the position of each neuron in the area near i, so that they are close to p. Use α to represent the learning rate. Adjust the position as Shown in formula (I):

w_j′＝w_j+α(p-w_j)，j∈N(i) (I)w _j ′=w _j +α(pw _j ), j∈N(i) (I)

式(I)中，j是邻域里的一个神经元，w_j是神经元的位置，w_j′是调整后的新位置；In formula (I), j is a neuron in the neighborhood, w _j is the position of the neuron, and w _j ' is the adjusted new position;

多次训练后，神经元移动到簇中；After multiple trainings, neurons move into clusters;

D、输入两个同一个簇的刺激，对应同一个获胜神经元，聚类中心就是获胜的神经元；D. Input two stimuli of the same cluster, corresponding to the same winning neuron, and the cluster center is the winning neuron;

SOM的一个突出的优点是，当它用作聚类算法时，无须指定簇的个数，这正是我们需要的性质。然而，一个自然簇内经常会出现多个聚类中心。为了解决这个问题，提出分辨率的概念，分辨率指竞争层的权重只能取离散值。在离散的向量空间里，神经元向靠得很近的几个输入移动时，就可能移动到相同的位置上，这样就减少了聚类中心，提高了完整性。那么如何选择合适的分辨率？很明显，最优分辨率因数据集而异，这引入了新的问题。A prominent advantage of SOM is that when it is used as a clustering algorithm, there is no need to specify the number of clusters, which is exactly the nature we need. However, multiple cluster centers often appear in a natural cluster. In order to solve this problem, the concept of resolution is proposed, which means that the weight of the competition layer can only take discrete values. In a discrete vector space, when a neuron moves to several inputs that are close together, it may move to the same position, which reduces the cluster center and improves the integrity. So how to choose the right resolution? Clearly, the optimal resolution varies from dataset to dataset, which introduces new problems.

E、检查所有的聚类中心，如果两个聚类中心是邻居，或者一个聚类中心和另一个聚类中心的邻居位置重叠，则合并这两个聚类中心，所得的聚类中心即角色划分的结果。E. Check all cluster centers, if two cluster centers are neighbors, or the neighbors of one cluster center and another cluster center overlap, then merge the two cluster centers, and the resulting cluster center is the role result of division.

进一步优选的，初始化密集的竞争层神经元，是指：每单位面积4个神经元。就是神经元像正方形网格的交叉点一样均匀分布在二维平面上，正方形边长是0.5，那么每单位面积就有4个。Further preferably, the initialization of dense competitive layer neurons refers to: 4 neurons per unit area. That is, neurons are evenly distributed on a two-dimensional plane like the intersection points of a square grid. The side length of the square is 0.5, so there are 4 neurons per unit area.

根据本发明优选的，步骤C中，取一个较高的分辨率0.2训练。就相当于在二维平面上画正方形网格，正方形边长是0.2，神经元每次调整位置后，要移动到最近的网格交叉点，比如位置在(0.12，1.27)，那么就移动到(0.2，1.2)。Preferably according to the present invention, in step C, a higher resolution of 0.2 is used for training. It is equivalent to drawing a square grid on a two-dimensional plane. The side length of the square is 0.2. After the neuron adjusts its position each time, it needs to move to the nearest grid intersection. For example, the position is (0.12, 1.27), then move to (0.2, 1.2).

一种计算机可读存储介质，其特征在于，所述计算机可读存储介质中存储有面向流网络的角色划分系统的工作方法的程序，所述面向流网络的角色划分系统的工作方法的程序被处理器执行时，实现任一项所述的面向流网络的角色划分系统的工作方法的步骤。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a program of the working method of the stream network-oriented role division system, and the program of the working method of the stream network-oriented role division system is When executed by the processor, the steps of any one of the working methods of the flow network-oriented role division system are realized.

本发明的有益效果为：The beneficial effects of the present invention are:

1、本发明提出的角色划分方法能快速地发现一个经济组织的角色组成，结合经验找到可能是高级成员的角色，如果调查的组织是非法组织，可以更早地对组织中关键银行账户的控制者进行调查，提高办案效率，减少公众财产的损失，维护经济秩序。1. The role division method proposed by the present invention can quickly discover the role composition of an economic organization, and find out the roles that may be senior members in combination with experience. If the investigated organization is an illegal organization, it can control the key bank accounts in the organization earlier investigation, improve the efficiency of case handling, reduce the loss of public property, and maintain economic order.

2、本发明对多个经济组织进行角色划分，可以对比这些组织的角色组成，有利于识别角色组成异于普通组织的特殊组织。2. The present invention divides the roles of multiple economic organizations, can compare the role composition of these organizations, and is beneficial to identify special organizations whose role composition is different from ordinary organizations.

3、实用范围包括一切流网络，应用前景包括非法经济组织的成员角色划分，交通枢纽的角色划分，基于角色组成的异常组织识别等等。流网络是一种特殊的有向加权网络，其边表示能量，物质，货币，信息等的流动，边的权重表示流量。3. The practical scope includes all stream networks, and the application prospects include the role division of members of illegal economic organizations, the role division of transportation hubs, the identification of abnormal organizations based on role composition, and so on. Flow network is a special directed weighted network, its edges represent the flow of energy, matter, currency, information, etc., and the weight of the edges represents the flow.

附图说明Description of drawings

图1为面向流网络的角色划分系统的结构框图；Figure 1 is a structural block diagram of a flow network-oriented role division system;

图2为面向流网络的角色划分系统的工作方法的示意图；Fig. 2 is a schematic diagram of a working method of a role division system oriented to a flow network;

图3(a)为原图示意图；Figure 3(a) is a schematic diagram of the original picture;

图3(b)为从图(a)中抽取A点的聚敛子图示意图；Figure 3(b) is a schematic diagram of the converged subgraph for extracting point A from Figure (a);

图3(c)为从图(a)中抽取A点的扩散子图示意图。Figure 3(c) is a schematic diagram of the diffusion subgraph for extracting point A from Figure (a).

具体实施方式Detailed ways

下面结合说明书附图和实施例对本发明作进一步限定，但不限于此。The present invention will be further limited below in conjunction with the accompanying drawings and embodiments, but not limited thereto.

实施例1Example 1

一种面向流网络的角色划分系统，如图1所示，包括依次连接的数据采集模块、有向加权网络获取模块、嵌入模块及聚类模块；数据采集模块用于：获取转账数据；有向加权网络获取模块用于：将转账数据表示成有向加权网络；嵌入模块用于：首先为每一个节点抽取两张无向子图，然后采用GraphWave算法结构性嵌入，最后整合结构性嵌入和节点的出入流量差得到节点嵌入；聚类模块用于：用改进的自组织映射神经网络对上一步得到的节点嵌入进行聚类，得到节点的角色划分。A flow network-oriented role division system, as shown in Figure 1, includes sequentially connected data acquisition modules, directed weighted network acquisition modules, embedding modules, and clustering modules; the data acquisition module is used to: acquire transfer data; directed The weighted network acquisition module is used to: represent the transfer data as a directed weighted network; the embedding module is used to: first extract two undirected subgraphs for each node, then use the GraphWave algorithm for structural embedding, and finally integrate the structural embedding and node The node embedding is obtained by the in-out traffic difference; the clustering module is used to: use the improved self-organizing map neural network to cluster the node embedding obtained in the previous step, and obtain the role division of the nodes.

实施例2Example 2

实施例1所述面向流网络的角色划分系统的工作方法，如图2所示，包括步骤如下：The working method of the flow network-oriented role division system described in Embodiment 1, as shown in Figure 2, includes the following steps:

(1)数据采集模块获取转账数据；转账数据是指银行卡之间的转账的数据，每一条银行卡之间的转账的数据包括转出方的卡号、转入方的卡号、金额、时间；(1) The data acquisition module obtains the transfer data; the transfer data refers to the transfer data between bank cards, and the transfer data between each bank card includes the card number of the transfer-out party, the card number of the transfer-in party, amount, and time;

(2)有向加权网络获取模块将转账数据表示成有向加权网络；是指：将所有转出方的卡号及所有转入方的卡号表示成有向加权网络里的点，转出方的卡号、转入方的卡号之间的累计转账金额表示成两个点之间的有向边，从转出方指向转入方，权重为金额，即得有向加权网络。(2) The directed weighted network acquisition module represents the transfer data as a directed weighted network; it means: express all the card numbers of the transfer-out party and all the card numbers of the transfer-in party as points in the directed weighted network, and the transfer-out party’s The cumulative transfer amount between the card number and the card number of the transferee is expressed as a directed edge between two points, from the transferee to the transferee, and the weight is the amount, that is, a directed weighted network.

(3)通过嵌入模块获取节点嵌入；首先为每一个节点抽取两张无向子图，然后用GraphWave算法获取结构性嵌入，最后整合结构性嵌入和节点的出入流量差得到节点嵌入；(3) Obtain node embedding through the embedding module; first extract two undirected subgraphs for each node, then use GraphWave algorithm to obtain structural embedding, and finally integrate structural embedding and node in-out traffic difference to obtain node embedding;

(4)通过聚类模块实现角色划分：使用改进的自组织映射神经网络对上一步得到的节点嵌入进行聚类，得到节点的角色划分。(4) Realize role division through the clustering module: use the improved self-organizing map neural network to cluster the node embeddings obtained in the previous step, and obtain the role division of the nodes.

步骤(3)中，为每一个节点抽取两张无向子图，无向子图包括聚敛子图和扩散子图，包括步骤如下：In step (3), two undirected subgraphs are extracted for each node, and the undirected subgraph includes a convergent subgraph and a diffusion subgraph, and the steps are as follows:

G＝(V，E)，表示原图，如图3(a)所示，即步骤(2)得到的有向加权网络；V是点集，包括有向加权网络中的每个点，包括A、B、C、D、E、F、G、H、I、J；E是边集，包括有向加权网络中的每条边；G=(V, E), represents the original image, as shown in Figure 3(a), that is, the directed weighted network obtained in step (2); V is a point set, including each point in the directed weighted network, including A, B, C, D, E, F, G, H, I, J; E is the edge set, including each edge in the directed weighted network;

如图3(b)所示，抽取A点的聚敛子图的获取过程如下：As shown in Figure 3(b), the process of extracting the converged subgraph of point A is as follows:

如图3(c)所示，抽取A点的扩散子图的获取过程如下：As shown in Figure 3(c), the acquisition process of extracting the diffusion subgraph of point A is as follows:

步骤(3)中，采用GraphWave算法获取结构性嵌入，结构性嵌入是一个向量，表示点在结构性角色的连续空间的位置，包括步骤如下：In step (3), the GraphWave algorithm is used to obtain the structural embedding. The structural embedding is a vector representing the position of the point in the continuous space of the structural role, including the following steps:

采用GraphWave算法处理G_gat(a)，得到a点在聚敛子图中的结构性嵌入X_a；采用GraphWave算法处理G_aif(a)，得到a点在扩散子图中的结构性嵌入Y_a；则a点的完整结构性嵌入(X_a，Y_a)。The GraphWave algorithm is used to process G _gat (a), and the structural embedding _X a of point a in the convergence subgraph is obtained; the GraphWave algorithm is used to process G _aif (a), and the structural embedding Y _a of point a in the diffusion subgraph is obtained; Then the complete structural embedding of point a (X _a , Y _a ).

步骤(3)中，整合结构性嵌入和节点的出入流量差得到节点嵌入，包括步骤如下：In step (3), the node embedding is obtained by integrating the structural embedding and the difference between the incoming and outgoing flow of the node, including the following steps:

通过PCA将结构性嵌入降到2维；Reduce the structural embedding to 2 dimensions by PCA;

步骤(4)中，使用改进的自组织映射神经网络SOM对上一步得到的节点嵌入进行聚类，得到节点的角色划分，包括步骤如下：In step (4), use the improved self-organizing map neural network SOM to cluster the node embeddings obtained in the previous step, and obtain the role division of the nodes, including the following steps:

改进的自组织映射神经网络SOM包括输入层和竞争层，所述竞争层和输入层全连接，所述输入层的神经元的个数和单个输入的维数相同；输入层神经元接受输入，竞争层神经元相互竞争。The improved self-organizing map neural network SOM includes an input layer and a competition layer, and the competition layer and the input layer are fully connected, and the number of neurons in the input layer is the same as the dimension of a single input; the input layer neurons accept input, Competitive layer neurons compete with each other.

w_j′＝w_j+α(p-w_j)，j∈N(i) (I)w _j ′=w _j +α(pw _j ), j∈N(i) (I)

初始化密集的竞争层神经元，是指：每单位面积4个神经元。就是神经元像正方形网格的交叉点一样均匀分布在二维平面上，正方形边长是0.5，那么每单位面积就有4个。Initialize dense competitive layer neurons, which means: 4 neurons per unit area. That is, neurons are evenly distributed on a two-dimensional plane like the intersection points of a square grid. The side length of the square is 0.5, so there are 4 neurons per unit area.

步骤C中，取一个较高的分辨率0.2训练。就相当于在二维平面上画正方形网格，正方形边长是0.2，神经元每次调整位置后，要移动到最近的网格交叉点，比如位置在(0.12，1.27)，那么就移动到(0.2，1.2)。In step C, take a higher resolution of 0.2 for training. It is equivalent to drawing a square grid on a two-dimensional plane. The side length of the square is 0.2. After the neuron adjusts its position each time, it needs to move to the nearest grid intersection. For example, the position is (0.12, 1.27), then move to (0.2, 1.2).

实施例3Example 3

一种计算机可读存储介质，计算机可读存储介质中存储有实施例2所述面向流网络的角色划分系统的工作方法的程序，实施例2所述面向流网络的角色划分系统的工作方法的程序被处理器执行时，实现任一项实施例2所述面向流网络的角色划分系统的工作方法的步骤。A computer-readable storage medium, the computer-readable storage medium stores the program of the working method of the flow network-oriented role division system described in Embodiment 2, and the working method of the flow network-oriented role division system described in Embodiment 2 When the program is executed by the processor, the steps of any working method of the stream network-oriented role division system described in Embodiment 2 are realized.

Claims

1. A method of work of a stream network-oriented role division system, characterized in that, comprising successively connected data acquisition modules, directed weighted network acquisition modules, embedding modules and clustering modules;

The data acquisition module is used to: obtain transfer data; the directed weighted network acquisition module is used to: represent the transfer data as a directed weighted network; the embedded module is used to: first extract two undirected subgraph, and then use the Graphwave algorithm to structurally embed, and finally integrate the structural embedding and the difference between the incoming and outgoing traffic of the node to obtain the node embedding; the clustering module is used to: use the improved self-organizing map neural network to carry out the node embedding obtained in the previous step Clustering to obtain the role division of nodes; the steps are as follows:

(1) The data acquisition module obtains the transfer data; the transfer data refers to the transfer data between bank cards, and the transfer data between each bank card includes the card number of the transfer party, the card number of the transfer party, the amount, time;

(2) The directed weighted network acquisition module represents the transfer data as a directed weighted network;

(3) Obtain node embedding by the embedding module; first extract two undirected subgraphs for each node, then obtain structural embedding with GraphWave algorithm, and finally integrate structural embedding and node's flow difference to obtain node embedding;

(4) Realize the role division by the clustering module: use the improved self-organizing map neural network to cluster the node embedding obtained in the previous step, and obtain the role division of the node;

In step (3), two undirected subgraphs are extracted for each node. The undirected subgraph includes a convergent subgraph and a diffusion subgraph, and the steps are as follows:

G=(V, E), represents the original graph, that is, the directed weighted network obtained in step (2); V is a point set, including each point in the directed weighted network; E is an edge set, including the directed weighted network each edge in

The process of obtaining the converged subgraph is as follows:

G _gat (a)=(V, E _gat (a)), represents the converged subgraph about point a; point a is any point in the point set V; E _gat (a) refers to the edge of G _gat (a) set;

The calculation process of E _gat (a) is: for each edge <n, a> pointing to a, add it to E _gat (a), and call n the first-order upstream neighbor of a, for each a The first-order upstream neighbor n of a does the same treatment, that is, every edge <m, n> pointing to n is added to E _gat (a), and m is called the second-order upstream neighbor of a, and so on, all from The k+1-order upstream neighbor of a points to the k-order upstream neighbor's edge and adds it to E _gat (a) until no new edge is added, and the edge in the converged subgraph retains the weight of the original graph and discards the direction;

The acquisition process of the diffusion subgraph is as follows:

G _dif (a)=(V, E _dif (a)), represents the diffusion subgraph about point a; E _dif (a) refers to the edge set of G _dif (a);

The calculation process of E _dif (a) is: for each edge <a, n> whose source point is a, add it to E _dif (a), and call n the first-order downstream neighbor of a, for each A first-order downstream neighbor n of a does the same process, that is, add each edge <n, m> with source point n to E _dif (a), and call m the second-order downstream neighbor of a, and so on , add all edges pointing to k+1-order upstream neighbors from the k-order downstream neighbors of a to E _dif (a), until no new edges are added, the edges in the diffusion subgraph retain the weight of the original graph, and the direction is discarded .

2. The working method of the stream network-oriented role division system according to claim 1, characterized in that, in step (2), expressing the transfer data as a directed weighted network means: all the card numbers of the transferring party and all transfer-in party's card numbers are expressed as points in the directed weighted network, the cumulative transfer amount between the transfer-out party's card number and the transfer-in party's card number is expressed as a directed edge between two points, from the transfer-out party Point to the transferee, and the weight is the amount, that is, a directed weighted network.

3. The working method of the stream network-oriented role division system according to claim 1, characterized in that, in step (3), the GraphWave algorithm is used to obtain the structural embedding, and the structural embedding is a vector representing points in the structural The position of the continuous space of the character, including the following steps:

The GraphWave algorithm is used to process G _gat (a), and the structural embedding _X a of point a in the convergence subgraph is obtained; the GraphWave algorithm is used to process G _dif (a), and the structural embedding Y _a of point a in the diffusion subgraph is obtained; Then the complete structural embedding of point a (X _a , Y _a ).

4. the working method of the role division system oriented flow network according to claim 1, is characterized in that, in step (3), the flow difference of integration structural embedding and node obtains node embedding, comprises steps as follows:

Note that the weight of the edge <i, j> is w _ij , i, j represent the card number of the transfer-out party and the card number of the transfer-in party, respectively, and w _ij is the cumulative transfer amount from i to j; then, the inflow of point a f _in ＝∑w _ia , the outflow is f _out ＝∑w _ai , and the inflow and outflow difference is d＝f _in -f _out ;

Reduce structural embeddings to 2 dimensions;

The structural embedding and the flow difference between in and out are spliced together to obtain a 3-dimensional vector representation. So far, for each point, a 3-dimensional vector representation is obtained, that is, node embedding.

5. The working method of the flow network-oriented role division system according to claim 4, characterized in that, the structural embedding is reduced to 2 dimensions by PCA.

6. according to the working method of the described flow network-oriented role division system of claim 1-5, it is characterized in that, in step (4), use improved self-organizing map neural network SOM to the node embedding that last step obtains Perform clustering to obtain the role division of nodes, including the following steps:

The improved self-organizing map neural network SOM includes an input layer and a competition layer, the competition layer and the input layer are fully connected, and the number of neurons in the input layer is the same as the dimension of a single input;

The weight [w ₁ , w ₂ , w ₃ ......] of the improved self-organizing map neural network SOM is the position of the neuron in the competition layer;

A. Initialize dense competitive layer neurons;

B. Each time a stimulus p is input, the stimulus p refers to the node embedding obtained in step (3), and the neuron i with the shortest distance, i.e. ||w _i -p|| ₂ , is selected from the competition layer as the winning neuron Yuan;

C. Adjust the winning neuron and its neighborhood N(i). Neighborhood N(i) refers to the position of each neuron in the area near i, so that they are close to p. Use α to represent the learning rate. Adjust the position as Shown in formula (I):

w _j ′=w _j +α(pw _j ), j∈N(i) (I)

In formula (I), j is a neuron in the neighborhood, w _j is the position of the neuron, and w _j ' is the adjusted new position;

After multiple trainings, neurons move into clusters;

D. Input two stimuli of the same cluster, corresponding to the same winning neuron, and the cluster center is the winning neuron;

E. Check all cluster centers, if two cluster centers are neighbors, or the neighbors of one cluster center and another cluster center overlap, then merge the two cluster centers, and the resulting cluster center is the role result of division.

7. The working method of the flow network-oriented role division system according to claim 6, characterized in that, the initialization of dense competitive layer neurons refers to: 4 neurons per unit area.

8. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores the program of the working method of any stream network-oriented role division system according to claim 1-7, and the stream network-oriented When the program of the working method of the role division system is executed by the processor, the steps of the working method of the stream network-oriented role division system described in any one of claims 1-7 are realized.