CN110825935A - Community core character mining method, system, electronic equipment and readable storage medium - Google Patents
Community core character mining method, system, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN110825935A CN110825935A CN201910914473.5A CN201910914473A CN110825935A CN 110825935 A CN110825935 A CN 110825935A CN 201910914473 A CN201910914473 A CN 201910914473A CN 110825935 A CN110825935 A CN 110825935A
- Authority
- CN
- China
- Prior art keywords
- community
- node
- target user
- calculating
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000005065 mining Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 57
- 230000006854 communication Effects 0.000 claims abstract description 37
- 238000004891 communication Methods 0.000 claims abstract description 36
- 230000008569 process Effects 0.000 claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000007635 classification algorithm Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a community core character mining method, which comprises the following steps: acquiring a target user group under a target group number, and acquiring communication data in the target user group; cleaning and converting the communication data to construct a target user communication sequence; dividing the community structure of the target user group by using a Louvain algorithm through the communication data; making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph; and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group. The whole process of the invention is unsupervised, namely the final structure completely depends on algorithm clustering, and manual advance preset classification is not needed. The method has no strict requirement on the size of the graph, can quickly converge after several iterations, and has high algorithm efficiency.
Description
Technical Field
The invention relates to the technical field of data mining, in particular to a community core character mining method and system, electronic equipment and a readable storage medium.
Background
The conversation is one of the most common contact ways in the modern people communication process, and the relation between people can be mined through the conversation. In the existing illegal organization such as distribution and marketing, people with illegal activities can be distinguished and core people in the illegal activities can be distinguished by using call records. The existing community discovery algorithm comprises:
and community division is realized by analyzing the similarity of user messages. The method comprises the steps of establishing a specific message content library, mapping to specific users by analyzing the similarity degree of user messages and the specific message content library, specifically classifying the users and setting corresponding weights, so as to judge core users. And dividing the core communication circle of the core user in the two-by-two connected communication networks by taking the core user as a node. However, this method requires a specific message library to be established by itself and analyzed for message similarity. For the method without too much information exchange, the community discovery and the group organization architecture analysis cannot be carried out. For the situation that the characteristics of the information to be exchanged are not obvious, the method cannot establish a message library with obvious characteristics, and the effect of the algorithm is greatly reduced. And only the message similarity analysis corresponding to the user can not carry out the alternating current frequency analysis, so that the core people in the group can not be mined.
The community division is realized by expanding the topological graph through communication data in the telephone communication network. And selecting the edge with the highest weight from the current topological graph, and regarding the edge and two nodes of the edge as an interaction circle. And searching the neighbor node with the maximum attribution degree of each user node in the topological graph, judging whether the attribution degree is greater than a preset value, if so, enlarging the interaction circle, and if not, stopping enlarging the interaction circle. For this approach, if the designated user is not the core of the clique, the expanded circle of interaction may not meet the requirements, and the process is too extensive to be developed.
Disclosure of Invention
The invention aims to provide a method and a device for accurately mining a group organization structure and a core character of a mobile user.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a community core character mining method comprises the following steps:
acquiring a target user group under a target group number, and acquiring communication data in the target user group;
cleaning and converting the communication data to construct a target user communication sequence;
dividing the community structure of the target user group by using a Louvain algorithm through the communication data;
making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group.
Preferably, the process of dividing the community structure of the target user group by using the Louvain algorithm through the communication data includes:
s31: each target user is taken as a node and belongs to a community;
s32: calculating the change delta Q of the modularity degree value of the whole network after any node i is merged into an adjacent community, merging the change delta Q into the community with the maximum delta Q value, and if the calculation result is negative, not changing the attributive community of i;
s33: repeating the step S32 until the node is transferred to another adjacent community in the network and the delta Q cannot be improved;
s34: and merging communities, compressing the obtained communities into nodes, and giving the sum of the edge weights of all node pairs in the original community as a new weight to each edge between the community nodes.
Preferably, the process of making the shortest path between the nodes of the community by dijkstra algorithm comprises:
s41: for each node in the community, let S ═ { v ═ v0},T={other nodes}
S42: calculating the distance from S to all the vertexes in T if viTo v0Has an arc of viTo v0The distance value of (1) is a weight value on the arc if v to v0Without arc, then viTo v0The distance value of (a) is infinite;
s43: selecting a vertex w with the minimum distance value from the T, and adding the vertex w into the set S;
s44: modifying the vertex distance values in the rest T, and if w is added as a middle vertex, from v to viIf the distance value is shortened, modifying the distance value;
s45: and repeating the steps S43 and S44 until all the vertexes are contained in S.
Preferably, the process of calculating the centrality of each node comprises:
for each pair of nodes (s, t) within the community, calculating all shortest paths between them;
for each pair of nodes (s, t) in the community, judging whether the node v is on the solved shortest path;
accumulating the shortest paths, and calculating the node betweenness centrality of the node v:
wherein, σ st is the shortest path number from s to t, and σ st (v) is the number of nodes v passing through in the shortest path from s to t;
and calculating the node betweenness centrality of all the nodes.
Preferably, the process of calculating the centrality of each node further comprises: calculating the betweenness centrality of each edge:
calculating all shortest paths between node pairs in the community;
judging whether the edge e is on the shortest path;
accumulating the shortest paths to obtain the betweenness centrality of the edge e
Where σ st is the number of all shortest paths in graph G; σ st (e) is the number of in-path paths that contain edge e;
and calculating the betweenness centrality of all edges.
Preferably, the process of calculating the volatility of each said central node comprises:
by calculating the standard deviation of node v from all other nodes in the network:
and the smaller the standard deviation is, the smaller the fluctuation is, the closer the node is to the center, and the node is the community core.
Preferably, the step of obtaining the target user group under the target group number includes: filtering the number with invalid number state in the target user, associating a bill list table of the target user, acquiring a call list of the target user with valid state in one month, and generating a call list sequence of the target user, wherein the call list comprises: user identification, called number and call duration.
In a second aspect, the present invention further provides a system for mining community core people, including:
an acquisition module: acquiring a target user group under a target group number, and acquiring communication data in the target user group;
a cleaning module: cleaning and converting the communication data to construct a target user communication sequence;
a dividing module: dividing the community structure of the target user group by using a Louvain algorithm through the communication data;
a central node module: making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
a core mining module: and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group.
In a third aspect, the present invention further provides an electronic device for community core people mining, including a memory, a processor and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the community core people mining method when executing the program.
In a fourth aspect, the present invention further provides a readable storage medium for community core persona mining, having a computer program stored thereon, the computer program being executed by a processor to implement the steps of the community core persona mining method described above.
The invention provides a method for judging the centrality of nodes and calculating the volatility of the nodes based on operator call big data by applying a Louvain algorithm and a Dijkstra (Dijkstra) algorithm, so that automatic community division and automatic judgment of community core personnel are realized. The community structure obtained by the algorithm is layered, a new graph obtained after each round of calculation is a result discovered by a plurality of subdivided communities in a large community, and the layered structure is the natural attribute of each grid, so that researchers can deeply know the internal structure and the formation mechanism of a certain community. The invention uses Louvain algorithm, and has no supervision in the whole process, namely the final structure completely depends on algorithm clustering, and artificial advance preset classification is not needed. The algorithm has better performance, almost has no upper limit requirement on the size of the graph in comparison of some classical community classification algorithms, and can quickly converge after several iterations, so the algorithm has higher efficiency.
Drawings
FIG. 1 is a flowchart of an embodiment of a community core people mining method of the present invention;
FIG. 2 is a flowchart of step S30 in FIG. 1;
FIG. 3 is a flowchart of step S40 in FIG. 1;
fig. 4 is a schematic diagram illustrating the principle of dividing the community structure of the target user group by the Louvain algorithm in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, 2 and 3, an embodiment of the present invention provides a community core character mining method, including:
s10: acquiring a target user group under a target group number, and acquiring communication data in the target user group; filtering the number with invalid number state in the target user, associating the list table of the call ticket of the target user, obtaining the call list of the target user with valid state in one month, and generating a call list sequence of the target user, wherein the call list comprises: user identification, called number and call duration.
S20: cleaning and converting the communication data to construct a target user communication sequence;
s30: and dividing the community structure of the target user group by using a Louvain algorithm through the communication data.
Each target user is taken as a node and belongs to a community; calculating the change delta Q of the modularity degree value of the whole network after any node i is merged into an adjacent community, merging the change delta Q into the community with the maximum delta Q value, and if the calculation result is negative, not changing the attributive community of i; the previous step, until a node is transferred to another adjacent community in the network, the improvement of delta Q cannot be brought; and merging communities, compressing the obtained communities into nodes, and giving the sum of the edge weights of all node pairs in the original community as a new weight to each edge between the community nodes.
S40: making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
for each node in the community, let S ═ { v ═ v0T ═ other nodes }; calculating the distance from S to all the vertexes in T if viTo v0Has an arc of viTo v0The distance value of (1) is a weight value on the arc if v to v0Without arc, then viTo v0The distance value of (a) is infinite; selecting a vertex w with the minimum distance value from the T, and adding the vertex w into the set S; modifying the vertex distance values in the rest T, and if w is added as a middle vertex, from v to viIf the distance value is shortened, modifying the distance value; the previous two steps are repeated until all vertices are contained in S.
The centrality of a technology node includes calculating the node centrality of all nodes and calculating the centrality of intermediaries of each edge.
The process of calculating the centrality of each node comprises:
for each pair of nodes (s, t) within the community, calculating all shortest paths between them;
for each pair of nodes (s, t) in the community, judging whether the node v is on the solved shortest path;
accumulating the shortest paths, and calculating the node betweenness centrality of the node v:
wherein, σ st is the shortest path number from s to t, and σ st (v) is the number of nodes v passing through in the shortest path from s to t;
the process of calculating the centrality of each node further comprises: calculating the betweenness centrality of each edge:
calculating all shortest paths between node pairs in the community;
judging whether the edge e is on the shortest path;
accumulating the shortest paths to obtain the betweenness centrality of the edge e
Where σ st is the number of all shortest paths in graph G; σ st (e) is the number of in-path paths that contain edge e;
and calculating the betweenness centrality of all edges.
S50: and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group. By calculating the standard deviation of node v from all other nodes in the network:
and the smaller the standard deviation is, the smaller the fluctuation is, the closer the node is to the center, and the node is the community core.
The invention provides a method for judging the centrality of nodes and calculating the volatility of the nodes based on operator call big data by applying a Louvain algorithm and a Dijkstra (Dijkstra) algorithm, so that automatic community division and automatic judgment of community core personnel are realized. The community structure obtained by the algorithm is layered, a new graph obtained after each round of calculation is a result discovered by a plurality of subdivided communities in a large community, and the layered structure is the natural attribute of each grid, so that researchers can deeply know the internal structure and the formation mechanism of a certain community. The invention uses Louvain algorithm, and has no supervision in the whole process, namely the final structure completely depends on algorithm clustering, and artificial advance preset classification is not needed. The algorithm has better performance, almost has no upper limit requirement on the size of the graph in comparison of some classical community classification algorithms, and can quickly converge after several iterations, so the algorithm has higher efficiency.
In another embodiment of the invention, a target user group under a group number is taken out according to the group number, a user with a valid mobile phone state is limited, and then a user bill list table is associated; and finally, taking out the communication data of the target user. The test data access period is one month, and the statistical period can be selected subsequently according to specific conditions.
Performing community discovery, as shown in fig. 4, 1) initializing, and assuming that each node in the network belongs to a community; 2) calculating the change delta Q of the Q value of the whole network after any node i is merged into an adjacent community, finding out the community with the maximum change of the Q value, and if the change delta Q is negative, not changing the attributive community of i;
can be simplified as follows:
wherein k isi,inRepresents the sum of the weights incident on cluster C by node i, Σ tot represents the total weight of incident cluster C, kiRepresenting the total weight of the incident node i.
3) Repeating the step 2 until the Q value is not changed any more, namely, transferring one node to another adjacent community in the network, wherein the delta Q cannot be improved, and all nodes in the current network are not moved any more; 4) and merging communities, namely, compressing the original image in the step, taking each community obtained in the previous steps as a node of the new image, and giving a new weight to each edge of the new image by taking the sum of the edge weights of all node pairs in the original community as the new weight.
The steps comprise two stages: and (4) solving the optimal solution of the Q value, and merging communities obtained by the round of division to obtain a new graph. The two stages are called as one round, the algorithm automatically enters the first stage of the next round of calculation after the calculation of the one round is finished, the Q value of the finally obtained network is not increased after a plurality of iterations, the network at the moment is aggregated into a plurality of small communities with close internal connection and sparse external connection, and the algorithm is finished at the moment.
It should be noted that the Louvain algorithm is a community discovery algorithm based on modularity, and the algorithm performs well in efficiency and effect, and can discover a hierarchical community structure, and the optimization goal is to maximize the modularity of the whole community network.
Modulation, Modularity definition:
Ai,j=fre qi,j*log(∑time)
wherein A isi,jWeight, freq, of the edge representing the node connecting the nodes i, ji,jRepresenting the frequency of calls between the nodes i, j; time represents the duration of each call; m represents the number of edges in the network;
kirepresenting the sum of all the edge weights connected with the node i; c. CiIs the community to which the node i belongs; and σ (c)i,cj) When two variables in the function are the same, the value is 1, otherwise, the value is 0.
The community structure obtained by the Louvain algorithm is layered, a new graph obtained after each round of calculation is the result discovered for a plurality of subdivided communities in a large community, and the layered structure is the natural attribute of each grid, so that researchers can deeply know the internal structure and the formation mechanism of a certain community. The whole calculation process of the algorithm is unsupervised, namely the final structure completely depends on algorithm clustering, and manual advance preset classification is not needed. The algorithm has good performance, and in comparison of some classical community classification algorithms, the Louvain algorithm has almost no upper limit requirement on the size of the graph and can be quickly converged after generations fall for several times.
The core molecules are mined according to the social network graph of the community, and the problem is abstracted into the problem of mining the central node of the complex network graph.
The complex network can measure the connection mechanism through the centrality of the nodes, and reasonably explains the actual phenomenon. In the study of complex networks, different centrality definitions are adopted: degree centrality, node betweenness centrality, tight centrality, edge betweenness centrality, feature vector centrality, and the like.
The embodiment of the invention uses: node betweenness centrality and edge betweenness centrality; meanwhile, the centrality fluctuation of the positions of the nodes is customized according to needs.
First we make the shortest path of the graph, which means: and starting from a certain vertex in the graph, and one path with the smallest sum of the weights on all the paths which pass from the edge of the graph to the other vertex is selected.
In the embodiment of the invention, Dijkstra (Dijkstra) is used for calculating the shortest path from one node to all other nodes. The method is mainly characterized in that the expansion is carried out layer by layer towards the outer part by taking the starting point as the center until the end point is reached.
The algorithm comprises the following steps:
1. initially, let S be { v }0},T={other nodes};
2. Calculating the distance from S to all the vertexes in T:
if v isiTo v0With arc (i.e. from v)iTo v0Exists), the distance is a weight on the arc,
if v isiTo v0If the distance of (v) is not present, then viTo v0The distance value of (a) is infinite;
3. and selecting a vertex w with the minimum distance value from the T, and adding the vertex w into the set S.
4. And modifying the vertex distance values in the rest T: if w is added as a middle vertex and the distance value from v to vi is shortened, the distance value is modified.
5. Repeating the steps 3 and 4; until all vertices are contained in S.
The process of calculating the centrality of each node comprises:
for each pair of nodes (s, t) within the community, calculating all shortest paths between them;
for each pair of nodes (s, t) in the community, judging whether the node v is on the solved shortest path;
accumulating the shortest paths, and calculating the node betweenness centrality of the node v:
wherein, σ st is the shortest path number from s to t, and σ st (v) is the number of nodes v passing through in the shortest path from s to t;
the process of calculating the centrality of each node further comprises: calculating the betweenness centrality of each edge:
calculating all shortest paths between node pairs in the community;
judging whether the edge e is on the shortest path;
accumulating the shortest paths to obtain the betweenness centrality of the edge e
Wherein σstIs the number of all shortest paths in graph G; sigmast(e) Is the number of passes in the shortest path containing edge e;
and calculating the betweenness centrality of all edges.
S50: and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group. By calculating the standard deviation of node v from all other nodes in the network:
and the smaller the standard deviation is, the smaller the fluctuation is, the closer the node is to the center, and the node is the community core.
The invention also provides a system for mining the community core character, which comprises the following steps:
an acquisition module: acquiring a target user group under a target group number, and acquiring communication data in the target user group;
a cleaning module: cleaning and converting the communication data to construct a target user communication sequence;
a dividing module: dividing the community structure of the target user group by using a Louvain algorithm through the communication data;
a central node module: making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
a core mining module: and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group.
The system for mining the community core people can also realize the community core people mining method.
The invention also provides electronic equipment for mining the community core characters, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the steps of the community core character mining method are realized when the processor executes the program.
The invention also proposes a readable storage medium for community core persona mining, on which a computer program is stored, the computer program being executed by a processor for implementing the steps of the community core persona mining method described above.
The invention provides a method for judging node centrality based on operator call big data by applying a Louvain algorithm and a Dijkstra (Dijkstra) algorithm, and realizing automatic community division and automatic judgment of community core personnel. Meanwhile, the results obtained by the algorithm are layered, a new graph obtained after each round of calculation is the result discovered by a plurality of subdivided communities in a large community, and the layered structure is the natural attribute of each grid, so that researchers can deeply know the internal structure and the formation mechanism of a certain community.
The invention brings about a plurality of beneficial effects: the community structure obtained by the algorithm is layered, a new graph obtained after each round of calculation is a result discovered by a plurality of subdivided communities in a large community, and the layered structure is a natural attribute of each grid, so that researchers can deeply know the internal structure and the formation mechanism of a certain community. The invention uses Louvain algorithm, and has no supervision in the whole process, namely the final structure completely depends on algorithm clustering, and artificial advance preset classification is not needed. The performance of the algorithm is good, in the comparison of some classical community classification algorithms, the Louvain algorithm has almost no upper limit requirement on the size of the graph, and can be quickly converged after several iterations, so the algorithm efficiency is high.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.
Claims (10)
1. A community core character mining method is characterized by comprising the following steps:
acquiring a target user group under a target group number, and acquiring communication data in the target user group;
cleaning and converting the communication data to construct a target user communication sequence;
dividing the community structure of the target user group by using a Louvain algorithm through the communication data;
making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group.
2. The community core character mining method according to claim 1, wherein the process of dividing the community structure of the target user group by using the Louvain algorithm through the communication data comprises:
s31: each target user is taken as a node and belongs to a community;
s32: calculating the change delta Q of the modularity degree value of the whole network after any node i is merged into an adjacent community, merging the change delta Q into the community with the maximum delta Q value, and if the calculation result is negative, not changing the attributive community of i;
s33: repeating the step S32 until the node is transferred to another adjacent community in the network and the delta Q cannot be improved;
s34: and merging communities, compressing the obtained communities into nodes, and giving the sum of the edge weights of all node pairs in the original community as a new weight to each edge between the community nodes.
3. The community core character mining method according to any one of claims 1 or 2, wherein the process of making the shortest path between the community nodes by dijkstra algorithm comprises:
s41: for each node in the community, let S ═ { v ═ v0},T={other nodes};
S42: calculating the distance from S to all the vertexes in T if viTo v0With an arc, then vi to v0The distance value of (1) is a weight value on the arc if v to v0Without arc, then viTo v0The distance value of (a) is infinite;
s43: selecting a vertex w with the minimum distance value from the T, and adding the vertex w into the set S;
s44: modifying the vertex distance values in the rest T, and if w is added as a middle vertex, from v to viIf the distance value is shortened, modifying the distance value;
s45: and repeating the steps S43 and S44 until all the vertexes are contained in S.
4. The community core character mining method according to claim 3, wherein the process of calculating the centrality of each node comprises:
for each pair of nodes (s, t) within the community, calculating all shortest paths between them;
for each pair of nodes (s, t) in the community, judging whether the node v is on the solved shortest path;
accumulating the shortest paths, and calculating the node betweenness centrality of the node v:
wherein, σ st is the shortest path number from s to t, and σ st (v) is the number of nodes v passing through in the shortest path from s to t;
and calculating the node betweenness centrality of all the nodes.
5. The community core character mining method according to claim 3, wherein the process of calculating the centrality of each node further comprises: calculating the betweenness centrality of each edge:
calculating all shortest paths between node pairs in the community;
judging whether the edge e is on the shortest path;
accumulating the shortest paths to obtain the betweenness centrality of the edge e
Wherein σstIs the number of all shortest paths in graph G; sigmast(e) Is the number of passes in the shortest path containing edge e;
and calculating the betweenness centrality of all edges.
6. The community core character mining method according to claim 5, wherein the process of calculating the volatility of each of the central nodes comprises:
by calculating the standard deviation of node v from all other nodes in the network:
wherein:
and the smaller the standard deviation is, the smaller the fluctuation is, the closer the node is to the center, and the node is the community core.
7. The community core character mining method according to claim 1, wherein a target user group under a target group number is obtained, and the process of obtaining the communication data inside the target user group comprises: filtering the number with invalid number state in the target user, associating a bill list table of the target user, acquiring a call list of the target user with valid state in one month, and generating a call list sequence of the target user, wherein the call list comprises: user identification, called number and call duration.
8. A system for community core persona mining, comprising:
an acquisition module: acquiring a target user group under a target group number, and acquiring communication data in the target user group;
a cleaning module: cleaning and converting the communication data to construct a target user communication sequence;
a dividing module: dividing the community structure of the target user group by using a Louvain algorithm through the communication data;
a central node module: making shortest paths among all nodes of the community through a Dijkstra algorithm, calculating the centrality of each node, and exploring a central node of a network graph;
a core mining module: and calculating the volatility of each central node, wherein the central node with small volatility is the community core of the target user group.
9. An electronic device for community core persona mining, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein: the processor when executing the program performs the steps of the community core people mining method of any of claims 1-7.
10. A readable storage medium for community core persona mining, having a computer program stored thereon, characterized by: the computer program is executed by a processor to perform the steps of the community core persona mining method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910914473.5A CN110825935A (en) | 2019-09-26 | 2019-09-26 | Community core character mining method, system, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910914473.5A CN110825935A (en) | 2019-09-26 | 2019-09-26 | Community core character mining method, system, electronic equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110825935A true CN110825935A (en) | 2020-02-21 |
Family
ID=69548395
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910914473.5A Pending CN110825935A (en) | 2019-09-26 | 2019-09-26 | Community core character mining method, system, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110825935A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111949835A (en) * | 2020-07-13 | 2020-11-17 | 北京明略软件系统有限公司 | Data processing method and device |
CN112100427A (en) * | 2020-09-03 | 2020-12-18 | Oppo广东移动通信有限公司 | Video processing method and device, electronic equipment and storage medium |
CN113761080A (en) * | 2021-04-01 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Community division method, device, equipment and storage medium |
CN114547143A (en) * | 2022-02-15 | 2022-05-27 | 支付宝(杭州)信息技术有限公司 | Core business object mining method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102202012A (en) * | 2011-05-30 | 2011-09-28 | 中国人民解放军总参谋部第五十四研究所 | Group dividing method and system of communication network |
CN103020302A (en) * | 2012-12-31 | 2013-04-03 | 中国科学院自动化研究所 | Academic core author excavation and related information extraction method and system based on complex network |
CN108509551A (en) * | 2018-03-19 | 2018-09-07 | 西北大学 | A kind of micro blog network key user digging system under the environment based on Spark and method |
CN108509607A (en) * | 2018-04-03 | 2018-09-07 | 三盟科技股份有限公司 | A kind of community discovery method and system based on Louvain algorithms |
US20190044821A1 (en) * | 2017-08-01 | 2019-02-07 | Elsevier, Inc. | Systems and methods for extracting structure from large, dense, and noisy networks |
-
2019
- 2019-09-26 CN CN201910914473.5A patent/CN110825935A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102202012A (en) * | 2011-05-30 | 2011-09-28 | 中国人民解放军总参谋部第五十四研究所 | Group dividing method and system of communication network |
CN103020302A (en) * | 2012-12-31 | 2013-04-03 | 中国科学院自动化研究所 | Academic core author excavation and related information extraction method and system based on complex network |
US20190044821A1 (en) * | 2017-08-01 | 2019-02-07 | Elsevier, Inc. | Systems and methods for extracting structure from large, dense, and noisy networks |
CN108509551A (en) * | 2018-03-19 | 2018-09-07 | 西北大学 | A kind of micro blog network key user digging system under the environment based on Spark and method |
CN108509607A (en) * | 2018-04-03 | 2018-09-07 | 三盟科技股份有限公司 | A kind of community discovery method and system based on Louvain algorithms |
Non-Patent Citations (2)
Title |
---|
张玉琢: "《数据结构实验教程》", 31 August 2018 * |
杨济海: "基于复杂网络的电力通信网拓扑分析与优化", 《计算机与数字工程》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111949835A (en) * | 2020-07-13 | 2020-11-17 | 北京明略软件系统有限公司 | Data processing method and device |
CN112100427A (en) * | 2020-09-03 | 2020-12-18 | Oppo广东移动通信有限公司 | Video processing method and device, electronic equipment and storage medium |
CN113761080A (en) * | 2021-04-01 | 2021-12-07 | 京东城市(北京)数字科技有限公司 | Community division method, device, equipment and storage medium |
CN114547143A (en) * | 2022-02-15 | 2022-05-27 | 支付宝(杭州)信息技术有限公司 | Core business object mining method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110825935A (en) | Community core character mining method, system, electronic equipment and readable storage medium | |
CN109978142A (en) | The compression method and device of neural network model | |
CN105592405B (en) | The mobile communication subscriber group configuration method propagated based on factions' filtering and label | |
CN107784327A (en) | A kind of personalized community discovery method based on GN | |
CN113190939B (en) | Large sparse complex network topology analysis and simplification method based on polygon coefficient | |
CN104391879B (en) | The method and device of hierarchical clustering | |
CN110913405B (en) | Intelligent communication system testing method and system based on scene grading and evaluation feedback | |
Barazandeh et al. | A decentralized adaptive momentum method for solving a class of min-max optimization problems | |
Hennessey et al. | A simplification algorithm for visualizing the structure of complex graphs | |
CN116012161A (en) | Risk analysis method, device and equipment for user group | |
CN115309985A (en) | Fairness evaluation method and AI model selection method of recommendation algorithm | |
Han et al. | Opportunistic coded distributed computing: An evolutionary game approach | |
CN108738028A (en) | A kind of cluster-dividing method that super-intensive group is off the net | |
CN111339376B (en) | Method and device for clustering network nodes | |
CN103051476A (en) | Topology analysis-based network community discovery method | |
CN113626657A (en) | Method for discovering densely connected sub-networks by multi-value attribute graph structure | |
CN108737158B (en) | Social network hierarchical community discovery method and system based on minimum spanning tree | |
CN113657136A (en) | Identification method and device | |
CN109886313A (en) | A kind of Dynamic Graph clustering method based on density peak | |
US20140126820A1 (en) | Local Image Translating Method and Terminal with Touch Screen | |
Beddar-Wiesing | Student Research Abstract: Using Local Activity Encoding for Dynamic Graph Pooling in Stuctural-Dynamic Graphs | |
Beddar-Wiesing | Using local activity encoding for dynamic graph pooling in stuctural-dynamic graphs: student research abstract | |
CN115086179B (en) | Detection method for community structure in social network | |
CN112256924A (en) | Social network structure identification method based on form concept interestingness | |
Baskakov et al. | Modeling of the Multiple Paths Finding Algorithm for Software-Defined Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200221 |