CN110213164B

CN110213164B - Method and device for identifying network key propagator based on topology information fusion

Info

Publication number: CN110213164B
Application number: CN201910423580.8A
Authority: CN
Inventors: 钱琳; 梅竹; 俞俊; 朱广新; 庞恒茂; 许明杰; 王琳; 梅峰; 王剑; 陈海洋
Original assignee: State Grid Zhejiang Electric Power Co Ltd; NARI Group Corp; Nari Technology Co Ltd; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Current assignee: State Grid Zhejiang Electric Power Co Ltd; NARI Group Corp; Nari Technology Co Ltd; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2021-06-08
Anticipated expiration: 2039-05-21
Also published as: CN110213164A

Abstract

The invention discloses a method and a device for identifying a network key propagator based on topology information fusion, wherein the method comprises the following steps: calculating the degree of each node in the network according to the number of the directly connected edges of each network node, namely the number of neighbor nodes corresponding to each node; calculating the H index of each node according to the node degree; and calculating the shortest distance between each pair of nodes in the network, and further obtaining the ranking corresponding score of each node. According to the method, the ranking score capable of comprehensively reflecting the roles of the social network nodes is obtained by adopting the common indexes of the network nodes, the positioning roles of the nodes in the social network can be more accurately positioned, key propagators in the social network are accurately excavated, and the misjudgment rate is reduced; in addition, the Dijkstra algorithm is adopted, compared with other distance algorithms, the time complexity is low, the time cost is lower when the method is suitable for a complex social network, and the identification efficiency is improved.

Description

Method and device for identifying network key propagator based on topology information fusion

Technical Field

The invention relates to the field of network information mining, in particular to a method and a device for identifying a network key propagator based on topology information fusion.

Background

Due to the heterogeneity of social networks, the role played by each individual in network structure and function varies greatly. Key propagators are special individuals that can affect the structure and function of the social network to a greater extent. For example, microboda V may accelerate the spread of rumors in social networks. Therefore, key propagators in the social network need to be accurately discovered from a large number of users, so that the propagation of the social network information can be better controlled, and the key for solving the problems lies in the design of a node sequencing method for the social network.

Currently, most node ranking methods only use structural information of the network, and are mainly classified into neighbor-based centrality and path-based centrality. The neighbor centrality-based representation includes: centrality, H-index and k-shell decomposition method. The representation based on path centrality includes: near centrality and intermediate centrality. However, the classical node sorting method cannot fully evaluate the roles of the nodes in the network.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a method for identifying network key propagators based on topology information fusion, which can solve the problems of incomplete roles and low accuracy of evaluation nodes in a network, and on the other hand, the invention also provides a device for identifying network key propagators based on topology information fusion.

The technical scheme is as follows: the invention discloses a method for identifying a network key propagator based on topology information fusion, which comprises the following steps:

acquiring friend list data in social software with individuals as nodes, wherein if the individuals are in a friend relationship, a straight connecting edge exists between the two corresponding nodes, so that the number of the straight connecting edges corresponding to each node is obtained;

determining the degree of each node in the network according to the number of the directly connected edges of each node;

calculating an H index of each node according to the degree of the node, wherein the H index is used for representing the direct influence of individuals in the social network;

and calculating the shortest distance between each pair of nodes in the network, and calculating the ranking corresponding score of each node according to the H index and the shortest distance so as to obtain a key propagator in the network, wherein the shortest distance between the nodes is used for representing the propagation position of the individual in the social network.

Further, comprising:

the calculating the H index of each node according to the degree of the node specifically includes:

counting the degrees of the neighbor nodes corresponding to the nodes, and judging whether the degree of one neighbor node is not less than 1 when h is equal to 1;

if not, the cycle is stopped, the H index is equal to 1,

if so, h +1, and iterating and circulating until the degree of at most h neighbor nodes is not less than h, wherein h is more than or equal to 1 and less than or equal to the number of the neighbor nodes.

Further, comprising:

and calculating the shortest distance between each pair of nodes in the network by adopting a Dijkstra algorithm.

Further, comprising:

the method for calculating the shortest distance between the nodes by adopting the Dijkstra algorithm specifically comprises the following steps:

inputting: a network adjacency matrix to which the network corresponds,

the method comprises the following steps: (1) specifying a starting point s;

(2) introducing two sets S and U, wherein S is used for recording nodes with solved shortest paths and corresponding shortest path lengths, and U is used for recording nodes with not solved shortest paths and the distance from the nodes to a starting point S;

(3) initially, only a starting point S exists in S; u is a node except s, and the path of the node in U is the path from the starting point s to the node; then finding out the node with the shortest path from the U, and adding the node into the S; then, updating the nodes in the U and the paths corresponding to the nodes, finding out the node with the shortest path from the U, and adding the node into the S; then, updating nodes in the U and paths corresponding to the nodes; repeating the operation until all the nodes are traversed;

and (3) outputting: a distance matrix between nodes in the network.

Further, comprising:

the ranking score formula of each node is as follows:

wherein H (j) is H index corresponding to j network node, d_i,jRepresenting the shortest distance between nodes i and j.

An apparatus for identifying network key propagators based on topology information fusion, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring friend list data in social software with individuals as nodes, and if the individuals are in a friend relationship, a straight connecting edge exists between two corresponding nodes, so that the number of the straight connecting edges corresponding to each node is obtained;

the node neighbor calculation module is used for determining the degree of each node in the network according to the number of the directly connected edges of each node;

the H index calculation module is used for calculating the H index of each node according to the degree of the node, and the H index is used for representing the direct influence of an individual in a social network;

and the score calculation module is used for calculating the shortest distance between each pair of nodes in the network, calculating the ranking corresponding score of each node according to the H index and the shortest distance, and further obtaining key propagators in the network, wherein the shortest distance between the nodes is used for representing the propagation positions of individuals in the social network.

Further, comprising:

in the H index calculation module, the calculating the H index of each node according to the degree of the node specifically includes:

if not, the cycle is stopped, the H index is equal to 1,

Further, comprising:

in the score calculation module, the shortest distance between each pair of nodes in the network is calculated by adopting a Dijkstra algorithm.

Further, comprising:

inputting: a network adjacency matrix to which the network corresponds,

the method comprises the following steps: (1) specifying a starting point s;

and (3) outputting: a distance matrix between nodes in the network.

Further, comprising:

in the score calculation module, the ranking score formula of each node is as follows:

Has the advantages that: according to the method, the ranking score which can comprehensively reflect the roles of the social network nodes is obtained by adopting the common indexes of the network nodes, the nodes can be more accurately positioned in the social network, key propagators in the social network are accurately excavated, and the misjudgment rate is reduced; in addition, the Dijkstra algorithm is adopted, compared with other distance algorithms, the time complexity is low, the time cost is lower when the method is suitable for a complex social network, and the identification efficiency is improved.

Drawings

FIG. 1 is a flow chart of an identification method in an embodiment of the invention;

FIG. 2 is a network node connection diagram according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a structure of an identification device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition to the number of individuals that can be directly influenced, whether the position of the node where the node is located is sufficiently central determines the influence of the node on the propagation, so the node ranking method needs to consider the influence of the node itself (the number of individuals with whom the node is closely related in the social network) and the position of the node in the network (the position of the node where the node is located in the social network). The classical node sequencing methods do not consider the roles of the two aspects at the same time, so that the roles of the nodes in the network cannot be comprehensively evaluated by applying the conventional methods.

The invention provides a method for identifying network key propagators based on topology information fusion, which adopts an H index to represent the direct influence of individuals in a social network and adopts the shortest distance between nodes to represent the propagation position of the individuals in the social network, and on the basis, the two are effectively combined to further achieve the purpose of comprehensively reflecting the roles of the nodes of the social network.

The present invention does not limit the types of networks involved, and as shown in fig. 1, the method for identifying a network key propagator based on topology information fusion in the embodiment of the present invention includes:

s100, collecting data in social software with individuals as nodes, wherein if the individuals are in a friend relationship, a connecting edge exists between the two corresponding nodes;

specifically, the method does not limit specific social software, if the direct friend relationship exists in the friend list of the individual, the friend list relationship in the corresponding database in the social software is acquired, if the friend relationship exists between the two individuals, a direct connection edge exists, if the friend relationship does not exist between the two individuals, the direct connection edge does not exist, and the individuals which are not friends are mutually influenced by the individuals which are friends at the same time.

S110, determining the degree of each node in the network according to the number of the straight edges of each network node.

Numbering nodes in the network, determining the number of the directly connected edges corresponding to each node, namely representing the degree corresponding to the node as K (i), wherein i is the number of the network node, i is more than or equal to 1 and less than or equal to N, N is the total number of the nodes in the network, and meanwhile, the nodes with the directly connected edges of the node are neighbor nodes corresponding to the node.

S120, calculating H indexes (H (i)) of all nodes according to the degrees of the nodes, wherein the H indexes are used for representing the direct influence of individuals in the social network.

Specifically, the operator H (·) acts on a set of real numbers, a non-negative integer is returned, that is, the H index H of the set of real numbers, and the operator H (·) acts on the degrees of all neighbors of a node to obtain the H index H (i) of the node. In this embodiment, the size of the H-index of a node in the social network characterizes the direct influence of its individual.

And counting the degrees of the neighbor nodes corresponding to the nodes, judging whether the degree of one neighbor node is not less than 1 when H is equal to 1, stopping circulation if H is not equal to 1, if H is equal to 1, and if H is equal to 1, H +1, and iterating circulation until the degree of H neighbor nodes at most is not less than H, wherein H is not less than 1 and is not more than the number of the neighbor nodes. And obtaining the H index corresponding to each node.

S130, calculating the shortest distance d between each pair of nodes in the network_i,jAnd then calculating the ranking corresponding score of each node to further obtain key propagators in the network, wherein the shortest distance between the nodes is used for representing the propagation position of the individual in the social network.

Further, in the embodiment of the present invention, each step is calculated by using Dijkstra algorithmTo the shortest distance d between the nodes_i,jWherein, i, j is the network node number, Dijkstra algorithm is a shortest path algorithm generated according to the ascending order of the path length, and specifically comprises:

the directed graph with the weight value is stored by adopting an adjacent matrix graph, in the calculation, a two-dimensional array of n x n is used for storage, the array comprises index numbers, the value of the two-dimensional array represents the weight value between nodes, and if two nodes cannot pass through, the maximum positive integer in the computer is used for representation.

Specifically, the method comprises the following steps: (1) the starting point s is specified, i.e. the calculation is started from node s.

(2) Two sets S and U are introduced. The role of S is to record the nodes for which the shortest path has been found and the corresponding shortest path lengths, while U is to record the nodes for which the shortest path has not been found and the distance from the node to the starting point S.

(3) Initially, only a starting point S exists in S; in U are nodes other than s, and the path of a node in U is the "path from the origin s to the node". Then finding out the node with the shortest path from the U, and adding the node into the S; then, the nodes in U and the paths corresponding to the nodes are updated. Then finding out the node with the shortest path from the U, and adding the node into the S; then, the nodes in U and the paths corresponding to the nodes are updated. And repeating the operation until all the nodes are traversed.

Further, in the embodiment of the present invention, the rank corresponding score of each node is further obtained, and the public expression is as follows:

according to the formula, the higher the ranking score is, the more important the node is, namely, the higher the ranking score is, the more the node reflects the key propagator in the network.

It should be noted that the method flowchart in the embodiment of the present invention is to more clearly illustrate the technical solution in the embodiment of the present invention, and is not limited to the technical solution provided in the embodiment of the present invention, and the embodiment of the present invention is also not limited to the application of a social network, and for other system structures and business applications, the technical solution provided in the embodiment of the present invention is also applicable to similar problems.

The following detailed description of the embodiments of the present invention will be made with reference to the example network of fig. 2.

Step 1: the degree of each node is calculated.

Taking node 1 as an example, node 1 has 3 neighbors, so K (1) ═ 3. The degrees of all nodes are shown in table 1:

TABLE 1 degrees of each node

Step 2: and calculating the H index of each node. Taking node 1 as an example, the neighbors of node 1 are nodes 2, 3 and 4, respectively, and the corresponding degrees are 4, 4 and 4, respectively, because the neighbors of node 1 have a degree of at most 3 nodes not less than 3, so H (1) ═ 3.

Taking node 4 as an example, the neighbors of node 4 are nodes 1, 2, 5, and 6, respectively, and the corresponding degrees are 3, 4, and 2, respectively, it can be seen that there are 3 numbers (3, 4) not less than 3, but there are no 4 numbers not less than 4 in the sequence of {3, 4, 2}, i.e., only 2 numbers not less than 4, that is, the degrees of node 2 and node 5, and therefore there are at most 3 numbers not less than 3, so the H index of all nodes with an H index of 3 is shown in table 2:

TABLE 2H-index of each node

And step 3: calculating the shortest distance d between each pair of nodes in the network_ij. Taking node pair (1, 6) as an example, d_1,6＝2。

The input is the corresponding network adjacency matrix, and the output is the distance matrix, and the corresponding adjacency matrix in this embodiment is represented as:

0 1 1 1 0 0 0 0 0 0

1 0 1 1 0 0 0 0 1 0

1 1 0 0 0 0 0 0 1 1

1 1 0 0 1 1 0 0 0 0

0 0 0 1 0 1 1 1 0 0

0 0 0 1 1 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0

0 1 1 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 0 0

taking node 1 as an example, since there are straight edges connecting with nodes 2, 3, and 4, the corresponding position is 1, and since there are no straight edges connecting with the remaining six nodes, the corresponding position is 0. The shortest distance matrix between all pairs of nodes is output, see table 3:

TABLE 3 shortest distance between pairs of nodes

And 4, step 4: and calculating the ranking score of each node, wherein the higher the ranking score is, the more important the node is.

Taking the node 1 as an example,

the scores for all nodes are shown in table 4:

TABLE 4 score of each node

From the scores of table 4, a final node importance ranking can be obtained, see table 5:

TABLE 5 importance ranking of nodes

As can be seen from table 5, node 4 is the most important. Fig. 2 shows that the node 4 is connected with two small communities on the left and right, and the node 4 in the necessary path is the most important node. Reflected in a real social network, the node 4 acts as an individual for both communities at the same time and is not replaceable. If the node 4 acts as a propagator, the message will quickly spread between the left and right communities.

Based on the foregoing embodiment, referring to fig. 3, in the embodiment of the present invention, an apparatus for identifying a network key propagator based on topology information fusion specifically includes:

the acquisition module 20 is configured to acquire data in social software using individuals as nodes, and if the individuals are in a friend relationship, a connecting edge exists between two corresponding nodes;

the node neighbor calculation module 21 is configured to determine the degree of each node in the network according to the number of the directly connected edges of each node;

the H index calculation module 22 is used for calculating the H index of each node according to the degree of the node, and the H index is used for representing the direct influence of individuals in the social network;

and the score calculation module 23 is configured to calculate a ranking corresponding score of each node according to the H index and the shortest distance after calculating the shortest distance between each pair of nodes in the network, so as to obtain a key propagator in the network, where the shortest distance between the nodes is used to represent a propagation position of an individual in a social network.

Further, comprising:

in the H index calculation module 22, the calculating the H index of each node according to the node degree specifically includes:

if not, the cycle is stopped, the H index is equal to 1,

Further, comprising:

in the score calculation module 23, the shortest distance between each pair of nodes in the network is calculated by using Dijkstra algorithm.

Further, comprising:

inputting: a network adjacency matrix to which the network corresponds,

the method comprises the following steps: (1) specifying a starting point s;

and (3) outputting: a distance matrix between nodes in the network.

Further, comprising:

in the score calculating module 23, the ranking score formula of each node is:

Referring to fig. 4, in an embodiment of the invention, a structural diagram of an electronic device is shown.

An embodiment of the present invention provides an electronic device, which may include a processor 310 (CPU), a memory 320, an input device 330, an output device 340, and the like, wherein the input device 330 may include a keyboard, a mouse, a touch screen, and the like, and the output device 340 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.

Memory 320 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 310 with program instructions and data stored in memory 320. In an embodiment of the present invention, the memory 320 may be used to store the program of the method for identifying the network key propagator based on topology information fusion.

The processor 310 is configured to execute the steps of any one of the above methods for identifying network key propagators based on topology information fusion according to the obtained program instructions by calling the program instructions stored in the memory 320.

Based on the foregoing embodiments, in the embodiments of the present invention, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for identifying a network key propagator based on topology information fusion in any of the above method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A method for identifying network key propagators based on topology information fusion is characterized by comprising the following steps:

collecting friend list data with individuals as nodes in a network;

obtaining the number of connecting edges corresponding to each node according to the friend list data; if the individuals are in a friend relationship, a direct connection edge exists between the two corresponding nodes;

determining the degree of each node in the network according to the number of the directly connected edges of each node; the degree corresponding to the node is: numbering nodes in a network, and determining the number of directly connected edges corresponding to each node;

counting the degree of a neighbor node corresponding to the node;

when h is 1, judging whether the degree of one neighbor node is not less than 1;

if not, the cycle is stopped, the H index is equal to 1,

if so, h +1, and iterating and circulating until the degree of at most h neighbor nodes is not less than h, wherein h is an integer and is not less than 1 and not more than the number of the neighbor nodes;

calculating ranking corresponding scores of all nodes according to the H index and the shortest distance between the nodes, and further obtaining key propagators in the network, wherein the shortest distance between the nodes is used for representing propagation positions of individuals in the social network;

the ranking score formula of each node is as follows:

2. The method for identifying key propagators in a network based on topology information fusion as claimed in claim 1, wherein the shortest distance between each pair of nodes in the network is calculated by Dijkstra algorithm.

3. The method for identifying key propagators in a network based on topology information fusion according to claim 1, wherein the calculation process of the shortest distance specifically comprises:

initially, the set S comprises a starting point S; nodes except s are in the set U, and the path of a node in U is the path from the starting point s to the node; the set S is used for recording the nodes with the shortest paths already solved and the corresponding shortest path lengths, and the set U is used for recording the nodes with the shortest paths not yet solved and the distances from the nodes to the starting point S;

finding out the node with the shortest path from the U according to a pre-obtained network adjacency matrix, and adding the node into the S; updating the nodes in the U and the paths corresponding to the nodes, finding out the node with the shortest path from the U, and adding the node into the S; updating nodes in the U and paths corresponding to the nodes; and repeating the operation until all the nodes are traversed to obtain the distance matrix between the nodes.

4. An apparatus for identifying network key propagators based on topology information fusion, comprising:

and the node neighbor calculation module is used for determining the degree of each node in the network according to the number of the directly connected edges of each node, and the degree corresponding to each node is as follows: numbering nodes in a network, and determining the number of directly connected edges corresponding to each node;

counting the degree of a neighbor node corresponding to the node;

if not, the cycle is stopped, the H index is equal to 1,

the score calculation module is used for calculating the shortest distance between each pair of nodes in the network, calculating the ranking corresponding score of each node according to the H index and the shortest distance, and further obtaining key propagators in the network, wherein the shortest distance between the nodes is used for representing the propagation positions of individuals in the social network;

the ranking score formula of each node is as follows:

5. The device for identifying key propagators in a network based on topology information fusion according to claim 4, wherein in the score calculating module, the shortest distance between each pair of nodes in the network is calculated by Dijkstra algorithm.

6. The device for identifying key propagators in a network based on topology information fusion according to claim 5, wherein the computing of the shortest distance between nodes by using Dijkstra algorithm specifically comprises:

inputting: a network adjacency matrix to which the network corresponds,

the method comprises the following steps: (1) specifying a starting point s;

and (3) outputting: a distance matrix between nodes in the network.