CN112765329A

CN112765329A - Method and system for discovering key nodes of social network

Info

Publication number: CN112765329A
Application number: CN202011628252.0A
Authority: CN
Inventors: 王建民; 沈恩亚; 太志伟; 宋怡然
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-07
Anticipated expiration: 2040-12-31
Also published as: CN112765329B

Abstract

The invention provides a method and a system for discovering key nodes of a social network, wherein the method comprises the following steps: clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk; combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing a biased random walk path corpus according to the plurality of biased random walk path documents; and extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords. The invention can more comprehensively acquire the importance degree of node pair clustering; meanwhile, in the obtained result, different nodes cannot have the same importance degree value, have good discrimination and are suitable for the non-connected graph.

Description

Method and system for discovering key nodes of social network

Technical Field

The invention relates to the technical field of computer network analysis, in particular to a social network key node discovery method system.

Background

The concept of node centrality, originally derived from the research of social networks, refers to the importance degree of nodes in a social network, and is gradually expanded to other graph data such as biological protein networks, and becomes a method for describing the importance degree of nodes in various graph data.

The most direct idea for judging the importance Degree of the node is to query the links between the node and other nodes, and the Degree centrality judges the importance Degree of the node by counting the Degree of entrance and exit of each node (selecting the Degree of entrance and exit or the sum of the Degree of entrance and exit according to the actual meaning of the graph data and different research requirements). The method is intuitive and easy to understand, high in calculation efficiency and wide in application range, but has a great limitation: degrece locality can only reflect static local link relationships near each node. Other existing methods also have a series of limitations for judging the task of representing key nodes of each part in the graph, which are specifically expressed as follows: 1. the importance degree of the nodes to the whole graph is mostly considered, and the importance degree is violated with the task of extracting key nodes of all parts of the graph to a certain extent; 2. a series of node centrality calculation methods (such as betweenness centre, closeness centre, and harmonic centre) based on the shortest path are not suitable for the non-connected graph; 3. a series of methods for simply counting the topological information of the graph are adopted, the obtained node centrality numerical value lacks of discrimination, and the situation that a plurality of nodes are identical in importance degree easily occurs.

Therefore, a method and a system for discovering a key node of a social network are needed to solve the above problems.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a method and a system for discovering key nodes of a social network.

The invention provides a method for discovering key nodes of a social network, which comprises the following steps:

clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk;

combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing a biased random walk path corpus according to the plurality of biased random walk path documents;

and extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords.

According to the method for discovering the key node of the social network, the biased random walk is performed, and the corresponding biased random walk path is obtained according to the clustered node, and the method comprises the following steps:

and by the biased random walk, taking each node in each cluster as a starting point to perform random walk to obtain a corresponding biased random walk path.

According to the method for discovering the key nodes of the social network, provided by the invention, the method for extracting the keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords comprises the following steps:

taking the node in the target graph data as a word in the biased random walk path corpus, and acquiring the inverse document frequency of the node in the target graph data in the biased random walk path corpus;

acquiring the word frequency of the node in the corresponding biased random walk path document according to the cluster to which the node in the target graph data belongs;

and acquiring the node importance degree of each node in each cluster of the target graph data according to the inverse document frequency and the word frequency, and sequencing the nodes in each cluster according to the node importance degree to obtain the key nodes of the target graph data according to the sequencing result.

According to the method for discovering the key nodes of the social network, the step of obtaining the inverse document frequency of the nodes in the target graph data in the biased random walk path corpus comprises the following steps:

acquiring the total number of documents in the corpus of the biased random walk paths;

acquiring the document number of documents of a biased random walk path document containing a target node, wherein the target node is a node of the importance degree of a node to be calculated;

and acquiring a first ratio according to the total number of the documents and the number of the documents, and obtaining the inverse document frequency through the natural logarithm of the first ratio.

According to the method for discovering the key nodes of the social network, provided by the invention, the word frequency of the nodes in the corresponding biased random walk path document is obtained according to the cluster to which the nodes in the target graph data belong, and the method comprises the following steps:

acquiring the occurrence frequency of nodes appearing in biased random walk path documents corresponding to the clusters to which the nodes belong;

acquiring the total number of all nodes in a biased random walk path document corresponding to a cluster to which the node belongs;

and acquiring a second ratio according to the occurrence times of the nodes and the total number of the nodes, and acquiring the word frequency of the nodes in the corresponding biased random walk path document according to the second ratio.

According to the method for discovering the key nodes of the social network, the node importance degree of each node in each cluster of the target graph data is obtained according to the inverse document frequency and the word frequency, and the method comprises the following steps:

and multiplying the value of the inverse document frequency and the value of the word frequency, and acquiring the node importance degree of each node in each cluster according to the multiplication result.

The invention also provides a system for discovering the key nodes of the social network, which comprises the following steps:

the clustering module is used for clustering nodes of target graph data in the social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk;

the biased random walk path corpus establishing module is used for combining biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents and establishing the biased random walk path corpus according to the plurality of biased random walk path documents;

and the key node discovery module is used for extracting key words from the biased random walk path corpus, acquiring the key words in each biased random walk path document and obtaining the key nodes of the target graph data according to the key words.

According to the system for discovering the key nodes of the social network provided by the invention, the clustering module comprises:

and the biased random walk path construction unit is used for carrying out random walk by taking each node in each cluster as a starting point through biased random walk to obtain a corresponding biased random walk path.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the social network key node discovery method.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the social network key node discovery method as described in any of the above.

According to the method and the system for discovering the key nodes of the social network, the node centrality problem is converted into the keyword extraction problem in the natural language processing field through the biased random walk path, representative key nodes are respectively extracted from the clustered nodes in the social network graph data based on the keyword extraction, and compared with the traditional method, the importance degree of node pair clustering can be more comprehensively known; meanwhile, in the obtained result, different nodes cannot have the same importance degree value, have good discrimination and are suitable for the non-connected graph.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for discovering a key node of a social network according to the present invention;

FIG. 2 is a schematic structural diagram of a social network key node discovery system provided in the present invention;

fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for discovering a key node of a social network according to the present invention, and as shown in fig. 1, the present invention provides a method for discovering a key node of a social network, including:

step 101, clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk.

In the invention, for given graph data (namely target graph data in a social network), clustering is carried out on nodes in the graph data by using a clustering algorithm, so that all the nodes in the graph data are divided into a series of clusters; then, starting from each starting point in each cluster as a starting point, and constructing a plurality of fixed-length biased random walk paths based on the biased random walk technology of the node2 vec.

And 102, combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing and obtaining a biased random walk path corpus according to the plurality of biased random walk path documents.

In the invention, the random walk paths with each node as a starting point in each cluster are analogized to sentences in the Natural Language Processing (NLP) field, and the biased random walk paths of the same cluster are combined to form a biased random walk path document representing each cluster in the graph data. Furthermore, all documents with biased random walk paths are combined into a corpus with biased random walk paths, so that the extraction problem of each clustering key node in the graph data is converted into the extraction problem of each document keyword in the corpus, and the graph data information is converted into the corpus with biased random walk paths with the minimum unit as a node.

Step 103, extracting keywords from the biased random walk path corpus to obtain keywords in each biased random walk path document, so as to obtain key nodes of the target graph data according to the keywords.

In the invention, a keyword extraction technology in the NLP field is used for a corpus of biased random walk paths, the central importance degree of the nodes is calculated through a classic algorithm TF-IDF (term frequency-inverse document frequency), keywords of each document of the biased random walk paths in the corpus of the biased random walk paths, namely key nodes of each cluster in graph data, are extracted, so that the importance degree and the ranking order of each node in the corresponding cluster of the document of the biased random walk paths are obtained, and the nodes ranked in the front are used as the key nodes of each cluster according to the ranking order (the ranking is performed in a descending order by the invention).

According to the method for discovering the key nodes of the social network, the node centrality problem is converted into the keyword extraction problem in the natural language processing field through the biased random walk path, representative key nodes are respectively extracted from the clustered nodes in the social network graph data based on the keyword extraction, and compared with the traditional method, the importance degree of node pair clustering can be more comprehensively known; meanwhile, in the obtained result, different nodes cannot have the same importance degree value, have good discrimination and are suitable for the non-connected graph.

On the basis of the above embodiment, obtaining a corresponding biased random walk path according to the clustered nodes through biased random walk includes:

On the basis of the above embodiment, the extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords includes:

On the basis of the foregoing embodiment, the obtaining an inverse document frequency of the node in the target graph data in the biased random walk path corpus includes:

In the present invention, for each node in the original image data, the Inverse Document Frequency (IDF) of the node in the biased random walk path corpus is calculated. Specifically, the calculation method is to obtain the number of documents containing the target node, obtain the sum of the number of the documents plus one, divide the total number of the documents in the corpus with the biased random walk path by the sum of the number of the documents plus one, and obtain the natural logarithm of the obtained quotient, thereby obtaining the inverse document frequency.

On the basis of the above embodiment, the obtaining the word frequency of the node in the corresponding biased random walk path document according to the cluster to which the node in the target graph data belongs includes:

In the invention, for each node in the original image data, the word Frequency (TF) of each node in the biased random walk path document corresponding to the cluster to which the node belongs is calculated. Specifically, the calculation method is that the number of times that a node appears in a biased random walk path document is divided by the total number of nodes of the document, and the obtained quotient is used as the word frequency.

On the basis of the above embodiment, the obtaining the node importance degree of each node in each cluster of the target graph data according to the inverse document frequency and the word frequency includes:

In the invention, the TF value and the IDF value of each node are multiplied to obtain the relative importance degree result of all the nodes in each cluster of the graph data, and then the result is used for sorting in a descending order according to the clusters to obtain the representative nodes in each cluster, namely the key nodes capable of representing all parts in the graph.

Fig. 2 is a schematic structural diagram of a social network key node discovery system provided by the present invention, and as shown in fig. 2, the present invention provides a social network key node discovery system, which includes a clustering module 201, a biased random walk path corpus building module 202, and a key node discovery module 203, where the clustering module 201 is configured to cluster nodes of target graph data in a social network, and obtain corresponding biased random walk paths according to the clustered nodes through biased random walks; the biased random walk path corpus construction module 202 is configured to combine biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and construct a biased random walk path corpus according to the plurality of biased random walk path documents; the key node discovery module 203 is configured to perform keyword extraction on the biased random walk path corpus, obtain a keyword in each biased random walk path document, and obtain a key node of the target graph data according to the keyword.

According to the social network key node discovery system, the node centrality problem is converted into a keyword extraction problem in the natural language processing field through a biased random walk path, representative key nodes are respectively extracted from each clustered node in the social network graph data based on the keyword extraction, and compared with the traditional method, the importance degree of node pair clustering can be more comprehensively known; meanwhile, in the obtained result, different nodes cannot have the same importance degree value, have good discrimination and are suitable for the non-connected graph.

On the basis of the above embodiment, the clustering module includes:

The system provided by the embodiment of the present invention is used for executing the above method embodiments, and for details of the process and the details, reference is made to the above embodiments, which are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)301, a communication interface (communication interface)302, a memory (memory)303 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 complete communication with each other through the communication bus 304. Processor 301 may invoke logic instructions in memory 303 to perform a social network key node discovery method comprising: clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk; combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing a biased random walk path corpus according to the plurality of biased random walk path documents; and extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords.

In addition, the logic instructions in the memory 303 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the social network key node discovery method provided by the above methods, the method including: clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk; combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing a biased random walk path corpus according to the plurality of biased random walk path documents; and extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the social network key node discovery method provided in the foregoing embodiments, the method including: clustering nodes of target graph data in a social network, and acquiring corresponding biased random walk paths according to the clustered nodes through biased random walk; combining the biased random walk paths of the same cluster to obtain a plurality of biased random walk path documents, and constructing a biased random walk path corpus according to the plurality of biased random walk path documents; and extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for discovering key nodes of a social network is characterized by comprising the following steps:

2. The method for discovering the key node of the social network according to claim 1, wherein the obtaining the corresponding biased random walk path according to the clustered nodes through the biased random walk comprises:

3. The method for discovering key nodes in a social network according to claim 1, wherein the extracting keywords from the biased random walk path corpus to obtain the keywords in each biased random walk path document so as to obtain the key nodes of the target graph data according to the keywords comprises:

4. The method for discovering key nodes in a social network according to claim 3, wherein the obtaining of the inverse document frequency of the nodes in the target graph data in the biased random walk path corpus comprises:

5. The method for discovering key nodes in a social network according to claim 3, wherein the obtaining of the word frequency of the node in the corresponding biased random walk path document according to the cluster to which the node in the target graph data belongs comprises:

6. The method of claim 3, wherein the obtaining the node importance degree of each node in each cluster of the target graph data according to the inverse document frequency and the word frequency comprises:

7. A social network key node discovery system, comprising:

8. The social network key node discovery system of claim 7, wherein said clustering module comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the social network key node discovery method according to any one of claims 1 to 6.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, performs the steps of the social network key node method of any of claims 1 to 6.