CN111274498B - Network characteristic community searching method - Google Patents

Network characteristic community searching method Download PDF

Info

Publication number
CN111274498B
CN111274498B CN202010075210.2A CN202010075210A CN111274498B CN 111274498 B CN111274498 B CN 111274498B CN 202010075210 A CN202010075210 A CN 202010075210A CN 111274498 B CN111274498 B CN 111274498B
Authority
CN
China
Prior art keywords
community
node
function
nodes
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010075210.2A
Other languages
Chinese (zh)
Other versions
CN111274498A (en
Inventor
王宏志
王春楠
陈含笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010075210.2A priority Critical patent/CN111274498B/en
Publication of CN111274498A publication Critical patent/CN111274498A/en
Application granted granted Critical
Publication of CN111274498B publication Critical patent/CN111274498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A network characteristic community searching method belongs to the technical field of network community construction. The method solves the problems of low efficiency and poor adaptability of the existing upgrading community searching and characteristic community searching. According to the internal structural features and the external structural features of the real communities, an internal connection density evaluation function of the community nodes is built; establishing an external connection density evaluation function of the community node; quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community; fusing an internal connection density evaluation function of the community node, an external connection density evaluation function of the community node, quantized attributes of the network community to be searched and a function with given attribute relativity to obtain a function RACCF; and optimizing the function RACCF by using an NSS node selection strategy and an elastic ISC cyclic termination condition to obtain an optimal solution of the target community structure as a final searched characteristic community. The method is suitable for searching the network characteristics.

Description

Network characteristic community searching method
Technical Field
The invention belongs to the technical field of network community construction.
Background
In network-related research efforts, the concept of communities (communities) continues to be of interest. Network structures composed of a large number of nodes and connection relations between nodes are widely used in the fields of computer science, biology, sociology, etc., for example, the world wide web composed of web pages as nodes and links between web pages as edges, and social networks built by artificial nodes and interpersonal relations as edges. In general, communities refer to sub-networks in which internal nodes are more closely connected than external nodes, and various community structures (such as community discovery and community search) in the network are discovered to be helpful for friend recommendation, crime partner identification and protein function prediction, and meanwhile, the selection of propagation hotspots and the update of betweenness centrality in the network can be effectively supported.
Unlike community discovery (community detection), community search (community search) aims to find communities containing a given set of nodes, enabling personalized community information to be quickly obtained, i.e. given one or more nodes, finding communities containing them. However, each node in various network structures in the real world is not meaningless, and related characteristic attributes exist in the network structures, such as a Facebook social network, in which each user is taken as a node, and in the social network which is constructed by taking the interpersonal relationship as an edge, each individual user has different interest, which means that each node in the network graph has characteristics, and how to find communities which contain given node sets and have high characteristic relevance, namely, characteristic community searching is led out. The related research has great social application value, such as social activities of organizing scientific seminars and the like, market decisions of adjusting product advertisement putting and the like, and social platform friend recommendation and the like.
Existing community searching methods include community searching related only to network topology and community searching related to node attributes: the former aims at finding communities that contain a given node set and meet a specific topology such as k-clique, k-core or k-tress; the latter comprehensively considers the topology structure and the node attribute when searching the community containing the given node set, and the returned result community not only needs to meet the specific topology structure, but also enables the attribute of the internal nodes to be as similar as possible. Aiming at the community searching related to node attributes, namely the characteristic community searching problem, the currently proposed methods are ACC (Attributed Community CL-tree index method. Characteristic community CL tree index method) and LocATC (Attributed truss index-based query processing algorithm by means of local extraction. Query processing algorithm based on local exploration and based on attribute truss index), and the method also obtains good searching results in an actual graph network.
From previous studies, community searching is an expansion of community discovery problems under different angles; in general, community discovery aims to discover all community structures in one graph, without other limiting requirements. The community discovery models which are proposed at present are a spectral clustering model, a label propagation model, a local expansion model and the like, but the communities are not searched based on inquiry, so that the most remarkable defects are that the community discovery models are not suitable for quick online community searching, and a large amount of time is consumed for discovering communities in a large-scale graph network. In addition, the existing community searching algorithm can only search communities which contain a given node set and meet specific topological structures such as k-clique, k-core or k-tress, and the algorithms pay little attention to feature graphs related to node attributes, so that a large amount of node attribute information which is helpful for community searching is ignored in searching.
However, the existing two preferred algorithms (ACC and LocATC) for solving the characteristic community searching problem also have certain disadvantages. (1) Lacking popularity, not being able to handle all query nodes provided by a user. For example, when the ACC searches for a target community, all query attributes and a single query node are considered, but other query nodes are ignored, that is, the algorithm cannot fully consider the user requirement, so that a huge deviation exists between the searched community and the target community. (2) Algorithm theory is complex and consumes a lot of time when processing large graph networks. For example, prior to searching for communities: the LocATC needs to construct an AT index (Attributed Truss Index) for the whole graph, so that the method can consume a lot of time and is low in efficiency when aiming AT a large-scale graph, and the LocATC is difficult to migrate and apply to searching in various graph structures which are dynamically changed in real life.
Disclosure of Invention
The invention provides a network characteristic community searching method for solving the problems of low efficiency and poor adaptability of community searching and characteristic community searching of the existing upgrade.
The invention relates to a high-efficiency network characteristic community searching method, which is based on RACCF-ACS algorithm to realize rapid community searching and characteristic community searching, and comprises the following specific steps:
step one, establishing an internal connection density evaluation function of community nodes according to the internal structural characteristics of a real community;
step two, establishing an external connection density evaluation function of community nodes according to the external structural characteristics of the real communities;
thirdly, quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal connection density evaluation function of the community node, an external connection density evaluation function of the community node, quantized attributes of the network community to be searched and a function with given attribute relativity to obtain a function RACCF (S, wq);
and step five, optimizing the RACCF (S, wq) function by using an NSS node selection strategy and an elastic ISC circulation termination condition, and obtaining an optimal solution of the target community structure as a final searched characteristic community.
The function RACSF (S, wq) described in this embodiment comprehensively evaluates the community quality of the node set S and the matching degree of the node set S and the target attribute set Wq.
Further, the NSS node selection strategy adopts a greedy idea, and a node which can improve the RACCF score of the current node set is selected from adjacent nodes each time and is used as a member node to be added;
the elastic ISC loop termination condition is that when the new member node is added so that the quality score of the node set obtained by the function RACSF (S, wq) is reduced, or the quality score of the node set obtained by adding the new member node function RACSF (S, wq) is not improved for a plurality of times, the node selection step is stopped to output the node set with the highest RACSF score.
Further, the specific method for establishing the inter-link density evaluation function of the community node in the step one is as follows:
using the DensityB (S) function:
Figure SMS_1
constructing an inter-link density estimation function Inner (S):
Figure SMS_2
wherein S is a group of nodes in G, G (V, E) is an undirected graph, densityB (S) ranges from [0, n s -1]V is a node in the graph G (V, E), avgDeg (S) is the average degree of the vertex set S, mS is the number of edges in the node set S, mS= | (u, V) ∈E: u ε S, V ε S|, nS is the number of nodes in the node set S, nS= |S|; d (v, S) is the number of internal connection edges of the node v in the node set S, d (v, S) = | (u, v) ∈E: u ε S|; d2 (v, S) is the power of d (v, S) to 2.
Further, in the second step, the establishing an external connection density evaluation function of the community node specifically includes:
Figure SMS_3
external link density assessment function Average-ODF (S): where D (u) is the degree of node u, D (u) = | (u, v) ∈e|.
Further, the specific method for quantifying the function of the correlation between the attribute of the community to be searched and the node set in the step three is as follows:
using the formula:
Figure SMS_4
quantifying the relevance (S, wq) between a given set of feature attributes Wq and a node group S, wherein Attriccore (S, W q ) For the correlation function between the attribute set Wq and the node set of the community to be searched, nodes (S, w) = { v: v e S, w e attr (v) }, nodes (S, w) is the set of all the Nodes covering the attribute w in S, and the attribute w belongs to the attribute set Wq of the community to be searched.
Further, in the fourth step, the obtaining RACSF (S, wq) function is:
Figure SMS_5
wherein, RACSF (S, wq) represents the RACSF value of node set S when the target attribute set is Wq; the value of the RACCF is used for evaluating the quality of the searched node set S, and the higher the score is, the higher the quality of the searched structure is, and the more likely the structure meets the search condition is the target community structure.
Further, the specific method for obtaining the optimal solution of the target community structure in the fifth step is as follows: continuously selecting member nodes of the target community structure by using an NSS node selection strategy until ISC cycle termination conditions are met;
Figure SMS_6
Figure SMS_7
stopping subsequent circulation, and taking OptimalGroup (t) after circulation stopping as an optimal solution of the target community structure; wherein DeRatioL and ConDeNumL are constants, decreaseRatio (t) is the ratio between Score (t) and OptimalScore (t), optimalScore (t) is the maximum RACCF Score obtained by the node set before joining the t-th node recommended by NSS node selection policy, score (t) i ) Group (t) is set for nodes i ) Score (t) is the RACCF Score of the node set Group (t), group (t) i ) Joining t for using NSS node selection policy i The community structure obtained after the nodes are arranged; group (t) is a community structure obtained after adding t nodes for using NSS node selection strategy; min { argmax ti<t Score(t i ) And the node set obtains the current maximum RACCF score, and uses NSS node selection strategy to add the least nodes.
Compared with the prior art, the method overcomes the defects that the existing characteristic community searching algorithm consumes time and labor and cannot perform online quick searching, and achieves good effects in the actual large-scale graph structure searching. The RACCF-ACS algorithm is flexible and efficient, and has strong applicability to dynamic changes in a large-scale graph network. And the time performance of the algorithm only depends on the size of the searched community and is not influenced by the whole graph or larger node variation, so that the online quick community searching of the large graph network can be realized more efficiently.
The RACS-ACS has good adjustability, and researchers can manually control the experiment cycle times according to the time performance and the precision requirement in the actual community searching problem, namely, the parameter values of DeRatioL and ConDeNumL are adjusted. The method has obvious effect, is beneficial to online large-scale realization of ACS characteristic community searching problems, has high searching result precision and less searching time, and fully reflects high efficiency and effectiveness.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIGS. 2 and 3 are graphs comparing the effectiveness evaluation of the feature search problem in different data sets by using the existing algorithms LCTC, ACC and LocATC with the RACCF-ACS algorithm;
FIGS. 4 and 5 are graphs comparing the efficiency assessment of feature lookup problems using the algorithms LCTC, ACC, and LocATC with the RACCF-ACS algorithm;
FIG. 6 is a comparison graph of the effectiveness of the feature lookup problem of the heterogeneous network data set by using the algorithms LCTC, ACC and LocATC for different query nodes and using the RACCF-ACS algorithm;
FIG. 7 is a graph comparing efficiency of different query nodes using LCTC, ACC and LocATC algorithms with the RACCF-ACS algorithm for performing feature search on the heterogeneous network data set;
FIG. 8 is a graph comparing efficiency of different query nodes using the algorithms LCTC, ACC, and LocATC with the RACCF-ACS algorithm for performing Texas state dataset feature lookup questions;
FIG. 9 is a graph comparing efficiency of different query nodes using the algorithms LCTC, ACC, and LocATC with the RACCF-ACS algorithm for performing Texas state dataset feature lookup questions;
FIG. 10 is a graph of the change in effectiveness of the RACS-ACS algorithm to change the value of DeRatio L in a cycle termination condition;
FIG. 11 is a graph of the change in efficiency of the RACS algorithm for varying the value of DeRatio L in a cycle termination condition;
FIG. 12 is a graph showing the change in the effectiveness of the ConDeNumL value RACCF-ACS algorithm in varying the cycle termination conditions;
FIG. 13 is a graph of the efficiency change of the value of ConDeNumL RACS algorithm in varying cycle termination conditions.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
The first embodiment is as follows: the following describes the present embodiment with reference to fig. 1, and the efficient network feature community searching method according to the present embodiment, which is based on an algorithm of RACSF-ACS (an attributed community search algorithm based on a reasonable attributed community scoring function, feature community searching algorithm based on a reasonable feature community scoring equation) to implement fast community searching and feature community searching, specifically includes the following steps:
step one, establishing an internal connection density evaluation function of community nodes according to the internal structural characteristics of a real community;
step two, establishing an external connection density evaluation function of community nodes according to the external structural characteristics of the real communities;
thirdly, quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal connection density evaluation function of the community node, an external connection density evaluation function of the community node, quantized attributes of the network community to be searched and a function with given attribute relativity to obtain a function RACCF (S, wq);
and step five, optimizing the RACCF (S, wq) function by using an NSS node selection strategy and an elastic ISC circulation termination condition, and obtaining an optimal solution of the target community structure as a final searched characteristic community.
The NSS node selection strategy disclosed by the invention comprises the following steps: and selecting a node which can improve the RACCF score of the current node set from adjacent nodes each time by adopting a greedy idea, and taking the node as a member node to be added.
Elastic ISC cycle termination condition: when the RACCF score is greatly reduced by adding the new member node or the RACCF score is not improved by adding the new member node for a plurality of times, stopping the node selection step to output the node set with the highest RACCF score; and (3) optimizing the RACCF (S, wq) function to obtain an optimal solution of the target community structure as a final searched characteristic community.
Further, the specific method for establishing the inter-link density evaluation function of the community node in the step one is as follows:
definition G (V, E) is an undirected graph, S is a set of nodes in G, and the range of the existing Density function is converted into [0, n ] s -1]The modified DensityB function is:
Figure SMS_8
establishing an Inner link density evaluation function Inner (S) using the modified DensityB function:
Figure SMS_9
where V is a node in the graph G (V, E), avgDeg (S) is the average degree of the vertex set S, m S For the number of edges in the node set S, m S =|(u,v)∈E:u∈S,v∈S|,n S N is the number of nodes in the node set S S = |s|; d (v, S) is the number of internal connection edges of the node v in the node set S, d (v, S) = | (u, v) ∈E: u ε S|; d, d 2 (v, S) is the power of d (v, S) to 2.
The present embodiment is based on the definition of high quality communities: i.e. the community node set has denser inter-links and looser outer links, and considering that the Relation (RAS) between the searched community structure and given attributes is an important factor affecting the quality of the searched community, the RACSF function in the present invention should contain 3 function blocks: (1) evaluating a function of inter-link density for a set of nodes; (2) evaluating a function of the density of external links for a set of nodes; (3) A function quantifying the correlation between a given attribute and a set of nodes. For the inter-link Density evaluation function, the invention first improves the Density function in the previous study to have the same value range as AvgDeg, and then combines the two to obtain the final inter-link Density evaluation function Inner (S).
Further, in the second step, the establishing an external connection density evaluation function of the community node specifically includes:
external link density assessment function Average-ODF (S):
Figure SMS_10
where D (u) is the degree of node u, D (u) = | (u, v) ∈e|.
Further, the specific method for quantifying the function of the correlation between the attribute of the community to be searched and the node set in the step three is as follows:
using the formula:
Figure SMS_11
quantifying the relevance (S, wq) between a given set of feature attributes Wq and a node group S, wherein Attriccore (S, W q ) For the correlation function between the attribute set Wq and the node set of the community to be searched, nodes (S, w) = { v: v e S, w e attr (v) }, nodes (S, w) is the set of all the Nodes covering the attribute w in S, and the attribute w belongs to the attribute set Wq of the community to be searched.
The design of RAS quantization function for quantization of the function of correlation between the attribute and the node set needs to satisfy the following three theorem: the notation f (S, wq) indicates the degree of correlation between a given feature Wq and the node group S. (1) If each feature in Wq can be covered by most nodes in S, the value of f (S, wq) will be very large; conversely, fewer node overlay features take on smaller values. (2) The f (S, wq) value is also greater if more diverse features in Wq are covered by the node. (3) If more nodes in S are not related to features in Wq, the value of f (S, wq) is smaller.
Further, in the fourth step, the obtaining RACSF (S, wq) function is:
Figure SMS_12
Figure SMS_13
wherein RACCF (S, W) q ) Representing when the target attribute set is W q The RACCF value of the time node set S; the value of the RACCF is used for evaluating the quality of the searched node set S, and the higher the score is, the higher the quality of the searched structure is, and the more likely the structure meets the search condition is the target community structure.
The invention is inspired by a local expansion model in community discovery, adopts the most common seed node expansion method NSMF (neighbor set merging framework adjacent set merging framework), and has simple and feasible idea, namely, continuously checking adjacent nodes from an input query node to continuously expand communities until a circulation stop condition appears. NSMF (neighbor set merge framework) mainly includes two core parts: NSS node selection policy and ISC cycle termination conditions. In order to improve the accuracy and the efficiency of NSMF, the invention focuses on improving a community quality function value reduction strategy (CQFR), and two parameters are introduced into the method: conDeNumL and DeRatioL facilitate human control of the number of loops performed since the first reduction of the quality function value (RACCF), and then combine the RACCF function with the modified OCQFR to obtain the loop termination condition RISC in NSMF, where the NSS node selection strategy is to combine the RACCF function with CQFO (optimization strategy). The definition of the relevant symbols of the NSMF strategy is shown in fig. 2, and the specific loop termination condition RISC is: if the current cycle satisfies either of the following two conditions, the subsequent cycle is stopped and OptimalGroup (t) is considered as the final result of NSMF.
Further, the specific method for optimizing the RACSF (S, wq) function by using the NSS node selection policy and the elastic ISC cycle termination condition to obtain the optimal solution of the target community structure as the finally found feature community is as follows:
when the cycle satisfies the condition:
Figure SMS_14
Figure SMS_15
stopping subsequent circulation, wherein the searched community member node set is the characteristic community finally searched in the ACS problem (OptimalGroup (t) after circulation is stopped is taken as the optimal solution of the target community structure), and the algorithm effect is optimal when DeRatioL and ConDeNumL are respectively set to 0.7 and 50 as constants; decreaseRatio (t) is the ratio between Score (t) and OptimalScore (t), optimalScore (t) is the maximum RACCF Score obtained by the node set before joining the nth node recommended by the NSS node selection policy, score (t) i ) Group (t) is set for nodes i ) Score (t) is the RACCF Score of the node set Group (t), group (t) i ) Joining t for using NSS node selection policy i The community structure obtained after the nodes are arranged; group (t) is a community structure obtained after adding t nodes for using NSS node selection strategy; min { argmax ti<t Score(t i ) And the node set obtains the current maximum RACCF score, and uses NSS node selection strategy to add the least nodes.
DecreaseRatio (t) < DeRatio L condition is used to detect the situation that the added new member node has greatly reduced RACCF score, conDecreaseNum (t) > ConDeNumL condition is used to detect the situation that none of the RACCF scores of the added new member node has been raised. Both cases do not favor the boost of RACSF score and are therefore considered as termination conditions for NSS node selection policy.
Finally, the above design results are synthesized to give the pseudo code (figure 4) of the RACS-ACS RACS (an attributed community search algorithm based on a reasonable attributed community scoring function. Feature community finding algorithm based on reasonable feature community scoring equation) algorithm RACS-ACS (an attributed community search algorithm based on a reasonable attributed community scorin g function. Feature community finding algorithm based on reasonable feature community scoring equation), and ACS algorithm verification is carried out in 16 real world networks with actual communities, wherein 6 data sets Facebook, cora, cornell, texas, washington and Winsconsin have actual attribute characteristics, and for the rest 10 graph networks without attributes, the invention determines the attribute set required by the artificial synthesis. The subsequent experimental evaluation is mainly performed on the two different data sets, namely performance evaluation and efficiency evaluation are respectively performed, and the performance quality evaluation function is used for better reflecting the comparison situation between an actual community and a searched community by adopting F1 score, as shown in fig. 2 and 3, compared and analyzed with the existing three ACS algorithms ACC (Attributed Community CL-tree index method. Characteristic community CL tree index method), locATC (Attributed truss index-based query processing algorithm by means of local extraction. Query processing algorithm based on local exploration and based on attribute truss index) and LCTC (Closest truss Community in the local neighborhood of query nodes. Nearest truss communities near the local area of the query nodes);
in fig. 2, the abscissas fe0, fe107, fe348, fe414, fe686, fe698, fe1684, fe1912, fe3437, fe3980 respectively represent data sets collected on a network social platform (facebook-ego-network-X facebook from network-X, X represents a number), a data set of facebook Cora (kora), a data set of facebook Cornell (Cornell), a data set of facebook Texas (Texas), a data set of Washington (Washington), a data set of facebook Wisconsin (Wisconsin); the ordinate is the effectiveness evaluation score of the processing feature search problem; in fig. 3, the abscissa represents the data set used, which is the real graph data set of the actual attribute possessed by the node, as follows: dolphins, football, karte, polboost (Miscellaneous Networks sundry network set), email (mail), amazon (Amazon), DBLP (bibliographic database for computer sciences computer science bibliographic database), youtube, liveJournal (Directed LiveJournal friendship social network directed LiveJournal friendship social network), orkut (Orkut friendship social network and ground-truth communities —orkut friendship social network and real community).
Whereas the efficiency assessment is a record of the time of the query in the community search (figures 4 and 5). In addition, the invention also performs parameter sensitivity test on the algorithm, and sequentially changes the number of query nodes |vq| (fig. 6 to 9), the number of query features |wq|, the parameter values DeRatioL on data sets Texas, dolphins and Amazon (fig. 10 to 11), and the parameter values ConDeNumL (fig. 12 and 13).
The key point of the invention is that the characteristic communities related to the node attributes are mainly considered in the existing community search, namely, the communities obtained by the search not only have specific topological structures such as k-clique, k-core or k-tress, but also have extremely high correlation of the internal node attributes. In addition, the key point of the RACS-ACS algorithm design is to design the RACS function, and comprehensively consider important characteristics of a high-quality community: the high internal node connection density and low external connection density, and the influence of RAS (correlation between given features and searched community structure) on community quality, propose an RACSF function that integrates functions related to three elements.
Based on the idea of NSMF (neighbor set merge framework), the invention designs more efficient NSS (node selection strategy) and ISC (cycle termination condition) for the RACS function, and the improved OCQPR condition in the ISC overcomes the defects of CQPR in the prior study: the loop may stop immediately once the condition is not met and the search cannot continue for a set of nodes that yields a higher quality function value. Secondly, the OCQFR method is also more flexible, and the number of experimental cycles can be regulated by setting different ConDeNumL and DeRatioL parameter values, so that the precision requirement and time performance in the actual community searching problem can be met. Furthermore, score (t) in the definition of DecreaseRatio (t) and ConDecreaseNum (t) in the OCQFR method can be replaced with a different quality function and thus construct a matching cycle termination condition for different problems.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that the different dependent claims and the features described herein may be combined in ways other than as described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims (4)

1. A network characteristic community searching method is characterized by comprising the following specific steps:
step one, establishing an internal connection density evaluation function of community nodes according to the internal structural characteristics of a real community;
step two, establishing an external connection density evaluation function of community nodes according to the external structural characteristics of the real communities;
thirdly, quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal connection density evaluation function of the community node, an external connection density evaluation function of the community node, quantized attributes of the network community to be searched and a function with given attribute relativity to obtain a function RACCF (S, wq); wherein S is a group of nodes in G (V, E), G (V, E) is an undirected graph, and Wq is a characteristic attribute set;
the acquisition of the RACSF (S, wq) function is:
Figure QLYQS_1
wherein RACCF (S, wq) indicates when the target attribute set is W q The RACCF value of the time node set S; the value of the RACCF is used for evaluating the quality of the searched node set S; attriccore (S, W) q ) For the correlation function between the feature attribute set Wq and the node set of the community to be searched, the Nodes (S, w) = { v: v.epsilon.S, wE, attr (v) }, nodes (S, w) is the set of all Nodes in S covering the attribute w, the attribute w belongs to the characteristic attribute set Wq of the search community, ns is the number of Nodes in the node set S, n S = |s|; d (u) is the degree of node u, D (u) = | (u, v) ∈e|, average-ODF (S) is the external link density evaluation function, and Inner (S) is the built internal link density evaluation function; v is a node in the graph G (V, E), m S For the number of edges in the node set S, m S = | (u, v) ∈e: u E S, v E S i, d (v, S) is the number of internal connection edges of node v in node set S, d (v, S) = | (u, v) ∈e: u epsilon S|; d, d 2 (v, S) is d (v, S) to the power of 2;
optimizing the RACCF (S, wq) function by using an NSS node selection strategy and an elastic ISC (integrated circuit chip computer) cycle termination condition, and obtaining an optimal solution of a target community structure as a final searched characteristic community;
the specific method for optimizing the RACS (S, wq) function by using the NSS node selection strategy and the elastic ISC cycle termination condition to obtain the optimal solution of the target community structure as the finally searched characteristic community comprises the following steps:
when the cycle satisfies the condition:
Figure QLYQS_2
Figure QLYQS_3
stopping subsequent circulation, and taking OptimalGroup (t) after circulation stopping as an optimal solution of the target community structure; wherein DeRatioL and ConDeNumL are constants, decreaseRatio (t) is the ratio between Score (t) and OptimalScore (t), optimalScore (t) is the maximum RACCF Score obtained by the node set before joining the t-th node recommended by NSS node selection policy, score (t) i ) Group (t) is set for nodes i ) Score (t) is the RACCF Score of the node set Group (t), group (t) i ) Joining t for using NSS node selection policy i The community structure obtained after the nodes are arranged; group (t) is a community structure obtained after adding t nodes by using NSS node selection strategy; min { argmax ti<t Score(t i ) And the node set obtains the current maximum RACCF score, and uses NSS node selection strategy to add the least nodes.
2. The network feature community searching method according to claim 1, wherein the specific method for establishing the inter-link density evaluation function of the community node in the step one is as follows:
using the DensityB (S) function:
Figure QLYQS_4
constructing an inter-link density estimation function Inner (S):
Figure QLYQS_5
wherein DensityB (S) is in the range of [0, n s -1]AvgDeg (S) is the average degree of the node set S.
3. The network feature community searching method according to claim 2, wherein the establishing the external connection density evaluation function of the community node in the second step is specifically:
external link density assessment function Average-ODF (S):
Figure QLYQS_6
4. the network feature community searching method according to claim 3, wherein the specific method for quantifying the correlation between the attribute of the network community to be searched and the given attribute in the step three is as follows:
using the formula:
Figure QLYQS_7
the degree of correlation (S, wq) between a given set of feature attributes Wq and a node group S is quantified.
CN202010075210.2A 2020-01-22 2020-01-22 Network characteristic community searching method Active CN111274498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010075210.2A CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010075210.2A CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Publications (2)

Publication Number Publication Date
CN111274498A CN111274498A (en) 2020-06-12
CN111274498B true CN111274498B (en) 2023-06-23

Family

ID=71003324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010075210.2A Active CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Country Status (1)

Country Link
CN (1) CN111274498B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898039B (en) * 2020-07-03 2023-12-19 哈尔滨工程大学 Attribute community searching method integrating hidden relations

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7761447B2 (en) * 2004-04-08 2010-07-20 Microsoft Corporation Systems and methods that rank search results
US20130268595A1 (en) * 2012-04-06 2013-10-10 Telefonaktiebolaget L M Ericsson (Publ) Detecting communities in telecommunication networks
US9749406B1 (en) * 2013-03-13 2017-08-29 Hrl Laboratories, Llc System and methods for automated community discovery in networks with multiple relational types
CN103425868B (en) * 2013-07-04 2016-12-28 西安理工大学 Complex network community discovery method based on fractal characteristic
US20150134402A1 (en) * 2013-11-11 2015-05-14 Yahoo! Inc. System and method for network-oblivious community detection
US9455874B2 (en) * 2013-12-30 2016-09-27 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for detecting communities in a network
US10210280B2 (en) * 2014-10-23 2019-02-19 Sap Se In-memory database search optimization using graph community structure
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
CN110119462B (en) * 2019-04-03 2021-07-23 杭州中科先进技术研究院有限公司 Community search method of attribute network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向属性网络的重叠社区发现算法;杜航原;《计算机应用》;第3151-3157页 *

Also Published As

Publication number Publication date
CN111274498A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
Li et al. An improved collaborative filtering recommendation algorithm and recommendation strategy
CN110851566B (en) Differentiable network structure searching method applied to named entity recognition
CN113378913B (en) Semi-supervised node classification method based on self-supervised learning
CN109818786A (en) A kind of cloud data center applies the more optimal choosing methods in combination of resources path of appreciable distribution
CN102571431B (en) Group concept-based improved Fast-Newman clustering method applied to complex network
Chen et al. Efficient and incremental clustering algorithms on star-schema heterogeneous graphs
CN111274498B (en) Network characteristic community searching method
CN116383519A (en) Group recommendation method based on double weighted self-attention
Guo et al. Active semi-supervised K-means clustering based on silhouette coefficient
CN109992593A (en) A kind of large-scale data parallel query method based on subgraph match
Wang et al. Two-level-oriented selective clustering ensemble based on hybrid multi-modal metrics
Pan et al. Overlapping community detection via leader-based local expansion in social networks
Chen et al. Efficient distributed clustering algorithms on star-schema heterogeneous graphs
CN112966165A (en) Interactive community searching method and device based on graph neural network
CN108198084A (en) A kind of complex network is overlapped community discovery method
Anoop et al. Exploitation whale optimization based optimal offloading approach and topology optimization in a mobile ad hoc cloud environment
Zhou et al. A graph clustering algorithm using attraction-force similarity for community detection
Pedrycz et al. Hierarchical FCM in a stepwise discovery of structure in data
Cai et al. Multi-granularity weighted federated learning in heterogeneous mobile edge computing systems
CN112035545B (en) Competition influence maximization method considering non-active node and community boundary
Liu et al. Hierarchical community discovery for multi-stage IP bearer network upgradation
Nawaz et al. Collaborative similarity measure for intra graph clustering
CN114596473A (en) Network embedding pre-training method based on graph neural network hierarchical loss function
Wang et al. Functional and structural fusion based web api recommendations in heterogeneous networks
CN106599187B (en) Edge instability based community discovery system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant