CN111274498A - Network characteristic community searching method - Google Patents

Network characteristic community searching method Download PDF

Info

Publication number
CN111274498A
CN111274498A CN202010075210.2A CN202010075210A CN111274498A CN 111274498 A CN111274498 A CN 111274498A CN 202010075210 A CN202010075210 A CN 202010075210A CN 111274498 A CN111274498 A CN 111274498A
Authority
CN
China
Prior art keywords
community
node
function
attribute
racsf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010075210.2A
Other languages
Chinese (zh)
Other versions
CN111274498B (en
Inventor
王宏志
王春楠
陈含笑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010075210.2A priority Critical patent/CN111274498B/en
Publication of CN111274498A publication Critical patent/CN111274498A/en
Application granted granted Critical
Publication of CN111274498B publication Critical patent/CN111274498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A network characteristic community searching method belongs to the technical field of network community construction. The problem of the current community search of upgrading version and the problem that the efficiency is low and adaptability is poor that characteristic community seeks to exist is solved. According to the internal structure characteristics and the external structure characteristics of a real community, an internal link density evaluation function of a community node is established; establishing an external link density evaluation function of the community nodes; quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community; fusing an internal link density evaluation function of the community node, an external link density evaluation function of the community node and a quantized function of the attribute of the network community to be searched and the given attribute correlation degree to obtain a function RACSF; and optimizing the RACSF function by utilizing the NSS node selection strategy and the elastic ISC cycle termination condition to obtain the optimal solution of the target community structure as the finally searched characteristic community. The invention is suitable for searching and using the network characteristics.

Description

Network characteristic community searching method
Technical Field
The invention belongs to the technical field of network community construction.
Background
In the network-related research work, the concept of community (community) continues to be of interest to people. Network structures composed of a large number of nodes and connection relationships between the nodes are widely used in the fields of computer science, biology, sociology, and the like, such as the world wide web in which web pages are used as nodes and links between web pages are used as edges, and social networks in which human nodes and interpersonal relationships are used as edges. Generally, a community refers to a sub-network with closer connection between internal nodes than between internal and external nodes, and various community structures (such as community discovery and community search) in a discovery network are helpful for friend recommendation, criminal group identification and protein function prediction, and can effectively support hotspot selection and betweenness centrality update in the network.
Unlike community discovery (community detection), community search (community search) aims at finding communities that contain a given set of nodes, enabling to quickly obtain personalized community information, i.e. to find communities that contain one or more nodes, given them. However, each node in various network structures of the real world is not meaningless, and they also have related characteristic attributes, for example, in a Facebook social network, in a social network constructed by taking each user as a node and taking a personal relationship as an edge, each user individual has different interests, which means that each node in a network graph has characteristics, and how to find a community which contains a given node set and has high characteristic correlation leads to a new research direction of community search, namely characteristic community search. The related research has great social application value, such as social activities such as an organization science workshop and the like, market decisions such as adjustment of product advertisement delivery and the like, friend recommendation of a social platform and the like.
Existing community search methods include community searches related only to network topology and community searches related to node attributes: the former aims to find communities which contain a given node set and meet specific topological structures such as k-clique, k-core or k-tress; the latter comprehensively considers the topology structure and the node attributes when searching the community containing a given node set, and the returned result community not only needs to satisfy a specific topology structure, but also needs to make the attributes of the internal nodes as close as possible. For the Community search related to the node attribute, that is, the feature Community search problem, currently proposed methods include ACC (associated Community CL-tree index method) and LocATC ((associated tree index-based query processing algorithm based on local search by means of attribute truss index), which also obtain good search results in the actual graph network.
As known from previous researches, community search is the expansion of community discovery problems in different angles; in general, community discovery aims at discovering all community structures in a graph, without other limiting requirements. The currently proposed community discovery models include a spectral clustering model, a label propagation model, a local expansion model and the like, but they do not search communities based on queries, so the most significant defects are that they are not suitable for quick online community search, and it takes a lot of time to discover communities in a large graph network. In addition, the existing community search algorithms can only find communities which contain a given node set and meet specific topological structures such as k-clique, k-core or k-tress, and the algorithms pay little attention to feature maps related to node attributes, so that a great amount of node attribute information which is helpful for community search is ignored in the search.
However, the existing two better algorithms (ACC and LocATC) for solving the feature community search problem have certain disadvantages. (1) Lacking in generality, it is not possible to process all query nodes provided by a user. For example, when the ACC searches for a target community, all query attributes and a single query node are considered, but other query nodes are ignored, which results in that the algorithm cannot sufficiently consider user requirements, and a huge deviation exists between the searched community and the target community. (2) The algorithm theory is complex and consumes a lot of time when dealing with large graph networks. For example, prior to searching for a community: the method includes the steps that an AT Index (Attributed Truss Index) needs to be constructed for the whole graph, so that a large amount of time is consumed in large-scale graph construction, the efficiency is low, and the LocATC is difficult to migrate and apply to various dynamically changed graph structures in real life for searching.
Disclosure of Invention
The invention provides a network characteristic community searching method, aiming at solving the problems of low efficiency and poor adaptability existing in the existing upgraded community searching and characteristic community searching.
The invention relates to an efficient network characteristic community searching method, which is based on RACSF-ACS algorithm to realize rapid community searching and characteristic community searching, and comprises the following specific steps:
step one, establishing an internal link density evaluation function of a community node according to the internal structure characteristics of a real community;
step two, establishing an external link density evaluation function of the community nodes according to the external structure characteristics of the real community;
quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal link density evaluation function of the community node, an external link density evaluation function of the community node and a quantized function of the attribute of the network community to be searched and the given attribute correlation degree to obtain a function RACSF (S, Wq);
and step five, optimizing the RACSF (S, Wq) function by utilizing the NSS node selection strategy and the elastic ISC circulation termination condition, and acquiring the optimal solution of the target community structure as the finally searched characteristic community.
The RACSF (S, Wq) function described in this embodiment is used to comprehensively evaluate the community quality of the node set S and the matching degree between the node set S and the target attribute set Wq.
Furthermore, the NSS node selection strategy adopts a greedy idea, and a node which can improve the RACSF score of the current node set is selected from adjacent nodes every time to serve as a member node to be added;
and the elastic ISC cycle termination condition is that when a new member node is added to reduce the quality score of the node set obtained by the function RACSF (S, Wq), or the quality score of the node set obtained by adding the new member node function RACSF (S, Wq) for multiple times is not improved, the node selection step is stopped to output the node set with the highest RACSF score.
Further, the specific method for establishing the internal link density evaluation function of the community node in the step one is as follows:
using the DensityB (S) function:
Figure BDA0002378320250000031
constructing an internal link density evaluation function Inner (S):
Figure BDA0002378320250000032
where S is a set of nodes in G, G (V, E) is an undirected graph, DensityB (S) ranges from [0, ns-1]V is a node in the graph G (V, E), AvgDeg (S) is the average degree of the vertex set S, mS is the number of edges in the node set S, mS | (u, V) ∈ E, u ∈ S, V ∈ S |, nS is the number of nodes in the node set S, and nS ═ S |; d (v, S) is the number of internal connecting edges of the node v in the node set S, and d (v, S) belongs to E and u belongs to S; d2(v, S) is the power of 2 of d (v, S).
Further, the step two of establishing the external link density evaluation function of the community node specifically includes:
Figure BDA0002378320250000033
external link density evaluation function Average-ODF (S): where d (u) is the degree of node u, and d (u) ∈ E | (u, v).
Further, the specific method for quantifying the function of the correlation between the attribute of the community to be searched and the node set in the third step is as follows:
using the formula:
Figure BDA0002378320250000034
quantifying a degree of correlation (S, Wq) between a given set of feature attributes Wq and a set of nodes S, wherein Attricore (S, W)q) For the correlation function between the attribute set Wq and the node set of the community to be searched, node S (S, w) { v ∈ S, w ∈ attr (v) }, node S (S, w)And (4) the attribute w belongs to the attribute set Wq of the search community for the set of all nodes covering the attribute w in the S.
Further, the obtaining RACSF (S, Wq) function in step four is:
Figure BDA0002378320250000041
wherein, RACSF (S, Wq) represents the RACSF value of the node set S when the target attribute set is Wq; the value of the RACSF is used for evaluating the quality of the searched node set S, and the higher the score is, the higher the searched structure quality is, the more the structure is consistent with the search condition, the more the structure is likely to be the target community structure.
Further, the specific method for obtaining the optimal solution of the target community structure in the fifth step is as follows: continuously selecting member nodes of the target community structure by using an NSS node selection strategy until ISC (Internet Small computer System) cycle termination conditions are met;
Figure BDA0002378320250000042
Figure BDA0002378320250000043
any one of the two groups stops subsequent circulation, and the OptimalGroup (t) after the circulation is stopped is used as the optimal solution of the target community structure; wherein DeRatioL and ConDeNumL are constants, DecreateRatio (t) is the ratio between Score (t) and OptimalScore (t), OptimalScore (t) is the maximum RACSF Score obtained by the node set before adding the tth node recommended by the NSS node selection strategy, and Score (t)i) Group (t) is a set of nodesi) Score (t) of node set Group (t)i) Adding t to select policy for using NSS nodesiCommunity structure obtained after each node; group (t) is a community structure obtained after an NSS node selection strategy is added into t nodes; min { argmaxti<tScore(ti) Indicates that the node set is added by using NSS node selection strategy when obtaining the current maximum RACSF scoreThe minimum number of nodes.
Compared with the prior art, the method overcomes the defects that the conventional characteristic community searching algorithm is time-consuming and labor-consuming and cannot perform online quick searching, and achieves a good effect in actual large-scale graph structure searching. The RACSF-ACS algorithm is flexible and efficient, and has strong applicability to dynamic changes in large-scale graph networks. And the time performance of the algorithm only depends on the size of the searched community and cannot be influenced by the whole graph or the change of a large node, so that the online quick community search of a large graph network can be realized more efficiently.
The RACSF-ACS has good adjustability, and researchers can manually control the experiment cycle number according to the time performance and the precision requirement in the actual community searching problem, namely, the parameter values of the DeRatioL and the ConDeNumL are adjusted. The method has obvious effect, is beneficial to the online large-scale realization of ACS characteristic community search problem, has high search result precision and less query time, and fully embodies high efficiency and effectiveness.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIGS. 2 and 3 are comparison diagrams of validity evaluation of feature search in different data sets by using the existing algorithms LCTC, ACC and LocATC and by using the RACSF-ACS algorithm;
FIGS. 4 and 5 are comparison graphs of efficiency evaluation of feature search problems using algorithms LCTC, ACC and LocATC and using RACSF-ACS;
FIG. 6 is a comparison graph of effectiveness of different query nodes for performing a miscellaneous network dataset feature lookup using LCTC, ACC, and LocATC algorithms and using RACSF-ACS algorithms;
FIG. 7 is a graph comparing efficiency of different query nodes for searching for the characteristics of heterogeneous network data sets by using LCTC, ACC and LocATC algorithms and by using RACSF-ACS algorithms;
FIG. 8 is a graph comparing the efficiency of the Texas data set feature lookup problem using the LCTC, ACC, and LocATC algorithms and the RACSF-ACS algorithm for different query nodes;
FIG. 9 is a graph comparing the efficiency of the Texas data set feature lookup problem using the LCTC, ACC, and LocATC algorithms and the RACSF-ACS algorithm for different query nodes;
FIG. 10 is a graph of the variation of the effectiveness of the RACSF-ACS algorithm in varying the value of DeRatioL in the end-of-cycle condition;
FIG. 11 is a graph of the variation of efficiency of the RACSF-ACS algorithm for varying the value of DeRatioL in the end-of-cycle condition;
FIG. 12 is a graph of the variation of the effectiveness of the RACSF-ACS algorithm for varying the value of ConDeNumL in the end-of-cycle condition;
FIG. 13 is a graph showing the variation of the efficiency of the RACSF-ACS algorithm by changing the value of ConDeNumL in the end-of-cycle condition.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
The first embodiment is as follows: the following describes the present embodiment with reference to fig. 1, where the method for efficient network characteristic community search according to the present embodiment is implemented based on an RACSF-ACS (a characteristic community search algorithm in which an attached community search algorithm on a reusable attached community searching function is based on a reasonable characteristic community scoring equation), and the specific steps are as follows:
step one, establishing an internal link density evaluation function of a community node according to the internal structure characteristics of a real community;
step two, establishing an external link density evaluation function of the community nodes according to the external structure characteristics of the real community;
quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal link density evaluation function of the community node, an external link density evaluation function of the community node and a quantized function of the attribute of the network community to be searched and the given attribute correlation degree to obtain a function RACSF (S, Wq);
and step five, optimizing the RACSF (S, Wq) function by utilizing the NSS node selection strategy and the elastic ISC circulation termination condition, and acquiring the optimal solution of the target community structure as the finally searched characteristic community.
The NSS node selection strategy of the invention comprises the following steps: and adopting a greedy idea to select a node which can improve the RACSF score of the current node set from the adjacent nodes every time as a member node to be added.
Elastic ISC cycle end conditions: when the RACSF score is greatly reduced by adding a new member node or the RACSF score is not improved by adding the new member node for multiple times, stopping the node selection step to output a node set with the highest RACSF score; and optimizing the RACSF (S, Wq) function to obtain the optimal solution of the target community structure as the finally searched characteristic community.
Further, the specific method for establishing the internal link density evaluation function of the community node in the step one is as follows:
definition G (V, E) is an undirected graph, S is a group of nodes in G, and the range of the existing Density function is converted into [0, ns-1]The improved DensityB function is:
Figure BDA0002378320250000061
an internal link density evaluation function Inner (S) is established by using the improved DensityB function:
Figure BDA0002378320250000062
where V is a node in graph G (V, E), AvgDeg (S) is the average of the set of vertices S, mSIs the number of edges, m, in the node set SS=|(u,v)∈E:u∈S,v∈S|,nSIs the number of nodes in the node set S, nS| S |; d (v, S) is the number of internal connecting edges of the node v in the node set S, and d (v, S) belongs to E and u belongs to S; d2(v, S) is d (v, S) to the power of 2.
This embodiment is based on the definition of high quality communities: that is, the community node set has denser internal links and looser external links, and considering that the Relation (RAS) between the searched community structure and the given attribute is an important factor affecting the quality of the searched community, the RACSF function in the present invention should include 3 function blocks: (1) evaluating a function of internal link density for a set of nodes; (2) evaluating a function of external link density for a set of nodes; (3) a function quantifying the degree of correlation between a given attribute and a set of nodes. For the internal link Density evaluation function, the invention firstly modifies the sensitivity function in the previous study to have the same value range as that of AvgDeg, and then combines the two to obtain the final internal link Density evaluation function Inner (S).
Further, the step two of establishing the external link density evaluation function of the community node specifically includes:
external link density evaluation function Average-ODF (S):
Figure BDA0002378320250000071
where d (u) is the degree of node u, and d (u) ∈ E | (u, v).
Further, the specific method for quantifying the function of the correlation between the attribute of the community to be searched and the node set in the third step is as follows:
using the formula:
Figure BDA0002378320250000072
quantifying a degree of correlation (S, Wq) between a given set of feature attributes Wq and a set of nodes S, wherein Attricore (S, W)q) And as a correlation function between the attribute set Wq and the node set of the community to be searched, setting node S (S, w) { v: v ∈ S, w ∈ attr (v) }, wherein the node S (S, w) is a set of all Nodes covering the attribute w in S, and the attribute w belongs to the attribute set Wq of the search community.
The RAS quantization function that quantizes a function of correlation between an attribute and a node set is designed to satisfy the following three theorems: let f (S, Wq) denote the degree of correlation between a given feature Wq and the node group S. (1) If each feature in Wq can be covered by most nodes in S, the value of f (S, Wq) is very large; conversely, the less the node coverage characteristics, the smaller the value. (2) The value of f (S, Wq) is also larger if more diverse features in Wq are covered by nodes. (3) If more nodes in S are not related to the features in Wq, the value of f (S, Wq) is smaller.
Further, the obtaining RACSF (S, Wq) function in step four is:
Figure BDA0002378320250000073
Figure BDA0002378320250000081
wherein, RACSF (S, W)q) Showing when the target attribute set is WqRACSF value of the time node set S; the value of the RACSF is used for evaluating the quality of the searched node set S, and the higher the score is, the higher the searched structure quality is, the more the structure is consistent with the search condition, the more the structure is likely to be the target community structure.
The invention is inspired by a local expansion model in community discovery, adopts a most common seed node expansion method NSMF (neighbor set clustering framework), and has simple and feasible idea, namely continuously checking adjacent nodes from an input query node to continuously expand communities until a cycle stop condition appears. The NSMF (contiguous set merge framework) mainly comprises two core parts: NSS node selection policy and ISC loop termination condition. In order to improve the accuracy and the high efficiency of NSMF, the invention mainly improves a community quality function value reduction strategy (CQFR), and two parameters are introduced for the CQFR: ConDeNumL and DeRatioL facilitate manual control of the number of executed cycles since a quality function value (RACSF) is reduced from the first time, then the RACSF function is combined with the improved OCQFR to obtain a cycle termination condition RISC in NSMF, and the NSS node selection strategy is to combine the RACSF function with CQFO (optimization strategy). The symbol associated with the NSMF policy is defined as shown in FIG. 2, and the specific cycle stop condition RISC is: the subsequent cycle is stopped if the current cycle satisfies either of the following two conditions, and optimalgroup (t) is considered as the final result of the NSMF.
Further, the concrete method for optimizing the RACSF (S, Wq) function by using the NSS node selection strategy and the elastic ISC cycle termination condition to obtain the optimal solution of the target community structure as the finally searched feature community, in the step five, is as follows:
when the cycle satisfies the condition:
Figure BDA0002378320250000082
Figure BDA0002378320250000083
when any one of the two is found, the subsequent circulation is stopped, the set of the searched community member nodes is the characteristic community finally searched in the ACS problem (and the OptimalGroup (t) after the circulation is stopped is used as the optimal solution of the target community structure), wherein the algorithm effect is optimal when the DeRatioL and the ConDeNumL are constants and are respectively set to be 0.7 and 50; DescriptearRatio (t) is the ratio between Score (t) and OptimalScore (t), OptimalScore (t) is the maximum RACSF Score obtained for a set of nodes before adding the tth node recommended by the NSS node selection strategy, Score (t)i) Group (t) is a set of nodesi) Score (t) of node set Group (t)i) Adding t to select policy for using NSS nodesiCommunity structure obtained after each node; group (t) isA community structure obtained after adding t nodes by using an NSS node selection strategy; min { argmaxti<tScore(ti) And represents the minimum number of nodes added by using the NSS node selection strategy when the node set obtains the current maximum RACSF score.
The condition of decreaseratio (t) < DeRatioL is used for detecting the condition that the added new member node greatly reduces the RACSF score, and the condition of CondecreaseNum (t) > ConDeNumL is used for detecting the condition that the RACSF score is not improved when the new member node is added for multiple times. Both cases are not favorable for increasing the RACSF score, and therefore are considered as termination conditions for the NSS node selection strategy.
Finally, combining the above design results to give a pseudo code of RACSF-ACS (an attemperated communique area based on a reasonable eigen community scoring algorithm) algorithm RACSF-ACS (an attemperated communique area based on a reasonable eigen community scoring algorithm) (FIG. 4), and performing ACS algorithm verification in 16 real world networks with actual community scoring functions, wherein 6 data sets Facebook, Cora, Cornell, Texas, Washington and Winnen have actual characteristics, and for the rest 10 graph networks without attributes, the invention determines the set of attributes needed for its synthesis artificially. Subsequent experimental evaluation mainly performs performance evaluation and efficiency evaluation on the two different data sets respectively, and compares and analyzes the two different data sets with the existing three ACS algorithms, namely, ACC (Attribute Community CL-tree index method), LocATC (query processing algorithm based on attribute truss index based on local exploration) and LCTC (truss in the local vicinity of query nodes), wherein the performance quality evaluation function adopts F1-score to better reflect the comparison condition between the actual Community and the Community obtained by searching, as shown in FIG. 2 and FIG. 3;
in fig. 2, the abscissa fe0, fe107, fe348, fe414, fe686, fe698, fe1684, fe1912, fe3437, fe3980 respectively represents a data set collected on the social networking platform (facebook-ego-network-X face book from network-X, X represents a number), a data set of face book Cora (carat), a data set of face book Cornell (Cornell), a data set of face book Texas (Texas), a data set of Washington (Washington), and a data set of face book Wisconsin (Wisconsin); the ordinate is the effectiveness evaluation score of the processing feature searching problem; in fig. 3, the abscissa represents the data set used, and the data set used is the data set of the real graph in which the nodes possess actual attributes: dolphins (Dolphins), Football (soccer), Karate (Karate), Polbooks (misscelaneous Networks, Email (mail), Amazon (Amazon), DBLP (bibliographic database for computer science programs), Youtube (oil pipe), live journal (direct live journal family social network Directed to live journal social network), Orkut (Orkut family social network and real community).
The efficiency assessment is a record of the time of the query in the community search (fig. 4 and 5). In addition, the invention also carries out parameter sensitivity test on the algorithm, and sequentially changes the number | Vq | (accompanying fig. 6 to fig. 9) of the query nodes, the number | Wq |, the parameter values DeRatioL (accompanying fig. 10 to fig. 11) on the data sets Texas, Dolphins and Amazon, and the parameter value ConDeNumL (accompanying fig. 12 and fig. 13).
The key point of the invention is that the characteristic communities related to the node attributes are mainly considered in the existing community search, namely, the searched communities have specific topological structures such as k-clique, k-core or k-tress and the like, and the relevance of the node attributes in the communities is extremely high. In addition, the key point of RACSF-ACS algorithm design lies in the design of RACSF functions, and the important characteristics of high-quality communities are comprehensively considered: the intra-node connection density is high and the external connection density is low, and the RAS (correlation between a given feature and a searched community structure) has an influence on the community quality, a RACSF function is proposed that integrates functions related to three elements.
Based on the concept of NSMF (proximity set merging framework), the invention designs more efficient NSS (node selection strategy) and ISC (loop termination condition) for RACSF function, and the improved OCQPR condition in ISC makes up the defects of CQPR in the previous research: the loop may be stopped immediately once the condition is not met and the search for a set of nodes that result in higher quality function values may not continue. And secondly, the OCQFR method is more flexible, the number of times of experiment circulation can be adjusted by setting different ConDeNumL and DeRatioL parameter values, and the accuracy requirement and time performance in the actual community search problem can be better met. In addition, score (t) in the definitions of the DeschasRatio (t) and ConDeschasNum (t) in the OCQFR method can be replaced by different quality functions, and matched cycle termination conditions are constructed according to different problems.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims (6)

1. A network characteristic community searching method is characterized by comprising the following specific steps:
step one, establishing an internal link density evaluation function of a community node according to the internal structure characteristics of a real community;
step two, establishing an external link density evaluation function of the community nodes according to the external structure characteristics of the real community;
quantifying the correlation between the attribute of the network community to be searched and the given attribute, and establishing an attribute correlation evaluation function of the network community;
fusing an internal link density evaluation function of the community node, an external link density evaluation function of the community node and a quantized function of the attribute of the network community to be searched and the given attribute correlation degree to obtain a function RACSF (S, Wq);
and step five, optimizing the RACSF (S, Wq) function by utilizing the NSS node selection strategy and the elastic ISC circulation termination condition, and acquiring the optimal solution of the target community structure as the finally searched characteristic community.
2. The method according to claim 1, wherein the specific method for establishing the internal link density evaluation function of the community node in the step one is as follows:
using the DensityB (S) function:
Figure FDA0002378320240000011
constructing an internal link density evaluation function Inner (S):
Figure FDA0002378320240000012
where S is a set of nodes in G, G (V, E) is an undirected graph, DensityB (S) ranges from [0, nS-1]V is a node in graph G (V, E), AvgDeg (S) is the average degree of node set S, mSIs the number of edges, m, in the node set SS=|(u,v)∈E:u∈S,v∈S|,nSIs the number of nodes in the node set S, nS| S |; d (v, S) is the number of internal connecting edges of the node v in the node set S, and d (v, S) belongs to | (u, v) belongs to E, and u belongs to S |; d2(v, S) is d (v, S) to the power of 2.
3. The method according to claim 1, wherein the step two of establishing the external link density evaluation function of the community node specifically comprises:
external link density evaluation function Average-ODF (S):
Figure FDA0002378320240000013
where d (u) is the degree of node u, and d (u) ∈ E | (u, v).
4. The method for searching for the network characteristic community according to claim 1, wherein the specific method for quantizing the function of the correlation between the attribute of the community to be searched and the node set in the third step is as follows:
using the formula:
Figure FDA0002378320240000021
quantifying a degree of correlation (S, Wq) between a given feature Wq and a set of nodes S, wherein Attricore (S, W)q) And as a correlation function between the attribute set Wq and the node set of the community to be searched, setting node S (S, w) { v: v ∈ S, w ∈ attr (v) }, wherein the node S (S, w) is a set of all Nodes covering the attribute w in S, and the attribute w belongs to the attribute set Wq of the search community.
5. The method of claim 1, wherein said obtaining RACSF (S, Wq) function in step four is:
Figure FDA0002378320240000022
wherein RACSF (S, Wq) indicates that when the target attribute set is WqRACSF value of the time node set S; the value of RACSF is used to evaluate the quality of the searched node set S.
6. The method according to claim 1, wherein the step five of optimizing the RACSF (S, Wq) function by using the NSS node selection strategy and the elastic ISC cycle termination condition to obtain the optimal solution of the target community structure as the finally searched feature community specifically comprises:
when the cycle satisfies the condition:
Figure FDA0002378320240000023
Figure FDA0002378320240000024
any one of the two groups stops subsequent circulation, and the OptimalGroup (t) after the circulation is stopped is used as the optimal solution of the target community structure; wherein DeRatioL and ConDeNumL are constants, DecreateRatio (t) is the ratio between Score (t) and OptimalScore (t), OptimalScore (t) is the maximum RACSF Score obtained by the node set before adding the tth node recommended by the NSS node selection strategy, and Score (t)i) Group (t) is a set of nodesi) Score (t) of node set Group (t)i) Adding t to select policy for using NSS nodesiCommunity structure obtained after each node; group (t) is a community structure obtained after an NSS node selection strategy is added into t nodes; min { argmaxti<tScore(ti) And represents the minimum number of nodes added by using the NSS node selection strategy when the node set obtains the current maximum RACSF score.
CN202010075210.2A 2020-01-22 2020-01-22 Network characteristic community searching method Active CN111274498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010075210.2A CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010075210.2A CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Publications (2)

Publication Number Publication Date
CN111274498A true CN111274498A (en) 2020-06-12
CN111274498B CN111274498B (en) 2023-06-23

Family

ID=71003324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010075210.2A Active CN111274498B (en) 2020-01-22 2020-01-22 Network characteristic community searching method

Country Status (1)

Country Link
CN (1) CN111274498B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898039A (en) * 2020-07-03 2020-11-06 哈尔滨工程大学 Attribute community searching method fusing hidden relations

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234904A1 (en) * 2004-04-08 2005-10-20 Microsoft Corporation Systems and methods that rank search results
US20130268595A1 (en) * 2012-04-06 2013-10-10 Telefonaktiebolaget L M Ericsson (Publ) Detecting communities in telecommunication networks
CN103425868A (en) * 2013-07-04 2013-12-04 西安理工大学 Complex network community detection method based on fractal feature
US20150134402A1 (en) * 2013-11-11 2015-05-14 Yahoo! Inc. System and method for network-oblivious community detection
US20150188783A1 (en) * 2013-12-30 2015-07-02 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for detecting communities in a network
US20160117414A1 (en) * 2014-10-23 2016-04-28 Sudhir Verma In-Memory Database Search Optimization Using Graph Community Structure
US9749406B1 (en) * 2013-03-13 2017-08-29 Hrl Laboratories, Llc System and methods for automated community discovery in networks with multiple relational types
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
CN110119462A (en) * 2019-04-03 2019-08-13 杭州中科先进技术研究院有限公司 A kind of community search method of net with attributes

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234904A1 (en) * 2004-04-08 2005-10-20 Microsoft Corporation Systems and methods that rank search results
US20130268595A1 (en) * 2012-04-06 2013-10-10 Telefonaktiebolaget L M Ericsson (Publ) Detecting communities in telecommunication networks
US9749406B1 (en) * 2013-03-13 2017-08-29 Hrl Laboratories, Llc System and methods for automated community discovery in networks with multiple relational types
CN103425868A (en) * 2013-07-04 2013-12-04 西安理工大学 Complex network community detection method based on fractal feature
US20150134402A1 (en) * 2013-11-11 2015-05-14 Yahoo! Inc. System and method for network-oblivious community detection
US20150188783A1 (en) * 2013-12-30 2015-07-02 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for detecting communities in a network
US20160117414A1 (en) * 2014-10-23 2016-04-28 Sudhir Verma In-Memory Database Search Optimization Using Graph Community Structure
CN109859065A (en) * 2019-02-28 2019-06-07 桂林理工大学 Multiple target complex network community discovery method based on spectral clustering
CN110119462A (en) * 2019-04-03 2019-08-13 杭州中科先进技术研究院有限公司 A kind of community search method of net with attributes

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHUNNAN WANG: "ECOQUG An Effective Ensemble Community Scoring Function", pages 1702 - 1705 *
XIN HUANG: "Attribute-Driven Community Search", pages 949 - 960 *
杜航原: "基于网络节点中心性度量的重叠社区发现算法", pages 1619 - 1630 *
杜航原: "面向属性网络的重叠社区发现算法", 《计算机应用》, pages 3151 - 3157 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898039A (en) * 2020-07-03 2020-11-06 哈尔滨工程大学 Attribute community searching method fusing hidden relations
CN111898039B (en) * 2020-07-03 2023-12-19 哈尔滨工程大学 Attribute community searching method integrating hidden relations

Also Published As

Publication number Publication date
CN111274498B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US11423086B2 (en) Data processing system and method of associating internet devices based upon device usage
Harenberg et al. Community detection in large‐scale networks: a survey and empirical evaluation
CN104731962B (en) Friend recommendation method and system based on similar corporations in a kind of social networks
Li et al. An improved collaborative filtering recommendation algorithm and recommendation strategy
Dominguez-Sal et al. A discussion on the design of graph database benchmarks
Benouaret et al. Selecting skyline web services from uncertain qos
CN114218400A (en) Semantic-based data lake query system and method
CN107451210B (en) Graph matching query method based on query relaxation result enhancement
CN112131261B (en) Community query method and device based on community network and computer equipment
Jalali et al. A web usage mining approach based on lcs algorithm in online predicting recommendation systems
CN108520035A (en) SPARQL parent map pattern query processing methods based on star decomposition
He et al. A fuzzy clustering based method for attributed graph partitioning
Sun et al. How we collaborate: characterizing, modeling and predicting scientific collaborations
CN111274498A (en) Network characteristic community searching method
CN107133274B (en) Distributed information retrieval set selection method based on graph knowledge base
Xiao et al. Latent neighborhood-based heterogeneous graph representation
Chen et al. PBSM: an efficient top-K subgraph matching algorithm
CN114722304A (en) Community search method based on theme on heterogeneous information network
CN112015854B (en) Heterogeneous data attribute association method based on self-organizing mapping neural network
Feng et al. An interlayer feature fusion-based heterogeneous graph neural network
Ayoub et al. Link prediction using betweenness centrality and graph neural networks
An Data mining analysis method of consumer behaviour characteristics based on social media big data
Xie et al. Correlation-based top-k recommendation for web services
Cao et al. Heterogeneous information network embedding with meta-path based graph attention networks
Cui Research on the filtering recommendation technology of network information based on big data environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant