CN112084424A - Social network community discovery method and system based on attribute graph information - Google Patents

Social network community discovery method and system based on attribute graph information Download PDF

Info

Publication number
CN112084424A
CN112084424A CN202010947352.3A CN202010947352A CN112084424A CN 112084424 A CN112084424 A CN 112084424A CN 202010947352 A CN202010947352 A CN 202010947352A CN 112084424 A CN112084424 A CN 112084424A
Authority
CN
China
Prior art keywords
module
matrix
community discovery
model
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010947352.3A
Other languages
Chinese (zh)
Inventor
许明
潘翔宇
胡伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wanjia'an Artificial Intelligence Data Technology Co ltd
Original Assignee
Shenzhen Wanjia'an Artificial Intelligence Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wanjia'an Artificial Intelligence Data Technology Co ltd filed Critical Shenzhen Wanjia'an Artificial Intelligence Data Technology Co ltd
Priority to CN202010947352.3A priority Critical patent/CN112084424A/en
Publication of CN112084424A publication Critical patent/CN112084424A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Operations Research (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a social network community discovery method and system based on attribute graph information, which comprises a network construction module, a data preprocessing module, a model construction module, a model solving module, a community discovery module and a result display module, the network construction module constructs social network data into an attribute network, the data preprocessing module executes a server calculation instruction to obtain an adjacency matrix, an attribute association matrix and a membership matrix of the network, the model construction module constructs an optimization problem about the membership matrix in the server after obtaining input parameters, the model solving module starts iterative solution on the optimization problem after obtaining the instruction completed by the model construction module, the most membership matrix is obtained by calculation, the community discovery module executes a community discovery instruction after obtaining the optimal membership matrix, and finally the community discovery result is output and displayed through the display module. The method and the device directly act on the social network data set with the attributes, can perform a community discovery function aiming at the social network, have high effect accuracy, and can solve the problem of community discovery in the social network.

Description

Social network community discovery method and system based on attribute graph information
Technical Field
The invention relates to the technical field of computer data processing, in particular to a community discovery problem in a social network.
Background
Currently, the existing social network community discovery method only completes community discovery work based on the topology structure in the social network, that is, only the connection between users in the social network is considered. However, such community discovery methods based solely on the topology in the social network ignore attribute information associated with users in the social network. This may leave some communities of practical interest undetected. For example, a user on certain social software has joined a group by his own liking, but he may not be a friend with anyone in the group. Such communities are difficult to discover if based solely on the network topology.
Although some community discovery methods that additionally consider node attribute information have been proposed, the communities they identify perform poorly with respect to accuracy.
Disclosure of Invention
In view of the above drawbacks and deficiencies, an object of the present invention is to provide a social network community discovery method and system based on attribute graph information, which implement community discovery by comprehensively considering the adjacency degree and the attribute association degree between nodes in a social network.
The invention is realized by the following technical scheme:
the invention discloses a social network community discovery method based on attribute graph information, which comprises the following steps:
step one, preprocessing data in a social network, wherein the preprocessing comprises the following steps: abstracting a user in a social network into nodes in the network, and acquiring a set comprising all the nodes, a set comprising edges between the two nodes and a set comprising all attributes associated with the nodes;
step two, calculating the attribute association degree between each pair of nodes in the social network by applying statistical knowledge and an informatics theory according to the result preprocessed in the step one;
step three, forming an adjacent matrix of the nodes and a node attribute association degree matrix according to all the nodes and all the edges obtained in the step one and the attribute association degree between each pair of nodes obtained in the step two;
step four, defining the number of communities to be discovered, and defining a membership degree matrix of the nodes for the communities according to the number of the communities to be discovered and all the nodes obtained in the step one;
inputting model parameters, and constructing an optimization problem about the membership matrix based on the adjacency matrix, the attribute association matrix and the membership matrix of the nodes obtained in the third step;
solving the solution of the optimization problem obtained in the step five, namely the optimal membership matrix;
and step seven, acquiring the found community according to the optimal membership matrix obtained in the step six.
And sixthly, when the optimization problem is solved, adopting a Lagrange multiplier method, and constructing a Karush-Kuhn-Tucker (KKT) condition by introducing a Lagrange multiplier to obtain an updated formula. And under the condition of initializing the membership matrix, continuously updating iteration according to the updating formula until the objective function of the optimization problem obtained in the step five converges. The membership matrix at this time is taken as the optimal membership matrix.
The invention also discloses a social network community discovery system based on the attribute graph information, which comprises the following steps:
the network construction module constructs a social network containing a large amount of attribute information into an attribute network;
the data preprocessing module is used for preprocessing data according to the network obtained in the previous module to obtain an adjacency matrix and a node attribute association degree matrix of the nodes, defining the number of communities and initializing a membership degree matrix of the nodes to the communities;
the model building module is used for inputting model parameters and building an optimized model related to the membership matrix according to the adjacent matrix, the attribute association matrix and the membership matrix obtained by the previous module;
the model solving module is used for solving the optimized model obtained by the previous module to obtain an optimal membership matrix;
and the community discovery module is used for carrying out community discovery according to the optimal membership matrix obtained by the previous module.
And the result display module outputs the community discovered by the previous module.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention discloses a social network community discovery method, which solves the defects of the prior art in the social network community discovery from the following aspects: in the algorithm design process, a community discovery task is completed by utilizing a network topology structure and the attribute information of the nodes, and the accuracy of community discovery by society is improved by constructing an optimization problem.
The invention also discloses a system capable of realizing the social network community discovery method, and the system mainly comprises the following six parts: the system comprises a network construction module, a data preprocessing module, a model construction module, a model solving module, a community discovery module and a result display module. First, a network construction module constructs a social network as an attribute network. Secondly, the data preprocessing module preprocesses the obtained attribute network to obtain an adjacency matrix, an attribute association degree matrix, a community number and a membership degree matrix. The model construction module then constructs the resulting matrix into an optimized model for the membership matrix. And then, the model solving module solves the optimized model by utilizing the Lagrange multiplier to obtain the optimal membership matrix. And finally, the community discovery module divides the nodes into communities according to the optimal membership matrix to achieve the purpose of community discovery. And the result display module displays the result of the community discovery.
Drawings
FIG. 1 is a logical block diagram of a social networking community discovery system based on attribute map information according to the present invention;
FIG. 2 is a schematic diagram of an optimization problem.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and examples.
The invention discloses a social network community discovery system based on attribute graph information, which has a functional structure shown in figure 1, and mainly comprises the following six parts: the system comprises a network construction module, a data preprocessing module, a model construction module, a model solving module, a community discovery module and a result display module.
The social network community discovery method based on the attribute graph information comprises the following steps:
preprocessing data in a social network, specifically abstracting users in the social network into nodes in the network, abstracting connections among the users into edges in the network, and sorting to obtain a set formed by all the nodes, a set formed by all the edges and a set of all attributes associated with the nodes;
step two, according to the result of the preprocessing in the step one, the method calculates the attribute association degree between each pair of nodes in the social network by applying statistical knowledge and an informatics theory;
step three, constructing an adjacency matrix and a node attribute association degree matrix of the nodes according to the node set and the edge set obtained in the step one and the attribute association degree between each pair of nodes obtained in the step two;
step four, defining the number of communities to be discovered, and defining a membership degree matrix of the nodes for the communities according to the number of the communities to be discovered and all the nodes obtained in the step one;
inputting model parameters, and constructing an optimization problem with constraints about the membership matrix based on the adjacency matrix, the attribute association matrix and the membership matrix of the node obtained in the third step;
solving the optimization problem constructed in the fifth step to obtain an optimal membership matrix;
and seventhly, dividing all the nodes into communities according to the optimal membership matrix obtained in the sixth step to obtain finally found communities.
The optimization problem of the system of the present invention is shown in FIG. 2As shown. Here, the matrix D and the matrix a are the adjacency matrix and the attribute association matrix described in step three, respectively, and the matrix U is the membership matrix described in step four. Elements in the matrix S
Figure BDA0002675743010000041
Figure BDA0002675743010000042
β,θ,
Figure BDA0002675743010000043
Are model parameters. The optimization model may be such that the discovered communities satisfy: the users in the community have close connection and the user attributes in the community are closely related.
The following describes each module in detail:
1. network construction module
In the first step, a network is constructed. Users in a social network are abstracted as nodes in the network, and connections between users are abstracted as edges in the network.
And secondly, storing the network information. The set V ═ V formed by all nodes in the tidying networki}(1≤i≤nV) And the set of all edges E ═ EijA set of all attributes associated with a node, Λ ═ Λm}{1≤m≤nΛAnd stored.
2. Data preprocessing module
In a first step, a adjacency matrix is obtained. And constructing an adjacency matrix D of the network according to the obtained node set in the previous module and the network. Wherein if there is an edge e between two nodes i and jijThen d isij1, otherwise dij=0。
And secondly, obtaining an attribute association degree matrix.
The specific method for calculating the attribute association degree matrix is as follows:
2-1 hypothesis valpAnd valqRespectively attribute ofmAnd ΛnmnE.g. a) of the value,
Figure BDA0002675743010000044
are respectively provided with attribute values valpAnd valqThe number of pairs of neighboring nodes of (c),
Figure BDA0002675743010000045
either one of them has an attribute value valpThe number of pairs of neighboring nodes of (c),
Figure BDA0002675743010000046
either one of them has an attribute value valqNumber of neighboring node pairs. Then, according to statistical knowledge, if
Figure BDA0002675743010000047
The attribute value val is consideredpAnd valqAnd are associative;
2-2 calculating the attribute val through Mutual Information (MI)pAnd valqDegree of association between:
Figure BDA0002675743010000048
in the above formula, the first and second equations,
Figure BDA0002675743010000049
Figure BDA0002675743010000051
2-3, calculating the attribute association degree between the nodes:
Figure BDA0002675743010000052
2-4 obtaining attribute association degree matrix according to attribute association degree between nodesA, wherein aij=r(vi,vj) (i ≠ j), element a on the diagonal of the matrixii=0。
Third, the number of communities to be discovered k is defined.
Fourthly, defining a membership degree matrix U, wherein the number of rows is the number n of nodesVThe column number is the number k of communities defined in the previous step. Element U in matrix UifRepresenting the degree of membership of node i to community f.
3. Model building module
First, inputting model parameters
Figure BDA0002675743010000053
β,θ,
Figure BDA0002675743010000054
And secondly, constructing an optimization problem shown in figure 2 according to the adjacent matrix, the attribute association matrix and the membership matrix obtained by the previous module.
4. Model solving module
Firstly, lagrangian multipliers lambda and omega are introduced to eliminate equality constraint and inequality constraint in the optimization problem, wherein the lagrangian formula is as follows:
Figure BDA0002675743010000055
second, list the KKT conditions for the lagrange equation:
Figure BDA0002675743010000056
thirdly, solving the KKT condition to obtain an update formula of the matrix U:
Figure BDA0002675743010000057
Figure BDA0002675743010000058
Figure BDA0002675743010000059
Figure BDA00026757430100000510
and fourthly, randomly initializing the matrix U, and updating the matrix U according to the sequence of the formulas (1-8), (1-9), (1-7) and (1-6) until the objective function in the graph 2 converges, wherein the U at the moment is used as the optimal solution of the optimization problem shown in the graph 2.
5. Community discovery module
In the method, for each node, the node is divided into communities with the highest membership degree. In particular, we represent the matrix U as a vector form
Figure BDA0002675743010000061
Wherein the row vector uiRepresenting the distribution of membership of node i to all k communities.
The module traverses all rows of the matrix U, and each row UiFinding out the community with the maximum element value and dividing the node i into the community until all the nodes are divided. Subsequently, if there is a community that does not contain any nodes, it is eliminated. And finally, storing the rest communities for the result display module to use.
6. Result display module
According to the result obtained by the community discovery module, the module takes each community as a line, takes the nodes in the line as elements in the line, and processes all the communities into text files for output and display.
The foregoing shows and describes the general principles and features of this invention, as well as features of this invention. The present invention is not limited by the above experimental particulars which are presented in the foregoing description and are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the principles of the present invention and such changes and modifications are intended to be within the scope of the invention as claimed.

Claims (5)

1. A social network community discovery method based on attribute graph information comprises the following steps:
s1, a server sends a social network community discovery instruction to a community discovery device;
s2, receiving massive data information in the social network by the community discovery device, and abstracting the social network into network data;
s3, preprocessing network data by the aid of the community discovery device by means of statistical knowledge and an informatics theory, and constructing an adjacency matrix, an attribute association matrix and a membership matrix;
s4, the community discovery device constructs an optimization model related to the membership matrix based on the adjacency matrix, the attribute association matrix and the membership matrix in the S3;
s5, aiming at the optimized model obtained in the S4, the community discovery device introduces a Lagrange multiplier by applying a Lagrange multiplier method, and obtains an updated formula by using a Karush-Kuhn-Tucker (KKT) condition.
And S6, initializing a membership matrix by the community discovery device, iteratively updating the membership matrix based on an updating formula obtained in S5 until the optimization model in S4 converges, and recording the membership matrix at the moment as an optimal membership matrix.
And S7, the community discovery device acquires the discovered community according to the optimal membership matrix obtained in the step S6.
2. The method as claimed in claim 1, wherein in the step S2, when calculating the attribute association matrix, first considering whether there is an association between the attributes, then calculating the association degree between the attributes by an informatics theory, and finally obtaining the attribute association degree between the nodes, so as to fully consider the attribute information associated with the nodes in the social network.
3. The social network community discovery device of the social network community discovery method based on the attribute graph information as claimed in any one of claims 1 to 2, comprising a network construction module, a data preprocessing module, a model construction module, a model solving module, a community discovery module and a result presentation module. Wherein:
the network construction module is connected with the data preprocessing module and is used for constructing a social network containing a large amount of attribute information into an attribute network and transmitting the constructed attribute network to the data preprocessing module.
The data preprocessing module is connected with the model building module, preprocesses network data, calculates to obtain an adjacency matrix and an attribute association matrix, defines a membership matrix, and transmits the matrixes to the model building module.
The model building module is connected with the model solving module, and the model building module builds an optimized model about the membership matrix according to the input parameters and the data preprocessing result and transmits the model to the model solving module.
The model solving module is connected with the community finding module, and the model solving module is used for solving the obtained optimized model by applying a Lagrange multiplier method to obtain an optimal membership matrix and transmitting the optimal membership matrix to the community finding module.
The community discovery module is connected with the result display module, and the community discovery module divides each node into corresponding communities based on the optimal membership matrix by applying a division-based method so as to complete community discovery tasks. And finally, the community discovery result is transmitted to the result display module.
4. The apparatus of claim 3, wherein the conventional community discovery problem is transformed into an optimization problem to make the discovered users in the community have close connection and the attributes of the users in the community are closely related.
5. The apparatus as claimed in claim 3, wherein the optimization model considers ambiguity of nodes to community membership, and finds out communities by solving membership matrix, so as to find out communities with more practical significance.
CN202010947352.3A 2020-09-10 2020-09-10 Social network community discovery method and system based on attribute graph information Pending CN112084424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010947352.3A CN112084424A (en) 2020-09-10 2020-09-10 Social network community discovery method and system based on attribute graph information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010947352.3A CN112084424A (en) 2020-09-10 2020-09-10 Social network community discovery method and system based on attribute graph information

Publications (1)

Publication Number Publication Date
CN112084424A true CN112084424A (en) 2020-12-15

Family

ID=73731718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010947352.3A Pending CN112084424A (en) 2020-09-10 2020-09-10 Social network community discovery method and system based on attribute graph information

Country Status (1)

Country Link
CN (1) CN112084424A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052371A1 (en) * 2006-08-28 2008-02-28 Evolution Artists, Inc. System, apparatus and method for discovery of music within a social network
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
CN109584095A (en) * 2018-12-14 2019-04-05 北京智明星通科技股份有限公司 The determination method, apparatus and terminal of a kind of game social activity group
CN111128301A (en) * 2019-12-06 2020-05-08 北部湾大学 Overlapped protein compound identification method based on fuzzy clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052371A1 (en) * 2006-08-28 2008-02-28 Evolution Artists, Inc. System, apparatus and method for discovery of music within a social network
CN103729475A (en) * 2014-01-24 2014-04-16 福州大学 Multi-label propagation discovery method of overlapping communities in social network
CN109584095A (en) * 2018-12-14 2019-04-05 北京智明星通科技股份有限公司 The determination method, apparatus and terminal of a kind of game social activity group
CN111128301A (en) * 2019-12-06 2020-05-08 北部湾大学 Overlapped protein compound identification method based on fuzzy clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱智幸: ""基于模糊聚类的基因共表达网络分析研究"", 《中国优秀硕士学位论文全文数据库基础科学辑》, pages 12 - 16 *
林慧娴: ""个性化服务中用户建模及社区划分算法研究"", 《中国优秀硕士学位论文全文数据库》, pages 9 - 17 *

Similar Documents

Publication Publication Date Title
Mao et al. On mixed memberships and symmetric nonnegative matrix factorizations
CN112232925A (en) Method for carrying out personalized recommendation on commodities by fusing knowledge maps
CN112613602A (en) Recommendation method and system based on knowledge-aware hypergraph neural network
CN108763376B (en) Knowledge representation learning method for integrating relationship path, type and entity description information
Baek et al. Personalized subgraph federated learning
US20150213360A1 (en) Crowdsourcing system with community learning
US20130124502A1 (en) Method and apparatus for facilitating answering a query on a database
CN114639483A (en) Electronic medical record retrieval method and device based on graph neural network
CN109919172A (en) A kind of clustering method and device of multi-source heterogeneous data
CN108737491B (en) Information pushing method and device, storage medium and electronic device
Xu et al. Interaction between epidemic spread and collective behavior in scale-free networks with community structure
CN110390014A (en) A kind of Topics Crawling method, apparatus and storage medium
Li et al. Detecting dynamic community by fusing network embedding and nonnegative matrix factorization
Tutumlu et al. A MIP model and a hybrid genetic algorithm for flexible job-shop scheduling problem with job-splitting
US8874615B2 (en) Method and apparatus for implementing a learning model for facilitating answering a query on a database
Ma et al. Identification of multi-layer networks community by fusing nonnegative matrix factorization and topological structural information
Liu et al. Ising-cf: A pathbreaking collaborative filtering method through efficient ising machine learning
Zeng et al. Influential simplices mining via simplicial convolutional networks
Sözdinler et al. Finding maximum edge biclique in bipartite networks by integer programming
CN109859063B (en) Community discovery method and device, storage medium and terminal equipment
CN112084424A (en) Social network community discovery method and system based on attribute graph information
WO2023173550A1 (en) Cross-domain data recommendation method and apparatus, and computer device and medium
CN113470738B (en) Overlapping protein complex identification method and system based on fuzzy clustering and gene ontology semantic similarity
CN106815653B (en) Distance game-based social network relationship prediction method and system
CN112084425A (en) Community discovery method and system based on node connection and attribute similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201215