CN114707044A - Extraction method and system of collective social behaviors based on community discovery - Google Patents
Extraction method and system of collective social behaviors based on community discovery Download PDFInfo
- Publication number
- CN114707044A CN114707044A CN202111638174.7A CN202111638174A CN114707044A CN 114707044 A CN114707044 A CN 114707044A CN 202111638174 A CN202111638174 A CN 202111638174A CN 114707044 A CN114707044 A CN 114707044A
- Authority
- CN
- China
- Prior art keywords
- community
- matrix
- collective
- extracting
- collective social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011273 social behavior Effects 0.000 title claims abstract description 82
- 238000000605 extraction Methods 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 116
- 230000006870 function Effects 0.000 claims abstract description 55
- 238000000034 method Methods 0.000 claims abstract description 51
- 238000009826 distribution Methods 0.000 claims abstract description 38
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 230000003044 adaptive effect Effects 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000004891 communication Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 5
- 230000006399 behavior Effects 0.000 abstract description 8
- 239000013598 vector Substances 0.000 description 10
- 238000003012 network analysis Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000005295 random walk Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 229960005486 vaccine Drugs 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009659 non-destructive testing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000002922 simulated annealing Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a community discovery-based extraction method and system for collective social behaviors, wherein the method comprises the following steps: capturing posts published by a plurality of users in a social network as an initial data set, and preprocessing the initial data set to obtain a data set; processing the data set by using an LDA model to generate theme distribution; constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; constructing a community discovery algorithm based on the adaptive loss function, and determining a target function; continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure; and (3) introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure. The method can accurately extract the collective social behaviors which are obviously different from the individual semantic behavior characteristics, and has high robustness.
Description
Technical Field
The invention relates to the technical field of social network analysis, in particular to a community discovery-based method and system for extracting collective social behaviors in an online social network.
Background
A social network is a network structure made up of participants and their interrelationships, which can be represented as a group of nodes and a set of links representing the connections between them. The group of nodes are connected with each other by individuals, groups, organizations and related systems through the same value view, environment and idea; or events such as social contacts, disputes, financial security transactions, businesses, etc., as one or more groups of many aspects of interpersonal relationships. When the above relationships are successfully formed, social networks can leverage broader social processes by capturing human, social, natural, material, and financial capital and related information content. In development work, they can influence policies, strategies, plans, and projects, as well as the partnership that underlies them. Based on these characteristics of online social networks, online social network analysis is made a valid point in dealing with many problems.
Social network analysis is commonly referred to as analytical research, with the goal of revealing relevant information about nodes and connections between nodes in a social network. By treating these relationships as information for social network analysis, a better understanding of the network structure may be ensured. Social network analysis is now used in almost all areas, such as detection of personal and social group structure and behavior (component decomposition, clustering, relationship determination), e-commerce online advertising (customer profiling and trend analysis, personalized advertising and proposal submission), large data set analysis (media tracking, academic publication analysis, genetic research), etc. Researchers may employ a variety of data mining techniques to achieve goals in social network analysis.
The community discovery is a type of algorithm based on a network topology structure, and can be divided into the following types according to different research contents: the hierarchical clustering algorithm is used for dividing communities based on the similarity or connection strength between nodes, and the most common clustering algorithms include a Newman quick algorithm, a Newman greedy algorithm, a spectrum-based clustering algorithm and the like; the spectral clustering algorithm is to find communities in the network by analyzing eigenvalues and eigenvectors of a Laplace matrix or a standard matrix formed by adjacent matrixes; the modular based algorithm comprises a modular optimization algorithm and an improved modular algorithm. The modular optimization algorithm detects communities in the network by targeting a modular function as an optimization. Common algorithms include greedy algorithm, simulated annealing algorithm, Louvain algorithm and the like; the improved modularization algorithm adopts an improved modularization function, and modularization is applied to different types of networks to achieve community discovery.
The research of the collective social behavior is the key for analyzing the community and the network foundation in the social network, and the accurate extraction of the collective social behavior in the online social network has important significance. For example, the crowd psychology of online shopping is researched through aspects of buyback rate, sales volume, different regional sources and the like; establishing a social community collective behavior characteristic model to reveal the relationship between collective behaviors and community topics; analyzing collective behavior in social data finds that users can communicate their own feelings of preference to other users with connections so that they gradually share the same or similar subjective feelings.
The existing method has the following problems: in the process of extracting the social behaviors, only structural features of communities in the social network are considered, semantic information of nodes in the social network is ignored, and the collective social behaviors which are obviously different from the individual semantic behavior features are difficult to accurately extract. Therefore, semantic information in the social network is extracted, users with similar behaviors in the social network form a community through community discovery, and therefore the collective social behavior in the social network is accurately extracted.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one purpose of the present invention is to provide a method for extracting a collective social behavior based on community discovery, which solves the technical problems that the accuracy of the collective social behavior capable of representing an online social network is not high and the robustness is not sufficient due to the fact that the collective social behavior obviously different from the individual semantic behavior characteristics is difficult to accurately extract in the prior art.
Another object of the present invention is to provide a system for extracting collective social behaviors based on community discovery.
In order to achieve the above object, an embodiment of the present invention provides a method for extracting collective social behaviors based on community discovery, including the following steps: step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing the initial data set to obtain a data set; step S2, processing the data set by using an LDA model, and generating a plurality of subjects and subject distribution of each post; step S3, constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; step S4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine a target function; step S5, continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network; and step S6, introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure.
According to the method for extracting the collective social behavior based on community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the collective social behavior extraction of the online social network is realized, and the result has excellent accuracy and robustness.
In addition, the method for extracting collective social behaviors based on community discovery according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the affinity matrix is:
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the step S5 specifically includes: fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the collective social behavior in step S6 is as follows: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior; and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using the centrality.
In order to achieve the above object, another embodiment of the present invention provides a system for extracting collective social behaviors based on community discovery, including: the acquisition and preprocessing module is used for capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing are carried out on the initial data set to obtain a data set; a subject distribution generation module for processing the data set using an LDA model to generate a plurality of subjects and a subject distribution for each post; constructing an affinity matrix module for constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; a target function determining module for constructing a community discovery algorithm based on an adaptive loss function and the affinity matrix to determine a target function; the iterative learning module is used for enabling the target function to learn continuously by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network; and the collective social behavior extraction module is used for introducing a node2vec model to visualize the community structure and extracting collective social behaviors according to the distribution condition of the nodes in the community structure.
According to the extraction system of the collective social behavior based on community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the collective social behavior extraction of the online social network is realized, and the result has excellent accuracy and robustness.
In addition, the extraction system of collective social behaviors based on community discovery according to the above embodiment of the present invention may also have the following additional technical features:
further, in one embodiment of the present invention, the affinity matrix is:
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the iterative learning module is specifically configured to: fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the social behavior of the group in the module for extracting the social behavior of the group is as follows: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior; and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using centrality.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for extracting collective social behaviors based on community discovery according to an embodiment of the invention;
FIG. 2 is a graphical representation of the results of modularity as a function of the number of topics for one embodiment of the present invention;
FIG. 3 is a diagram of the visualization result of the node2vec model on the similarity matrix according to one embodiment of the present invention;
FIG. 4 is a diagram of collective social behavior extraction results, according to an embodiment of the invention;
FIG. 5 is a diagram of the modularity comparison analysis of the existing Ncut, Louvain and CAN algorithms of one embodiment of the present invention with the present application;
FIG. 6 is a schematic structural diagram of an extraction system of collective social behaviors based on community discovery according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and system for extracting community discovery-based collective social behaviors provided by the embodiment of the invention are described below with reference to the accompanying drawings, and first, the method for extracting community discovery-based collective social behaviors provided by the embodiment of the invention will be described with reference to the accompanying drawings.
FIG. 1 is a flowchart of a method for extracting collective social behaviors based on community discovery according to an embodiment of the present invention.
As shown in fig. 1, the method and system for extracting collective social behaviors based on community discovery includes the following steps:
in step S1, posts posted by a plurality of users in the social network are captured as an initial data set, and the initial data set is cleaned and word-segmented to obtain a data set.
Specifically, the online social network data information may be obtained by crawling posts on the social web page by a crawler written by Python, for example, a social media platform microblog based on the user relationship. After the data information is obtained, in order to ensure the accuracy of the experimental result, the data set is cleaned (for example, advertisements are removed, repeated posts are short), and word segmentation (jieba word segmentation) is performed to obtain the data set.
In step S2, the data set is processed using the LDA model to generate a distribution of topics for the plurality of topics and each post.
Specifically, the data set is processed by utilizing an LDA model to generate T subjects, and any node viThe node v may belong to one topic or to several topics (i.e. topic distributions), represented as floating point numbersiProbability of the subject. The generation process of the LDA model corresponds to the following joint distribution:
wherein, thetadSubject distribution, β, of Dirichlet (α)dDirichlet (η) is a word distribution; zd,nNumbering the subject, wd,nFor word probability, the parameters alpha and eta are hyper-parameter vectors, D belongs to D, and the subject Zd,nTopic distribution theta depending on text information published by a userd(ii) a Word wd,nWord distribution beta depending on all topics1,kAnd subject Zd,n. The data will be stored in the form of a matrix, denoted by X, with rows representing nodes viSubject matter Zd,nThe column represents the node feature vector.
In step S3, a similarity calculation function based on sparse expression is constructed to solve the similarity between each post and the distribution of the topic, and an affinity matrix is obtained.
Specifically, the embodiment of the present invention obtains the affinity matrix by calculating the similarity between the feature vectors, so that users with a relatively high degree of association in the social network (the distance between the feature vectors corresponding to semantic information published by the users is relatively small) correspond to a relatively high similarity value, and the corresponding similarity value between users with a relatively low degree of association is relatively small or even zero similarity, so as to obtain the affinity matrix and complete the reconstruction of the social network, wherein the affinity matrix can be obtained by solving the following problems:
wherein the content of the first and second substances,d is the dimension (number of topics) of the features of the semantic information in the social network, and n is the number of data (number of users in the social network) in the data matrix obtained in step S2; its j-th column vector is denoted xjThe ith, j element is represented as xi,j(ii) a Alpha is a sparse adjustment factor. The results are calculated and derived as follows:
wherein, the first and the second end of the pipe are connected with each other,and will sort it from small to large, so that ci,jIs satisfied withAnd is provided withAnd m is the number of neighbors of the self-adaptive user. And (4) obtaining an affinity matrix C of the social network according to the formula (3), and obtaining the existing connection relation between the users according to the affinity matrix. Compared with fixed connection graph structures such as a full connection graph and a K neighbor graph (obtained by calculating cosine similarity, Gaussian kernel similarity and the like), the calculation method shown in the formula (3) can be adaptive to the number m of neighbors of the user. The affinity matrix constructed in the way can accurately reflect the relationship among users in the social network, and can make up the disadvantage that the spectral clustering has higher requirement on the similarity of the nodes, so that the subsequent community discovery effect is better.
In step S4, a community discovery algorithm is constructed based on the adaptive loss function and the affinity matrix to determine an objective function.
Specifically, the embodiment of the invention selects l1-norm and l2-norm to construct the loss function, and the loss function constructed by l1-norm is not sensitive to a larger outlier but is sensitive to a smaller outlier; l2-norm is exactly the opposite, and the adaptive loss function neutralizes both of the above-mentioned problems. The function is defined as follows:
after the affinity matrix C of the social network is reconstructed by the formula (3), in order to learn the optimal similarity matrix S, the following objective function is proposed:
where L is the laplacian matrix of S, and rank (L) n-k is the rank constraint, such that the similarity matrix S has k connected components. To avoid the appearance of an abnormal node (without any neighbors), constraint 1 is setTSiLet S be 1 for each row.
However, L depends on the target variable S and the rank-constrained equation is non-linear, resulting in equation (5) being difficult to solve. So let λi(L) represents LI-th smallest eigenvalue if the first k smallest eigenvalues of L satisfyThe rank constraint is satisfied. Given the balance factor ε, equation (5) can be expressed as:
according to the theory of Fan, it can be known that
Wherein F ═ { F ═ F1,f2,…,fkAnd is a clustering indication matrix. Substituting equation (7) into equation (6) yields:
the formula (8) is the final target function, the target variable S has k connected components, namely the final community discovery result can be directly obtained by using the algorithm, wherein S is the target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
In step S5, the objective function is continuously learned by using an alternating iteration method to obtain a connected component between each post under the same topic in the affinity matrix, so as to construct an objective similarity matrix to determine a community structure in the community network.
Further, in an embodiment of the present invention, step S5 specifically includes: using an alternative iteration method, firstly fixing the clustering indication matrix to solve the target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then a target similarity matrix is constructed to determine the community structure in the community network.
Specifically, the objective function is solved by using an alternating iteration method, and one variable is updated while the other variables are kept unchanged, as shown below:
(1) and fixing the clustering indication matrix F, and solving a target variable S.
After the clustering indication matrix F is fixed, the property of the Laplace matrix is utilizedEquation (8) can be changed to:
definition matrixWherein the content of the first and second substances,is the ith column of E, whose jth element isBecause each row in S has independence, equation (9) can be written out in vector form:
wherein s isiA column vector formed for the i-th row element of the target similarity matrix S, ciA column vector formed by the ith row element of the affinity matrix; u. ofiThe values of (A) are as follows:
equation (10) is simplified to:
order toUsing the Lagrange multiplier method, there areWherein eta and xi are Lagrange multipliers, the former is scalar, and the latter is vector. According to the KKT condition:
and due to 1T s i1 is represented by formula (13) 1Substituting it into equation (13) to obtain the optimal solutionThe following:
therefore, only need to determineAn optimal solution can be obtainedFrom equation (15) and equation (13), we can obtain:
due to the fact thatThus, it is possible to provideIs optimally solved asDefining a relation xi*Function of (2)When f (xi)*) Can be determined when the value is 0Due to xi*≧ 0, and f' (ξ)*) 0 is a piecewise linear convex function, so f' (ξ)*) The 0 root can be solved by the Newton method, i.e.
(2) Fixing target variable S, solving clustering indication matrix F
When the target variable S is fixed, it is equivalent to solving the following problem:
at this time, the optimal solution of the clustering indication matrix F is composed of eigenvectors corresponding to the k minimum eigenvalues before the laplacian matrix L.
Continuously iterating the two processes until the relative change of the target variable S is less than 10-3Or the iteration times are more than 150 times, and the iterative learning is completed.
In step S6, a node2vec model is introduced to visualize the community structure, and a collective social behavior is extracted according to the distribution of nodes in the community structure.
In particular, in order to more easily understand and analyze the collective social behavior of the social network, the embodiment of the present invention may represent the result of the community discovery as a visual result. Therefore, a Node2Vec graph embedding model is introduced, is a Node vectorization model, obtains local information from truncated random walks, takes the nodes as terms, takes the walks as sentences to learn potential representation, and further expands the Deepwalk algorithm by changing the generation mode of a random walk sequence.
Then, extracting the collective social behavior, if the middle nodes of the community structure are distributed sparsely, covering all the nodes in the community by using a minimized circle, and taking the node closest to the center of the circle as the collective social behavior; and if the distribution of the middle nodes of the community structure is dense, extracting the collective social behaviors by using the centrality.
The method for extracting collective social behaviors based on community discovery provided by the embodiment of the invention is further explained by two specific embodiments.
Detailed description of the preferred embodiment
10176 posts posted by the user from 3/1/2021 to 3/5/2021 were crawled from the Sina microblog by a crawler. The data set was cleaned (posts removed from advertisements, repeated, short, etc.) leaving 1584 posts as the initial data set. The data set can be obtained by carrying out jieba word segmentation on semantic information, and the method comprises the following steps:
1: national asset investment and stock control group … … general cooperation construction semiconductor nondestructive testing intelligence of formal science and technology strain city
2: curima auto-arrival compression ignition spark plug mechanical pressurization … … stability factor lending ten thousand legal routes
3: median safety of second-stage research of main-ren new crown epidemic … … candidate vaccine of Russian vector science center
4: china successfully develops a plurality of long-distance transport modes of vaccines triphibian storage and transportation … …, namely sea, land and air in Yangzhou base
……
1584: new rule and affair education … … of early warning caution road icing in evening news and yellow early warning is more and more matched with dish market
Generating the distribution of the T subjects and the subject of each post by using an LDA model, wherein the number of the subjects is determined according to the evaluation index modularity Q, as shown in FIG. 2, the number of the subjects is selected to be 30, and finally, a data matrix (30 multiplied by 1584) is obtained as follows:
an affinity matrix (1584 × 1584) of microblog data can be obtained according to the formula (3):
according to the community discovery algorithm based on the adaptive loss function, a similarity matrix (1584 × 1584) with a specified number of connected components is learned as follows:
and (3) introducing a node2vec graph embedding model to visualize the similarity matrix, wherein the result is shown in the attached figure 3. And finally, extracting collective social behaviors from the social network by using two different methods, wherein the social behaviors are marked by using two different graphs respectively, and the method 1 is rhombus, and the method 2 is a five-pointed star, as shown in an attached figure 4.
Detailed description of the invention
The existing Ncut, Louvain and CAN algorithms are selected to be compared with the extraction method provided by the invention for realization, WebKB, BBC news reports and 20NGs news document data sets are used as verification data, the stability and cohesion of community discovery results are measured by using the modularity Q, and the verification results are shown in figure 5, so that the embodiment of the invention CAN be found to be in a leading position in performance.
According to the method for extracting the collective social behaviors based on the community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and the community discovery of the social network are completed, the output community structure is ensured to have higher cohesion and stability, the collective social behaviors of the online social network are extracted, and the result has excellent accuracy and robustness.
Next, a system for extracting collective social behaviors based on community discovery according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 6 is a system for extracting collective social behavior based on community discovery, according to an embodiment of the present invention.
As shown in fig. 6, the system 10 includes: the system comprises an acquisition and preprocessing module 100, a topic distribution generation module 200, a construction affinity matrix module 300, a determination objective function module 400, an iterative learning module 500 and an extraction collective social behavior module 600.
The acquisition and preprocessing module 100 is configured to capture posts published by a plurality of users in a social network as an initial data set, and perform cleaning and word segmentation on the initial data set to obtain a data set; the subject distribution generating module 200 is configured to process the data set by using an LDA model to generate a plurality of subjects and a subject distribution of each post; the affinity matrix building module 300 is configured to build a similarity calculation function based on sparse expression to solve the similarity between each post and the distribution of the subject, so as to obtain an affinity matrix. The determine objective function module 400 is used to construct a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine the objective function. The iterative learning module 500 is configured to continuously learn the objective function by using an alternating iterative method, to obtain a connected component between each post in the same topic in the affinity matrix, so as to construct a target similarity matrix to determine a community structure in the community network. The collective social behavior extraction module 600 is configured to introduce a node2vec model to visualize a community structure, and extract collective social behaviors according to distribution of nodes in the community structure.
Further, in one embodiment of the present invention, the affinity matrix is:
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the iterative learning module 500 is specifically configured to:
using an alternative iteration method, firstly fixing the clustering indication matrix to solve the target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then a target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the social behavior of the group in the group social behavior extracting module 600 is: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as a collective social behavior; and if the middle nodes of the community structure are densely distributed, using the centrality to extract the collective social behaviors.
It should be noted that the explanation of the embodiment of the method for extracting a collective social behavior based on community discovery is also applicable to the system of the embodiment, and is not repeated here.
According to the extraction system of the collective social behaviors based on community discovery provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesion and stability, the collective social behaviors of the online social network are extracted, and the result has excellent accuracy and robustness.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A community discovery based collective social behavior extraction method is characterized by comprising the following steps:
step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing the initial data set to obtain a data set;
step S2, processing the data set by using an LDA model, and generating a plurality of subjects and subject distribution of each post;
step S3, constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix;
step S4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine a target function;
step S5, continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network;
and step S6, introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure.
2. The method for extracting collective social behavior based on community discovery as claimed in claim 1, wherein the affinity matrix is:
3. The method for extracting collective social behaviors based on community discovery according to claim 1, wherein the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
4. The method for extracting collective social behaviors based on community discovery according to claim 1, wherein the step S5 is specifically as follows:
fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
5. The method for extracting collective social behavior based on community discovery as claimed in claim 1, wherein the method for extracting collective social behavior in step S6 is as follows:
if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior;
and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using the centrality.
6. An extraction system of collective social behaviors based on community discovery, comprising:
the acquisition and preprocessing module is used for capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing are carried out on the initial data set to obtain a data set;
a subject distribution generation module for processing the data set using an LDA model to generate a plurality of subjects and a subject distribution for each post;
constructing an affinity matrix module for constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix;
the target function determining module is used for constructing a community discovery algorithm based on an adaptive loss function and the affinity matrix so as to determine a target function;
the iterative learning module is used for enabling the target function to learn continuously by using an alternating iterative method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network;
and the collective social behavior extraction module is used for introducing a node2vec model to visualize the community structure and extracting collective social behaviors according to the distribution condition of the nodes in the community structure.
7. The community discovery based extraction system of collective social behaviors of claim 6, wherein the affinity matrix is:
8. The system for extracting collective social behavior based on community discovery of claim 6, wherein the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
9. The system for extracting collective social behavior based on community discovery of claim 6, wherein the iterative learning module is specifically configured to:
fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
10. The community discovery-based collective social behavior extraction system according to claim 6, wherein the method for extracting collective social behaviors in the collective social behavior extraction module is as follows:
if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior;
and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using centrality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111638174.7A CN114707044B (en) | 2021-12-29 | 2021-12-29 | Method and system for extracting collective social behavior based on community discovery |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111638174.7A CN114707044B (en) | 2021-12-29 | 2021-12-29 | Method and system for extracting collective social behavior based on community discovery |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114707044A true CN114707044A (en) | 2022-07-05 |
CN114707044B CN114707044B (en) | 2023-06-23 |
Family
ID=82166741
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111638174.7A Active CN114707044B (en) | 2021-12-29 | 2021-12-29 | Method and system for extracting collective social behavior based on community discovery |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114707044B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760426A (en) * | 2016-01-28 | 2016-07-13 | 仲恺农业工程学院 | Subject community mining method for online social networking service |
CN107705213A (en) * | 2017-07-17 | 2018-02-16 | 西安电子科技大学 | A kind of overlapping Combo discovering method of static social networks |
US20180253485A1 (en) * | 2017-03-01 | 2018-09-06 | Yahoo! Inc. | Latent user communities |
CN108776844A (en) * | 2018-04-13 | 2018-11-09 | 中国科学院信息工程研究所 | Social network user behavior prediction method based on context-aware tensor resolution |
CN110457477A (en) * | 2019-08-09 | 2019-11-15 | 东北大学 | A kind of Interest Community discovery method towards social networks |
CN111143704A (en) * | 2019-12-20 | 2020-05-12 | 北京理工大学 | Online community friend recommendation method and system fusing user influence relationship |
CN111292197A (en) * | 2020-01-17 | 2020-06-16 | 福州大学 | Community discovery method based on convolutional neural network and self-encoder |
CN112269922A (en) * | 2020-10-14 | 2021-01-26 | 西华大学 | Community public opinion key character discovery method based on network representation learning |
-
2021
- 2021-12-29 CN CN202111638174.7A patent/CN114707044B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105760426A (en) * | 2016-01-28 | 2016-07-13 | 仲恺农业工程学院 | Subject community mining method for online social networking service |
US20180253485A1 (en) * | 2017-03-01 | 2018-09-06 | Yahoo! Inc. | Latent user communities |
CN107705213A (en) * | 2017-07-17 | 2018-02-16 | 西安电子科技大学 | A kind of overlapping Combo discovering method of static social networks |
CN108776844A (en) * | 2018-04-13 | 2018-11-09 | 中国科学院信息工程研究所 | Social network user behavior prediction method based on context-aware tensor resolution |
CN110457477A (en) * | 2019-08-09 | 2019-11-15 | 东北大学 | A kind of Interest Community discovery method towards social networks |
CN111143704A (en) * | 2019-12-20 | 2020-05-12 | 北京理工大学 | Online community friend recommendation method and system fusing user influence relationship |
CN111292197A (en) * | 2020-01-17 | 2020-06-16 | 福州大学 | Community discovery method based on convolutional neural network and self-encoder |
CN112269922A (en) * | 2020-10-14 | 2021-01-26 | 西华大学 | Community public opinion key character discovery method based on network representation learning |
Non-Patent Citations (1)
Title |
---|
杨海陆: ""在线社会网络的结构化分析方法及应用研究"", 《中国博士学位论文全文数据库信息科技辑》, pages 139 - 22 * |
Also Published As
Publication number | Publication date |
---|---|
CN114707044B (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Composite quantile regression neural network with applications | |
CN112529168B (en) | GCN-based attribute multilayer network representation learning method | |
Xu et al. | Credit scoring algorithm based on link analysis ranking with support vector machine | |
CN112925989B (en) | Group discovery method and system of attribute network | |
CN111127146B (en) | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder | |
CN104731962A (en) | Method and system for friend recommendation based on similar associations in social network | |
Dhurandhar et al. | Tip: Typifying the interpretability of procedures | |
CN103325061A (en) | Community discovery method and system | |
CN105760649B (en) | A kind of credible measure towards big data | |
CN112529415B (en) | Article scoring method based on combined multiple receptive field graph neural network | |
CN105825430A (en) | Heterogeneous social network-based detection method | |
Li et al. | Explain graph neural networks to understand weighted graph features in node classification | |
Mannan et al. | Regionalization of rainfall characteristics in India incorporating climatic variables and using self-organizing maps | |
Yang | Computational verb decision trees | |
CN113051440A (en) | Link prediction method and system based on hypergraph structure | |
Hewapathirana | Change detection in dynamic attributed networks | |
CN114970736A (en) | Network node depth anomaly detection method based on density estimation | |
Olazagoitia et al. | Identification of tire model parameters with artificial neural networks | |
Radhwan et al. | Forecasting exchange rates: a chaos-based regression approach | |
CN115080868A (en) | Product pushing method, product pushing device, computer equipment, storage medium and program product | |
CN112131261A (en) | Community query method and device based on community network and computer equipment | |
Ohanuba et al. | Topological data analysis via unsupervised machine learning for recognizing atmospheric river patterns on flood detection | |
CN113409157B (en) | Cross-social network user alignment method and device | |
Soloviev et al. | Construction of crisis precursors in multiplex networks | |
Li et al. | An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |