CN114707044A - Extraction method and system of collective social behaviors based on community discovery - Google Patents

Extraction method and system of collective social behaviors based on community discovery Download PDF

Info

Publication number
CN114707044A
CN114707044A CN202111638174.7A CN202111638174A CN114707044A CN 114707044 A CN114707044 A CN 114707044A CN 202111638174 A CN202111638174 A CN 202111638174A CN 114707044 A CN114707044 A CN 114707044A
Authority
CN
China
Prior art keywords
community
matrix
collective
extracting
collective social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111638174.7A
Other languages
Chinese (zh)
Other versions
CN114707044B (en
Inventor
杨海陆
刘乾
张建林
张金
陈晨
王莉莉
丁晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202111638174.7A priority Critical patent/CN114707044B/en
Publication of CN114707044A publication Critical patent/CN114707044A/en
Application granted granted Critical
Publication of CN114707044B publication Critical patent/CN114707044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a community discovery-based extraction method and system for collective social behaviors, wherein the method comprises the following steps: capturing posts published by a plurality of users in a social network as an initial data set, and preprocessing the initial data set to obtain a data set; processing the data set by using an LDA model to generate theme distribution; constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; constructing a community discovery algorithm based on the adaptive loss function, and determining a target function; continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure; and (3) introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure. The method can accurately extract the collective social behaviors which are obviously different from the individual semantic behavior characteristics, and has high robustness.

Description

Extraction method and system of collective social behaviors based on community discovery
Technical Field
The invention relates to the technical field of social network analysis, in particular to a community discovery-based method and system for extracting collective social behaviors in an online social network.
Background
A social network is a network structure made up of participants and their interrelationships, which can be represented as a group of nodes and a set of links representing the connections between them. The group of nodes are connected with each other by individuals, groups, organizations and related systems through the same value view, environment and idea; or events such as social contacts, disputes, financial security transactions, businesses, etc., as one or more groups of many aspects of interpersonal relationships. When the above relationships are successfully formed, social networks can leverage broader social processes by capturing human, social, natural, material, and financial capital and related information content. In development work, they can influence policies, strategies, plans, and projects, as well as the partnership that underlies them. Based on these characteristics of online social networks, online social network analysis is made a valid point in dealing with many problems.
Social network analysis is commonly referred to as analytical research, with the goal of revealing relevant information about nodes and connections between nodes in a social network. By treating these relationships as information for social network analysis, a better understanding of the network structure may be ensured. Social network analysis is now used in almost all areas, such as detection of personal and social group structure and behavior (component decomposition, clustering, relationship determination), e-commerce online advertising (customer profiling and trend analysis, personalized advertising and proposal submission), large data set analysis (media tracking, academic publication analysis, genetic research), etc. Researchers may employ a variety of data mining techniques to achieve goals in social network analysis.
The community discovery is a type of algorithm based on a network topology structure, and can be divided into the following types according to different research contents: the hierarchical clustering algorithm is used for dividing communities based on the similarity or connection strength between nodes, and the most common clustering algorithms include a Newman quick algorithm, a Newman greedy algorithm, a spectrum-based clustering algorithm and the like; the spectral clustering algorithm is to find communities in the network by analyzing eigenvalues and eigenvectors of a Laplace matrix or a standard matrix formed by adjacent matrixes; the modular based algorithm comprises a modular optimization algorithm and an improved modular algorithm. The modular optimization algorithm detects communities in the network by targeting a modular function as an optimization. Common algorithms include greedy algorithm, simulated annealing algorithm, Louvain algorithm and the like; the improved modularization algorithm adopts an improved modularization function, and modularization is applied to different types of networks to achieve community discovery.
The research of the collective social behavior is the key for analyzing the community and the network foundation in the social network, and the accurate extraction of the collective social behavior in the online social network has important significance. For example, the crowd psychology of online shopping is researched through aspects of buyback rate, sales volume, different regional sources and the like; establishing a social community collective behavior characteristic model to reveal the relationship between collective behaviors and community topics; analyzing collective behavior in social data finds that users can communicate their own feelings of preference to other users with connections so that they gradually share the same or similar subjective feelings.
The existing method has the following problems: in the process of extracting the social behaviors, only structural features of communities in the social network are considered, semantic information of nodes in the social network is ignored, and the collective social behaviors which are obviously different from the individual semantic behavior features are difficult to accurately extract. Therefore, semantic information in the social network is extracted, users with similar behaviors in the social network form a community through community discovery, and therefore the collective social behavior in the social network is accurately extracted.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one purpose of the present invention is to provide a method for extracting a collective social behavior based on community discovery, which solves the technical problems that the accuracy of the collective social behavior capable of representing an online social network is not high and the robustness is not sufficient due to the fact that the collective social behavior obviously different from the individual semantic behavior characteristics is difficult to accurately extract in the prior art.
Another object of the present invention is to provide a system for extracting collective social behaviors based on community discovery.
In order to achieve the above object, an embodiment of the present invention provides a method for extracting collective social behaviors based on community discovery, including the following steps: step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing the initial data set to obtain a data set; step S2, processing the data set by using an LDA model, and generating a plurality of subjects and subject distribution of each post; step S3, constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; step S4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine a target function; step S5, continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network; and step S6, introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure.
According to the method for extracting the collective social behavior based on community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the collective social behavior extraction of the online social network is realized, and the result has excellent accuracy and robustness.
In addition, the method for extracting collective social behaviors based on community discovery according to the above embodiment of the present invention may further have the following additional technical features:
further, in one embodiment of the present invention, the affinity matrix is:
Figure BDA0003442089710000031
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,
Figure BDA0003442089710000032
l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the step S5 specifically includes: fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the collective social behavior in step S6 is as follows: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior; and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using the centrality.
In order to achieve the above object, another embodiment of the present invention provides a system for extracting collective social behaviors based on community discovery, including: the acquisition and preprocessing module is used for capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing are carried out on the initial data set to obtain a data set; a subject distribution generation module for processing the data set using an LDA model to generate a plurality of subjects and a subject distribution for each post; constructing an affinity matrix module for constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix; a target function determining module for constructing a community discovery algorithm based on an adaptive loss function and the affinity matrix to determine a target function; the iterative learning module is used for enabling the target function to learn continuously by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network; and the collective social behavior extraction module is used for introducing a node2vec model to visualize the community structure and extracting collective social behaviors according to the distribution condition of the nodes in the community structure.
According to the extraction system of the collective social behavior based on community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the collective social behavior extraction of the online social network is realized, and the result has excellent accuracy and robustness.
In addition, the extraction system of collective social behaviors based on community discovery according to the above embodiment of the present invention may also have the following additional technical features:
further, in one embodiment of the present invention, the affinity matrix is:
Figure BDA0003442089710000041
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,
Figure BDA0003442089710000042
l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the iterative learning module is specifically configured to: fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the social behavior of the group in the module for extracting the social behavior of the group is as follows: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior; and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using centrality.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for extracting collective social behaviors based on community discovery according to an embodiment of the invention;
FIG. 2 is a graphical representation of the results of modularity as a function of the number of topics for one embodiment of the present invention;
FIG. 3 is a diagram of the visualization result of the node2vec model on the similarity matrix according to one embodiment of the present invention;
FIG. 4 is a diagram of collective social behavior extraction results, according to an embodiment of the invention;
FIG. 5 is a diagram of the modularity comparison analysis of the existing Ncut, Louvain and CAN algorithms of one embodiment of the present invention with the present application;
FIG. 6 is a schematic structural diagram of an extraction system of collective social behaviors based on community discovery according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The method and system for extracting community discovery-based collective social behaviors provided by the embodiment of the invention are described below with reference to the accompanying drawings, and first, the method for extracting community discovery-based collective social behaviors provided by the embodiment of the invention will be described with reference to the accompanying drawings.
FIG. 1 is a flowchart of a method for extracting collective social behaviors based on community discovery according to an embodiment of the present invention.
As shown in fig. 1, the method and system for extracting collective social behaviors based on community discovery includes the following steps:
in step S1, posts posted by a plurality of users in the social network are captured as an initial data set, and the initial data set is cleaned and word-segmented to obtain a data set.
Specifically, the online social network data information may be obtained by crawling posts on the social web page by a crawler written by Python, for example, a social media platform microblog based on the user relationship. After the data information is obtained, in order to ensure the accuracy of the experimental result, the data set is cleaned (for example, advertisements are removed, repeated posts are short), and word segmentation (jieba word segmentation) is performed to obtain the data set.
In step S2, the data set is processed using the LDA model to generate a distribution of topics for the plurality of topics and each post.
Specifically, the data set is processed by utilizing an LDA model to generate T subjects, and any node viThe node v may belong to one topic or to several topics (i.e. topic distributions), represented as floating point numbersiProbability of the subject. The generation process of the LDA model corresponds to the following joint distribution:
Figure BDA0003442089710000051
wherein, thetadSubject distribution, β, of Dirichlet (α)dDirichlet (η) is a word distribution; zd,nNumbering the subject, wd,nFor word probability, the parameters alpha and eta are hyper-parameter vectors, D belongs to D, and the subject Zd,nTopic distribution theta depending on text information published by a userd(ii) a Word wd,nWord distribution beta depending on all topics1,kAnd subject Zd,n. The data will be stored in the form of a matrix, denoted by X, with rows representing nodes viSubject matter Zd,nThe column represents the node feature vector.
In step S3, a similarity calculation function based on sparse expression is constructed to solve the similarity between each post and the distribution of the topic, and an affinity matrix is obtained.
Specifically, the embodiment of the present invention obtains the affinity matrix by calculating the similarity between the feature vectors, so that users with a relatively high degree of association in the social network (the distance between the feature vectors corresponding to semantic information published by the users is relatively small) correspond to a relatively high similarity value, and the corresponding similarity value between users with a relatively low degree of association is relatively small or even zero similarity, so as to obtain the affinity matrix and complete the reconstruction of the social network, wherein the affinity matrix can be obtained by solving the following problems:
Figure BDA0003442089710000061
wherein the content of the first and second substances,
Figure BDA0003442089710000062
d is the dimension (number of topics) of the features of the semantic information in the social network, and n is the number of data (number of users in the social network) in the data matrix obtained in step S2; its j-th column vector is denoted xjThe ith, j element is represented as xi,j(ii) a Alpha is a sparse adjustment factor. The results are calculated and derived as follows:
Figure BDA0003442089710000063
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003442089710000064
and will sort it from small to large, so that ci,jIs satisfied with
Figure BDA0003442089710000065
And is provided with
Figure BDA0003442089710000066
And m is the number of neighbors of the self-adaptive user. And (4) obtaining an affinity matrix C of the social network according to the formula (3), and obtaining the existing connection relation between the users according to the affinity matrix. Compared with fixed connection graph structures such as a full connection graph and a K neighbor graph (obtained by calculating cosine similarity, Gaussian kernel similarity and the like), the calculation method shown in the formula (3) can be adaptive to the number m of neighbors of the user. The affinity matrix constructed in the way can accurately reflect the relationship among users in the social network, and can make up the disadvantage that the spectral clustering has higher requirement on the similarity of the nodes, so that the subsequent community discovery effect is better.
In step S4, a community discovery algorithm is constructed based on the adaptive loss function and the affinity matrix to determine an objective function.
Specifically, the embodiment of the invention selects l1-norm and l2-norm to construct the loss function, and the loss function constructed by l1-norm is not sensitive to a larger outlier but is sensitive to a smaller outlier; l2-norm is exactly the opposite, and the adaptive loss function neutralizes both of the above-mentioned problems. The function is defined as follows:
Figure BDA0003442089710000071
after the affinity matrix C of the social network is reconstructed by the formula (3), in order to learn the optimal similarity matrix S, the following objective function is proposed:
Figure BDA0003442089710000072
where L is the laplacian matrix of S, and rank (L) n-k is the rank constraint, such that the similarity matrix S has k connected components. To avoid the appearance of an abnormal node (without any neighbors), constraint 1 is setTSiLet S be 1 for each row.
However, L depends on the target variable S and the rank-constrained equation is non-linear, resulting in equation (5) being difficult to solve. So let λi(L) represents LI-th smallest eigenvalue if the first k smallest eigenvalues of L satisfy
Figure BDA0003442089710000073
The rank constraint is satisfied. Given the balance factor ε, equation (5) can be expressed as:
Figure BDA0003442089710000074
according to the theory of Fan, it can be known that
Figure BDA0003442089710000075
Wherein F ═ { F ═ F1,f2,…,fkAnd is a clustering indication matrix. Substituting equation (7) into equation (6) yields:
Figure BDA0003442089710000076
the formula (8) is the final target function, the target variable S has k connected components, namely the final community discovery result can be directly obtained by using the algorithm, wherein S is the target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
In step S5, the objective function is continuously learned by using an alternating iteration method to obtain a connected component between each post under the same topic in the affinity matrix, so as to construct an objective similarity matrix to determine a community structure in the community network.
Further, in an embodiment of the present invention, step S5 specifically includes: using an alternative iteration method, firstly fixing the clustering indication matrix to solve the target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then a target similarity matrix is constructed to determine the community structure in the community network.
Specifically, the objective function is solved by using an alternating iteration method, and one variable is updated while the other variables are kept unchanged, as shown below:
(1) and fixing the clustering indication matrix F, and solving a target variable S.
After the clustering indication matrix F is fixed, the property of the Laplace matrix is utilized
Figure BDA0003442089710000081
Equation (8) can be changed to:
Figure BDA0003442089710000082
definition matrix
Figure BDA0003442089710000083
Wherein the content of the first and second substances,
Figure BDA0003442089710000084
is the ith column of E, whose jth element is
Figure BDA0003442089710000085
Because each row in S has independence, equation (9) can be written out in vector form:
Figure BDA0003442089710000086
wherein s isiA column vector formed for the i-th row element of the target similarity matrix S, ciA column vector formed by the ith row element of the affinity matrix; u. ofiThe values of (A) are as follows:
Figure BDA0003442089710000087
equation (10) is simplified to:
Figure BDA0003442089710000088
order to
Figure BDA0003442089710000089
Using the Lagrange multiplier method, there are
Figure BDA00034420897100000810
Wherein eta and xi are Lagrange multipliers, the former is scalar, and the latter is vector. According to the KKT condition:
Figure BDA0003442089710000091
and due to 1T s i1 is represented by formula (13) 1
Figure BDA0003442089710000092
Substituting it into equation (13) to obtain the optimal solution
Figure BDA0003442089710000093
The following:
Figure BDA0003442089710000094
order to
Figure BDA0003442089710000095
To any j have
Figure BDA0003442089710000096
Obtained according to equation (13):
Figure BDA0003442089710000097
therefore, only need to determine
Figure BDA0003442089710000098
An optimal solution can be obtained
Figure BDA0003442089710000099
From equation (15) and equation (13), we can obtain:
Figure BDA00034420897100000910
due to the fact that
Figure BDA00034420897100000911
Thus, it is possible to provide
Figure BDA00034420897100000912
Is optimally solved as
Figure BDA00034420897100000913
Defining a relation xi*Function of (2)
Figure BDA00034420897100000914
When f (xi)*) Can be determined when the value is 0
Figure BDA00034420897100000917
Due to xi*≧ 0, and f' (ξ)*) 0 is a piecewise linear convex function, so f' (ξ)*) The 0 root can be solved by the Newton method, i.e.
Figure BDA00034420897100000915
(2) Fixing target variable S, solving clustering indication matrix F
When the target variable S is fixed, it is equivalent to solving the following problem:
Figure BDA00034420897100000916
at this time, the optimal solution of the clustering indication matrix F is composed of eigenvectors corresponding to the k minimum eigenvalues before the laplacian matrix L.
Continuously iterating the two processes until the relative change of the target variable S is less than 10-3Or the iteration times are more than 150 times, and the iterative learning is completed.
In step S6, a node2vec model is introduced to visualize the community structure, and a collective social behavior is extracted according to the distribution of nodes in the community structure.
In particular, in order to more easily understand and analyze the collective social behavior of the social network, the embodiment of the present invention may represent the result of the community discovery as a visual result. Therefore, a Node2Vec graph embedding model is introduced, is a Node vectorization model, obtains local information from truncated random walks, takes the nodes as terms, takes the walks as sentences to learn potential representation, and further expands the Deepwalk algorithm by changing the generation mode of a random walk sequence.
Then, extracting the collective social behavior, if the middle nodes of the community structure are distributed sparsely, covering all the nodes in the community by using a minimized circle, and taking the node closest to the center of the circle as the collective social behavior; and if the distribution of the middle nodes of the community structure is dense, extracting the collective social behaviors by using the centrality.
The method for extracting collective social behaviors based on community discovery provided by the embodiment of the invention is further explained by two specific embodiments.
Detailed description of the preferred embodiment
10176 posts posted by the user from 3/1/2021 to 3/5/2021 were crawled from the Sina microblog by a crawler. The data set was cleaned (posts removed from advertisements, repeated, short, etc.) leaving 1584 posts as the initial data set. The data set can be obtained by carrying out jieba word segmentation on semantic information, and the method comprises the following steps:
1: national asset investment and stock control group … … general cooperation construction semiconductor nondestructive testing intelligence of formal science and technology strain city
2: curima auto-arrival compression ignition spark plug mechanical pressurization … … stability factor lending ten thousand legal routes
3: median safety of second-stage research of main-ren new crown epidemic … … candidate vaccine of Russian vector science center
4: china successfully develops a plurality of long-distance transport modes of vaccines triphibian storage and transportation … …, namely sea, land and air in Yangzhou base
……
1584: new rule and affair education … … of early warning caution road icing in evening news and yellow early warning is more and more matched with dish market
Generating the distribution of the T subjects and the subject of each post by using an LDA model, wherein the number of the subjects is determined according to the evaluation index modularity Q, as shown in FIG. 2, the number of the subjects is selected to be 30, and finally, a data matrix (30 multiplied by 1584) is obtained as follows:
Figure BDA0003442089710000111
an affinity matrix (1584 × 1584) of microblog data can be obtained according to the formula (3):
Figure BDA0003442089710000112
according to the community discovery algorithm based on the adaptive loss function, a similarity matrix (1584 × 1584) with a specified number of connected components is learned as follows:
Figure BDA0003442089710000113
and (3) introducing a node2vec graph embedding model to visualize the similarity matrix, wherein the result is shown in the attached figure 3. And finally, extracting collective social behaviors from the social network by using two different methods, wherein the social behaviors are marked by using two different graphs respectively, and the method 1 is rhombus, and the method 2 is a five-pointed star, as shown in an attached figure 4.
Detailed description of the invention
The existing Ncut, Louvain and CAN algorithms are selected to be compared with the extraction method provided by the invention for realization, WebKB, BBC news reports and 20NGs news document data sets are used as verification data, the stability and cohesion of community discovery results are measured by using the modularity Q, and the verification results are shown in figure 5, so that the embodiment of the invention CAN be found to be in a leading position in performance.
According to the method for extracting the collective social behaviors based on the community discovery, provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and the community discovery of the social network are completed, the output community structure is ensured to have higher cohesion and stability, the collective social behaviors of the online social network are extracted, and the result has excellent accuracy and robustness.
Next, a system for extracting collective social behaviors based on community discovery according to an embodiment of the present invention will be described with reference to the drawings.
FIG. 6 is a system for extracting collective social behavior based on community discovery, according to an embodiment of the present invention.
As shown in fig. 6, the system 10 includes: the system comprises an acquisition and preprocessing module 100, a topic distribution generation module 200, a construction affinity matrix module 300, a determination objective function module 400, an iterative learning module 500 and an extraction collective social behavior module 600.
The acquisition and preprocessing module 100 is configured to capture posts published by a plurality of users in a social network as an initial data set, and perform cleaning and word segmentation on the initial data set to obtain a data set; the subject distribution generating module 200 is configured to process the data set by using an LDA model to generate a plurality of subjects and a subject distribution of each post; the affinity matrix building module 300 is configured to build a similarity calculation function based on sparse expression to solve the similarity between each post and the distribution of the subject, so as to obtain an affinity matrix. The determine objective function module 400 is used to construct a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine the objective function. The iterative learning module 500 is configured to continuously learn the objective function by using an alternating iterative method, to obtain a connected component between each post in the same topic in the affinity matrix, so as to construct a target similarity matrix to determine a community structure in the community network. The collective social behavior extraction module 600 is configured to introduce a node2vec model to visualize a community structure, and extract collective social behaviors according to distribution of nodes in the community structure.
Further, in one embodiment of the present invention, the affinity matrix is:
Figure BDA0003442089710000121
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,
Figure BDA0003442089710000122
l2-norm of the topic distribution for nodes i and j.
Further, in one embodiment of the present invention, the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indicator matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
Further, in an embodiment of the present invention, the iterative learning module 500 is specifically configured to:
using an alternative iteration method, firstly fixing the clustering indication matrix to solve the target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then a target similarity matrix is constructed to determine the community structure in the community network.
Further, in an embodiment of the present invention, the method for extracting the social behavior of the group in the group social behavior extracting module 600 is: if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as a collective social behavior; and if the middle nodes of the community structure are densely distributed, using the centrality to extract the collective social behaviors.
It should be noted that the explanation of the embodiment of the method for extracting a collective social behavior based on community discovery is also applicable to the system of the embodiment, and is not repeated here.
According to the extraction system of the collective social behaviors based on community discovery provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesion and stability, the collective social behaviors of the online social network are extracted, and the result has excellent accuracy and robustness.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A community discovery based collective social behavior extraction method is characterized by comprising the following steps:
step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing the initial data set to obtain a data set;
step S2, processing the data set by using an LDA model, and generating a plurality of subjects and subject distribution of each post;
step S3, constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix;
step S4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine a target function;
step S5, continuously learning the target function by using an alternative iteration method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network;
and step S6, introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure.
2. The method for extracting collective social behavior based on community discovery as claimed in claim 1, wherein the affinity matrix is:
Figure FDA0003442089700000011
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,
Figure FDA0003442089700000012
l2-norm of the topic distribution for nodes i and j.
3. The method for extracting collective social behaviors based on community discovery according to claim 1, wherein the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
4. The method for extracting collective social behaviors based on community discovery according to claim 1, wherein the step S5 is specifically as follows:
fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
5. The method for extracting collective social behavior based on community discovery as claimed in claim 1, wherein the method for extracting collective social behavior in step S6 is as follows:
if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior;
and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using the centrality.
6. An extraction system of collective social behaviors based on community discovery, comprising:
the acquisition and preprocessing module is used for capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation processing are carried out on the initial data set to obtain a data set;
a subject distribution generation module for processing the data set using an LDA model to generate a plurality of subjects and a subject distribution for each post;
constructing an affinity matrix module for constructing a similarity calculation function based on sparse expression to solve the similarity of each post and the distribution of the subjects to obtain an affinity matrix;
the target function determining module is used for constructing a community discovery algorithm based on an adaptive loss function and the affinity matrix so as to determine a target function;
the iterative learning module is used for enabling the target function to learn continuously by using an alternating iterative method to obtain a communication component between posts under the same subject in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network;
and the collective social behavior extraction module is used for introducing a node2vec model to visualize the community structure and extracting collective social behaviors according to the distribution condition of the nodes in the community structure.
7. The community discovery based extraction system of collective social behaviors of claim 6, wherein the affinity matrix is:
Figure FDA0003442089700000021
wherein, ci,jIs the value of the ith row and j column of the affinity matrix, m is the number of neighbors of the adaptive user,
Figure FDA0003442089700000022
l2-norm of the topic distribution for nodes i and j.
8. The system for extracting collective social behavior based on community discovery of claim 6, wherein the objective function is:
minS,F||C(v)-S||σ+εTr(FTLF)
s.t.1Tsi=1,si,j≥0,FTF=I
wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, Tr () is a trace, 1TsiIs the sum of all values in column ith, Si,jIs the value of the ith row and j column of S, and I is the identity matrix.
9. The system for extracting collective social behavior based on community discovery of claim 6, wherein the iterative learning module is specifically configured to:
fixing a clustering indication matrix to solve a target variable by using an alternative iteration method, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10-3Or the iteration times are more than 150, the connected components of all posts under the same subject are obtained, and then the target similarity matrix is constructed to determine the community structure in the community network.
10. The community discovery-based collective social behavior extraction system according to claim 6, wherein the method for extracting collective social behaviors in the collective social behavior extraction module is as follows:
if the middle nodes of the community structure are distributed sparsely, all the nodes in the community are covered by a minimized circle, and the node closest to the center of the circle is taken as the collective social behavior;
and if the middle nodes of the community structure are densely distributed, extracting the collective social behaviors by using centrality.
CN202111638174.7A 2021-12-29 2021-12-29 Method and system for extracting collective social behavior based on community discovery Active CN114707044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111638174.7A CN114707044B (en) 2021-12-29 2021-12-29 Method and system for extracting collective social behavior based on community discovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111638174.7A CN114707044B (en) 2021-12-29 2021-12-29 Method and system for extracting collective social behavior based on community discovery

Publications (2)

Publication Number Publication Date
CN114707044A true CN114707044A (en) 2022-07-05
CN114707044B CN114707044B (en) 2023-06-23

Family

ID=82166741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111638174.7A Active CN114707044B (en) 2021-12-29 2021-12-29 Method and system for extracting collective social behavior based on community discovery

Country Status (1)

Country Link
CN (1) CN114707044B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760426A (en) * 2016-01-28 2016-07-13 仲恺农业工程学院 Subject community mining method for online social networking service
CN107705213A (en) * 2017-07-17 2018-02-16 西安电子科技大学 A kind of overlapping Combo discovering method of static social networks
US20180253485A1 (en) * 2017-03-01 2018-09-06 Yahoo! Inc. Latent user communities
CN108776844A (en) * 2018-04-13 2018-11-09 中国科学院信息工程研究所 Social network user behavior prediction method based on context-aware tensor resolution
CN110457477A (en) * 2019-08-09 2019-11-15 东北大学 A kind of Interest Community discovery method towards social networks
CN111143704A (en) * 2019-12-20 2020-05-12 北京理工大学 Online community friend recommendation method and system fusing user influence relationship
CN111292197A (en) * 2020-01-17 2020-06-16 福州大学 Community discovery method based on convolutional neural network and self-encoder
CN112269922A (en) * 2020-10-14 2021-01-26 西华大学 Community public opinion key character discovery method based on network representation learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760426A (en) * 2016-01-28 2016-07-13 仲恺农业工程学院 Subject community mining method for online social networking service
US20180253485A1 (en) * 2017-03-01 2018-09-06 Yahoo! Inc. Latent user communities
CN107705213A (en) * 2017-07-17 2018-02-16 西安电子科技大学 A kind of overlapping Combo discovering method of static social networks
CN108776844A (en) * 2018-04-13 2018-11-09 中国科学院信息工程研究所 Social network user behavior prediction method based on context-aware tensor resolution
CN110457477A (en) * 2019-08-09 2019-11-15 东北大学 A kind of Interest Community discovery method towards social networks
CN111143704A (en) * 2019-12-20 2020-05-12 北京理工大学 Online community friend recommendation method and system fusing user influence relationship
CN111292197A (en) * 2020-01-17 2020-06-16 福州大学 Community discovery method based on convolutional neural network and self-encoder
CN112269922A (en) * 2020-10-14 2021-01-26 西华大学 Community public opinion key character discovery method based on network representation learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨海陆: ""在线社会网络的结构化分析方法及应用研究"", 《中国博士学位论文全文数据库信息科技辑》, pages 139 - 22 *

Also Published As

Publication number Publication date
CN114707044B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Xu et al. Composite quantile regression neural network with applications
CN112529168B (en) GCN-based attribute multilayer network representation learning method
Xu et al. Credit scoring algorithm based on link analysis ranking with support vector machine
CN112925989B (en) Group discovery method and system of attribute network
CN111127146B (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN104731962A (en) Method and system for friend recommendation based on similar associations in social network
Dhurandhar et al. Tip: Typifying the interpretability of procedures
CN103325061A (en) Community discovery method and system
CN105760649B (en) A kind of credible measure towards big data
CN112529415B (en) Article scoring method based on combined multiple receptive field graph neural network
CN105825430A (en) Heterogeneous social network-based detection method
Li et al. Explain graph neural networks to understand weighted graph features in node classification
Mannan et al. Regionalization of rainfall characteristics in India incorporating climatic variables and using self-organizing maps
Yang Computational verb decision trees
CN113051440A (en) Link prediction method and system based on hypergraph structure
Hewapathirana Change detection in dynamic attributed networks
CN114970736A (en) Network node depth anomaly detection method based on density estimation
Olazagoitia et al. Identification of tire model parameters with artificial neural networks
Radhwan et al. Forecasting exchange rates: a chaos-based regression approach
CN115080868A (en) Product pushing method, product pushing device, computer equipment, storage medium and program product
CN112131261A (en) Community query method and device based on community network and computer equipment
Ohanuba et al. Topological data analysis via unsupervised machine learning for recognizing atmospheric river patterns on flood detection
CN113409157B (en) Cross-social network user alignment method and device
Soloviev et al. Construction of crisis precursors in multiplex networks
Li et al. An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant