CN114707044B

CN114707044B - Method and system for extracting collective social behavior based on community discovery

Info

Publication number: CN114707044B
Application number: CN202111638174.7A
Authority: CN
Inventors: 杨海陆; 刘乾; 张建林; 张金; 陈晨; 王莉莉; 丁晓宇
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2023-06-23
Anticipated expiration: 2041-12-29
Also published as: CN114707044A

Abstract

The invention discloses a method and a system for extracting collective social behavior based on community discovery, wherein the method comprises the following steps: capturing posts published by a plurality of users in a social network as an initial data set, and preprocessing the posts to obtain a data set; processing the data set by using an LDA model to generate theme distribution; constructing a similarity calculation function based on sparse expression to solve the similarity between each post and the topic distribution to obtain an affinity matrix; constructing a community discovery algorithm based on the self-adaptive loss function, and determining an objective function; continuously learning an objective function by using an alternate iteration method to obtain a connected component between each post under the same theme in the affinity matrix so as to construct an objective similarity matrix to determine a community structure; and a node2vec model is introduced to visualize the community structure, and collective social behavior is extracted according to the distribution condition of the nodes in the community structure. The method can accurately extract the collective social behavior which is obviously different from the individual semantic behavior characteristics, and has high robustness.

Description

Method and system for extracting collective social behavior based on community discovery

Technical Field

The invention relates to the technical field of social network analysis, in particular to a method and a system for extracting collective social behaviors in an online social network based on community discovery.

Background

Social networks are composed of participants and their interrelationships, which may be represented as a network of a group of nodes and a set of links representing connections between them. The group of nodes are interconnected by individuals, communities, organizations, and related systems through the same value views, environments, ideas; events such as social contacts, disputes, financial securities exchanges, businesses, etc., may also be combined with one another as one or more groups of many aspects of personal relationships. When the above relationships are successfully formed, social networks can affect a wider social process by capturing human, social, natural, physical and financial capital and related information content. In development work, they can affect policies, strategies, plans and projects, as well as partnerships that form their basis. Based on these features of online social networks, online social network analysis is made an effective point in addressing many problems.

Social network analysis is often referred to as analysis research, whose purpose is to reveal relevant information about nodes and connections between nodes in a social network. By treating these relationships as information for social network analysis, a better understanding of the network structure may be ensured. Social network analysis is now almost used in many areas, such as detection of personal and social group structure and behavior (component decomposition, clustering, relationship determination), e-commerce online advertising (customer profile and trend analysis, personalized advertising and proposal submission), large dataset analysis (media tracking, academic publication analysis, genetic research), and the like. Researchers may use a variety of data mining techniques to achieve goals in social network analysis.

Community discovery is a type of algorithm based on network topology, and can be classified into the following categories according to different research contents: the hierarchical clustering algorithm divides communities based on similarity or connection strength among nodes, and the most common clustering algorithm is a Newman quick algorithm, a Newman greedy algorithm, a clustering algorithm based on spectrums and the like; the spectral clustering algorithm is to find communities in the network by analyzing eigenvalues and eigenvectors of a Laplace matrix or a standard matrix formed by adjacent matrices; the modularization-based algorithm includes a modularization optimization algorithm and a modified modularization algorithm. The modular optimization algorithm detects communities in the network by targeting modular functions for optimization. The common algorithms include greedy algorithm, simulated annealing algorithm, louvain algorithm and the like; the improved modularization algorithm employs an improved modularization function to apply modularization to different types of networks to implement community discovery.

The research of the collective social behavior is the key for analyzing the community and the network basis in the social network, and the accurate extraction of the collective social behavior in the online social network has important significance. Study of the popular psychology of online shopping, for example, through the aspects of rate of return, sales, different regional sources, etc.; establishing a social community collective behavior feature model to reveal the relationship between the collective behavior and community topics; analyzing collective behavior in social data finds that users can communicate their own preferences to other users with connections so that they gradually share the same or similar subjective experiences.

The existing method has the following problems: in the process of extracting social behaviors, only structural features of communities in the social network are considered, semantic information of nodes in the social network is ignored, and collective social behaviors which show semantic features which are obviously different from those of individuals are difficult to accurately extract. Therefore, the semantic information in the social network is extracted, and users with similar behaviors in the social network form a community through community discovery, so that the collective social behaviors in the social network are accurately extracted.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems in the related art to some extent.

Therefore, an object of the invention is to provide a method for extracting a collective social behavior based on community discovery, which solves the technical problems of low accuracy and insufficient robustness of the collective social behavior capable of representing an online social network caused by the fact that the collective social behavior which is obviously different from the semantic behavior characteristics of an individual is difficult to extract accurately in the prior art.

Another object of the present invention is to provide an extraction system for collective social behavior based on community discovery.

In order to achieve the above objective, an embodiment of one aspect of the present invention provides a method for extracting a collective social behavior based on community discovery, including the following steps: step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation are carried out on the initial data set to obtain a data set; s2, processing the data set by using an LDA model to generate a plurality of topics and topic distribution of each post; s3, constructing a similarity calculation function based on sparse expression to solve the similarity between each post and the topic distribution to obtain an affinity matrix; s4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine an objective function; step S5, continuously learning the objective function by using an alternate iteration method to obtain a connected component between each post under the same theme in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network; and S6, introducing a node2vec model to visualize the community structure, and extracting collective social behavior according to the distribution condition of the nodes in the community structure.

According to the extraction method of the collective social behavior based on community discovery, disclosed by the embodiment of the invention, the similarity matrix is learned by utilizing the self-adaptive loss function, the initial data information of the social network is processed with high quality, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the extraction of the collective social behavior of the online social network is realized, and the result is enabled to have excellent accuracy and robustness.

In addition, the method for extracting the collective social behavior based on community discovery according to the embodiment of the invention can also have the following additional technical characteristics:

further, in one embodiment of the present invention, the affinity matrix is:

wherein c _i,j The value of row j of the i-th row of the affinity matrix, m is the neighbor number of the adaptive user,

l2-norm distributed for the subjects of nodes i and j.

Further, in one embodiment of the present invention, the objective function is:

min _S,F ||C ^(v) -S|| _σ +εTr(F ^T LF)

s.t.1 ^T s _i ＝1,s _i,j ≥0,F ^T F＝I

wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplacian matrix of the target variable, tr () is a trace, and 1 ^T s _i Is the sum of all values of the ith row, S _i,j And I is an identity matrix, which is the value of the ith row and the jth column of S.

Further, in one embodiment of the present invention, the step S5 specifically includes: utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then the target similarity matrix is constructed to determine a community structure in a community network.

Further, in one embodiment of the present invention, the method for extracting the collective social behavior in the step S6 is: if the nodes in the community structure are sparsely distributed, covering all the nodes in the community by using a minimized circle, and taking the node closest to the center of the circle as the collective social behavior; and if the nodes in the community structure are densely distributed, extracting the collective social behavior by using centrality.

To achieve the above objective, another embodiment of the present invention provides a system for extracting collective social behavior based on community discovery, including: the acquisition and preprocessing module is used for capturing posts published by a plurality of users in the social network as an initial data set, and cleaning and word segmentation are carried out on the initial data set to obtain a data set; the topic distribution generation module is used for processing the data set by utilizing an LDA model to generate a plurality of topics and topic distribution of each post; an affinity matrix module is constructed and used for constructing a similarity calculation function based on sparse expression to solve the similarity between each post and the topic distribution so as to obtain an affinity matrix; the objective function determining module is used for constructing a community finding algorithm based on the adaptive loss function and the affinity matrix so as to determine an objective function; the iterative learning module is used for continuously learning the objective function by using an alternate iterative method to obtain a connected component between each post under the same theme in the affinity matrix so as to construct a target similarity matrix and determine a community structure in a community network; the collective social behavior extraction module is used for introducing a node2vec model to visualize the community structure, and extracting collective social behaviors according to the distribution condition of nodes in the community structure.

According to the extraction system of the collective social behavior based on community discovery, disclosed by the embodiment of the invention, the similarity matrix is learned by utilizing the self-adaptive loss function, the initial data information of the social network is processed with high quality, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the extraction of the collective social behavior of the online social network is realized, and the result is enabled to have excellent accuracy and robustness.

In addition, the extraction system of collective social behavior based on community discovery according to the embodiment of the invention may further have the following additional technical features:

further, in one embodiment of the present invention, the affinity matrix is:

l2-norm distributed for the subjects of nodes i and j.

Further, in one embodiment of the present invention, the objective function is:

min _S,F ||C ^(v) -S|| _σ +εTr(F ^T LF)

s.t.1 ^T s _i ＝1,s _i,j ≥0,F ^T F＝I

wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, and F is a cluster indicationMatrix, L is Laplacian matrix of target variable, tr () is trace, 1 ^T s _i Is the sum of all values of the ith row, S _i,j And I is an identity matrix, which is the value of the ith row and the jth column of S.

Further, in one embodiment of the present invention, the iterative learning module is specifically configured to: utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then the target similarity matrix is constructed to determine a community structure in a community network.

Further, in one embodiment of the present invention, the method for extracting the collective social behavior in the collective social behavior module includes: if the nodes in the community structure are sparsely distributed, covering all the nodes in the community by using a minimized circle, and taking the node closest to the center of the circle as the collective social behavior; and if the nodes in the community structure are densely distributed, extracting the collective social behavior by using centrality.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a method of extracting collective social behavior based on community discovery in accordance with one embodiment of the present invention;

FIG. 2 is a graphical representation of the results of modularity versus topic number for one embodiment of the present invention;

FIG. 3 is a graph of the visual results of a node2vec model versus a similarity matrix according to an embodiment of the present invention;

FIG. 4 is a graph of collective social behavior extraction results for one embodiment of the invention;

FIG. 5 is a block diagram of the prior Ncut, louvain, and CAN algorithms versus the modularity of the present application for one embodiment of the invention;

FIG. 6 is a schematic diagram of a system for extracting collective social behavior based on community discovery according to one embodiment of the invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

The method and system for extracting the collective social behavior based on the community discovery according to the embodiment of the invention are described below with reference to the accompanying drawings, and the method for extracting the collective social behavior based on the community discovery according to the embodiment of the invention will be described first.

FIG. 1 is a flow chart of a method of extracting collective social behavior based on community discovery in accordance with one embodiment of the invention.

As shown in fig. 1, the method and system for extracting the collective social behavior based on community discovery comprise the following steps:

in step S1, posts published by a plurality of users in a social network are captured as an initial data set, and the initial data set is cleaned and word-segmented to obtain a data set.

Specifically, online social network data information can be obtained by capturing posts on a social webpage through a Python writing crawler program, such as a social media platform microblog based on user relations. After obtaining the data information, in order to ensure the accuracy of the experimental result, the data set is cleaned (for example, advertisement is removed, repeated, short posts are made), and the data set is obtained by word segmentation (jieba word segmentation).

In step S2, the data set is processed using the LDA model, generating a plurality of topics and a topic distribution for each post.

Specifically, the data set is processed by using an LDA model to generate T topics, and any node v _i May belong to one topic or a plurality of topics (i.e. topic distribution), and the node v is represented by a floating point number _i Probability of the subject matter. The generation process of the LDA model corresponds to the following joint distribution:

wherein θ _d =dirichlet (α) as subject distribution, β _d =dirichlet (η) is word distribution; z is Z _d,n Numbering the topics, w _d,n For word probability, the parameters alpha and eta are super-parameter vectors, D E D, and the topic Z _d,n Subject distribution θ depending on text information published by user _d The method comprises the steps of carrying out a first treatment on the surface of the Word w _d,n Word distribution beta depending on all topics _1,k And subject Z _d,n . The data will be stored in the form of a matrix, denoted by X, with the rows representing nodes v _i Subject Z _d,n The column represents the node feature vector.

In step S3, a similarity calculation function based on sparse representation is constructed to solve the similarity between each post and the topic distribution, and an affinity matrix is obtained.

Specifically, the embodiment of the invention obtains the affinity matrix by calculating the similarity between the feature vectors, so that a larger similarity value is corresponding between users with larger association degree (the feature vector distance corresponding to the semantic information published by the users is smaller) in the social network, the similarity value corresponding to the users with smaller association degree is smaller or even zero, and the affinity matrix is obtained, and the reconstruction of the social network is completed, wherein the affinity matrix can be obtained by solving the following problems:

wherein, the liquid crystal display device comprises a liquid crystal display device,

the data matrix obtained in the step S2 is d is semantic information in the social networkFeature dimension (number of topics), n is the number of data (number of users in the social network); its jth column vector is denoted as x _j The i, j-th element is denoted as x _i,j The method comprises the steps of carrying out a first treatment on the surface of the Alpha is a sparse adjustment factor. The result is obtained through calculation and deduction:

and will order it from small to large, making c _i,j Learning to satisfy->

And is also provided with

m is the number of neighbors of the adaptive user. And (3) using the formula (3) to obtain an affinity matrix C of the social network, and knowing the connection relation existing between the users according to the affinity matrix. Compared with the fixed connection graph structures such as the full connection graph and the K neighbor graph (obtained by calculation such as cosine similarity, gaussian kernel similarity and the like), the calculation method shown in the formula (3) can be adaptive to the neighbor number m of the user. The affinity matrix constructed in this way can accurately reflect the relationship among users in the social network, and can make up for the disadvantage that the spectrum clustering has high requirement on node similarity, so that the subsequent community discovery effect is better.

In step S4, a community finding algorithm is constructed based on the adaptive loss function and the affinity matrix to determine an objective function.

Specifically, the embodiment of the invention selects the defects that the loss function is constructed by using l1-norm and l2-norm, and the loss function constructed by l1-norm is insensitive to larger outliers and is very sensitive to smaller outliers; l2-norm is the exact opposite, while the adaptive loss function neutralizes both of the problems described above. The function is defined as follows:

after reconstructing the affinity matrix C of the social network through the formula (3), in order to learn the optimal similarity matrix S, the following objective function is proposed:

where L is a laplace matrix of S and rank (L) =n-k is a rank constraint such that the similarity matrix S has k connected components. To avoid the occurrence of abnormal nodes (without any neighbors), constraint 1 is set ^T S _i =1, such that the sum of each row of S is 1.

However, L depends on the target variable S, and the rank constraint is nonlinear, resulting in equation (5) being difficult to solve. So let lambda _i (L) represents the ith small eigenvalue of L if the first k minimum eigenvalues of L satisfy

Rank constraint is satisfied. Given the balance factor ε, equation (5) can be expressed as:

based on Fan theory, it can be seen that

Wherein F= { F ₁ ,f ₂ ,…,f _k And the cluster indication matrix. Substituting formula (7) into formula (6) yields:

equation (8) is the final objective function, the objectiveThe variable S has k connected components, namely, the final community discovery result can be directly obtained by using the algorithm, wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplacian matrix of the target variable, tr () is a trace, and 1 ^T s _i Is the sum of all values of the ith row, S _i,j And I is an identity matrix, which is the value of the ith row and the jth column of S.

In step S5, an alternate iterative method is used to continuously learn the objective function, so as to obtain a connected component between each post under the same topic in the affinity matrix, so as to construct a target similarity matrix to determine a community structure in the community network.

Further, in one embodiment of the present invention, step S5 specifically includes: the method comprises the steps of utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is smaller than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then a target similarity matrix is constructed to determine a community structure in a community network.

Specifically, the objective function is solved using an alternating iterative method, one variable is updated while the other variables are kept unchanged, as follows:

(1) And (5) fixing the clustering indication matrix F, and solving a target variable S.

When the clustering indication matrix F is fixed, the Laplace matrix property is utilized

Equation (8) can be changed to:

definition matrix

Wherein (1)>

The ith column of E, the jth element of which is

Because each row in S has independence, equation (9) can be written in vector form:

wherein s is _i Column vector c formed by the ith row element of the target similarity matrix S _i Column vectors formed for the i-th row element of the affinity matrix; u (u) _i The values of (2) are as follows:

the formula (10) is simplified to obtain:

order the

By Lagrangian multiplier method, there is +.>

Where η, ζ is the Lagrangian multiplier, the former is a scalar and the latter is a vector. According to the KKT conditions:

and due to 1 ^T s _i =1, according to equation 1 in equation (13)

Substituting it into formula (13) to obtainOptimal solution->

The following are provided:

order the

There is +.>

Obtained according to formula (13):

thus only need to determine

The optimal solution can be obtained>

From equation (15) and equation (13) can be obtained:

due to

Thus->

Is->

Define a relation of xi ^* Function of->

When f (xi) ^* ) By means of the method of the formula =0>

Due to xi ^* Not less than 0 and f' (ζ) ^* ) Less than or equal to 0 is a piecewise linear convex function, so f' (ζ) ^* ) Root=0 can be solved using the Newton method, i.e.

(2) Fixing a target variable S, and solving a clustering indication matrix F

When the target variable S is fixed, this corresponds to solving the following problem:

at this time, the optimal solution of the clustering indication matrix F is composed of eigenvectors corresponding to the first k minimum eigenvalues of the Laplace matrix L.

The two processes are continuously iterated until the relative change of the target variable S is less than 10 ^-3 Or the iteration times are more than 150 times, and the iterative learning is completed.

In step S6, a node2vec model is introduced to visualize the community structure, and collective social behavior is extracted according to the distribution condition of nodes in the community structure.

In particular, in order to facilitate understanding and analysis of collective social behavior of a social network, the embodiment of the invention can represent the result of community discovery in a visual result. Therefore, a Node2Vec graph embedded model is introduced, which is a Node vectorization model, local information is obtained from truncated random walks, nodes are regarded as terms, the walks are regarded as sentences to learn potential representation, and the deep walk algorithm is further expanded by changing the generation mode of random walk sequences.

Then extracting the collective social behavior, if the middle node of the community structure is sparse in distribution, covering all nodes in the community by adopting a minimized circle, and taking the node closest to the center of the circle as the collective social behavior; if the nodes in the community structure are densely distributed, the central degree is used for extracting the collective social behavior.

The method for extracting the collective social behavior based on community discovery provided by the embodiment of the invention is further described by two specific embodiments.

Detailed description of the preferred embodiments

10176 posts published by users between 2021, 3, 1 and 2021, 3, 5 and 5 are captured from newwave microblogs by crawlers. The dataset was washed (advertisement removed, repeated, brief, etc. posts) leaving 1584 posts as the initial dataset. The data set can be obtained after the semantic information is subjected to jieba word segmentation, and the data set is as follows:

1: intelligent nondestructive testing method for construction of semiconductors by using … … full-force cooperation of asset investment control groups in Zhengzhou scientific and technological, utility and national province

2: curie-mada compression ignition spark plug mechanical boost … … stability factor lender ten thousand legal routes

3: median safety in second-stage research of novel epidemic … … candidate vaccine of Russian vector science center

4: chinese successfully develops vaccine triphibian storage and transportation … … service Yangzhou base sea, land, air and space multiple long-distance transportation modes

……

1584: evening news yellow early warning careless road icing new law education … … increasingly matches the vegetable market

Generating T topics and topic distribution of each post by using an LDA model, determining the topic number according to the evaluation index modularity Q, selecting the topic number as 30 according to the evaluation index modularity Q as shown in figure 2, and finally obtaining a data matrix (30 multiplied by 1584) as follows:

the affinity matrix (1584×1584) of microblog data can be obtained according to formula (3):

according to the community discovery algorithm based on the adaptive loss function, the similarity matrix (1584×1584) with a specified number of connected components is learned as follows:

the similarity matrix is visualized by introducing a node2vec graph embedded model, and the result is shown in figure 3. Finally, extracting the collective social behavior of the social network by using two different methods, respectively marking the social network by using two different graphs, wherein the method 1 is diamond, and the method 2 is five-pointed star, as shown in fig. 4.

Second embodiment

The method is realized by selecting the existing Ncut, louvain and CAN algorithm and comparing with the extraction method provided by the invention, using WebKB, BBC news report and 20NGs news document data sets as verification data, and using the modularity Q to measure the stability and the cohesive force of the community discovery result, wherein the verification result is shown in figure 5, and the embodiment of the invention CAN be found to be in a leading position in performance.

According to the extraction method of the collective social behavior based on community discovery, which is provided by the embodiment of the invention, the initial data information of the social network is processed with high quality by utilizing the self-adaptive loss function learning similarity matrix, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the extraction of the collective social behavior of the online social network is realized, and the result is enabled to have excellent accuracy and robustness.

Next, an extraction system of collective social behavior based on community discovery according to an embodiment of the present invention is described with reference to the accompanying drawings.

FIG. 6 is an extraction system of collective social behavior based on community discovery in accordance with one embodiment of the invention.

As shown in fig. 6, the system 10 includes: the system comprises an acquisition and preprocessing module 100, a topic distribution generation module 200, an affinity matrix construction module 300, a determination objective function module 400, an iterative learning module 500 and an extraction collective social behavior module 600.

The acquiring and preprocessing module 100 is used for capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation are carried out on the initial data set to obtain the data set; the topic distribution generation module 200 is configured to process the data set by using an LDA model to generate a plurality of topics and topic distribution of each post; the affinity matrix constructing module 300 is configured to construct a similarity calculating function based on sparse representation to solve the similarity between each post and the topic distribution, so as to obtain an affinity matrix. The determine objective function module 400 is configured to construct a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine an objective function. The iterative learning module 500 is configured to continuously learn the objective function by using an alternating iterative method, so as to obtain a connected component between each post under the same topic in the affinity matrix, so as to construct a target similarity matrix to determine a community structure in the community network. The collective social behavior extraction module 600 is used for introducing a node2vec model to visualize a community structure, and extracting collective social behaviors according to the distribution situation of nodes in the community structure.

Further, in one embodiment of the invention, the affinity matrix is:

l2-norm distributed for the subjects of nodes i and j.

Further, in one embodiment of the invention, the objective function is:

min _S,F ||C ^(v) -S|| _σ +εTr(F ^T LF)

s.t.1 ^T s _i ＝1,s _i,j ≥0,F ^T F＝I

Further, in one embodiment of the present invention, the iterative learning module 500 is specifically configured to:

the method comprises the steps of utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is smaller than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then a target similarity matrix is constructed to determine a community structure in a community network.

Further, in one embodiment of the present invention, the method for extracting the collective social behavior in the collective social behavior module 600 is as follows: if the middle node of the community structure is sparse in distribution, covering all nodes in the community by adopting a minimized circle, and taking the node closest to the center of the circle as a collective social behavior; if the nodes in the community structure are densely distributed, the central degree is used for extracting the collective social behavior.

It should be noted that the foregoing explanation of the embodiment of the method for extracting the collective social behavior based on the community discovery is also applicable to the system of the embodiment, and will not be repeated herein.

According to the extraction system for the collective social behavior based on community discovery, provided by the embodiment of the invention, the similarity matrix is learned by utilizing the self-adaptive loss function, the initial data information of the social network is processed with high quality, the reconstruction and community discovery of the social network are completed, the output community structure is ensured to have higher cohesiveness and stability, the extraction of the collective social behavior of the online social network is realized, and the result is enabled to have excellent accuracy and robustness.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The extraction method of the collective social behavior based on community discovery is characterized by comprising the following steps of:

step S1, capturing posts published by a plurality of users in a social network as an initial data set, and cleaning and word segmentation are carried out on the initial data set to obtain a data set;

s2, processing the data set by using an LDA model to generate a plurality of topics and topic distribution of each post;

step S3, constructing a similarity calculation function based on sparse expression, and solving the similarity between each post and the topic distribution to obtain an affinity matrix, wherein the affinity matrix is as follows:

l2-norm distributed for the subjects of nodes i and j;

step S4, constructing a community discovery algorithm based on the adaptive loss function and the affinity matrix to determine an objective function, wherein the objective function is:

min _S,F ||C ^(v) -S|| _σ +εTr(F ^T LF)

s.t.1 ^T s _i ＝1,s _i,j ≥0,F ^T F＝I

wherein S is a target variable, C is an affinity matrix, sigma is an adaptive parameter, epsilon is a balance factor, F is a clustering indication matrix, L is a Laplace matrix of the target variable, tr () is a trace, 1 ^T s _i Is the sum of all values of the ith row, S _i,j The value of the row j is S, and I is a cell matrix;

step S5, continuously learning the objective function by using an alternate iteration method to obtain a connected component between each post under the same theme in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network, wherein the method specifically comprises the following steps of: utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then the target similarity matrix is constructed to determine a community structure in a community network;

step S6, a node2vec model is introduced to visualize the community structure, and collective social behavior is extracted according to the distribution condition of nodes in the community structure, wherein if the distribution of the nodes in the community structure is sparse, a minimized circle is adopted to cover all nodes in the community, and the node closest to the center of the circle is taken as the collective social behavior; and if the nodes in the community structure are densely distributed, extracting the collective social behavior by using centrality.

2. A community discovery-based extraction system for collective social behavior, comprising:

the acquisition and preprocessing module is used for capturing posts published by a plurality of users in the social network as an initial data set, and cleaning and word segmentation are carried out on the initial data set to obtain a data set;

the topic distribution generation module is used for processing the data set by utilizing an LDA model to generate a plurality of topics and topic distribution of each post;

an affinity matrix building module, configured to construct a similarity calculation function based on sparse representation, to solve the similarity between each post and the topic distribution, so as to obtain an affinity matrix, where the affinity matrix is:

l2-norm distributed for the subjects of nodes i and j;

the objective function determining module is used for constructing a community finding algorithm based on the adaptive loss function and the affinity matrix to determine an objective function, wherein the objective function is as follows:

min _S,F ||C ^(v) -S|| _σ +εTr(F ^T LF)

s.t.1 ^T s _i ＝1,s _i,j ≥0,F ^T F＝I

the iterative learning module is used for continuously learning the objective function by using an alternate iterative method to obtain a connected component between each post under the same theme in the affinity matrix so as to construct a target similarity matrix to determine a community structure in a community network, and is specifically used for: utilizing an alternate iteration method, firstly fixing a clustering indication matrix to solve a target variable, and then fixing the target variable to solve the clustering indication matrix until the relative change of the target variable is less than 10 ^-3 Or the iteration times are more than 150 times, so that connected components among posts under the same theme are obtained, and then the target similarity matrix is constructed to determine a community structure in a community network;

the system comprises a community structure acquisition module, a community structure generation module and a community structure generation module, wherein the community structure acquisition module is used for acquiring a community structure, and acquiring a community structure; and if the nodes in the community structure are densely distributed, extracting the collective social behavior by using centrality.