CN114020999A

CN114020999A - Community structure detection method and system for movie social network

Info

Publication number: CN114020999A
Application number: CN202111221461.8A
Authority: CN
Inventors: 杜航原; 姚倩; 白亮
Original assignee: Shanxi University
Current assignee: Shanxi University
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-02-08

Abstract

The invention discloses a community structure detection method and a community structure detection system for a movie social network, wherein the method comprises the following steps: acquiring a user data set, constructing a movie social network structure according to the concern relationship among users, taking the film watching data of the users as user node attributes, and establishing an adjacency matrix and an attribute matrix based on the movie social network structure and the user node attributes; based on the established adjacency matrix and attribute matrix, a film social network community structure detection model is established by using an automatic graph encoder; designing a joint optimization objective function for the constructed film social network community structure detection model, and performing model training by minimizing the joint optimization objective function; and detecting the community structure of the movie social network by using the trained detection model of the community structure of the movie social network, and outputting the detection result of the community structure of the movie social network. The method and the system can effectively and reliably divide the community structure in the movie social network.

Description

Community structure detection method and system for movie social network

Technical Field

The invention relates to the technical field of data mining, in particular to a community structure detection method and system for a movie social network.

Background

With the rapid development and wide application of computer network technology, more and more social networking platforms, such as Facebook, tremble, microblog and the like, appear in the internet field, and the social networking platforms rapidly develop and rise and gradually become an indispensable part of people's social life. The social network with large scale and various forms is generated based on different social platforms, the social network reflects the interactive relationship among social individuals, and the convenience of establishing connection and exchanging information by people is greatly improved. Movie social networks, a common virtual social network, have become the most popular social platform for tens of millions of movie lovers. For example, the bean is a community website, provides information on books, movies, music, and other works, and is a website with a distinctive feature in the web2.0 website. The broad bean movie is a product under the broad bean flag, is the biggest movie sharing and commenting community in China, and converges tens of millions of users of favorite movies, and the users establish contact through mutual attention, so that information transfer is realized. There is often an associative relationship between users having the same or similar interests, thus gathering together to form a community. The users in the same community are closely connected and frequently interacted, so that information transmission is facilitated, and interest communication of the users is facilitated. The method and the system for detecting the community structure in the film social network are beneficial to research on relevant tasks such as user interest analysis, interest community analysis and user film watching behavior prediction by researchers, can help websites to push interested films for users in time, and have important commercial value for accurate marketing of hospital line positioning user requirements.

Social networks in the real world contain rich node attribute information, and the attribute information also has a positive effect on the formation of community structures. The early community discovery methods mainly included: the method comprises a graph segmentation method, a hierarchical clustering method, a modularity optimization method and a label propagation method, wherein the methods generally discover communities based on the topological structure of a network, and ignore the important role of node attributes in community structure formation. Therefore, the invention provides a method and a system which can effectively fuse space structure information and node attribute information and realize reliable division of community structures in a movie social network.

Disclosure of Invention

The invention aims to provide a method and a system for detecting a community structure of a movie social network, which can effectively and reliably divide the community structure in the movie social network.

To solve the above technical problem, an embodiment of the present invention provides the following solutions:

in one aspect, a method for detecting a community structure of a movie social network is provided, which includes the following steps:

s10, acquiring a user data set, constructing a movie social network structure according to the concern relationship among users, taking the film watching data of the users as user node attributes, and establishing an adjacency matrix and an attribute matrix based on the movie social network structure and the user node attributes; the viewing data includes: movie name, movie genre, starring actors, region;

s20, constructing a film social network community structure detection model by using an automatic graph encoder based on the established adjacency matrix and attribute matrix;

s30, designing a joint optimization objective function for the constructed film social network community structure detection model, and performing model training by minimizing the joint optimization objective function;

and S40, detecting the community structure of the movie social network by using the trained detection model of the community structure of the movie social network, and outputting the detection result of the community structure of the movie social network.

Preferably, the step S10 specifically includes the following steps:

s11, acquiring a user data set from the movie social platform, constructing a movie social network structure according to the concern relationship among users, representing social network users as user nodes in the network, representing the concern relationship among the users as edges among the user nodes, and counting the number of videos of the usersAccording to the attribute as the user node; let network be G ═ V, { E, X, and₁,v₂,…,v_Ndenotes the set of N user nodes in the network, where the nth user is denoted as user node v_n，1≤n≤N；E＝{e₁,e₂,…,e_MDenotes the M edges existing between user nodes, where the M-th edge is denoted as e_mM is more than or equal to 1 and less than or equal to M; x is a user node attribute matrix of dimension NxD, the nth row X of which_n＝[x_n1,x_n2,…,x_nD]Representing user nodes v in a network_nOf D attributes, where element x_ndRepresenting a user node v_nD is more than or equal to 1 and less than or equal to D;

s12, constructing an N × N-dimensional adjacency matrix with network G ═ V, E, X, and marking the adjacency matrix as a, where the value of each element in a represents the adjacency relationship between two corresponding user nodes in network G ═ V, E, X, that is, element a in the ith row and jth column in a_ijRepresenting the ith user node v in the network_iAnd the jth user node v_j1 ≦ i ≦ N,1 ≦ j ≦ N, if v_iAnd v_jThere is an edge in between, then A_ij1, otherwise A_ij＝0。

Preferably, the movie social network community structure detection model constructed in step S20 includes four parts, namely an encoder, a structure decoder, an attribute decoder, and a modularity optimizer; the step S20 specifically includes the following steps:

s21, the encoder encodes the movie social network G ═ V, E, X as an embedded vector in a low dimensional space, and uses a graph attention network with the same 2-layer structure as an encoder

As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

are respectively user nodes v_iThe low-dimensional embedded vector is obtained after passing through the attention network of the first layer diagram and the attention network of the second layer diagram; s is a non-linear activation function; n is a radical of_iRepresenting a user node v_iThe neighbor node of (2); alpha is alpha_ijAn attention coefficient, called normalized, defined by equation (4); w⁽⁰⁾And W⁽¹⁾Respectively determining connection weight matrixes in the attention network of the first layer of graph and the attention network of the second layer of graph, wherein the connection weight matrixes are undetermined parameters and are determined by inputting a movie social network in the subsequent steps; z is a set of encoded embedded vectors, Z_tRepresenting by user node v_tEncoding the resulting embedded vector in a low-dimensional space, an

In the formula, LeakyReLU () is a nonlinear activation function, and is defined by formula (5); a is a weight vector; w is a weight matrix; x is the number of_iRepresenting a user node v_i(ii) an attribute of (d); | | is a join operation;

in the formula, lambda is a negative input slope and is 0.2;

s22, the structure decoder reconstructs the embedded vector set Z into a network relation

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

using the cross-entropy function as a loss function for the structure reconstruction, defined by equation (7):

s23, the attribute decoder uses a symmetrical 2-layer graph attention network in the encoder to reconstruct the attribute information of the user nodes, each layer uses the representation of the neighbor user nodes to reconstruct the attribute of the nodes, and the decoding process can be formalized as follows:

wherein the content of the first and second substances,

and

respectively low-dimensional embedding obtained after passing through a first layer graph attention network and a second layer graph attention network in the attribute decoderVector quantity; s is a non-linear activation function; n is a radical of_iRepresenting a user node v_iThe neighbor node of (2);

attention coefficient called normalized;

and

respectively connecting weight matrixes in the attention network of the first layer graph and the attention network of the second layer graph;

the output of the last layer of the attribute decoder is used as a user node v_iReconstructed property of

Namely:

the loss function for attribute reconstruction is defined as equation (11):

s24, detecting the social network community structure by combining the modularity optimizer; classifying the low-dimensional embedded vectors Z of the nodes by using a softmax function to obtain a community distribution matrix P:

P＝softmax(Z) (12)

in order to make the obtained interior of the community more compact, the community structure is optimized by combining the modularity; the modularity function is defined as the difference between the number of edges in the community and the number of edges expected on all user node pairs, expressed as:

wherein, c_iRepresenting a user node v_iCommunity assigned if c_i＝c_jThen, delta (c)_i,c_j) Is 1, otherwise is 0,

is a user node v_iAnd a user node v_jDesired number of edges, k, between_iIs a user node v_iThe degree of (a) is greater than (b),

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

where P is the community allocation matrix, B is the modularity matrix, and B ═ B_ij，

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

Preferably, the step S30 specifically includes the following steps:

s31, jointly training the four parts of the encoder, the structure decoder, the attribute decoder and the modularity optimizer, and defining a joint optimization objective function as shown in formula (16):

L＝L_a+L_x-βL_mod (16)

wherein L is_aIs the loss of structural reconstruction, L_xIs the attribute reconstruction loss, L_modIs the loss of modularity, beta is a hyper-parameter, which is used to measure the importance of the loss of modularity;

and S32, performing back propagation by using a gradient method, and updating the connection weight matrix in the film social network community structure detection model.

Preferably, the step S40 specifically includes the following steps:

s41, dividing users with similar interests in the movie social network into the same community; user node v_iThe community tag t of (a) is obtained by the formula (17):

wherein p is_iuIs an element in the community distribution matrix P and represents a user node v_iProbability of belonging to community u;

and S42, sending the detection result of the community structure of the social network of the movie to related analysis personnel or scientific research personnel for carrying out related tasks including user interest analysis, interest community analysis, user film watching behavior prediction and diversified film recommendation.

On one hand, the community structure detection system of the movie social network comprises a movie social network structure construction and adjacency matrix and attribute matrix construction unit, a movie social network community structure detection model training unit and a movie social network community structure detection result output unit, wherein the movie social network structure construction and adjacency matrix and attribute matrix construction unit is connected with a computer processor and a memory;

the movie social network structure construction and adjacency matrix and attribute matrix construction unit is configured to execute step S10: acquiring a user data set, constructing a movie social network structure according to the attention relationship among users, taking the film watching data of the users as user node attributes, establishing an adjacency matrix and an attribute matrix based on the movie social network structure and the user node attributes, and loading the adjacency matrix and the attribute matrix into a computer memory; the viewing data includes: movie name, movie genre, starring actors, region;

the movie social network community structure detection model training unit is configured to perform steps S20 and S30: based on the established adjacency matrix and attribute matrix, a film social network community structure detection model is established by using an automatic graph encoder; designing a joint optimization objective function for the constructed film social network community structure detection model, and performing model training by minimizing the joint optimization objective function;

the movie social network community structure detection result output unit is configured to execute step S40: and detecting the community structure of the movie social network by using the trained detection model of the community structure of the movie social network, and outputting the detection result of the community structure of the movie social network.

Preferably, the movie social network structure construction and adjacency matrix and attribute matrix construction unit is specifically configured to perform the following steps:

s11, acquiring a user data set from the movie social platform, constructing a movie social network structure according to the concern relationship among users, representing social network users as user nodes in the network, representing the concern relationship among the users as edges among the user nodes, and taking the film watching data of the users as the attributes of the user nodes; let network be G ═ V, { E, X, and₁,v₂,…,v_Ndenotes the set of N user nodes in the network, where the nth user is denoted as user node v_n，1≤n≤N；E＝{e₁,e₂,…,e_MDenotes the M edges existing between user nodes, where the M-th edge is denoted as e_mM is more than or equal to 1 and less than or equal to M; x is a user node attribute matrix of dimension NxD, the nth row X of which_n＝[x_n1,x_n2,…,x_nD]Representing user nodes v in a network_nOf D attributes, where element x_ndRepresenting a user node v_nD is more than or equal to 1 and less than or equal to D;

s12, constructing an N × N-dimensional adjacency matrix with network G ═ V, E, X, and marking the adjacency matrix as a, where the value of each element in a represents the adjacency relationship between two corresponding user nodes in network G ═ V, E, X, that is, element a in the ith row and jth column in a_ijRepresenting the ith user node in the networkv_iAnd the jth user node v_j1 ≦ i ≦ N,1 ≦ j ≦ N, if v_iAnd v_jThere is an edge in between, then A_ij1, otherwise A_ij＝0。

Preferably, the constructed movie social network community structure detection model comprises four parts, namely an encoder, a structure decoder, an attribute decoder and a modularity optimizer, and the movie social network community structure detection model training unit is specifically configured to execute the following steps:

As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

in the formula, lambda is a negative input slope and is 0.2;

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

wherein the content of the first and second substances,

and

respectively obtaining low-dimensional embedded vectors after passing through a first layer graph attention network and a second layer graph attention network in the attribute decoder; s is a non-linear activation function; n is a radical of_iRepresenting a user node v_iThe neighbor node of (2);

attention coefficient called normalized;

and

Namely:

the loss function for attribute reconstruction is defined as equation (11):

P＝softmax(Z)(12)

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

Preferably, the movie social network community structure detection model training unit is further configured to perform the following steps:

L＝L_a+L_x-βL_mod (16)

Preferably, the movie social network community structure detection result output unit is specifically configured to execute the following steps:

wherein p is_iuIs a company of JapanElements in the region allocation matrix P representing user nodes v_iProbability of belonging to community u;

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

(1) according to the community structure detection method and system for the movie social network, provided by the invention, the adjacency matrix and the attribute matrix of the movie social network are constructed, so that not only can the correlation information between users be recorded, but also the attribute information of the users is effectively utilized, and the detection result of the community structure of the movie social network with higher robustness and interpretability can be obtained.

(2) According to the community structure detection method and system for the movie social network, provided by the invention, the detection model of the community structure of the movie social network is established by utilizing the automatic encoder structure of the graph, so that the model has certain generating capacity, and the detection process of the community structure of the movie social network has stronger generalization capacity.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a community structure detection method for a social network of movies according to an embodiment of the present invention;

FIG. 2 is a block diagram of a movie social network community structure detection model provided by an embodiment of the present invention;

fig. 3 is a structural diagram of a community structure detection system of a movie social network according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention firstly provides a community structure detection method of a movie social network, as shown in fig. 1, the method comprises the following steps:

s10, acquiring a user data set, constructing a movie social network structure according to the concern relationship among users, taking the film watching data (including movie names, movie types, main actors, regions and the like) of the users as user node attributes, and establishing an adjacency matrix and an attribute matrix based on the movie social network structure and the user node attributes.

The method specifically comprises the following steps:

And S20, constructing a film social network community structure detection model by using the automatic graph encoder based on the established adjacency matrix and attribute matrix.

The movie social network community structure detection model constructed in the step comprises four parts, namely an encoder, a structure decoder, an attribute decoder and a modularity optimizer, and specifically comprises the following steps as shown in fig. 2:

As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

in the formula, lambda is a negative input slope and is 0.2;

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

wherein the content of the first and second substances,

and

attention coefficient called normalized;

and

Namely:

the loss function for attribute reconstruction is defined as equation (11):

P＝softmax(Z) (12)

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

S30, designing a joint optimization objective function for the constructed film social network community structure detection model, and performing model training by minimizing the joint optimization objective function.

The method specifically comprises the following steps:

L＝L_a+L_x-βL_mod (16)

The method specifically comprises the following steps:

s41, dividing users with similar interests in the movie social network into the same communityPerforming the following steps; user node v_iThe community tag t of (a) is obtained by the formula (17):

In order to verify the effectiveness and the advancement of the method, the method is compared with several classical community detection methods, the comparison methods comprise an Infomap method based on information theory, a Label Propagation (LPA) method, a graph self-encoder (GAE) method and an unsupervised community discovery (JGE-CD) method based on GCN, the average accuracy and the normalized mutual information of 20 experiments are used as evaluation indexes, the matching results are compared and analyzed, and the comparison results are shown in Table 1:

TABLE 1 comparison of results

As can be seen from the results in the table, the method can obtain better accuracy and normalized mutual information when the community structure detection is carried out on the movie social network.

Correspondingly, an embodiment of the present invention further provides a community structure detection system for a movie social network, as shown in fig. 3, the system includes: the system comprises a movie social network structure construction and adjacency matrix and attribute matrix construction unit, a movie social network community structure detection model training unit and a movie social network community structure detection result output unit, wherein the movie social network structure construction and adjacency matrix and attribute matrix construction unit is connected with a computer processor and a memory;

Further, the movie social network structure construction and adjacency matrix and attribute matrix construction unit is specifically configured to perform the following steps:

Further, the constructed movie social network community structure detection model comprises four parts, namely an encoder, a structure decoder, an attribute decoder and a modularity optimizer, and the movie social network community structure detection model training unit is specifically used for executing the following steps:

s21, the encoder encodes the film social network G ═ V, E, X into an embedded vector in a low-dimensional space, uses a graph attention network with the same 2-layer structure as an encoder, and uses X_i＝h_i ⁽⁰⁾As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

in the formula, lambda is a negative input slope and is 0.2;

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

wherein the content of the first and second substances,

and

attention coefficient called normalized;

and

Namely:

the loss function for attribute reconstruction is defined as equation (11):

P＝softmax(Z) (12)

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

Further, the movie social network community structure detection model training unit is further configured to perform the following steps:

L＝L_a+L_x-βL_mod (16)

Further, the movie social network community structure detection result output unit is specifically configured to execute the following steps:

Compared with the prior art, the method and the system for detecting the community structure of the movie social network, provided by the invention, construct the adjacency matrix and the attribute matrix of the movie social network, can record the association information among users, effectively utilize the attribute information of the users, and are beneficial to obtaining a detection result of the community structure of the movie social network with stronger robustness and interpretability. In addition, the invention utilizes the automatic encoder structure of the image to establish the detection model of the community structure of the movie social network, so that the model has certain generating capacity, and the detection process of the community structure of the movie social network has stronger generalization capacity.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A community structure detection method of a movie social network is characterized by comprising the following steps:

2. The method for detecting community structure of social networking services of claim 1, wherein the step S10 specifically comprises the following steps:

3. The community structure detection method for social networks of movies as claimed in claim 2, wherein the community structure detection model for social networks of movies constructed in the step S20 comprises four parts, namely an encoder, a structure decoder, an attribute decoder, and a modularity optimizer; the step S20 specifically includes the following steps:

As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

in the formula, lambda is a negative input slope and is 0.2;

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

wherein the content of the first and second substances,

and

attention coefficient called normalized;

and

Namely:

the loss function for attribute reconstruction is defined as equation (11):

P＝softmax(Z) (12)

wherein, c_iRepresenting a user node v_iCommunity assigned if c_i＝c_jThen, delta (c)_i,c_j) Is/are as followsThe value is 1, otherwise 0,

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

4. The method for detecting community structure of social networking services of claim 3, wherein the step S30 specifically comprises the following steps:

L＝L_a+L_x-βL_mod (16)

wherein L is_aIs the loss of structural reconstruction, L_xIs attribute reconstruction loss，L_modIs the loss of modularity, beta is a hyper-parameter, which is used to measure the importance of the loss of modularity;

5. The method for detecting community structure of social networking services of claim 1, wherein the step S40 specifically comprises the following steps:

6. A community structure detection system of a movie social network is characterized by comprising a movie social network structure construction and adjacency matrix and attribute matrix construction unit, a movie social network community structure detection model training unit and a movie social network community structure detection result output unit, wherein the movie social network structure construction and adjacency matrix and attribute matrix construction unit is connected with a computer processor and a memory;

7. The community structure detection system of a social network of movies as claimed in claim 6, wherein the movie social network structure construction and adjacency matrix and attribute matrix construction unit is specifically configured to perform the following steps:

8. The community structure detection system of the movie social network, according to claim 7, wherein the constructed movie social network community structure detection model comprises four parts, namely an encoder, a structure decoder, an attribute decoder, and a modularity optimizer, and the movie social network community structure detection model training unit is specifically configured to perform the following steps:

As input, the formalization of the encoding process is shown as follows:

wherein the content of the first and second substances,

and

in the formula, lambda is a negative input slope and is 0.2;

Namely, it is

The structural decoder definition is shown in equation (6):

wherein δ () is a dirac function;

wherein the content of the first and second substances,

and

attention coefficient called normalized;

and

Namely:

the loss function for attribute reconstruction is defined as equation (11):

P＝softmax(Z) (12)

is the total number of edges in the social network;

the matrix form of the modularity can be expressed as:

To optimize equation (14), the modularity penalty is defined:

wherein Tr () is the trace of the matrix, Tr (P)^TP)＝N。

9. The community structure detection system for social networking movie according to claim 8, wherein the community structure detection model training unit for social networking movie is further configured to perform the following steps:

L＝L_a+L_x-βL_mod (16)

10. The community structure detection system of a social network of movies according to claim 6, wherein the community structure detection result output unit of the social network of movies is specifically configured to execute the following steps: