CN113409157A

CN113409157A - Cross-social network user alignment method and device

Info

Publication number: CN113409157A
Application number: CN202110545701.3A
Authority: CN
Inventors: 蔡晓东; 王鑫岚
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-09-17
Anticipated expiration: 2041-05-19
Also published as: CN113409157B

Abstract

The invention provides a method and a device for aligning users across social networks, wherein the method comprises the following steps: importing social network user data, constructing a training model for feature extraction, optimizing the training model according to the social network user data to obtain an optimized model, importing the to-be-tested social network user data, and aligning the to-be-tested social network user data through the optimized model to obtain a user alignment result. The method can extract the distinguishing semantic features, reduce the sparsity of network structure information and greatly improve the accuracy of user alignment across social networks.

Description

Cross-social network user alignment method and device

Technical Field

The invention mainly relates to the technical field of social network analysis, in particular to a cross-social network user alignment method and device.

Background

The social platforms with different functions greatly enrich the life of people, but various information of users can not be integrated after being scattered in the social platforms. Because the root of the social platforms is the user, each social platform wants to monopolize the user and does not want the user to run away, and therefore an information sharing mechanism between the social platforms is lost, and data fragmentation of the user is caused. The fragmentation of the user data causes the user to lose the previous social data when joining a new social platform, and the user needs to spend time again to construct a social circle, so that the user experience is poor. For the social platform, the new user does not have previous social data when joining, cannot know the social relationship and the preference of the new user, cannot perform effective personalized recommendation, and also brings disadvantages for the social platform. User alignment across social networks is achieved by matching different accounts belonging to the same person in the real world in multiple networks, which is of great significance to many aspects of research or applications in the field of social networks.

While the research directions on social networks can be roughly divided into three categories: the method comprises the following steps of user alignment technology research based on user attributes, user alignment technology research based on a network structure, and multi-factor user alignment technology research combining attributes and the network structure. In the multi-factor user alignment technology research combining attributes and network structures, no matter user alignment is performed through the network structures or based on the attributes, good effects are achieved in recent years. Since each approach has its own advantages, there is naturally an attempt by the learner to combine the various approaches together for user alignment. In the prior art, the user alignment is realized by using a social network structure and user profile attributes; some LHNE models are used for cross-network user alignment tasks, and the models simultaneously utilize network structures and user text information; some of the devices use a deep neural network to complete the user alignment task by simultaneously using the network structure and the user position information. However, the method does not extract the distinguishing semantic features, and the sparsity of the network structure information greatly affects the user alignment accuracy.

Disclosure of Invention

The invention provides a cross-social-network user alignment method and device aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: a method of user alignment across social networks, comprising the steps of:

importing social network user data, constructing a training model for feature extraction, and optimizing the training model according to the social network user data to obtain an optimized model;

and importing the social network user data to be tested, and aligning the social network user data to be tested through the optimization model to obtain a user alignment result.

The invention has the beneficial effects that: the training model for feature extraction is built, the optimization model is obtained through optimization processing of the training model according to the social network user data, the user alignment result is obtained through alignment processing of the optimization model on the to-be-detected social network user data, the distinguishing semantic features can be extracted, sparsity of network structure information is reduced, and accuracy of cross-social network user alignment is greatly improved.

Drawings

FIG. 1 is a schematic flowchart of a cross-social-network user alignment method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a device for aligning users across social networks according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a schematic flowchart of a cross-social-network user alignment method according to an embodiment of the present invention.

As shown in FIG. 1, a method for aligning users across social networks comprises the following steps:

In the embodiment, the training model for feature extraction is constructed, the optimization model is obtained by optimizing the training model according to the social network user data, and the user alignment result is obtained by aligning the to-be-detected social network user data through the optimization model, so that the distinguishing semantic features can be extracted, the sparsity of network structure information is reduced, and the accuracy of cross-social network user alignment is greatly improved.

Optionally, as an embodiment of the present invention, the social network user data includes a plurality of social network user sub-data carrying preset user numbers, and each of the social network user sub-data includes user attribute information, structural data, and a true value; the process of constructing the training model for feature extraction includes the following steps:

obtaining user attribute information from each piece of social network user subdata, and respectively extracting attribute features of each piece of user attribute information to obtain an attribute feature vector group corresponding to each preset user number;

obtaining structural data from each piece of social network user subdata, and respectively extracting structural features of each piece of structural data to obtain structural feature vectors corresponding to each preset user number;

obtaining real values from each social network user subdata, and performing fusion loss calculation on all attribute feature vector groups, all real values and all structure feature vectors to obtain a fusion loss function;

the process of optimizing the training model according to the social network user data to obtain an optimized model comprises the following steps:

and updating parameters of the training model according to the fusion loss function to obtain an optimized model.

Specifically, step S1: extracting attribute feature vectors of the social network user data containing attribute information and structural information to obtain attribute feature vectors of user nodes (namely the attribute feature vector group); step S2: carrying out structural feature vector extraction on the social network user data containing attribute information and structural information to obtain a structural feature vector of a user node; step S3: with the outputs of step S1 and step S2 as inputs, the attribute feature vectors (i.e., the set of attribute feature vectors) and the structure feature vectors are fused, the user alignment result is determined, compared with the ground truth value (i.e., the trueness) and the fusion loss is calculated.

In the above embodiment, the attribute features of each user attribute information are extracted to obtain the attribute feature vector group corresponding to each preset user number, the structural feature of each structural data is extracted to obtain the structural feature vector corresponding to each preset user number, and the fusion loss function is obtained by performing fusion loss calculation on all the attribute feature vector groups, all the true values, and all the structural feature vectors together.

Optionally, as an embodiment of the present invention, the user attribute information includes neighbor node information and a plurality of user attribute parameters, and the plurality of user attribute information corresponds to each other; the process of respectively extracting the attribute features of each user attribute information to obtain the attribute feature vector group corresponding to the preset user number comprises the following steps:

respectively extracting word features of the user attribute parameters corresponding to the preset user numbers to obtain a plurality of word feature vectors corresponding to the user attribute parameters;

respectively carrying out information balance processing on each word feature vector to obtain a word balance vector corresponding to the word feature vector;

respectively carrying out local feature extraction on the word balance vectors through a TextCNN convolutional network to obtain a local feature vector group corresponding to the user attribute parameters;

evaluating each local feature vector group respectively to obtain semantic feature vectors corresponding to the user attribute parameters;

respectively fusing a plurality of semantic feature vectors corresponding to the preset user number through a first type to obtain a fusion attribute feature vector corresponding to the preset user number, wherein the first type is as follows:

wherein z is_ikA semantic feature vector, gamma, for the k-th attribute of a preset user number i_k∈R，γ_kFor the respective weighting parameter to be learned, v_iM is the number of semantic feature vectors corresponding to the preset user number;

obtaining a preset user number adjacent to the preset user number according to the neighbor node information, and taking a fusion attribute feature vector corresponding to the adjacent preset user number as a neighbor attribute feature vector;

and respectively carrying out vector fusion on the fusion attribute feature vector corresponding to each pair of corresponding user attribute information and the plurality of neighbor attribute feature vectors corresponding to the respective fusion attribute feature vectors to obtain an attribute feature vector group corresponding to the preset user number.

It should be understood that two of the user attribute information pairs are known to correspond to the corresponding relationship, and the user attribute information pairs are known in the social network user data, for example, A, B, C pieces of user attribute information are known in the social network user data, where the a user attribute information corresponds to the B user attribute information, and the B user attribute information corresponds to the C user attribute information.

It should be understood that the TextCNN convolutional network is a convolutional neural network that extracts semantic features of different granularity using convolution kernels of different sizes.

It should be understood that the semantic feature vectors of different attributes are taken together as input, and the fused attribute features of the user nodes (i.e., the fused attribute feature vectors) are obtained through an attention mechanism.

It should be understood that all word embedding (i.e. the word balance vector) of the attribute captures local features through TextCNN convolutional networks containing different granularity sizes, and outputs semantic information (i.e. the local feature vector set) of the attribute at different abstraction levels.

In the embodiment, the attribute features of each user attribute information are respectively extracted to obtain the attribute feature vector group corresponding to the preset user number, so that a basis is provided for subsequent data processing, the distinguishing semantic features can be extracted, the sparsity of network structure information is reduced, and the accuracy of user alignment across the social network is greatly improved.

Optionally, as an embodiment of the present invention, the process of respectively performing word feature extraction on the multiple user attribute parameters corresponding to the preset user number to obtain multiple word feature vectors corresponding to the user attribute parameters includes:

respectively carrying out word division on a plurality of user attribute parameters corresponding to the preset user numbers to obtain a plurality of word information corresponding to each user attribute parameter, and converting each word information into a word vector;

respectively carrying out character division on each word information to obtain a plurality of character information corresponding to the word information, and converting each character information into a character vector;

respectively extracting the characteristics of each character vector through a preset one-dimensional convolution layer to obtain a character characteristic vector corresponding to the character vector;

screening each character feature vector through a preset maximum pooling layer, and screening to obtain a plurality of character screening vectors corresponding to the word information;

and respectively carrying out vector splicing on each word vector and the plurality of character screening vectors corresponding to the word vectors to obtain the word characteristic vectors corresponding to the word vectors.

It should be understood that an attribute (i.e., the user attribute parameter) is divided into a word list (i.e., a plurality of the word information), each word (i.e., the word information) is represented as a word embedding (i.e., the word vector) and each word (i.e., the word information) is divided into a character list (i.e., a plurality of the character information), each character (i.e., the character information) is represented as a character embedding (i.e., the character vector), all the character embedding (i.e., the character vector) is subjected to one-dimensional convolution and maximum pooling, and the word embedding (i.e., the word vector) is merged with the pooled character (i.e., the character vector) embedding as a final embedding of one word (i.e., the word feature vector).

In the above embodiment, the word features of the plurality of user attribute parameters corresponding to the preset user numbers are respectively extracted to obtain the plurality of word feature vectors corresponding to the user attribute parameters, so that a data basis is provided for subsequently extracting the distinguishing semantic features, and the accuracy of user alignment across the social network is greatly improved.

Optionally, as an embodiment of the present invention, the step of performing information balancing processing on each word feature vector to obtain a word balance vector corresponding to the word feature vector includes:

respectively carrying out information balance processing on each word feature vector through a second formula to obtain a word balance vector corresponding to the word feature vector, wherein the second formula is as follows:

z＝t⊙g(W_Hh+b_H)+(1-t)⊙h，

where t ═ σ (W)_Th+b_T)，

Wherein, W_HAnd W_TAre all square matrices, b_HAnd b_TAre bias vectors, g is a non-linear function tanh, h is a word feature vector, and z is a word balance vector.

It should be understood that the second equation is a calculation process of the highway network.

It should be understood that word embedding (i.e., the word feature vector) balances word information with character information over the highway network.

In the embodiment, the information balance processing of each word feature vector is performed through the second formula, so that the word balance vector corresponding to the word feature vector is obtained, the distinguishing semantic features can be extracted, and the accuracy of user alignment across social networks is greatly improved.

Optionally, as an embodiment of the present invention, the process of respectively performing evaluation processing on each local feature vector group to obtain a semantic feature vector corresponding to the user attribute parameter includes:

evaluating each local feature vector group respectively through a third formula to obtain semantic feature vectors corresponding to the user attribute parameters, where the third formula is:

wherein the content of the first and second substances,

wherein A is_ij＝α(z_i),z_j∈R^d，

Wherein the content of the first and second substances,

wherein A is_ijIs a matrix of the degree of similarity, and,

multiplication of elements, [;]the vectors are spliced in a row-by-row manner,

for context-critical information, W^1T、W^2T、W^3T∈R^2d×d，b¹、b²、b³∈R^d，W^1T、W^2T、W^3T、b¹、b²And b³Are trainable parameters, sigma is a nonlinear function sigmoid, z_iIs a set of local feature vectors, and,

is a semantic feature vector.

It should be understood that semantic information (i.e., the set of local feature vectors) is subject to a self-attention mechanism to evaluate the importance of each information, resulting in a semantic feature (i.e., the semantic feature vector) for that attribute.

Understandably, R^3dExpressed in dimension 1 x 3d, R^2d×dExpressed as dimension 2d x d, R^dThe expressed dimension is 1 x d, A_ijIs a similarity matrix calculated by α (·).

Understandably, z_jWeighted summation to z_iTo indicate more contextually important information.

In the embodiment, the semantic feature vectors corresponding to the user attribute parameters are obtained by respectively evaluating and processing each local feature vector group through the third formula, so that the importance degree of each piece of information can be evaluated, and the accuracy of user alignment across social networks is greatly improved.

Optionally, as an embodiment of the present invention, the process of performing vector fusion on the fusion attribute feature vector corresponding to each of the two pairs of corresponding user attribute information and the plurality of neighbor attribute feature vectors corresponding to the respective fusion attribute feature vectors to obtain the attribute feature vector group corresponding to the preset user number includes:

performing vector fusion on the fusion attribute feature vector corresponding to each of the two pairs of corresponding user attribute information and the plurality of neighbor attribute feature vectors corresponding to the respective fusion attribute feature vectors respectively through a fourth formula to obtain a first attribute feature vector corresponding to the preset user number and a second attribute feature vector corresponding to the first attribute feature vector, where the fourth formula is:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein e is_jiTo the attention coefficient, v_iFor the first fused attribute feature vector, u_iFor the second fused attribute feature vector, v_jIs the jth neighbor attribute feature vector u corresponding to the first fusion attribute feature vector_jIs the j-th neighbor attribute feature vector corresponding to the second fusion attribute feature vector, a_jiTo normalize the coefficients, σ (-) is a non-linear function,

is a first attribute feature vector, and is,

is a second attribute feature vector, W^TAnd b are model parameters to be learned;

and obtaining an attribute feature vector group corresponding to the preset user number according to each first attribute feature vector and the second attribute feature vector corresponding to the first attribute feature vector.

It should be understood that the fusion attribute features (i.e. the fusion attribute feature vectors) of the user pair to be predicted (i.e. the user attribute information corresponding to each pair) and the neighbor node pair thereof are used as inputs, the influence of different neighbors on the attribute features is fused through an attention mechanism, and the final attribute feature vector (i.e. the attribute feature vector group) of the user to be predicted is output.

Specifically, the calculation process of the attention mechanism is as follows:

e_ji＝g(v_j,u_j,v_i,u_i)，

wherein e_jiFor attention coefficients, the neighbors are representedTo (v)_j,u_j) For predicting the user v to be predicted_i(i.e., the first fused attribute feature vector), u_i(i.e., the second fused attribute feature vector) whether or not it is a contribution of the same person, g: R^K×R^K×R^K×R^K→R。a_jiExpressing the normalization coefficients of all the neighbor pairs, wherein the normalization coefficients are used for calculating the linear combination of the user node to be predicted and the feature vectors of the neighbor nodes, applying a nonlinear function sigma (-) on the linear combination, and calculating the final feature vector of the attribute of the user pair to be predicted

(i.e. the first attribute feature vector and the second attribute feature vector).

Considering the individual characteristics of each user in the neighbor pair, the similarity between two users in the neighbor pair, and the relationship between the neighbor pair and the focus pair, three specific attention mechanisms of individual attention, differential attention, and relationship attention are proposed:

e_ji＝W^T[v_j；u_j]+b，

e_ji＝W^T|v_j-u_j|+b，

e_ji＝W^T||v_j-v_i|-|u_j-u_i||+b，

where W and b are the model parameters to be learned. Consider that three attention mechanisms unify the unified attention used in the model in one equation:

in the above embodiment, the fusion attribute feature vectors corresponding to each of two pairs of corresponding user attribute information and the vectors of multiple neighbor attribute feature vectors corresponding to each fusion attribute feature vector are fused to obtain the attribute feature vector group corresponding to the preset user number, so that the influence of neighbor nodes is fused, the distinguishing semantic features can be extracted, and the accuracy of user alignment across social networks is greatly improved.

Optionally, as an embodiment of the present invention, the process of respectively performing structural feature extraction on each piece of structural data to obtain a structural feature vector corresponding to each preset user number includes:

respectively converting the adjacent matrixes of the structural data to obtain grid structural data corresponding to the preset user numbers;

respectively carrying out normalization processing on each grid structure data to obtain normalization structure data corresponding to each preset user number;

and respectively extracting the characteristics of the normalized structure data through a preset convolutional neural network to obtain a structure characteristic vector corresponding to each preset user number.

It should be understood that the adjacency matrix is a two-dimensional array storing relationship data between nodes in the graph, and if two nodes are connected, the corresponding position number is 1, and the two nodes are not connected to be 0.

Specifically, irregular graph structure data (namely the structure data) in matching social network data is converted into regular grid structure data by utilizing an adjacency matrix; carrying out normalization processing such as sequencing, zero padding and the like on the network structure data; taking a normalized graph (namely the normalized structure data) as input, extracting structure features through a convolutional neural network, and outputting the structure feature vector of the user pair to be predicted.

In the above embodiment, the adjacent matrix of each piece of structure data is converted to obtain the grid structure data corresponding to each preset user number, the normalization processing of each grid structure data is performed to obtain the normalized structure data corresponding to each preset user number, the feature extraction of each normalized structure data is performed through the preset convolutional neural network to obtain the structure feature vector corresponding to each preset user number, so that data support is provided for subsequent processing, the distinguishing semantic features can be extracted, and the accuracy of user alignment across the social network is greatly improved.

Optionally, as an embodiment of the present invention, the process of performing fusion loss calculation on all attribute feature vector groups, all real values, and all structure feature vectors together to obtain a fusion loss function includes:

performing fusion loss calculation on all attribute feature vector groups, all real values and all structural feature vectors together by using a fifth formula to obtain a fusion loss function, wherein the fifth formula is as follows:

Loss＝Loss_CE+λLoss_cos，

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein, Loss is a fusion Loss function, Loss_CEFor cross-entropy Loss, Loss_cosIn order to be a cosine loss,

is a first attribute feature vector, and is,

is the second attribute feature vector, y is the matching score, y_iIs the true value, n is the total number of attribute feature vector groups,

is the difference between attribute feature vectors, s_iIs a structural feature vector.

As should be appreciated, the first and second members,

for predicting a match score.

Specifically, the attribute feature vector group and the structure feature vector are spliced to serve as input, and matching scores are predicted; constructing a cosine loss function (namely the cosine loss) according to the cosine similarity, and performing weighted fusion with the cross entropy loss to obtain a fusion loss function; and comparing the obtained matching score with a ground real value (namely the real value) and calculating the fusion loss.

In the embodiment, the fusion loss function is obtained by performing fusion loss calculation on all attribute feature vector groups, all true values and all structural feature vectors together through the fifth formula, so that the loss function is optimized, the distinguishing semantic features can be extracted, and the accuracy of user alignment across the social network is greatly improved.

Optionally, as another embodiment of the present invention, as shown in fig. 2, an apparatus for aligning users across social networks includes:

the model optimization module is used for importing social network user data, constructing a training model for feature extraction, and optimizing the training model according to the social network user data to obtain an optimization model;

and the alignment result obtaining module is used for importing the social network user data to be tested, and performing alignment processing on the social network user data to be tested through the optimization model to obtain a user alignment result.

Optionally, another embodiment of the present invention provides a cross-social-network user alignment apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, which when executed by the processor, implements the cross-social-network user alignment method as described above. The device may be a computer or the like.

Optionally, another embodiment of the invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements a cross-social network user alignment method as described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for aligning users across social networks is characterized by comprising the following steps:

2. The cross-social-network user alignment method of claim 1, wherein the social-network user data comprises a plurality of social-network user sub-data carrying a preset user number, each of the social-network user sub-numbers comprising user attribute information, structural data, and a true value; the process of constructing the training model for feature extraction includes the following steps:

3. The cross-social-network user alignment method according to claim 2, wherein the user attribute information includes neighbor node information and a plurality of user attribute parameters, the plurality of user attribute information corresponding pairwise; the process of respectively extracting the attribute features of each user attribute information to obtain the attribute feature vector group corresponding to the preset user number comprises the following steps:

4. The method of claim 3, wherein the process of extracting word features from the plurality of user attribute parameters corresponding to the preset user numbers to obtain a plurality of word feature vectors corresponding to the user attribute parameters comprises:

5. The method of claim 3, wherein the step of performing information balancing processing on each word feature vector to obtain a word balance vector corresponding to the word feature vector comprises:

z＝t⊙g(W_Hh+b_H)+(1-t)⊙h，

where t ═ σ (W)_Th+b_T)，

6. The method of claim 3, wherein the step of evaluating each local feature vector group to obtain the semantic feature vector corresponding to the user attribute parameter comprises:

wherein the content of the first and second substances,

wherein A is_ij＝α(z_i),z_j∈R^d，

Wherein the content of the first and second substances,

wherein A is_ijIs a matrix of the degree of similarity, and,

multiplication of elements, [;]the vectors are spliced in a row-by-row manner,

is a semantic feature vector.

7. The method according to claim 3, wherein the process of vector fusing the fusion attribute feature vector corresponding to each of the pairwise corresponding user attribute information and the plurality of neighbor attribute feature vectors corresponding to each of the fusion attribute feature vectors to obtain the attribute feature vector group corresponding to the preset user number comprises:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is a first attribute feature vector, and is,

8. The method of claim 2, wherein the step of extracting the structural features of the structural data to obtain the structural feature vector corresponding to the preset user number comprises:

9. The method of claim 2, wherein the step of performing fusion loss calculation on all attribute feature vector groups, all real values, and all structural feature vectors to obtain a fusion loss function comprises:

Loss＝Loss_CE+λLoss_cos，

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is a first attribute feature vector, and is,

10. An apparatus for aligning users across social networks, comprising: