CN110210540B

CN110210540B - Cross-social media user identity recognition method and system based on attention mechanism

Info

Publication number: CN110210540B
Application number: CN201910431115.9A
Authority: CN
Inventors: 崔思伟; 宋雪萌; 陈潇琳; 尹建华; 刘萌; 甘甜
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2021-02-26
Anticipated expiration: 2039-05-22
Also published as: CN110210540A

Abstract

The invention discloses a cross-social media user identity recognition method and method based on an attention mechanism, wherein the method comprises the following steps: acquiring data of different modalities of a plurality of users on different social media as training data; for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model: calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data; mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters; the model is used to determine whether to point to the same user for different modal data on different social media. The method and the device consider the difference of data transmission in different modes, and the accuracy of user identity identification is higher.

Description

Cross-social media user identity recognition method and system based on attention mechanism

Technical Field

The invention relates to the technical field of user identity identification, in particular to a cross-social media user identity identification method and system based on an attention mechanism.

Background

In the age of maturing social media and gradually presenting multi-source data generated by users, multi-source heterogeneous data of the users can reflect the attribute characteristics of the users from different aspects and can refract the daily life of the users from different angles. The behavior data of the user scattered on a plurality of social media is organically integrated, so that the possibility of comprehensively modeling the user is brought for deeply understanding the user behavior and analyzing the user characteristics. In essence, user identification across social media is a prerequisite for subsequent integration of users, and thus draws the attention of many researchers. However, the existing technology mainly depends on user configuration information (user name, birthday, gender) and social network structure, and omits richer user generated data, so that the model has poor interpretability.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the cross-social media user identity recognition method and system based on the attention mechanism, which considers the difference of data transmission information of different modes and has higher accuracy of user identity recognition.

In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

a cross-social media user identity recognition method based on an attention mechanism comprises the following steps:

acquiring data of different modalities of a plurality of users on different social media as training data;

for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model:

calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data;

mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media;

constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;

the model is used to determine whether to point to the same user for different modal data on different social media.

One or more embodiments provide an attention-based cross-social media user identification system, comprising:

the data acquisition module is used for capturing data of different modalities of the user on different social media;

the data representation module is used for learning potential representation of data by adopting different models for data of different modes;

the model training module and the user similarity calculation module are used for calculating the similarity of data between users on different social media by combining the time sequence relation and the confidence degrees of different modal data; the probability calculation module is used for mapping the similarity between the users to a probability space by using a multilayer perceptron, and the probabilities of the users pointing to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;

and the user identity identification module receives data of different modalities on different social media and judges whether the data point to the same user.

An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the attention-based cross-social media user identification method when executing the program.

One or more embodiments provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor performs the cross-social media user identification method based on an attention mechanism.

The above one or more technical solutions have the following beneficial effects:

unlike existing methods for identifying whether the same user is on different social media based on user configuration information, the present invention identifies the same user by similarity of user-generated data (i.e., posted text, image information) on different social media. The method can better solve the limitations that the user configuration information is inconsistent, the credibility is low, the social network structure is difficult to obtain, the data volume is too large and the like, and meanwhile, the potential behavior characteristics of the user can be better analyzed according to the data generated by the user, so that the user matching of the cross-social media is better realized.

The invention takes into account that users typically post similar or even identical content on different social media within the same time period. Therefore, time decay parameters are introduced to learn the similarity between the user generated contents on the basis of the content similarity, the calculation result is more explanatory, and the subsequent user identity can be identified more accurately.

User generated data typically involves multiple modalities such as text, pictures, and video. Data of different modalities often have different confidence levels for user identification across social media. The attention mechanism introduced by the invention can realize automatic distribution of confidence coefficients of different modes, solve the problem that different modes in user generated data have different confidence levels in cross-social media user identity recognition, and improve the modeling performance of the cross-social media user identity recognition and the interpretability of a model.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a general flow diagram of a cross-social media user identification method based on an attention mechanism in one or more embodiments of the invention.

FIG. 2 is a flowchart of a method for calculating similarity between user-generated data on different social media according to one or more embodiments of the present invention.

Detailed Description

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

Interpretation of professional terms:

an attention mechanism is as follows: the attention mechanism is summarized from the habit rules of human observation environment, when human observes the environment, the brain usually only focuses on some particularly important parts to acquire needed information and construct a certain description about the environment, and the attention mechanism generates corresponding weights by learning the importance of different parts to form more accurate data representation.

Example one

In order to achieve the above purpose, in this embodiment, through learning the accurate representation of the user multi-source heterogeneous data, and combining the time sequence relationship inside the user generated data, the cross-social media user identity recognition is achieved. Because different modalities in the user generated data have different confidence levels in the cross-social media user identification, an attention mechanism is introduced to realize automatic distribution of the confidence levels of the different modalities, so that the modeling performance of the cross-social media user identification and the interpretability of the model are improved. Specifically, the embodiment provides a cross-social media user identity recognition method based on attention time perception user modeling, which includes the following steps:

s1: capturing data of different modalities of a user on different social media, and adopting different models to learn potential representation of the data for the data of different modalities respectively;

s2: based on the time sequence relation in the user generated data, carrying out user similarity modeling;

s3: based on the representation of multi-source heterogeneous data learned by S1 and the user similarity modeling of S2, different confidence degrees of different modal data for cross-social media user identity recognition are captured, and the generalization capability of the model is improved.

The step S1 of forming the data representation further includes:

s11: targeting two social media using different networks, respectivelyIs modeled with O₁In (3), assume that O is the text data in (1)₁User in (1)

The kth data of the release

(abbreviated as t) contains M words t ═ x¹，x²，…，x^MFor each word x^zMapped into word vector e using Global Vectors^z. Modeling data over a Bi-directional Long Short-Term Memory network (BilSTM), wherein the forward hidden layer state of the z-th word

Expressed as:

wherein u is_zAnd r_zRespectively an update gate and a reset gate, i denotes the ith user, m_zIs the memory cell state.

Is the last time hidden layer state, σ (x) is the sigmoid function. W_u，W_r，W_m， b_u，b_rAnd b_mAre the model parameters. Similarly, the status of the reverse hidden layer of the z-th word can be obtained

Thus, a representation f of the z-th word can be obtained^zThe following are:

finally, we can get a potential representation of the data t as follows:

thus, we can obtain

Potential representation of

Similarly, O is obtained through another BilSTM network₂User's device

Text information in g-th data released

Potential representation of

S12: extracting picture features using a pre-trained Residual Neural Network (ResNet) Network, for O₂User's device

Picture information in g-th data of issue

It is first sent to the ResNet network and then through the fully connected network as follows:

wherein, W_hAnd b_hIs a full connection model parameter, h represents ResNet network, theta_rIs a parameter in the network h.

The step S2 of modeling similarity further includes:

s21: data potential representation generated based on S1

And

(respectively omitted as

) Separately, O was calculated using the cosine method₂Data of different modalities and O₁Similarity of data, thereby performing user similarity modeling:

wherein the content of the first and second substances,

and

are respectively indicated

And

and

the similarity between them.

S22: because of the same time period, users often post similar or even identical content on different social media. Combining the time sequence relation inside the user generated data with the user similarity modeling, as follows:

wherein r is_k，gRepresents a time decay parameter, p_kAnd q is_gAre respectively

And

the time stamp of (a) is stored,

and

after combining the timing relationships respectively

And

and

the similarity between them.

The step S3 of constructing the user identification model further includes:

s31: generated based on S2

And

can obtain the user

The g-th data of

And the user

Global visual similarity distribution of

(abbreviated as

) And global text similarity distribution

(omitted as

). Considering that data of different modalities often have different confidence levels for cross-social media user identification, an attention mechanism is introduced to enable automatic assignment of confidence levels of different modalities, as follows:

[α_v，α_c]＝softmax(a^Tcon(h_v，h_c))

wherein W_v，W_c，b_vAnd b_cAre the model parameters. con (-) stands for cascade operation, α_vAnd alpha_cRepresenting different confidence levels assigned by the model to the visual and textual modalities, respectively. a represents the question "which of the visual and textual modalities delivers more information given this user posting? "end available user

Is/are as follows

And

similarity of (2):

thus, the user can be obtained

Each strip of

And

degree of similarity of

S32: based on the global similarity obtained by S31, the similarity of the G pieces of data is integrated to obtain the user

And

similarity d ═ d₁，d₂，…，d_G]Wherein, in the step (A),

to represent

Average pooling of (3). The present embodiment maps the similarity between users to the probability space by using a multi-tier perceptron:

wherein the content of the first and second substances,

representing a user

And

probabilities pointing to uniform user entities, w and b are model parameters. Finally, the objective equation of the model can be obtained. The target equation is realized by adopting cross entropy and is used for measuring the difference between a prediction result given by the model and a real label value:

wherein, y_iRepresenting a user

And

a label that points to a unified user entity.

S33: and performing iterative training until the model converges, wherein the initial value is a random value of normal distribution, performing multiple rounds of training by adopting parameters in the back propagation automatic learning model, gradually reducing the target equation, training until the target equation is stable, and storing the parameters of the user identity recognition model, so that the result of the cross-social media user identity recognition can be output.

Example two

The embodiment aims to provide a cross-social media user identification system.

In order to achieve the above object, the present embodiment provides an attention-based cross-social media user identification system, including:

EXAMPLE III

The embodiment aims at providing an electronic device.

In order to achieve the above object, this embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements, when executing the program, the following:

Example four

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:

The steps involved in the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.

One or more of the above embodiments have the following technical effects:

Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A cross-social media user identity recognition method based on an attention mechanism is characterized by comprising the following steps:

calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data; calculating the similarity of data between users on different social media comprises:

calculating the similarity of different modal data among users on different social media;

calculating time attenuation parameters between user data on different social media by combining the time sequence relation, and correcting the similarity;

distributing confidence coefficients for the data of different modes based on an attention mechanism, and weighting the similarity between the data of different modes to obtain comprehensive similarity;

2. The method for cross-social media user identification based on attention mechanism as claimed in claim 1, wherein the data of different modalities are text data and image data; for text data, learning potential representations of the data over a two-way long-short term memory network; for image data, a residual neural network and a fully-connected network are employed to learn potential representations of the data.

3. The method of claim 1, wherein social media is identified by the cross-social media user identity based on attention mechanism₁User in (1)

The kth data published is represented as

Mixing O with₂User's device

The text information in the g-th data of the publication is represented as

The picture information in the g-th piece of data is represented as

The potential representations obtained through learning are respectively represented as

Is differentThe similarity calculation method of different modal data among users on social media comprises the following steps:

wherein the content of the first and second substances,

and

are respectively indicated

And

and

the similarity between the users, i, represents the ith user.

4. The method for cross-social media user identification based on attention mechanism as claimed in claim 3, wherein the time decay parameter calculation method is:

And

a timestamp of (d);

the similarity is corrected according to the time attenuation parameters as follows:

wherein the content of the first and second substances,

and

after combining the timing relationships respectively

And

and

the similarity between them.

5. The method for cross-social media user identification based on attention mechanism as claimed in claim 4, wherein the calculation method of confidence of different modalities is:

[α_v，α_c]＝softmax(a^Tcon(h_v，h_c) In which α is_vAnd alpha_cRespectively representing different confidence degrees of the model for visual and text modes;

and

are respectively users

The g-th data of

And the user

Global visual similarity distribution and global text similarity distribution; w_v，W_c，b_vAnd b_cIs a model parameter, con (-) represents a cascading operation;

user' s

Each strip of

And

the similarity of (a) is as follows:

6. the method of claim 5, wherein the user identity recognition across social media based on attention mechanism,

integrating users over a period of time

All G pieces of data and users

Similarity d ═ d₁，d₂，…，d_G]；

Mapping the similarity between users to a probability space using a multi-tier perceptron:

wherein the content of the first and second substances,

representing a user

And

probability of pointing to a uniform user entity, w and b are model parameters;

constructing a target equation:

wherein, y_iRepresenting a user

And

a label that points to a unified user entity.

7. A cross-social media user identification system based on an attention mechanism, comprising:

the model training module and the user similarity calculation module are used for calculating the similarity of data between users on different social media by combining the time sequence relation and the confidence degrees of different modal data; calculating the similarity of data between users on different social media comprises:

and distributing confidence degrees for the data of different modes based on the attention mechanism, and weighting the similarity degrees between the data of different modes to obtain the comprehensive similarity degree.

The probability calculation module is used for mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;

8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the attention-based cross-social media user identification method of any one of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs a cross-social media user identification method based on an attention mechanism according to any one of claims 1 to 6.