CN110210540B - Cross-social media user identity recognition method and system based on attention mechanism - Google Patents

Cross-social media user identity recognition method and system based on attention mechanism Download PDF

Info

Publication number
CN110210540B
CN110210540B CN201910431115.9A CN201910431115A CN110210540B CN 110210540 B CN110210540 B CN 110210540B CN 201910431115 A CN201910431115 A CN 201910431115A CN 110210540 B CN110210540 B CN 110210540B
Authority
CN
China
Prior art keywords
data
different
social media
user
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910431115.9A
Other languages
Chinese (zh)
Other versions
CN110210540A (en
Inventor
崔思伟
宋雪萌
陈潇琳
尹建华
刘萌
甘甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN201910431115.9A priority Critical patent/CN110210540B/en
Publication of CN110210540A publication Critical patent/CN110210540A/en
Application granted granted Critical
Publication of CN110210540B publication Critical patent/CN110210540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a cross-social media user identity recognition method and method based on an attention mechanism, wherein the method comprises the following steps: acquiring data of different modalities of a plurality of users on different social media as training data; for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model: calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data; mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters; the model is used to determine whether to point to the same user for different modal data on different social media. The method and the device consider the difference of data transmission in different modes, and the accuracy of user identity identification is higher.

Description

Cross-social media user identity recognition method and system based on attention mechanism
Technical Field
The invention relates to the technical field of user identity identification, in particular to a cross-social media user identity identification method and system based on an attention mechanism.
Background
In the age of maturing social media and gradually presenting multi-source data generated by users, multi-source heterogeneous data of the users can reflect the attribute characteristics of the users from different aspects and can refract the daily life of the users from different angles. The behavior data of the user scattered on a plurality of social media is organically integrated, so that the possibility of comprehensively modeling the user is brought for deeply understanding the user behavior and analyzing the user characteristics. In essence, user identification across social media is a prerequisite for subsequent integration of users, and thus draws the attention of many researchers. However, the existing technology mainly depends on user configuration information (user name, birthday, gender) and social network structure, and omits richer user generated data, so that the model has poor interpretability.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the cross-social media user identity recognition method and system based on the attention mechanism, which considers the difference of data transmission information of different modes and has higher accuracy of user identity recognition.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
a cross-social media user identity recognition method based on an attention mechanism comprises the following steps:
acquiring data of different modalities of a plurality of users on different social media as training data;
for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model:
calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data;
mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media;
constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
the model is used to determine whether to point to the same user for different modal data on different social media.
One or more embodiments provide an attention-based cross-social media user identification system, comprising:
the data acquisition module is used for capturing data of different modalities of the user on different social media;
the data representation module is used for learning potential representation of data by adopting different models for data of different modes;
the model training module and the user similarity calculation module are used for calculating the similarity of data between users on different social media by combining the time sequence relation and the confidence degrees of different modal data; the probability calculation module is used for mapping the similarity between the users to a probability space by using a multilayer perceptron, and the probabilities of the users pointing to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
and the user identity identification module receives data of different modalities on different social media and judges whether the data point to the same user.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the attention-based cross-social media user identification method when executing the program.
One or more embodiments provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor performs the cross-social media user identification method based on an attention mechanism.
The above one or more technical solutions have the following beneficial effects:
unlike existing methods for identifying whether the same user is on different social media based on user configuration information, the present invention identifies the same user by similarity of user-generated data (i.e., posted text, image information) on different social media. The method can better solve the limitations that the user configuration information is inconsistent, the credibility is low, the social network structure is difficult to obtain, the data volume is too large and the like, and meanwhile, the potential behavior characteristics of the user can be better analyzed according to the data generated by the user, so that the user matching of the cross-social media is better realized.
The invention takes into account that users typically post similar or even identical content on different social media within the same time period. Therefore, time decay parameters are introduced to learn the similarity between the user generated contents on the basis of the content similarity, the calculation result is more explanatory, and the subsequent user identity can be identified more accurately.
User generated data typically involves multiple modalities such as text, pictures, and video. Data of different modalities often have different confidence levels for user identification across social media. The attention mechanism introduced by the invention can realize automatic distribution of confidence coefficients of different modes, solve the problem that different modes in user generated data have different confidence levels in cross-social media user identity recognition, and improve the modeling performance of the cross-social media user identity recognition and the interpretability of a model.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a general flow diagram of a cross-social media user identification method based on an attention mechanism in one or more embodiments of the invention.
FIG. 2 is a flowchart of a method for calculating similarity between user-generated data on different social media according to one or more embodiments of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Interpretation of professional terms:
an attention mechanism is as follows: the attention mechanism is summarized from the habit rules of human observation environment, when human observes the environment, the brain usually only focuses on some particularly important parts to acquire needed information and construct a certain description about the environment, and the attention mechanism generates corresponding weights by learning the importance of different parts to form more accurate data representation.
Example one
In order to achieve the above purpose, in this embodiment, through learning the accurate representation of the user multi-source heterogeneous data, and combining the time sequence relationship inside the user generated data, the cross-social media user identity recognition is achieved. Because different modalities in the user generated data have different confidence levels in the cross-social media user identification, an attention mechanism is introduced to realize automatic distribution of the confidence levels of the different modalities, so that the modeling performance of the cross-social media user identification and the interpretability of the model are improved. Specifically, the embodiment provides a cross-social media user identity recognition method based on attention time perception user modeling, which includes the following steps:
s1: capturing data of different modalities of a user on different social media, and adopting different models to learn potential representation of the data for the data of different modalities respectively;
s2: based on the time sequence relation in the user generated data, carrying out user similarity modeling;
s3: based on the representation of multi-source heterogeneous data learned by S1 and the user similarity modeling of S2, different confidence degrees of different modal data for cross-social media user identity recognition are captured, and the generalization capability of the model is improved.
The step S1 of forming the data representation further includes:
s11: targeting two social media using different networks, respectivelyIs modeled with O1In (3), assume that O is the text data in (1)1User in (1)
Figure GDA0002742381370000051
The kth data of the release
Figure GDA0002742381370000052
(abbreviated as t) contains M words t ═ x1,x2,…,xMFor each word xzMapped into word vector e using Global Vectorsz. Modeling data over a Bi-directional Long Short-Term Memory network (BilSTM), wherein the forward hidden layer state of the z-th word
Figure GDA0002742381370000061
Expressed as:
Figure GDA0002742381370000062
Figure GDA0002742381370000063
Figure GDA0002742381370000064
Figure GDA0002742381370000065
wherein u iszAnd rzRespectively an update gate and a reset gate, i denotes the ith user, mzIs the memory cell state.
Figure GDA0002742381370000066
Is the last time hidden layer state, σ (x) is the sigmoid function. Wu,Wr,Wm, bu,brAnd bmAre the model parameters. Similarly, the status of the reverse hidden layer of the z-th word can be obtained
Figure GDA0002742381370000067
Thus, a representation f of the z-th word can be obtainedzThe following are:
Figure GDA0002742381370000068
finally, we can get a potential representation of the data t as follows:
Figure GDA0002742381370000069
thus, we can obtain
Figure GDA00027423813700000610
Potential representation of
Figure GDA00027423813700000611
Similarly, O is obtained through another BilSTM network2User's device
Figure GDA00027423813700000612
Text information in g-th data released
Figure GDA00027423813700000613
Potential representation of
Figure GDA00027423813700000614
S12: extracting picture features using a pre-trained Residual Neural Network (ResNet) Network, for O2User's device
Figure GDA00027423813700000615
Picture information in g-th data of issue
Figure GDA00027423813700000616
It is first sent to the ResNet network and then through the fully connected network as follows:
Figure GDA00027423813700000617
wherein, WhAnd bhIs a full connection model parameter, h represents ResNet network, thetarIs a parameter in the network h.
The step S2 of modeling similarity further includes:
s21: data potential representation generated based on S1
Figure GDA0002742381370000071
And
Figure GDA0002742381370000072
(respectively omitted as
Figure GDA0002742381370000073
) Separately, O was calculated using the cosine method2Data of different modalities and O1Similarity of data, thereby performing user similarity modeling:
Figure GDA0002742381370000074
Figure GDA0002742381370000075
Figure GDA0002742381370000076
wherein the content of the first and second substances,
Figure GDA0002742381370000077
and
Figure GDA0002742381370000078
are respectively indicated
Figure GDA0002742381370000079
And
Figure GDA00027423813700000710
and
Figure GDA00027423813700000711
the similarity between them.
S22: because of the same time period, users often post similar or even identical content on different social media. Combining the time sequence relation inside the user generated data with the user similarity modeling, as follows:
Figure GDA00027423813700000712
Figure GDA00027423813700000713
Figure GDA00027423813700000714
wherein r isk,gRepresents a time decay parameter, pkAnd q isgAre respectively
Figure GDA00027423813700000715
And
Figure GDA00027423813700000716
the time stamp of (a) is stored,
Figure GDA00027423813700000717
and
Figure GDA00027423813700000718
after combining the timing relationships respectively
Figure GDA00027423813700000719
And
Figure GDA00027423813700000720
and
Figure GDA00027423813700000721
the similarity between them.
The step S3 of constructing the user identification model further includes:
s31: generated based on S2
Figure GDA0002742381370000081
And
Figure GDA0002742381370000082
can obtain the user
Figure GDA0002742381370000083
The g-th data of
Figure GDA0002742381370000084
And the user
Figure GDA0002742381370000085
Global visual similarity distribution of
Figure GDA0002742381370000086
(abbreviated as
Figure GDA0002742381370000087
) And global text similarity distribution
Figure GDA0002742381370000088
(omitted as
Figure GDA0002742381370000089
). Considering that data of different modalities often have different confidence levels for cross-social media user identification, an attention mechanism is introduced to enable automatic assignment of confidence levels of different modalities, as follows:
Figure GDA00027423813700000810
Figure GDA00027423813700000811
v,αc]=softmax(aTcon(hv,hc))
wherein Wv,Wc,bvAnd bcAre the model parameters. con (-) stands for cascade operation, αvAnd alphacRepresenting different confidence levels assigned by the model to the visual and textual modalities, respectively. a represents the question "which of the visual and textual modalities delivers more information given this user posting? "end available user
Figure GDA00027423813700000812
Is/are as follows
Figure GDA00027423813700000813
And
Figure GDA00027423813700000814
similarity of (2):
Figure GDA00027423813700000815
thus, the user can be obtained
Figure GDA00027423813700000816
Each strip of
Figure GDA00027423813700000817
And
Figure GDA00027423813700000818
degree of similarity of
Figure GDA00027423813700000819
S32: based on the global similarity obtained by S31, the similarity of the G pieces of data is integrated to obtain the user
Figure GDA00027423813700000820
And
Figure GDA00027423813700000821
similarity d ═ d1,d2,…,dG]Wherein, in the step (A),
Figure GDA00027423813700000822
to represent
Figure GDA00027423813700000823
Average pooling of (3). The present embodiment maps the similarity between users to the probability space by using a multi-tier perceptron:
Figure GDA0002742381370000091
wherein the content of the first and second substances,
Figure GDA0002742381370000092
representing a user
Figure GDA0002742381370000093
And
Figure GDA0002742381370000094
probabilities pointing to uniform user entities, w and b are model parameters. Finally, the objective equation of the model can be obtained. The target equation is realized by adopting cross entropy and is used for measuring the difference between a prediction result given by the model and a real label value:
Figure GDA0002742381370000095
wherein, yiRepresenting a user
Figure GDA0002742381370000096
And
Figure GDA0002742381370000097
a label that points to a unified user entity.
S33: and performing iterative training until the model converges, wherein the initial value is a random value of normal distribution, performing multiple rounds of training by adopting parameters in the back propagation automatic learning model, gradually reducing the target equation, training until the target equation is stable, and storing the parameters of the user identity recognition model, so that the result of the cross-social media user identity recognition can be output.
Example two
The embodiment aims to provide a cross-social media user identification system.
In order to achieve the above object, the present embodiment provides an attention-based cross-social media user identification system, including:
the data acquisition module is used for capturing data of different modalities of the user on different social media;
the data representation module is used for learning potential representation of data by adopting different models for data of different modes;
the model training module and the user similarity calculation module are used for calculating the similarity of data between users on different social media by combining the time sequence relation and the confidence degrees of different modal data; the probability calculation module is used for mapping the similarity between the users to a probability space by using a multilayer perceptron, and the probabilities of the users pointing to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
and the user identity identification module receives data of different modalities on different social media and judges whether the data point to the same user.
EXAMPLE III
The embodiment aims at providing an electronic device.
In order to achieve the above object, this embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements, when executing the program, the following:
for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model:
calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data;
mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media;
constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
the model is used to determine whether to point to the same user for different modal data on different social media.
Example four
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, performs the steps of:
for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model:
calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data;
mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media;
constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
the model is used to determine whether to point to the same user for different modal data on different social media.
The steps involved in the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present invention.
One or more of the above embodiments have the following technical effects:
unlike existing methods for identifying whether the same user is on different social media based on user configuration information, the present invention identifies the same user by similarity of user-generated data (i.e., posted text, image information) on different social media. The method can better solve the limitations that the user configuration information is inconsistent, the credibility is low, the social network structure is difficult to obtain, the data volume is too large and the like, and meanwhile, the potential behavior characteristics of the user can be better analyzed according to the data generated by the user, so that the user matching of the cross-social media is better realized.
The invention takes into account that users typically post similar or even identical content on different social media within the same time period. Therefore, time decay parameters are introduced to learn the similarity between the user generated contents on the basis of the content similarity, the calculation result is more explanatory, and the subsequent user identity can be identified more accurately.
User generated data typically involves multiple modalities such as text, pictures, and video. Data of different modalities often have different confidence levels for user identification across social media. The attention mechanism introduced by the invention can realize automatic distribution of confidence coefficients of different modes, solve the problem that different modes in user generated data have different confidence levels in cross-social media user identity recognition, and improve the modeling performance of the cross-social media user identity recognition and the interpretability of a model.
Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (9)

1. A cross-social media user identity recognition method based on an attention mechanism is characterized by comprising the following steps:
acquiring data of different modalities of a plurality of users on different social media as training data;
for data of different modes, respectively adopting different models to learn potential representations of the data, training a user identity recognition model:
calculating the similarity of data between users on different social media by combining the time sequence relationship and the confidence degrees of different modal data; calculating the similarity of data between users on different social media comprises:
calculating the similarity of different modal data among users on different social media;
calculating time attenuation parameters between user data on different social media by combining the time sequence relation, and correcting the similarity;
distributing confidence coefficients for the data of different modes based on an attention mechanism, and weighting the similarity between the data of different modes to obtain comprehensive similarity;
mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media;
constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
the model is used to determine whether to point to the same user for different modal data on different social media.
2. The method for cross-social media user identification based on attention mechanism as claimed in claim 1, wherein the data of different modalities are text data and image data; for text data, learning potential representations of the data over a two-way long-short term memory network; for image data, a residual neural network and a fully-connected network are employed to learn potential representations of the data.
3. The method of claim 1, wherein social media is identified by the cross-social media user identity based on attention mechanism1User in (1)
Figure FDA0002742381360000021
The kth data published is represented as
Figure FDA0002742381360000022
Mixing O with2User's device
Figure FDA0002742381360000023
The text information in the g-th data of the publication is represented as
Figure FDA0002742381360000024
The picture information in the g-th piece of data is represented as
Figure FDA0002742381360000025
The potential representations obtained through learning are respectively represented as
Figure FDA0002742381360000026
Is differentThe similarity calculation method of different modal data among users on social media comprises the following steps:
Figure FDA0002742381360000027
Figure FDA0002742381360000028
wherein the content of the first and second substances,
Figure FDA0002742381360000029
and
Figure FDA00027423813600000210
are respectively indicated
Figure FDA00027423813600000211
And
Figure FDA00027423813600000212
Figure FDA00027423813600000213
and
Figure FDA00027423813600000214
the similarity between the users, i, represents the ith user.
4. The method for cross-social media user identification based on attention mechanism as claimed in claim 3, wherein the time decay parameter calculation method is:
Figure FDA00027423813600000215
wherein r isk,gRepresents a time decay parameter, pkAnd q isgAre respectively
Figure FDA00027423813600000216
And
Figure FDA00027423813600000217
a timestamp of (d);
the similarity is corrected according to the time attenuation parameters as follows:
Figure FDA00027423813600000218
Figure FDA00027423813600000219
wherein the content of the first and second substances,
Figure FDA00027423813600000220
and
Figure FDA00027423813600000221
after combining the timing relationships respectively
Figure FDA00027423813600000222
And
Figure FDA00027423813600000223
Figure FDA00027423813600000224
and
Figure FDA00027423813600000225
the similarity between them.
5. The method for cross-social media user identification based on attention mechanism as claimed in claim 4, wherein the calculation method of confidence of different modalities is:
Figure FDA0002742381360000031
Figure FDA0002742381360000032
v,αc]=softmax(aTcon(hv,hc) In which α isvAnd alphacRespectively representing different confidence degrees of the model for visual and text modes;
Figure FDA0002742381360000033
and
Figure FDA0002742381360000034
are respectively users
Figure FDA0002742381360000035
The g-th data of
Figure FDA0002742381360000036
And the user
Figure FDA0002742381360000037
Global visual similarity distribution and global text similarity distribution; wv,Wc,bvAnd bcIs a model parameter, con (-) represents a cascading operation;
user' s
Figure FDA0002742381360000038
Each strip of
Figure FDA0002742381360000039
And
Figure FDA00027423813600000310
the similarity of (a) is as follows:
Figure FDA00027423813600000311
6. the method of claim 5, wherein the user identity recognition across social media based on attention mechanism,
integrating users over a period of time
Figure FDA00027423813600000312
All G pieces of data and users
Figure FDA00027423813600000313
Similarity d ═ d1,d2,…,dG];
Mapping the similarity between users to a probability space using a multi-tier perceptron:
Figure FDA00027423813600000314
wherein the content of the first and second substances,
Figure FDA00027423813600000315
representing a user
Figure FDA00027423813600000316
And
Figure FDA00027423813600000317
probability of pointing to a uniform user entity, w and b are model parameters;
constructing a target equation:
Figure FDA00027423813600000318
wherein, yiRepresenting a user
Figure FDA00027423813600000319
And
Figure FDA00027423813600000320
a label that points to a unified user entity.
7. A cross-social media user identification system based on an attention mechanism, comprising:
the data acquisition module is used for capturing data of different modalities of the user on different social media;
the data representation module is used for learning potential representation of data by adopting different models for data of different modes;
the model training module and the user similarity calculation module are used for calculating the similarity of data between users on different social media by combining the time sequence relation and the confidence degrees of different modal data; calculating the similarity of data between users on different social media comprises:
calculating the similarity of different modal data among users on different social media;
calculating time attenuation parameters between user data on different social media by combining the time sequence relation, and correcting the similarity;
and distributing confidence degrees for the data of different modes based on the attention mechanism, and weighting the similarity degrees between the data of different modes to obtain the comprehensive similarity degree.
The probability calculation module is used for mapping the similarity between the users to a probability space by using a multilayer perceptron to obtain the probability that the users point to the same user entity on different social media; constructing an objective function by adopting cross entropy, and performing iterative optimization solution on the model parameters;
and the user identity identification module receives data of different modalities on different social media and judges whether the data point to the same user.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the attention-based cross-social media user identification method of any one of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs a cross-social media user identification method based on an attention mechanism according to any one of claims 1 to 6.
CN201910431115.9A 2019-05-22 2019-05-22 Cross-social media user identity recognition method and system based on attention mechanism Active CN110210540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910431115.9A CN110210540B (en) 2019-05-22 2019-05-22 Cross-social media user identity recognition method and system based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910431115.9A CN110210540B (en) 2019-05-22 2019-05-22 Cross-social media user identity recognition method and system based on attention mechanism

Publications (2)

Publication Number Publication Date
CN110210540A CN110210540A (en) 2019-09-06
CN110210540B true CN110210540B (en) 2021-02-26

Family

ID=67788116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910431115.9A Active CN110210540B (en) 2019-05-22 2019-05-22 Cross-social media user identity recognition method and system based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110210540B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046166B (en) * 2019-12-10 2022-10-11 中山大学 Semi-implicit multi-modal recommendation method based on similarity correction
CN111274491B (en) * 2020-01-15 2021-04-06 杭州电子科技大学 Social robot identification method based on graph attention network
CN113297397B (en) * 2021-05-12 2022-08-09 山东大学 Information matching method and system based on hierarchical multi-mode information fusion
CN113779520B (en) * 2021-09-07 2023-06-13 中国船舶重工集团公司第七0九研究所 Cross-space target virtual identity association method based on multi-layer attribute analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677830A (en) * 2016-01-04 2016-06-15 北京大学 Heterogeneous media similarity computing method and retrieving method based on entity mapping
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107562812A (en) * 2017-08-11 2018-01-09 北京大学 A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090857B (en) * 2017-12-29 2021-06-22 复旦大学 Multi-mode student classroom behavior analysis system and method
CN108805009A (en) * 2018-04-20 2018-11-13 华中师范大学 Classroom learning state monitoring method based on multimodal information fusion and system
CN109753602B (en) * 2018-12-04 2020-12-25 中国科学院计算技术研究所 Cross-social network user identity recognition method and system based on machine learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677830A (en) * 2016-01-04 2016-06-15 北京大学 Heterogeneous media similarity computing method and retrieving method based on entity mapping
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107562812A (en) * 2017-08-11 2018-01-09 北京大学 A kind of cross-module state similarity-based learning method based on the modeling of modality-specific semantic space

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network;Yuxin Peng et al.;《IEEE》;20181130;第27卷(第11期);第5585-5599页 *
时间衰减制导的协同过滤相似性计算;李源鑫 等;《计算机系统应用》;20131231;第22卷(第11期);第129-134页 *

Also Published As

Publication number Publication date
CN110210540A (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN110210540B (en) Cross-social media user identity recognition method and system based on attention mechanism
CN111931062B (en) Training method and related device of information recommendation model
WO2020094060A1 (en) Recommendation method and apparatus
CN110383299B (en) Memory enhanced generation time model
US8645287B2 (en) Image tagging based upon cross domain context
CN109992773B (en) Word vector training method, system, device and medium based on multi-task learning
CN113395578B (en) Method, device, equipment and storage medium for extracting video theme text
CN110234018B (en) Multimedia content description generation method, training method, device, equipment and medium
CN113301442B (en) Method, device, medium, and program product for determining live broadcast resource
WO2022188773A1 (en) Text classification method and apparatus, device, computer-readable storage medium, and computer program product
CN113761153B (en) Picture-based question-answering processing method and device, readable medium and electronic equipment
CN110580516B (en) Interaction method and device based on intelligent robot
CN113704511B (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
US9875443B2 (en) Unified attractiveness prediction framework based on content impact factor
CN117473951A (en) Text processing method, device and storage medium
CN110704650A (en) OTA picture tag identification method, electronic device and medium
CN115878839A (en) Video recommendation method and device, computer equipment and computer program product
CN113822291A (en) Image processing method, device, equipment and storage medium
CN112116264A (en) Activity evaluation method and apparatus
CN117556149B (en) Resource pushing method, device, electronic equipment and storage medium
CN115470397B (en) Content recommendation method, device, computer equipment and storage medium
CN116523024B (en) Training method, device, equipment and storage medium of recall model
CN109871487B (en) News recall method and system
CN108875928B (en) Multi-output regression network and learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant