CN110674417B

CN110674417B - Label recommendation method based on user attention relationship

Info

Publication number: CN110674417B
Application number: CN201910902974.1A
Authority: CN
Inventors: 赵鑫; 侯宇蓬; 陈俊华; 文继荣
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2022-03-11
Anticipated expiration: 2039-09-24
Also published as: CN110674417A

Abstract

The invention provides a label recommendation method based on a user attention relationship, which specifically comprises the following steps: 1) generating user influence scores using a conventional PageRank algorithm

And label impact score

2) Training a user interest network and a user-label network by using a graph embedding model to generate a user vector

And label vector

Binding the impact fraction

Label impact score

User vector

And the label vector

And recommending the label for the user. The user attention relationship-based tag recommendation method provided by the invention is used for mining information from a user attention network and a user tag network containing rich information, so that the user characteristic information in the social network is richer, and a service provider can better understand the user.

Description

Label recommendation method based on user attention relationship

Technical Field

The invention relates to the technical field of tag recommendation methods, in particular to a method for recommending tags to users by using a graph embedding technology based on attention relations among users in a social network.

Background

In recent years, micro-blogging services like twitter and Singal micro blogging have attracted a large number of users, and have formed social networks of great scale and influence. In order to better manage, organize and understand the microblog users, a task of automatically recommending tags for the microblog users is provided by the academic world. By automatically recommending the labels to the user, the hidden interests which the user may have can be known, and the preferences and social relationships of the user can be understood in more dimensions. However, the previous tag recommendation method mainly focuses on mining text data generated by users, but the attention relationship among users, another data type rich in information in the microblog, is not reasonably mined and utilized.

The PageRank algorithm is an algorithm for analyzing the influence of nodes by taking the number and quality of links between the nodes in a network as main factors. The basic assumptions are: more important nodes are more linked by other nodes, and nodes linked by important nodes are more important. The algorithm calculates an influence score for each node in the network, and a high score indicates that the node has a large influence in the network. A schematic diagram of the PageRank algorithm is shown in fig. 1.

Graph Embedding (Network Embedding) is a technology for Embedding high-dimensional and discrete graph/Network data into a low-dimensional and dense real vector space by a machine learning method. The embedded real space vector is more easily applied to common machine learning models than high-dimensional, discrete graph data.

Calculating a training set by an iterative method through a gradient descent method

Minimum of upper risk function.

The formula is expressed as follows:

wherein theta is_tIs the parameter value at the t-th iteration, alpha is the learning rate,

is a training set

The risk function of (1).

The random Gradient Descent (SGD) method is based on the Gradient Descent method, and only one sample is randomly acquired in each iteration, and the Gradient of the sample loss function is calculated and the parameters are updated. Over a sufficient number of iterations, the random gradient descent may also converge to a locally optimal solution.

The information disclosed in this background section is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a label recommendation method based on a user attention relationship, so as to solve the technical problems in the prior art.

In order to solve the technical problem, the invention provides a tag recommendation method based on a user attention relationship, which specifically comprises the following steps:

1) generating user influence scores using a conventional PageRank algorithm

And label impact score

And label vector

Binding the impact fraction

Label impact score

User vector

And the label vector

And recommending the label for the user.

As a further technical scheme, the graph embedding model is divided into three parts: modeling explicit similarities between users, modeling implicit similarities between users, and modeling tag semantic information.

As a further technical solution, the modeling of explicit similarity between users specifically includes: sampling user attention relationship u₁→u₂And optimizing the user vector by using a random gradient descent method, so that the vector space and the probability distribution of the attention relationship generated in the user attention network are fitted with each other.

As a further technical scheme, the probability distribution of the attention relationship generated in the user attention network is characterized by the influence scores of the users, and the probability of forming the attention relationship among the users with the similar influence scores is higher.

As a further technical solution, the modeling of the implicit similarity between users specifically includes: sampling the user triples of 'common concern' and 'common concern', mapping an original vector space to a new vector space with a semantic node as an origin by using affine transformation, and then optimizing the user vector by using a random gradient descent method to ensure that the probability distribution of the triples generated in the new vector space and the user concern network are mutually fitted.

As a further technical solution, the semantic node refers to a node in the triplets of "concern together" and "concern together" to which two other users are simultaneously connected.

As a further technical solution, the modeling of the tag semantic information specifically includes: and sampling the user-label incidence relation u-t, and optimizing a user vector and a label vector by using a random gradient descent method to ensure that the vector space is fitted with the probability distribution of the user-label incidence relation generated in the user concern network and the user-label network.

By adopting the technical scheme, the invention has the following beneficial effects:

according to the method, the interest transfer relationship of the users is mined from the attention relationship among the users by using a graph embedding technology, so that the labels are recommended to the users, hidden interests possibly carried by the users can be known, and topics or users possibly interested by the users can be recommended to the users better.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a diagram of a prior art PageRank algorithm;

FIG. 2 is a schematic diagram of the present invention employing affine transformations on triples.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present invention will be further explained with reference to specific embodiments.

The invention provides a label recommendation method based on a user attention relationship, which specifically comprises the following steps: 1) using conventionalPageRank algorithm generates user influence scores

And label impact score

And label vector

Binding the impact fraction

Label impact score

User vector

And the label vector

And recommending the label for the user.

The invention provides a novel graph embedding model based on a user attention network and a user label network, and further automatically recommends labels for users according to generated user/label vectors and influence scores.

In this embodiment, as a further technical solution, the graph embedding model is divided into three parts: modeling explicit similarities between users, modeling implicit similarities between users, and modeling tag semantic information. For each user u, use separately

And

representing its in-degree vector and out-degree vector.

In this embodiment, as a further technical solution, the modeling of the explicit similarity between the users specifically includes: sampling user attention relationship u₁→u₂And optimizing the user vector by using a random gradient descent method, so that the vector space and the probability distribution of the attention relationship generated in the user attention network are fitted with each other. The method specifically comprises the following steps:

sampling user attention relationship u₁→u₂Updating user u using a stochastic gradient descent method₁And user u₂Such that u is linked in the vector space₁→u₂Generated probability distribution p₁(u₁,u₂) Fitting an empirical probability distribution corresponding to links in a network of interest to a user

Wherein:

a here_uRepresenting the influence score, Δ, of user u_b,b′The degree of similarity between two real numbers is measured by-a- | b-b' |,

representing the totality of users，

Representing user u₁The user concerned. The optimization function is:

wherein KL (·, ·) represents the KL divergence,

in this embodiment, as a further technical solution, the probability distribution for generating the attention relationship in the user attention network is characterized by the influence scores of the users, and the probability of forming the attention relationship between users with similar influence scores is higher.

In this embodiment, as a further technical solution, the modeling of the implicit similarity between users specifically includes: sampling the user triples of 'common concern' and 'common concern', mapping an original vector space to a new vector space with a semantic node as an origin by using affine transformation, and then optimizing the user vector by using a random gradient descent method to ensure that the probability distribution of the triples generated in the new vector space and the user concern network are mutually fitted. The method specifically comprises the following steps:

the implicit similarity modeling part among users samples the triples of 'common concern' and 'common concern'. Without loss of generality, the "common focus" is taken here as an example:<u₁,u₂,u₃>represents u₁And u₂Are all covered by u₃Attention is paid. The model adopts affine transformation (as shown in figure 2) to map the original vector space to the user u₃In a new vector space with the origin of the output vector, the user u is updated by using a random gradient descent method₁And u₂The probability distribution p generated by the triplets in the new vector space₂(u₁,u₂,u₃) Fitting to an empirical probability distribution corresponding to triples in a network of interest to a user

Wherein:

affine transformation here

And

the definition of (A) is similar to that of the previous part, and is not described in detail. The optimization function still uses the KL divergence.

In this embodiment, as a further technical solution, the semantic node refers to a node in the "attention together" and "attention together" triples, which simultaneously connects two other users.

In this embodiment, as a further technical solution, the modeling of the tag semantic information specifically includes: and sampling the user-label incidence relation u-t, and optimizing a user vector and a label vector by using a random gradient descent method to ensure that the vector space is fitted with the probability distribution of the user-label incidence relation generated in the user concern network and the user-label network. The method specifically comprises the following steps:

firstly, the model splices the in-degree vector and the out-degree vector corresponding to the user to obtain a user vector

Herein, the

Representing a vector stitching operation. Then, the user label link u-t is sampled, and the user vector is updated by using a random gradient descent method

And label vector

Probability distribution p resulting from chaining u-t in vector space₃(u, t) empirical probability distribution fitting to user interest network and link correspondence in user tag network

Wherein:

here, the

Finally we adopt

And calculating the similarity between the user and the vector, and selecting the K labels with the highest s (u, t) for each user u to recommend.

In summary, the invention provides a tag recommendation method based on user attention relations, which is characterized in that interest transfer relations of users are mined from attention relations among the users by using a graph embedding technology, and then tags are recommended to the users, so that hidden interests possibly carried by the users can be known, and further topics or users possibly interested by the users can be better recommended to the users.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A label recommendation method based on a user attention relationship is characterized by specifically comprising the following steps:

1) generating user influence scores using a conventional PageRank algorithm

And label impact score

And label vector

Binding the impact fraction

Label impact score

User vector

And the label vector

Recommending a label for the user;

the graph embedding model is divided into three parts: modeling explicit similarities between users, implicit similarities between users, and tag semantic information;

the modeling of the explicit similarity between the users specifically comprises: sampling user attention relationship u₁→u₂Optimizing a user vector by using a random gradient descent method, and fitting a vector space and probability distribution of an attention relation generated in a user attention network with each other;

Wherein:

a here_uRepresenting the influence score, Δ, of user u_b,b′The degree of similarity between two real numbers is measured by- α · | b-b' |, u denotes the total user,

representing user u₁A user of interest; the optimization function is:

wherein KL (·, ·) represents the KL divergence,

by using

2. The label recommendation method based on the user attention relationship according to claim 1, wherein the probability distribution for generating the attention relationship in the user attention network is characterized by the influence scores of the users, and the probability of forming the attention relationship between users with similar influence scores is higher.

3. The tag recommendation method based on user attention relationship according to claim 1, wherein the modeling of implicit similarity between users specifically comprises: sampling the user triples of 'common concern' and 'common concern', mapping an original vector space to a new vector space with a semantic node as an origin by using affine transformation, and then optimizing the user vector by using a random gradient descent method to ensure that the probability distribution of the triples generated in the new vector space and the user concern network are mutually fitted.

4. The tag recommendation method based on user attention relationship according to claim 3, wherein the semantic node refers to a node in the triplets of "attention together" and "attention together" connecting two other users at the same time.

5. The user attention relationship-based tag recommendation method according to claim 1, wherein the modeling of tag semantic information specifically comprises: and sampling the user-label incidence relation u-t, and optimizing a user vector and a label vector by using a random gradient descent method to ensure that the vector space is fitted with the probability distribution of the user-label incidence relation generated in the user concern network and the user-label network.