CN107679242A

CN107679242A - Merge the label recommendation method that multiple information sources Coupling Tensor is decomposed

Info

Publication number: CN107679242A
Application number: CN201711040886.2A
Authority: CN
Inventors: 杨忆; 韩立新; 刘元珍; 勾智楠
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2017-10-30
Filing date: 2017-10-30
Publication date: 2018-02-09
Anticipated expiration: 2037-10-30
Also published as: CN107679242B

Abstract

The invention discloses a kind of label recommendation method for merging multiple information sources Coupling Tensor and decomposing, first, decomposed simultaneously in the tensor CP of label resources user triplet information construction, label is with the addition of to participate in combining decomposition with three resource, label and user auxiliary information matrixes with label, label, and when building label and label similarity matrix, the present invention considers the Semantic Similarity of cooccurrence relation and label in WordNet between label simultaneously, and similarity measurement between final label is used as using the linear Integrated of two kinds of similitudes.Secondly, after the loss function of Construct question, parameter optimization is carried out to object function with ADMM algorithms.Finally according to decompose completion prediction tensor more accurately to（User, resource）To recommending Top N labels.The present invention has merged the same Heterogeneous Information of label, applied to having versatility on each socialization labeling system.

Description

Label recommendation method fusing multi-information source coupling tensor decomposition

Technical Field

The invention relates to a label recommendation method fusing multi-information source coupling tensor decomposition, and belongs to the technical field of computer network labeling.

Background

With the increasing development of Web2.0 websites, information on the Web is rapidly growing at an alarming rate, and the information growth rate far exceeds the processing capacity of people. Recommendation systems have now played an increasingly important role in the efficient processing of information. Social annotation systems, which are typical applications of recommendation systems, such as last. Fm for sharing music, flicker for sharing pictures, and dericious for sharing bookmarks, are rapidly developed, and in these social annotation systems, users actively generate tags and identify, manage, and discover information resources through the tags. The recommendation of the label is a research hotspot of the current labeling system, and aims to reduce the burden of a user and help the user to select a proper label to finish labeling operation. Unlike the traditional recommendation system which only processes user-resource (user-item), the social annotation system needs to process three dimensions of user-tag-resource (user-tag-item), so that the model which only considers binary relation is no longer suitable for the social annotation system containing ternary relation of user, resource and tag. Meanwhile, tensor models have become popular methods for studying potential association between high-order data, so more and more students begin studying label recommendation based on tensor decomposition models. However, the existing tensor decomposition method still faces the problems of extreme sparseness of data, missing values and overfitting in the label recommendation process.

In view of the above three problems, the existing label recommendation method based on tensor decomposition only utilizes isomorphic auxiliary information between labels, and does not consider utilizing heterogeneous auxiliary information between label-resources and label-users.

Disclosure of Invention

The purpose is as follows: with the popularization of the application of a social labeling system, the invention provides the label recommendation method integrating the multi-information source coupling tensor decomposition, in order to overcome the defect that the conventional label recommendation method based on the tensor decomposition only utilizes isomorphic information between labels and improve the accuracy of label recommendation.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a label recommendation method fusing multi-information source coupling tensor decomposition comprises the following specific steps:

step1, constructing a label similarity matrix B based on linear integration of two similarity measures of label co-occurrence and semantic correlation;

step2: constructing a Tag-resource, tag-Item matrix C;

step3: constructing a label-resource, tag-User matrix D;

step4: constructing a model and optimizing a parameter algorithm;

step5: and (5) recommending the label.

Preferably, step1 comprises the following steps:

step1.1 calculates the tag co-occurrence similarity:

let t be _i And t _j Two tags in the data set of the tag similarity matrix B are used, and the method for measuring the co-occurrence similarity between the two tags is shown in formula 1:

|t _i ∩t _j | denotes t _i And t _j Number of commonly labeled resources, | t _i ∪t _j | denotes t _i And t _j The sum of the annotated resources;

step1.2 computes tag semantic similarity:

calculating the semantic similarity of the labels according to the semantic similarity of the labels in WordNet, and collecting the data set by using two labels t _i And t _j Semantic similarity is shown in equation 2:

wherein LCS is t _i And t _j The minimum common super concept of (c), depth (LCS) is the number of nodes from LCS to the classification root; n1 is from t _i Number of nodes on the path to LCS, N2 being from t _j The number of nodes on the path to the LCS;

step1.3 integrates similarity and semantics as the final similarity:

step1.1 and Step1.2 respectively acquire the mutual correlation between the co-occurrence similarity of the labels and the semantic hierarchy of the labels, and in order to complement the two ways, the similarity calculation between the two labels is combined as t _i And t _j Final similarity:

B _i,j ＝γ×cooccurrence_B _i,j +(1-γ)×senmantic_B _i,j ,γ∈[0，1] (3)

step1.4 tag map Laplacian as the regularization term:

the assumption of label graph regularization is that if label i and label j are similar, then the implicit eigen factor matrix U of the labels mined by the tensor decomposition process ⁽¹⁾ Implicit feature row vectors for Medium tag i and tag jAndwill be very close;

wherein, B _i,j Calculating final similarity values of the label i and the label j in Step 1.3;implicit characteristic factor matrix U for labels ⁽¹⁾ The ith row and the d th column elements,implicit characteristic factor matrix U representing labels ⁽¹⁾ The whole D column, L is called a graph Laplacian matrix, L = D-B, D is a diagonal matrix, the ith value of the diagonal element of D is the element sum of the similarity matrix B corresponding to the ith row, namely D _ii ＝∑ _j B _i,j (ii) a And tr () denotes the trace of the matrix.

Preferably, step2 comprises the following steps:

all labels in a training data set of the social annotation system are regarded as a document set, and the labels t _h Annotating a resource v _j The weight of (A) is:

where Num (h, j) represents the tag t _h The number of times of occurrence in the whole label set, M is the total number of resources in the system, d _hj Indicating the number of resources tagged with the label.

Preferably, step3 comprises the following steps:

and (4) regarding all labels in the training data set of the social annotation system as a document set, and then, regarding the labels t _k Is user u _i The weights used are:

where Num (k, i) represents the label t _k The number of times of concentrated appearance of the whole label, N is the total number of users in the system, W _ki Indicating the number of users tagged with the label.

Preferably, step4 comprises the following steps:

step4.1: the formalized label recommendation is the basic CP tensor decomposition model:

set U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ Respectively representing implicit characteristic factor matrixes corresponding to a Tag (Tag), a resource (Item) and a User (User); whereink is the rank of tensor, and k < min (| Tag |, | Item |, | User |), the label recommendation model based on CP tensor decomposition is defined as the following constrained optimization problem:

wherein,the method includes constructing an initial tensor by using a training data set of a socialized marking system, wherein lambda is a regularization parameter, | | | calting _F Is L ₂ The number of the norm is calculated,for the Tikhonov regularization term, prevent the target expression from overfitting and provide a unique solution,representative table and watchNon-negative indicator tensor of the same size ifWith the observed annotated behavior value, thenOtherwise equal to Representing an observed annotated behavior value;

step4.2: and (3) integrating auxiliary information of the Step1-Step3 structures as regularization items and adding the regularization items to the model constructed in the Step4.1, and converting the model in the Step4.1 to solve the following constraint optimization problem:

wherein alpha controls the weight of the auxiliary information of the similarity between the labels participating in the decomposition, and beta controls the weight of the label-resource matrix C and the label-user matrix D as the weight of the auxiliary information participating in the decomposition;

because the formula (8) has no closed solution, the invention uses ADMM, alternating Direction Method of Multipliers algorithm to optimize the objective function solution; the objective function formula (8) is written in the form of a partial lagrangian function as formula (9):

wherein Z ⁽ⁿ⁾ ≧ 0,n =1,2,3 is a temporary variable introduced for solving equation (9), Y ⁽ⁿ⁾ (n =1,2, 3) is a Lagrangian multiplier matrix, η is a penalty parameter,<*,*&gt is inner product operation;

step 4.3: ADMM optimization target expression formula (9):

the invention adopts an iteration scheme to solve the formula (9) and updates the formula respectivelyThen subsequently update { Y ⁽¹⁾ ,Y ⁽²⁾ ,Y ⁽³⁾ The method comprises the following specific steps:

(1) Updating Z ⁽¹⁾ ,Z ⁽²⁾ ,Z ⁽³⁾

To update Z ⁽¹⁾ ,Z ⁽²⁾ ,Z ⁽³⁾ Only the following optimization problem needs to be solved, as shown in equation (10):

writing Z, Y, U in the form of a column vector, such as Z = [ ] ]Z ₁ ...Z _m ]，Z _i Is the ith column of Z, thus Z _i ,Y _i ,U _i Are all column vectors; z ⁽ⁿ⁾ Equations (11) and (12) are effectively updated by solving the optimization problem:

obtaining by solution (11)

Obtaining by solution (12)

Wherein I is an identity matrix;

(2) Updating U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ ：

To update U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ Equation (9) is rewritten into equation (15):

wherein E ⁽ⁿ⁾ ＝(U ^(N) ⊙...U ⁽ⁿ⁺¹⁾ ⊙U ^(n-1) ⊙LU ⁽¹⁾ ) ^T | _N＝3 Filament as Khatri-Rao product, X _(n) Is the tensorThe mode-n expansion matrix of (1); then get the information about U ⁽ⁿ⁾ The update formula of (c) is as follows:

(3) Updating

WhereinIs thatIs equal to

(4) Updating Y ⁽ⁿ⁾ ：

(5) Updating eta:

updating eta in an adaptive mode to accelerate the optimization algorithm, wherein eta is updated according to an equation (21):

η _t+1 ＝min(ρη _t ,η _max ) (21)。

preferably, step5 comprises the following steps:

through the joint division of formula (9)After the solution, the compensated tensor can be obtained The middle element represents a quadruple (t, v, u, p), and p represents the possibility that a user u tags a resource v with t; the label can be recommended according to the weight corresponding to (u, v); and if N labels are to be recommended when the user u labels the resource v, selecting the labels corresponding to the N weights from high to low.

Has the advantages that: the label recommendation method fusing multi-information source coupling tensor decomposition, provided by the invention, comprises the steps of firstly constructing three auxiliary information matrixes of labels and labels, labels and resources and labels and users, optimizing parameters of a training model by using an ADMM algorithm after a problem model is constructed, calculating implicit eigenvectors of the labels, the users and the resources by using a coupling tensor-matrix decomposition method, and recommending the labels for the labeled users. The algorithm can effectively utilize the isomorphic and heterogeneous semantic information of the label, and effectively relieves the problems of data sparseness, missing values and overfitting; the method has universality and is suitable for being applied to various social labeling systems.

Drawings

FIG. 1 is a schematic diagram of a coupling tensor-matrix decomposition model;

FIG. 2 is a framework diagram of a tag recommendation algorithm;

FIG. 3 is an ADMM optimization model parameter algorithm diagram.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1-2, a label recommendation method fusing multi-information source coupling tensor decomposition includes the following specific steps:

step1, constructing a label similarity matrix B based on linear integration of label co-occurrence and semantic correlation similarity measurement;

if two resources are labeled with similar labels, the two resources are likely to have similar implicit eigenvectors, so the coupling tensor-matrix decomposition process can be regularized by label information.

Step1.1 calculates the tag co-occurrence similarity:

let t _i And t _j Two tags in the data set of the tag similarity matrix B are used, and the method for measuring the co-occurrence similarity between the two tags is shown in formula 1:

|t _i ∩t _j i denotes t _i And t _j Number of commonly labeled resources, | t _i ∪t _j I denotes t _i And t _j The sum of the annotated resources;

step1.2 computes tag semantic similarity:

calculating the semantic similarity of the labels according to the semantic similarity of the labels in WordNet, and collecting two labels t in the data set _i And t _j Semantic similarity is shown in equation 2:

wherein LCS is t _i And t _j The depth (LCS) is the number of nodes from the LCS to the classification root; n1 is from t _i Number of nodes on the path to LCS, N2 being from t _j The number of nodes on the path to the LCS;

step1.3 integrates similarity and semantics as the final similarity:

step1.1 and step1.2 obtain mutual correlation between the co-occurrence similarity of the labels and the semantic hierarchy of the labels respectively, and in order to complement the two ways, the similarity calculation between the two labels is combined as t _i And t _j Final similarity:

B _i,j ＝γ×cooccurrence_B _i,j +(1-γ)×senmantic_B _i,j ,γ∈[0，1] (3)

step1.4 tag map Laplacian as the regularization term:

the main assumption for regularization of the label graph is that if label i and label j are similar, then the implicit eigen factor matrix U of the labels mined by the tensor decomposition process ⁽¹⁾ Implicit feature row vectors for middle label i and label jAndbut also very close.

Wherein, B _i,j Calculating final similarity values of the label i and the label j in Step 1.3;implicit characteristic factor matrix U for labels ⁽¹⁾ The ith row and the d th column elements,implicit characteristic factor matrix U representing a label ⁽¹⁾ The whole D column, L is called a Laplacian matrix, L = D-B, D is a diagonal matrix, the ith value of the diagonal elements of D is the sum of the elements of the ith row corresponding to the similarity matrix B, namely D _ii ＝∑ _j B _i,j (ii) a And tr () denotes the trace of the matrix.

Step2: constructing a Tag-resource (Tag-Item) matrix C:

all labels in the training data set of the socialized annotation system are regarded as textArchives, labels t _h Annotating a resource v _j The weight of (A) is:

where Num (h, j) represents the tag t _h The number of times of occurrence in the whole label set, M is the total number of resources in the system, d _hj Representing the number of resources labeled by the label;

step3: constructing a Tag-resource (Tag-User) matrix D:

and (4) regarding all labels in the training data set of the social annotation system as a document set, and then, regarding the labels t _k By user u _i The weights used are:

where Num (k, i) represents the tag t _k The number of times of concentrated appearance of the whole label, N is the total number of users in the system, W _ki Representing the number of users labeled by the label;

step4: model construction and parameter optimization algorithm:

the Step4.1 formalized label recommendation is the basic CP tensor decomposition model:

set U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ Respectively, the implicit characteristic factor matrixes corresponding to the label (Tag), the resource (Item) and the User (User). Whereink is rank of tensor, k < min (| Tag |, | Item |, | User |), then the Tag recommendation model based on CP tensor decomposition can be defined as the following constrained optimization problem:

wherein,the method comprises constructing an initial tensor by utilizing a training data set of a socialized labeling system, wherein lambda is a regularization parameter, | | | | computation _F Is L ₂ The norm of the number of the first-order-of-arrival,for the Tikhonov regularization term, prevent the target expression from overfitting and provide a unique solution,representative table and watchTensor of non-negative indicator of the same size, ifWith observed annotated behavior valuesOtherwise equal to Representing an observed annotated behavior value;

step4.2 integrates the auxiliary information of the Step1-Step3 structure as a regularization item and adds the regularization item to the model constructed in the Step4.1, and the model in the Step4.1 is transformed to solve the following constraint optimization problem:

wherein alpha controls the weight of the auxiliary information participating in the decomposition of the similarity between the labels, and beta controls the weight of the label-resource matrix C and the label-user matrix D as the weight of the auxiliary information participating in the decomposition.

Since equation (8) has no closed-form solution. The invention uses ADMM (Alternating Direction Method of Multipliers) algorithm to optimize the objective function solution. The objective function (8) is written in the form of a partial Lagrangian function as shown in equation (9):

wherein Z ⁽ⁿ⁾ ≧ 0,n =1,2,3 is a temporary variable introduced for the resolution of formula (9), Y ⁽ⁿ⁾ (n =1,2, 3) is a Lagrangian multiplier matrix, η is a penalty parameter,<*,*&gt, is the inner product operation.

As shown in fig. 3, step 4.3: ADMM optimizes the target expression:

the invention adopts an iterative scheme to solve the formula (9) and respectively updatesThen subsequently update { Y ⁽¹⁾ ,Y ⁽²⁾ ,Y ⁽³⁾ The method comprises the following specific steps:

(1) Updating Z ⁽¹⁾ ,Z ⁽²⁾ ,Z ⁽³⁾

writing Z, Y, U into column vector form, e.g. Z = [ Z ] ₁ ...Z _m ],Z _i Is the ith column of Z, thus Z _i ,Y _i ,U _i Are all column vectors. Z is a linear or branched member ⁽ⁿ⁾ Can be efficiently updated by solving the optimization problems (11) and (12):

obtaining by solution (11)

Obtaining by solution (12)

Where I is the identity matrix.

(2) Updating U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ .

wherein E ⁽ⁿ⁾ ＝(U ^(N) ⊙...U ⁽ⁿ⁺¹⁾ ⊙U ^(n-1) ⊙LU ⁽¹⁾ ) ^T | _N＝3 Filament as Khatri-Rao product, X _(n) Is the tensorThe mode-n expansion matrix. Then get the information about U ⁽ⁿ⁾ The update formula of (2) is as follows:

(3) Updating

WhereinIs thatIs equal to

(4) Updating Y ⁽ⁿ⁾

(5) Updating eta

The eta acceleration optimization algorithm can be updated adaptively, and eta is updated according to the formula (21).

η _t+1 ＝min(ρη _t ,η _max ) (21)

Step5 tag recommendation

After the joint decomposition of the formula (9), the compensated tensor can be obtained The middle element represents a quadruple (t, v, u, p), and p represents the possibility that a user u tags a resource v with t; the labels can be recommended according to the corresponding weight of (u, v); and if N labels are to be recommended when the user u labels the resource v, selecting the labels corresponding to the N weights from high to low.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A label recommendation method fusing multi-information source coupling tensor decomposition is characterized by comprising the following steps: the method comprises the following specific steps:

step2: constructing a Tag-resource, tag-Item matrix C;

step3: constructing a label-resource, tag-User matrix D;

step4: constructing a model and optimizing a parameter algorithm;

step5: and (4) recommending the label.

2. The label recommendation method fusing multi-information source coupling tensor decomposition according to claim 1, wherein: the Step1 comprises the following steps:

step1.1 calculate tag co-occurrence similarity:

|t _i ∩t _j i denotes t _i And t _j Number of commonly-labeled resources, | t _i ∪t _j | denotes t _i And t _j The sum of the annotated resources;

step1.2 computes tag semantic similarity:

calculating the semantic similarity of the labels according to the semantic similarity of the labels in WordNet, and collecting the data set by using two labels t _i And t _j Semantic faciesSimilarity is shown in formula 2:

step1.3 integrates similarity and semantics as the final similarity:

B _i,j ＝γ×cooccurrence_B _i,j +(1-γ)×senmantic_B _i,j ,γ∈[0，1] (3)

the Step1.4 label graph Laplacian is used as a regularization term:

the assumption of label graph regularization is that if label i and label j are similar, then the implicit eigen factor matrix U of the labels mined by the tensor decomposition process ⁽¹⁾ Implicit feature row vectors for middle label i and label jAndwill be very close;

wherein, B _i,j Calculating final similarity values of the label i and the label j in Step 1.3;is a labelImplicit characteristic factor matrix U of ⁽¹⁾ The ith row and the d th column elements,implicit characteristic factor matrix U representing labels ⁽¹⁾ The whole D column, L is called a graph Laplacian matrix, L = D-B, D is a diagonal matrix, the ith value of the diagonal element of D is the element sum of the similarity matrix B corresponding to the ith row, namely D _ii ＝∑ _j B _i,j (ii) a And tr () denotes the trace of the matrix.

3. The label recommendation method for merging multiple information source coupling tensor decompositions as recited in claim 1, wherein: the Step2 comprises the following steps:

4. The label recommendation method for merging multiple information source coupling tensor decompositions as recited in claim 1, wherein: the Step3 comprises the following steps:

where Num (k, i) represents the tag t _k The number of times of concentrated appearance of the whole label, N is the total number of users in the system, W _ki Indicating the labelling byThe number of the households.

5. The label recommendation method fusing multi-information source coupling tensor decomposition according to claim 1, wherein: the Step4 comprises the following steps:

wherein,the method includes constructing an initial tensor by using a training data set of a socialized marking system, wherein lambda is a regularization parameter, | | | calting _F Is L ₂ The norm of the number of the first-order-of-arrival,for the Tikhonov regularization term, prevent the target expression from overfitting and provide a unique solution,representative table and watchThe same applies toTensor of non-negative indicator of size, ifWith the observed annotated behavior value, thenOtherwise equal to Representing observed annotated behavior values;

wherein alpha controls the weight of the auxiliary information of the similarity between the labels participating in the decomposition, and beta controls the weight of the label-resource matrix C and the label-user matrix D as the auxiliary information participating in the decomposition;

wherein Z ⁽ⁿ⁾ ≧ 0,n =1,2,3 is a temporary variable introduced for the resolution of formula (9), Y ⁽ⁿ⁾ (n =1,2, 3) is a Lagrangian multiplier matrix, η is a penalty parameter,<*,*&gt is inner product operation;

step 4.3: ADMM optimization target expression formula (9):

(1) Updating Z ⁽¹⁾ ,Z ⁽²⁾ ,Z ⁽³⁾

To update Z ⁽¹⁾ ,Z ⁽²⁾ ,Z ⁽³⁾ Only the following optimization problem needs to be solved, as shown in formula (10):

writing Z, Y, U into column vector form, e.g. Z = [ Z ] ₁ ...Z _m ]，Z _i Is the ith column of Z, thus Z _i ,Y _i ,U _i Are all column vectors; z ⁽ⁿ⁾ Equations (11) and (12) are effectively updated by solving the optimization problem:

obtaining by solution (11)

Obtaining by solution (12)

Wherein I is an identity matrix;

(2) Updating U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ ：

To update U ⁽¹⁾ ,U ⁽²⁾ ,U ⁽³⁾ Equation (9) is rewritten to equation (15):

wherein E ⁽ⁿ⁾ ＝(U ^(N) ⊙...U ⁽ⁿ⁺¹⁾ ⊙U ^(n-1) ⊙LU ⁽¹⁾ ) ^T | _N＝3 Filament as Khatri-Rao product, X _(n) Is the tensorThe mode-n expansion matrix of (2); then get the information about U ⁽ⁿ⁾ The update formula of (2) is as follows:

(3) Updating

WhereinIs thatIs equal to

(4) Updating Y ⁽ⁿ⁾ ：

(5) Updating eta:

η _t+1 ＝min(ρη _t ,η _max ) (21)。

6. the label recommendation method fusing multi-information source coupling tensor decomposition according to claim 1, wherein: the Step5 comprises the following steps:

after the joint decomposition of the formula (9), the compensated tensor is obtained The middle element represents a quadruple (t, v, u, p), and p represents the possibility that a user u tags a resource v with t; the label is recommended according to the weight corresponding to the (u, v); and if N labels are to be recommended when the user u labels the resource v, selecting the labels corresponding to the N weights from high to low.