CN112784173B

CN112784173B - Recommendation system scoring prediction method based on self-attention confrontation neural network

Info

Publication number: CN112784173B
Application number: CN202110217932.1A
Authority: CN
Inventors: 马康康; 王庆先; 黄庆; 常奥
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2022-06-10
Anticipated expiration: 2041-02-26
Also published as: CN112784173A

Abstract

The invention discloses a recommendation system score prediction method based on a self-attention confrontation neural network, which comprises the following steps of: s1: collecting user information, project information and project scoring data of a user, and constructing a high-dimensional sparse scoring matrix and a corresponding masking matrix; s2: generating distribution information on a high-dimensional sparse scoring matrix; s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model; s4: and evaluating the high-dimensional sparse scoring matrix to complete the scoring prediction of the user on the project. The invention combines the self-attention mechanism and the discrimination automatic encoder and provides a specific method applied to a recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data.

Description

Recommendation system scoring prediction method based on self-attention confrontation neural network

Technical Field

The invention belongs to the technical field of recommendation systems, and particularly relates to a recommendation system score prediction method based on a self-attention confrontation neural network.

Background

The rapid development of the internet causes the problem of information overload, and the efficiency of obtaining useful information by a user is seriously influenced. To solve this problem, recommendation system technology has gained a great deal of research effort. In a recommendation system, user-item scoring data is the underlying data source. Because there are a large number of users and projects in the system, it is not possible for each user to rate all projects, and so there is very little scoring data. These scoring data are typically represented using a high-dimensional sparse matrix, in which only a small number of elements are known. To solve this high-dimensional sparse matrix, a number of collaborative filtering-based methods are proposed. The methods mainly utilize the existing evaluation data to extract the low-dimensional hidden feature representation of the users and the projects, and have the following defects: firstly, the relationship between the score data of the local regions in the high-dimensional sparse matrix data is not fully utilized; secondly, the overall distribution characteristics of the scoring data in the high-dimensional sparse matrix are not considered.

Disclosure of Invention

The invention aims to solve the problem of score prediction among user items and provides a recommendation system score prediction method based on a self-attention confrontation neural network.

The technical scheme of the invention is as follows: a recommendation system scoring prediction method based on a self-attention confrontation neural network comprises the following steps:

s1: collecting user information, project information and project scoring data of a user, and constructing a high-dimensional sparse scoring matrix and a corresponding masking matrix;

s2: extracting the distribution characteristics of a mask matrix by using a self-attention encoder to generate distribution information about a high-dimensional sparse scoring matrix;

s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model according to distribution information and a high-dimensional sparse scoring matrix;

s4: and evaluating the high-dimensional sparse scoring matrix by using the trained scoring prediction model to complete the scoring prediction of the project by the user.

The invention has the beneficial effects that:

(1) the invention combines the self-attention mechanism and the discrimination automatic encoder and provides a specific method applied to a recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data. Meanwhile, the model adopts a convolution neural network to extract the distribution characteristics of the local area in the mask matrix. Besides, the dependency relationship between all data in the mask matrix is calculated by using a self-attention mechanism, and the global distribution characteristic is obtained. And finally, the distribution characteristics of the local area and the global distribution characteristics are fused to train the model, so that the distribution information of the mask matrix can be effectively and comprehensively acquired.

(2) The invention establishes a prediction model based on the antagonistic neural network to estimate the missing scoring data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of the objective function of the antagonistic neural network, and the prediction precision of the model is improved.

Further, step S1 includes the following sub-steps:

s11: collecting user information, project information and project grading data of users to obtain a user set

Item set

And user scoring set of items

Wherein n denotes the number of users, m denotes the number of items, u₁,u₂,…,u_nRepresenting 1 st to nth users, i₁,i₂,…,i_mDenotes the 1 st to m-th items, s_u,iRepresents the score of the user u on the item i, and v represents the maximum value of the score;

s12: aggregation of ratings from user to items

Constructing a high-dimensional sparse scoring matrix R, wherein each element R_u,iThe expression of (a) is:

s13: according to the high-dimensional sparse scoring matrix R, constructing a corresponding mask matrix H epsilon {0,1}^n×mWherein each element h_u,iThe expression of (a) is:

wherein 1 indicates that the user u has a known score for the item i, 0 indicates that the user u has an unknown score for the item i,

represents the set of known elements in the R,

representing a set of unknown elements.

The beneficial effects of the further scheme are as follows: in the invention, a recommendation system model is designed aiming at the internet application based on the grading feedback, and is used for predicting the missing grading data and providing the possibly interested items for the user. First, user information, project information, and user rating data for projects, such as movie rating, joke rating, and web service quality rating, need to be collected from a real application. Matrix array

In (1),

and

respectively represent a known element set and an unknown element set in R, because

Thus R is a high-dimensional sparse matrix. The mask matrix H can reflect the overall distribution characteristics of the known scores in R, and each row vector H_uE H can reflect the distribution characteristics of the scoring data of the user u.

Further, step S2 includes the following sub-steps:

s21: setting each row vector { H) in the mask matrix H₁,…,h_u,…,h_nObey the first data distribution q (h);

s22: defining a self-attention encoder for distributing samples h from a first data distribution q (h)_uE.g. H turnTransformation into a corresponding low-dimensional hidden feature representation z_uWherein z is_uOne sample of the second data distribution q (z);

s23: representing the low-dimensional hidden features from the second data distribution q (z) as z_uAs input to a self-attention decoder, and generates samples h_uOf the reconstructed sample

S24: calculate sample h_uAnd reconstructing the sample

The reconstruction error rec _ error in between;

s25: setting a distribution p (z) of a known analytic expression, and training a self-attention encoder and a self-attention decoder according to the distance between the distribution p (z) of the known analytic expression and a second data distribution q (z);

s26: and converting the mask vector into a low-dimensional hidden feature representation of the distribution p (z) conforming to the known analytic expression by using a trained self-attention encoder, and converting sample data of the distribution p (z) of the known analytic expression into distribution information by using a trained self-attention decoder to generate the distribution information about the high-dimensional sparse scoring matrix.

The beneficial effects of the further scheme are as follows: in the present invention, assume that each row vector H in H₁,…,h_u,…,h_nObey a data distribution q (h). The corresponding low dimensional hidden feature matrix is represented as

Wherein the row vector z₁,…,z_u,…,z_nD represents the dimension of the hidden feature representation. The low-dimensional hidden feature representation is assumed to follow an analytically unknown data distribution q (z).

Further, step S22 includes the following sub-steps:

s221: randomly sampling t mask vectors { H) from a mask matrix H by using a small batch gradient descent algorithm_u,h_u+1…,h_u+tForming an input matrix X;

s222: defining a self-attention encoder, taking an input matrix X as an input of the self-attention encoder, and distributing samples h from a first data distribution q (h)_uE H is converted into corresponding low-dimensional hidden feature representation z_uWherein z is_uIs a sample of the second data distribution q (z).

Further, in step S222, the self-attention encoder includes a convolutional layer, a self-attention layer, and a pooling layer;

the method for constructing the convolutional layer comprises the following steps: the convolutional layer contains K1 × 1 convolutional kernels, and the feature map E of the input matrix X is extracted by using the convolutional kernels, and the calculation formula is as follows:

wherein, represents a two-dimensional convolution calculation,

the parameters representing the k-th convolution kernel,

represents the deviation of the kth convolution kernel, σ (·) represents the activation function;

the method for constructing the self-attention layer comprises the following steps: calculating a dependency relationship matrix Y between each element in the feature mapping E, and fusing the dependency relationship matrix Y into the feature mapping E to obtain a fused feature I, wherein each element Y in the feature mapping E_pAnd the calculation formula of the fusion characteristic I is respectively as follows:

wherein e is_pP-th one representing a feature map EElement, e_qQ element, y, representing feature map E_pThe p-th element representing Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, γ (E) represents a normalization factor,

the parameters representing the kth self-attention layer convolution kernel,

the deviation of the kth self-attention layer convolution kernel;

the method for constructing the pooling layer comprises the following steps: inputting the fusion feature I into a pooling layer with a pooling kernel size of c × c and a sliding step of a, wherein the expression of the pooling layer is as follows:

Z＝MeanPooling2D(I)

where MeanPooling2D (. cndot.) represents average pooling, Z represents a low-dimensional implicit feature matrix, and each row vector { Z ·_u,z_u+1…,z_u+tDenotes a mask vector h_u,h_u+1…,h_u+tThe low dimensional implicit feature representation of.

The beneficial effects of the above further scheme are: in the invention, in a self-attention layer, firstly, a dependency matrix Y between each element in a feature mapping E is calculated, the dependency matrix Y is fused into the feature mapping E, and global features are introduced for local features extracted from each receptive field in convolution operation, so that richer information is brought to a subsequent convolution layer; wherein g (-) represents a mapping function for computing a feature vector of a point.

Further, in step S23, a sample { h } is generated_u,h_u+1…,h_u+tCorresponding reconstructed data

Using matrices formed from reconstructed data

Expressed, the calculation formula is:

wherein D is₀Input of a representation model, D_l-1Denotes the output of layer l-1, D_lRepresents the output of the l-th layer, U_lRepresents the output of the l-th upsampling layer, U_L-1Represents the output of the L-1 st upsampling layer, K represents the number of convolution kernels, σ (-) represents the activation function,

representing the parameters of the kth convolution kernel in layer l-1,

represents the deviation of the kth convolution kernel in the L-1 layer, L represents the number of deconvolution layers, Z represents the low-dimensional hidden feature matrix, UpSamplling 2D (·) represents the upsampled layer,

parameters representing the kth convolution kernel in the lth layer,

indicating the deviation of the kth convolution kernel in the L-th layer.

D_lAnd U_lThe intermediate result of the calculation step is shown, so that the detailed process of the calculation step is convenient to explain, and the calculation reconstruction data has a direct relation.

Further, in step S24, the calculation formula of the reconstruction error rec _ error is:

wherein, X represents an input matrix,

representing a reconstructed data matrix.

Further, in step S25, the step of calculating the distance C (p (z)) between the distribution p (z) of the known analytic expression and the second data distribution q (z), q (z) includes the following sub-steps:

s251: setting a distribution p (z) of a known analytical formula;

s252: establishing a discriminator by utilizing a full-connection neural network;

s253: randomly sampling t low-dimensional hidden feature representations from a distribution p (z) of known analytic expressions

Randomly sampling t low-dimensional hidden feature representations { z ] from the distribution q (z)₁,…,z_t}，

S254: representing low-dimensional hidden features

And

as the input of the discriminator, and outputs the discrimination result;

s253: based on the determination result, the distance C (p (z), q (z)) between the distribution p (z) of the known analytical formula and the second data distribution q (z) is calculated by the following formula:

wherein D (-) represents the discrimination result, log (-) represents the logarithmic function,

indicating that a mathematical expectation is calculated on a second data distribution q (z) of z,

is shown in

Data distribution of

The mathematical expectation value is calculated.

The beneficial effects of the further scheme are as follows: in the present invention, to enable a decoder to generate new mask vectors

The decoder must receive the other vectors from q (z)

However, the analytical formula of q (z) is unknown, and it is difficult to directly obtain other data samples. For indirectly obtaining other samples z of the q (z) distribution^*Assuming that a distribution p (z) for which the analytic expression is known is the equivalent distribution of q (z), the samples in q (z) are obeyed to p (z) and the samples in p (z) are also obeyed to q (z). At this time, a new mask vector can be generated by using the samples in p (z) as input data of the decoder. To satisfy this assumption, the distance between q (z) and p (z) is taken as part of the encoder and decoder objective function, with which the parameters of the encoder and decoder are updated, enabling the encoder to convert the data distribution q (h) to an equivalent distribution p (z) of q (z), and the decoder to convert the low-dimensional hidden feature representation output by the encoder to a new mask vector. After training, the transformed low-dimensional implicit features represent samples that are more and more similar to p (z).

Further, in step S26, the target function expression of the self-attention encoder is rec _ error; the target function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is

Where rec _ error represents sample h_uAnd reconstructing the sample

C (p (z), q (z)) represents the distance between the distribution p (z) of the known analytical formula and the second data distribution q (z), D (-) represents the discrimination result, log (-) represents the logarithmic function,

is shown in

Data distribution of

The mathematical expectation value is calculated.

The beneficial effects of the further scheme are as follows: in the present invention, after training is completed, the encoder can convert the mask vector into a low-dimensional hidden feature representation that obeys the data distribution p (z), and the decoder can convert the sample data from p (z) into useful distribution information. Since the analytic expression of p (z) is known, a large amount of low-dimensional hidden feature representation data can be uniformly obtained from the low-dimensional hidden feature representation data, and more comprehensive distribution information about a high-dimensional sparse scoring matrix is obtained through the conversion of a decoder.

Further, step S3 includes the following sub-steps:

s31: setting row vectors { R) in high-dimensional sparse scoring matrix R₁,…,r_u,…,r_nObey the true data distribution p_rAnd the row vector r₁,…,r_u,…,r_nAs a true score vector, where r_uA score vector representing user u;

s32: constructing a self-attention confrontation neural network model, wherein a generator of the self-attention confrontation neural network model is in a self-attention self-encoder structure, a discriminator is in a fully-connected neural network, and a prediction score vector generated by the generator is set to obey the generator distribution p_g；

S33: sample to be reconstructed

And a score vector r of the user_uPerforming fusion to obtain fusion data

The calculation formula for fusion is:

wherein h is_uAn indication sample, < > indicates a logical operation;

s34: fusing data

As input to the generator, a prediction score vector for each user is calculated

The calculation formula is as follows:

wherein,

n represents the number of users, SAAE (-) represents the self-attention self-encoder;

s35: vector prediction scores

And a truth score vector { r₁,…,r_u,…,r_nAs the input of the discriminator, the real data distribution p is discriminated_rSum generator distribution p_gThe difference between the grading sample data and the real grading data and outputting a judgment result;

s36: training a scoring prediction model according to the judgment result until the scoring prediction model is converged;

in step S35, the sparsification method is used to facilitate the differentiation by the discriminator, and the expression is:

wherein,

representing a prediction score vector of the user u after sparsification;

in step S36, the discriminator is trained according to the discrimination result, and the objective function expression is:

wherein, J^DisAn objective function of the discriminator is represented,

is shown at r_uTrue data distribution p of_rThe mathematical expectation value is calculated as above,

is shown in

Generator distribution p of_gCalculating a mathematical expected value, and Dis (·) represents a discrimination result input by the discriminator;

training the generator according to the discrimination result, wherein the target function expression is as follows:

wherein, J^GenAn objective function of the generator is represented,

is shown in

Data distribution p of_gUpper calculated mathematical expected value，

λ denotes regularization coefficient, ψ denotes regularization, r_u,iRepresenting elements in a high-dimensional sparse scoring matrix R,

representing the predicted score of user u for item i,

representing a known set of scoring data.

The beneficial effects of the further scheme are as follows: in the present invention, in order to learn the features of the score data in R, first assume a row vector { R ] in R₁,…,r_u,…,r_nObey the true data distribution p_r(ii) a Establishing a self-attention confrontation neural network model, wherein a generator uses a self-attention self-encoder structure, and a discriminator uses a full-connection neural network; to enable the generator to generate a true score vector, it is assumed that the predicted score vector generated by the generator obeys the generator distribution p_g. If can enable p_gAnd p_rThe same, the prediction score vector generated by the generator is real at this time; to make p stand for_gClose to p_rUntil the same, the distance between the two distributions is taken as the objective function of the model. Updating the parameters of the generator by using the target function to ensure that the distribution distance is smaller and smaller; due to p_gAnd p_rThe analytical formula (2) is unknown, and the distance between the two cannot be calculated by a clear formula. Therefore, the difference between the sample data of the two distributions and the sample data of the real distribution is estimated through the discriminator and then is substituted into the Wasserstein distance formula to approximately calculate p_gAnd p_rThe distance of (d); to obtain sample data for the producer distribution, the input to the producer is first obtained. Decoder sample mask vector learned from S2

And the mask vector and the score vector r of the user_uFusion, providing more distribution information for scoring dataThe fused data is used as the input of the generator. The generator calculates the prediction scoring vector of each user

Vector prediction scores

And a truth score vector { r₁,…,r_u,…,r_nThe data is used as the input of a discriminator for evaluating the difference between the two types of data and the real scoring data; since the true score vector has only a small amount of known score data, a large number of unknown scores are filled with 0, and the predicted score values of all the predicted score vectors are difficult for the discriminator to discriminate the true and false of the two. Therefore, the prediction score vector is thinned by using the mask vector, only the prediction score corresponding to the known score data in the prediction score vector is reserved, so that the consent is kept in the form, and only the characteristics of the true and false score data are reserved on the data, thereby being convenient for the distinguishing of a discriminator. The sparse prediction score vector and the real score vector are used as the input of a discriminator, and the discrimination result is output; and finally training a model: the discriminants are trained first, the generators are trained second, and regularization (which represents the mean square error between the prediction scores and the true scores) is used until the model converges.

Drawings

FIG. 1 is a flow chart of a recommendation system score prediction method.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the present invention provides a recommendation system score prediction method based on a self-attention confrontation neural network, comprising the following steps:

s1: collecting user information, project information and project grading data of a user, and constructing a high-dimensional sparse grading matrix and a corresponding masking matrix;

In the embodiment of the present invention, as shown in fig. 1, step S1 includes the following sub-steps:

Item set

And user scoring set of items

s12: aggregation of ratings according to user to items

Constructing a high-dimensional sparse scoring matrix R, wherein each element R_u,iThe expression of (c) is:

s13: according to the high-dimensional sparse scoring matrix R, a corresponding mask matrix H epsilon {0,1} is constructed^n×mWherein each element h_u,iThe expression of (a) is:

represents the set of known elements in the R,

representing a set of unknown elements.

In the invention, a recommendation system model is designed aiming at the internet application based on the grading feedback, and is used for predicting the missing grading data and providing the possibly interested items for the user. First, user information, project information, and user rating data for projects, such as movie rating, joke rating, and web service quality rating, need to be collected from a practical application. Matrix array

In (1),

and

In the embodiment of the present invention, as shown in fig. 1, step S2 includes the following sub-steps:

s22: defining a self-attention encoder to receive a first number from a first numberSample h according to distribution q (h)_uE H is converted into corresponding low-dimensional hidden feature representation z_uWherein z is_uOne sample of the second data distribution q (z);

S24: calculate sample h_uAnd reconstructing the sample

The reconstruction error rec _ error in between;

In the present invention, assume that each row vector H in H₁,…,h_u,…,h_nObey a data distribution q (h). The corresponding low dimensional hidden feature matrix is represented as

Wherein the row vector z₁,…,z_u,…,z_nD represents the dimension of the hidden feature representation. It is assumed that the low dimensional implicit features represent a data distribution q (z) subject to an analytical unknowns.

In the embodiment of the present invention, as shown in fig. 1, step S22 includes the following sub-steps:

In the embodiment of the present invention, as shown in fig. 1, in step S222, the self-attention encoder includes a convolutional layer, a self-attention layer, and a pooling layer;

wherein, represents a two-dimensional convolution calculation,

the parameters representing the k-th convolution kernel,

wherein e is_pP-th element, E, representing a feature map E_qQ-th element, y, representing a feature map E_pThe p-th element representing Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, γ (E) represents a normalization factor,

the parameters representing the kth self-attention layer convolution kernel,

the deviation of the kth self-attention layer convolution kernel;

the method for constructing the pooling layer comprises the following steps: inputting the fusion feature I into a pooling layer with a pooling kernel size of c × c and a sliding stride of a, wherein the expression of the pooling layer is as follows:

Z＝MeanPooling2D(I)

In the invention, in the self-attention layer, firstly, a dependency matrix Y between each element in a feature mapping E is calculated, the dependency matrix Y is fused into the feature mapping E, and global features are introduced for local features extracted from each receptive field in convolution operation, so that richer information is brought to the following convolution layer; wherein g (-) represents a mapping function for computing a feature vector of a point.

In the embodiment of the present invention, as shown in fig. 1, in step S23, samples { h } are generated_u,h_u+1…,h_u+tCorresponding reconstructed data

Using matrices formed from reconstructed data

Expressed, the calculation formula is:

representing the parameters of the kth convolution kernel in layer l-1,

parameters representing the kth convolution kernel in the lth layer,

indicating the deviation of the kth convolution kernel in the L-th layer.

In the embodiment of the present invention, as shown in fig. 1, in step S24, the calculation formula of the reconstruction error rec _ error is as follows:

wherein, X represents an input matrix,

representing a reconstructed data matrix.

In the embodiment of the present invention, as shown in fig. 1, in step S25, calculating a distance C (p (z)) between a distribution p (z) of a known analytic expression and a second data distribution q (z), q (z)) includes the following sub-steps:

s251: setting a distribution p (z) of a known analytical formula;

S254: representing low-dimensional hidden features

And

as the input of the discriminator, and outputs the discrimination result;

is shown in

Data distribution of

The mathematical expectation value is calculated.

In the present invention, to enable a decoder to generate new mask vectors

The decoder must receive the other vectors from q (z)

In the embodiment of the present invention, as shown in fig. 1, in step S26, the target function expression of the self-attention encoder is rec _ error; the objective function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is

Where rec _ error represents sample h_uAnd reconstructing the sample

is shown in

Data distribution of

The mathematical expectation value is calculated.

In the present invention, after training is completed, the encoder can convert the mask vector into a low-dimensional hidden feature representation that obeys the data distribution p (z), and the decoder can convert the sample data from p (z) into useful distribution information. Since the analytic expression of p (z) is known, a large amount of low-dimensional hidden feature representation data can be uniformly obtained from the p (z), and more comprehensive distribution information about a high-dimensional sparse scoring matrix is obtained through conversion of a decoder.

In the embodiment of the present invention, as shown in fig. 1, step S3 includes the following sub-steps:

S33: sample to be reconstructed

And a score vector r of the user_uPerforming fusion to obtain fusion data

The calculation formula for fusion is:

wherein h is_uAn indication sample, < > indicates a logical operation;

s34: fusing data

The calculation formula is as follows:

wherein,

s35: vector prediction scores

wherein,

representing a prediction score vector of the user u after sparsification;

in step S36, training the discriminator according to the discrimination result, where the target function expression is:

wherein, J^DisAn objective function of the discriminator is represented,

is shown in

wherein, J^GenAn objective function of the generator is represented,

is shown in

Data distribution p of_gThe mathematical expectation value is calculated as above,

representing the predicted score of user u for item i,

representing a known set of scoring data.

In the present invention, in order to learn the features of the score data in R, first assume a row vector { R ] in R₁,…,r_u,…,r_nObey the true data distribution p_r(ii) a Establishing a self-attention confrontation neural network model, wherein a generator uses a self-attention self-encoder structure, and a discriminator uses a full-connection neural network; to enable the generator to generate a true score vector, it is assumed that the predicted score vector generated by the generator obeys the generator distribution p_g. If can enable p_gAnd p_rThe same, the prediction score vector generated by the generator is real at this time; to make p be_gClose to p_rUntil the same, the distance between the two distributions is taken as the objective function of the model. Updating the parameters of the generator by using the target function to ensure that the distribution distance is smaller and smaller; due to p_gAnd p_rThe analytical formula (2) is unknown, and the distance between the two cannot be calculated by a clear formula. Therefore, the difference between the sample data of the two distributions and the sample data of the real distribution is estimated through the discriminator and then is substituted into the Wasserstein distance formula to approximately calculate p_gAnd p_rThe distance of (d); to obtain sample data for the producer distribution, the input to the producer is first obtained. Decoder sample mask vector learned from S2

And the mask vector and the score vector r of the user_uAnd fusion, namely providing more distribution information for the scoring data, and taking the fused data as the input of the generator. The generator calculates the prediction scoring vector of each user

Vector prediction scores

And a truth score vector { r₁,…,r_u,…,r_nThe data is used as the input of a discriminator for evaluating the difference between the two types of data and the real scoring data; since the true score vector has only a small amount of known score data, a large number of unknown scores are filled with 0, and the predicted score values of all the predicted score vectors are difficult for the discriminator to discriminate the true and false of the two. Therefore, the prediction score vector is thinned by using the mask vector, only the prediction score corresponding to the known score data in the prediction score vector is reserved, so that the consent is kept in the form, and only the characteristics of the true and false score data are reserved on the data, thereby being convenient for being distinguished by a discriminator. The sparse prediction score vector and the real score vector are used as the input of a discriminator, and the discrimination result is output; and finally training a model: the discriminants are trained first, the generators are trained second, and regularization (which represents the mean square error between the prediction scores and the true scores) is used until the model converges.

In a specific implementation process, the hyper-parameters of the model influence the performance of the model, and the hidden layer dimension and the number of hidden layers of the generator need to be carefully adjusted. In addition, when training the self-attention-confrontation neural network, a regularization term is added to the objective function, the regularization term is quite related to the prediction precision of the model, and the sparseness of the regularization term needs to be carefully selected to balance the proportion between the regularization term and the distribution distance.

In summary, the invention considers the scoring matrix for solving the high-dimensional sparsity from two aspects of the distribution characteristics and the scoring data characteristics of the high-dimensional sparse matrix. First, the present invention combines the self-attention mechanism with the discrimination autoencoder and provides a specific method for application in the recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data. Meanwhile, the model adopts a convolution neural network to extract the distribution characteristics of the local area in the mask matrix. Besides, the dependency relationship between all data in the mask matrix is calculated by using a self-attention mechanism, and the global distribution characteristic is obtained. And finally, the distribution characteristics of the local area and the global distribution characteristics are fused to train the model, so that the distribution information of the mask matrix can be effectively and comprehensively obtained. Secondly, the method establishes a prediction model based on the antagonistic neural network to estimate the missing score data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of the objective function of the antagonistic neural network, and the prediction precision of the model is improved.

The working principle and the process of the invention are as follows: the recommendation system scoring prediction method based on the self-attention confrontation neural network provided by the invention utilizes the whole data distribution characteristic of the self-attention confrontation neural network learning high-dimensional sparse matrix and utilizes the self-attention mechanism and the convolution neural network learning relationship between the local region scoring data of the high-dimensional sparse matrix, thereby being beneficial to improving the prediction precision.

The invention has the beneficial effects that:

(2) The invention establishes a prediction model based on the antagonistic neural network to estimate the missing scoring data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of an anti-neural network target function, and the prediction accuracy of the model is improved.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A recommendation system scoring prediction method based on a self-attention confrontation neural network is characterized by comprising the following steps:

s4: evaluating a high-dimensional sparse scoring matrix by using the trained scoring prediction model to complete the scoring prediction of the project by the user;

the step S1 includes the following sub-steps:

Item set

And user scoring set of items

Wherein n denotes the number of users, m denotes the number of items, u₁,u₂,…,u_nRepresenting the 1 st to nth users, i₁,i₂,…,i_mDenotes the 1 st to m-th items, s_u,iRepresents the score of the user u on the item i, and v represents the maximum value of the score;

s12: aggregation of ratings from user to items

represents the set of known elements in the R,

representing a set of unknown elements.

2. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 1, wherein the step S2 comprises the following sub-steps:

s21: let each row vector { H in the mask matrix H₁,…,h_u,…,h_nObey the first data distribution q (h);

s22: defining a self-attention encoder, samples h from a first data distribution q (h)_uE H is converted into corresponding low-dimensional hidden feature representation z_uWherein z is_uOne sample of the second data distribution q (z);

S24: calculate sample h_uAnd reconstructing the sample

The reconstruction error rec _ error in between;

3. The self-attention-directed neural network-based recommendation system score prediction method of claim 2, wherein the step S22 comprises the sub-steps of:

4. The method according to claim 3, wherein in step S222, the self-attention encoder comprises a convolutional layer, a self-attention layer and a pooling layer;

the method for constructing the convolutional layer comprises the following steps: the convolutional layer comprises K1 × 1 convolutional kernels, and the feature mapping E of the input matrix X is extracted by using the convolutional kernels, and the calculation formula is as follows:

wherein, represents a two-dimensional convolution calculation,

the parameters representing the k-th convolution kernel,

the method for constructing the self-attention layer comprises the following steps: a dependency matrix Y between each element in the feature map E is calculated,and the dependency relationship matrix Y is fused into a feature mapping E to obtain a fusion feature I, wherein each element Y in the feature mapping E_pAnd the calculation formula of the fusion characteristic I is respectively as follows:

wherein e is_pP-th element, E, representing a feature map E_qQ-th element, y, representing a feature map E_pRepresents the p-th element of Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, Y (E) represents a normalization factor,

the parameters representing the kth self-attention layer convolution kernel,

the deviation of the kth self-attention layer convolution kernel;

Z＝MeanPooling2D(I)

5. The method for predicting the recommender system score based on the self-attention-directed neural network as claimed in claim 2, wherein in step S23, samples { h } are generated_u,h_u+1…,h_u+tCorresponding reconstructed data

Using matrices formed from reconstructed data

Expressed, the calculation formula is:

representing the parameters of the kth convolution kernel in layer l-1,

represents the deviation of the kth convolution kernel in the L-1 layer, L represents the number of deconvolution layers, Z represents a low-dimensional hidden feature matrix, UpSampling2D (·) represents an upsampled layer,

parameters representing the kth convolution kernel in the lth layer,

indicating the deviation of the kth convolution kernel in the L-th layer.

6. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 2, wherein in the step S24, the calculation formula of the reconstruction error rec _ error is as follows:

wherein, X represents an input matrix,

representing a reconstructed data matrix.

7. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 2, wherein the step S25 of calculating the distance C (p (z), q (z)) between the distribution p (z) of the known analytic expression and the second data distribution q (z) comprises the following sub-steps:

s251: setting a distribution p (z) of a known analytical formula;

S254: representing low-dimensional hidden features

And z₁,…,z_tThe result is used as the input of the discriminator, and the discrimination result is output;

s253: based on the determination result, a distance C (p (z), q (z)) between the distribution p (z) of the known analytical formula and the second data distribution q (z) is calculated, wherein the calculation formula is:

is shown in

Data distribution of

The mathematical expectation value is calculated.

8. The method for predicting the scoring of the recommendation system based on the self-attention confrontation neural network as claimed in claim 2, wherein in the step S26, the objective function expression of the self-attention encoder is rec _ error; the target function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is

Where rec _ error represents sample h_uAnd reconstructing the sample

is shown in

Data distribution of

The mathematical expectation value is calculated.

9. The self-attention-directed neural network-based recommendation system score prediction method of claim 2, wherein the step S3 comprises the sub-steps of:

s31: setting row vectors { R } in high-dimensional sparse scoring matrix R₁,…,r_u,…,r_nObey the true data distribution p_rAnd the row vector r₁,…,r_u,…,r_nAs a true score vector, where r_uA score vector representing user u;

S33: sample to be reconstructed

And a score vector r of the user_uPerforming fusion to obtain fusion data

The calculation formula for fusion is:

wherein h is_uAn indication sample, < > indicates a logical operation;

s34: fusing data

The calculation formula is as follows:

wherein,

u 1, …, n, n denotes the number of users, SAAE (·) denotes a self-attention self-encoder;

s35: vector prediction scores

wherein,

representing a prediction score vector of the user u after sparsification;

in step S36, the discriminator is trained according to the discrimination result, and the target function expression is:

wherein, J^DisAn objective function of the discriminator is represented,

is shown in

Generator distribution p of_gCalculating a mathematical expected value, wherein Dis (·) represents a discrimination result input by the discriminator;

wherein, J^GenAn objective function that represents the generator is determined,

is shown in

Data distribution p of_gThe mathematical expectation value is calculated as above and,

representing the predicted score of user u for item i,

representing a known set of scoring data.