CN112784173B - Recommendation system scoring prediction method based on self-attention confrontation neural network - Google Patents

Recommendation system scoring prediction method based on self-attention confrontation neural network Download PDF

Info

Publication number
CN112784173B
CN112784173B CN202110217932.1A CN202110217932A CN112784173B CN 112784173 B CN112784173 B CN 112784173B CN 202110217932 A CN202110217932 A CN 202110217932A CN 112784173 B CN112784173 B CN 112784173B
Authority
CN
China
Prior art keywords
self
attention
distribution
matrix
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110217932.1A
Other languages
Chinese (zh)
Other versions
CN112784173A (en
Inventor
马康康
王庆先
黄庆
常奥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110217932.1A priority Critical patent/CN112784173B/en
Publication of CN112784173A publication Critical patent/CN112784173A/en
Application granted granted Critical
Publication of CN112784173B publication Critical patent/CN112784173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a recommendation system score prediction method based on a self-attention confrontation neural network, which comprises the following steps of: s1: collecting user information, project information and project scoring data of a user, and constructing a high-dimensional sparse scoring matrix and a corresponding masking matrix; s2: generating distribution information on a high-dimensional sparse scoring matrix; s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model; s4: and evaluating the high-dimensional sparse scoring matrix to complete the scoring prediction of the user on the project. The invention combines the self-attention mechanism and the discrimination automatic encoder and provides a specific method applied to a recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data.

Description

Recommendation system scoring prediction method based on self-attention confrontation neural network
Technical Field
The invention belongs to the technical field of recommendation systems, and particularly relates to a recommendation system score prediction method based on a self-attention confrontation neural network.
Background
The rapid development of the internet causes the problem of information overload, and the efficiency of obtaining useful information by a user is seriously influenced. To solve this problem, recommendation system technology has gained a great deal of research effort. In a recommendation system, user-item scoring data is the underlying data source. Because there are a large number of users and projects in the system, it is not possible for each user to rate all projects, and so there is very little scoring data. These scoring data are typically represented using a high-dimensional sparse matrix, in which only a small number of elements are known. To solve this high-dimensional sparse matrix, a number of collaborative filtering-based methods are proposed. The methods mainly utilize the existing evaluation data to extract the low-dimensional hidden feature representation of the users and the projects, and have the following defects: firstly, the relationship between the score data of the local regions in the high-dimensional sparse matrix data is not fully utilized; secondly, the overall distribution characteristics of the scoring data in the high-dimensional sparse matrix are not considered.
Disclosure of Invention
The invention aims to solve the problem of score prediction among user items and provides a recommendation system score prediction method based on a self-attention confrontation neural network.
The technical scheme of the invention is as follows: a recommendation system scoring prediction method based on a self-attention confrontation neural network comprises the following steps:
s1: collecting user information, project information and project scoring data of a user, and constructing a high-dimensional sparse scoring matrix and a corresponding masking matrix;
s2: extracting the distribution characteristics of a mask matrix by using a self-attention encoder to generate distribution information about a high-dimensional sparse scoring matrix;
s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model according to distribution information and a high-dimensional sparse scoring matrix;
s4: and evaluating the high-dimensional sparse scoring matrix by using the trained scoring prediction model to complete the scoring prediction of the project by the user.
The invention has the beneficial effects that:
(1) the invention combines the self-attention mechanism and the discrimination automatic encoder and provides a specific method applied to a recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data. Meanwhile, the model adopts a convolution neural network to extract the distribution characteristics of the local area in the mask matrix. Besides, the dependency relationship between all data in the mask matrix is calculated by using a self-attention mechanism, and the global distribution characteristic is obtained. And finally, the distribution characteristics of the local area and the global distribution characteristics are fused to train the model, so that the distribution information of the mask matrix can be effectively and comprehensively acquired.
(2) The invention establishes a prediction model based on the antagonistic neural network to estimate the missing scoring data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of the objective function of the antagonistic neural network, and the prediction precision of the model is improved.
Further, step S1 includes the following sub-steps:
s11: collecting user information, project information and project grading data of users to obtain a user set
Figure BDA0002954630830000021
Item set
Figure BDA0002954630830000022
And user scoring set of items
Figure BDA0002954630830000023
Wherein n denotes the number of users, m denotes the number of items, u1,u2,…,unRepresenting 1 st to nth users, i1,i2,…,imDenotes the 1 st to m-th items, su,iRepresents the score of the user u on the item i, and v represents the maximum value of the score;
s12: aggregation of ratings from user to items
Figure BDA0002954630830000031
Constructing a high-dimensional sparse scoring matrix R, wherein each element Ru,iThe expression of (a) is:
Figure BDA0002954630830000032
s13: according to the high-dimensional sparse scoring matrix R, constructing a corresponding mask matrix H epsilon {0,1}n×mWherein each element hu,iThe expression of (a) is:
Figure BDA0002954630830000033
wherein 1 indicates that the user u has a known score for the item i, 0 indicates that the user u has an unknown score for the item i,
Figure BDA0002954630830000034
represents the set of known elements in the R,
Figure BDA0002954630830000035
representing a set of unknown elements.
The beneficial effects of the further scheme are as follows: in the invention, a recommendation system model is designed aiming at the internet application based on the grading feedback, and is used for predicting the missing grading data and providing the possibly interested items for the user. First, user information, project information, and user rating data for projects, such as movie rating, joke rating, and web service quality rating, need to be collected from a real application. Matrix array
Figure BDA0002954630830000036
In (1),
Figure BDA0002954630830000037
and
Figure BDA0002954630830000038
respectively represent a known element set and an unknown element set in R, because
Figure BDA0002954630830000039
Thus R is a high-dimensional sparse matrix. The mask matrix H can reflect the overall distribution characteristics of the known scores in R, and each row vector HuE H can reflect the distribution characteristics of the scoring data of the user u.
Further, step S2 includes the following sub-steps:
s21: setting each row vector { H) in the mask matrix H1,…,hu,…,hnObey the first data distribution q (h);
s22: defining a self-attention encoder for distributing samples h from a first data distribution q (h)uE.g. H turnTransformation into a corresponding low-dimensional hidden feature representation zuWherein z isuOne sample of the second data distribution q (z);
s23: representing the low-dimensional hidden features from the second data distribution q (z) as zuAs input to a self-attention decoder, and generates samples huOf the reconstructed sample
Figure BDA0002954630830000041
S24: calculate sample huAnd reconstructing the sample
Figure BDA0002954630830000042
The reconstruction error rec _ error in between;
s25: setting a distribution p (z) of a known analytic expression, and training a self-attention encoder and a self-attention decoder according to the distance between the distribution p (z) of the known analytic expression and a second data distribution q (z);
s26: and converting the mask vector into a low-dimensional hidden feature representation of the distribution p (z) conforming to the known analytic expression by using a trained self-attention encoder, and converting sample data of the distribution p (z) of the known analytic expression into distribution information by using a trained self-attention decoder to generate the distribution information about the high-dimensional sparse scoring matrix.
The beneficial effects of the further scheme are as follows: in the present invention, assume that each row vector H in H1,…,hu,…,hnObey a data distribution q (h). The corresponding low dimensional hidden feature matrix is represented as
Figure BDA0002954630830000043
Wherein the row vector z1,…,zu,…,znD represents the dimension of the hidden feature representation. The low-dimensional hidden feature representation is assumed to follow an analytically unknown data distribution q (z).
Further, step S22 includes the following sub-steps:
s221: randomly sampling t mask vectors { H) from a mask matrix H by using a small batch gradient descent algorithmu,hu+1…,hu+tForming an input matrix X;
s222: defining a self-attention encoder, taking an input matrix X as an input of the self-attention encoder, and distributing samples h from a first data distribution q (h)uE H is converted into corresponding low-dimensional hidden feature representation zuWherein z isuIs a sample of the second data distribution q (z).
Further, in step S222, the self-attention encoder includes a convolutional layer, a self-attention layer, and a pooling layer;
the method for constructing the convolutional layer comprises the following steps: the convolutional layer contains K1 × 1 convolutional kernels, and the feature map E of the input matrix X is extracted by using the convolutional kernels, and the calculation formula is as follows:
Figure BDA0002954630830000044
wherein, represents a two-dimensional convolution calculation,
Figure BDA0002954630830000051
the parameters representing the k-th convolution kernel,
Figure BDA0002954630830000052
represents the deviation of the kth convolution kernel, σ (·) represents the activation function;
the method for constructing the self-attention layer comprises the following steps: calculating a dependency relationship matrix Y between each element in the feature mapping E, and fusing the dependency relationship matrix Y into the feature mapping E to obtain a fused feature I, wherein each element Y in the feature mapping EpAnd the calculation formula of the fusion characteristic I is respectively as follows:
Figure BDA0002954630830000053
Figure BDA0002954630830000054
wherein e ispP-th one representing a feature map EElement, eqQ element, y, representing feature map EpThe p-th element representing Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, γ (E) represents a normalization factor,
Figure BDA0002954630830000055
the parameters representing the kth self-attention layer convolution kernel,
Figure BDA0002954630830000056
the deviation of the kth self-attention layer convolution kernel;
the method for constructing the pooling layer comprises the following steps: inputting the fusion feature I into a pooling layer with a pooling kernel size of c × c and a sliding step of a, wherein the expression of the pooling layer is as follows:
Z=MeanPooling2D(I)
where MeanPooling2D (. cndot.) represents average pooling, Z represents a low-dimensional implicit feature matrix, and each row vector { Z ·u,zu+1…,zu+tDenotes a mask vector hu,hu+1…,hu+tThe low dimensional implicit feature representation of.
The beneficial effects of the above further scheme are: in the invention, in a self-attention layer, firstly, a dependency matrix Y between each element in a feature mapping E is calculated, the dependency matrix Y is fused into the feature mapping E, and global features are introduced for local features extracted from each receptive field in convolution operation, so that richer information is brought to a subsequent convolution layer; wherein g (-) represents a mapping function for computing a feature vector of a point.
Further, in step S23, a sample { h } is generatedu,hu+1…,hu+tCorresponding reconstructed data
Figure BDA0002954630830000061
Using matrices formed from reconstructed data
Figure BDA0002954630830000062
Expressed, the calculation formula is:
Figure BDA0002954630830000063
Figure BDA0002954630830000064
wherein D is0Input of a representation model, Dl-1Denotes the output of layer l-1, DlRepresents the output of the l-th layer, UlRepresents the output of the l-th upsampling layer, UL-1Represents the output of the L-1 st upsampling layer, K represents the number of convolution kernels, σ (-) represents the activation function,
Figure BDA0002954630830000065
representing the parameters of the kth convolution kernel in layer l-1,
Figure BDA0002954630830000066
represents the deviation of the kth convolution kernel in the L-1 layer, L represents the number of deconvolution layers, Z represents the low-dimensional hidden feature matrix, UpSamplling 2D (·) represents the upsampled layer,
Figure BDA0002954630830000067
parameters representing the kth convolution kernel in the lth layer,
Figure BDA0002954630830000068
indicating the deviation of the kth convolution kernel in the L-th layer.
DlAnd UlThe intermediate result of the calculation step is shown, so that the detailed process of the calculation step is convenient to explain, and the calculation reconstruction data has a direct relation.
Further, in step S24, the calculation formula of the reconstruction error rec _ error is:
Figure BDA0002954630830000069
wherein, X represents an input matrix,
Figure BDA00029546308300000610
representing a reconstructed data matrix.
Further, in step S25, the step of calculating the distance C (p (z)) between the distribution p (z) of the known analytic expression and the second data distribution q (z), q (z) includes the following sub-steps:
s251: setting a distribution p (z) of a known analytical formula;
s252: establishing a discriminator by utilizing a full-connection neural network;
s253: randomly sampling t low-dimensional hidden feature representations from a distribution p (z) of known analytic expressions
Figure BDA00029546308300000611
Randomly sampling t low-dimensional hidden feature representations { z ] from the distribution q (z)1,…,zt},
S254: representing low-dimensional hidden features
Figure BDA00029546308300000612
And
Figure BDA00029546308300000613
as the input of the discriminator, and outputs the discrimination result;
s253: based on the determination result, the distance C (p (z), q (z)) between the distribution p (z) of the known analytical formula and the second data distribution q (z) is calculated by the following formula:
Figure BDA0002954630830000071
wherein D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure BDA0002954630830000072
indicating that a mathematical expectation is calculated on a second data distribution q (z) of z,
Figure BDA0002954630830000073
is shown in
Figure BDA0002954630830000074
Data distribution of
Figure BDA0002954630830000075
The mathematical expectation value is calculated.
The beneficial effects of the further scheme are as follows: in the present invention, to enable a decoder to generate new mask vectors
Figure BDA0002954630830000076
The decoder must receive the other vectors from q (z)
Figure BDA0002954630830000077
However, the analytical formula of q (z) is unknown, and it is difficult to directly obtain other data samples. For indirectly obtaining other samples z of the q (z) distribution*Assuming that a distribution p (z) for which the analytic expression is known is the equivalent distribution of q (z), the samples in q (z) are obeyed to p (z) and the samples in p (z) are also obeyed to q (z). At this time, a new mask vector can be generated by using the samples in p (z) as input data of the decoder. To satisfy this assumption, the distance between q (z) and p (z) is taken as part of the encoder and decoder objective function, with which the parameters of the encoder and decoder are updated, enabling the encoder to convert the data distribution q (h) to an equivalent distribution p (z) of q (z), and the decoder to convert the low-dimensional hidden feature representation output by the encoder to a new mask vector. After training, the transformed low-dimensional implicit features represent samples that are more and more similar to p (z).
Further, in step S26, the target function expression of the self-attention encoder is rec _ error; the target function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is
Figure BDA0002954630830000078
Where rec _ error represents sample huAnd reconstructing the sample
Figure BDA0002954630830000079
C (p (z), q (z)) represents the distance between the distribution p (z) of the known analytical formula and the second data distribution q (z), D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure BDA00029546308300000710
is shown in
Figure BDA00029546308300000711
Data distribution of
Figure BDA00029546308300000712
The mathematical expectation value is calculated.
The beneficial effects of the further scheme are as follows: in the present invention, after training is completed, the encoder can convert the mask vector into a low-dimensional hidden feature representation that obeys the data distribution p (z), and the decoder can convert the sample data from p (z) into useful distribution information. Since the analytic expression of p (z) is known, a large amount of low-dimensional hidden feature representation data can be uniformly obtained from the low-dimensional hidden feature representation data, and more comprehensive distribution information about a high-dimensional sparse scoring matrix is obtained through the conversion of a decoder.
Further, step S3 includes the following sub-steps:
s31: setting row vectors { R) in high-dimensional sparse scoring matrix R1,…,ru,…,rnObey the true data distribution prAnd the row vector r1,…,ru,…,rnAs a true score vector, where ruA score vector representing user u;
s32: constructing a self-attention confrontation neural network model, wherein a generator of the self-attention confrontation neural network model is in a self-attention self-encoder structure, a discriminator is in a fully-connected neural network, and a prediction score vector generated by the generator is set to obey the generator distribution pg
S33: sample to be reconstructed
Figure BDA0002954630830000081
And a score vector r of the useruPerforming fusion to obtain fusion data
Figure BDA0002954630830000082
The calculation formula for fusion is:
Figure BDA0002954630830000083
wherein h isuAn indication sample, < > indicates a logical operation;
s34: fusing data
Figure BDA0002954630830000084
As input to the generator, a prediction score vector for each user is calculated
Figure BDA0002954630830000085
The calculation formula is as follows:
Figure BDA0002954630830000086
wherein,
Figure BDA0002954630830000087
n represents the number of users, SAAE (-) represents the self-attention self-encoder;
s35: vector prediction scores
Figure BDA0002954630830000088
And a truth score vector { r1,…,ru,…,rnAs the input of the discriminator, the real data distribution p is discriminatedrSum generator distribution pgThe difference between the grading sample data and the real grading data and outputting a judgment result;
s36: training a scoring prediction model according to the judgment result until the scoring prediction model is converged;
in step S35, the sparsification method is used to facilitate the differentiation by the discriminator, and the expression is:
Figure BDA0002954630830000091
wherein,
Figure BDA0002954630830000092
representing a prediction score vector of the user u after sparsification;
in step S36, the discriminator is trained according to the discrimination result, and the objective function expression is:
Figure BDA0002954630830000093
wherein, JDisAn objective function of the discriminator is represented,
Figure BDA0002954630830000094
is shown at ruTrue data distribution p ofrThe mathematical expectation value is calculated as above,
Figure BDA0002954630830000095
is shown in
Figure BDA0002954630830000096
Generator distribution p ofgCalculating a mathematical expected value, and Dis (·) represents a discrimination result input by the discriminator;
training the generator according to the discrimination result, wherein the target function expression is as follows:
Figure BDA0002954630830000097
wherein, JGenAn objective function of the generator is represented,
Figure BDA0002954630830000098
is shown in
Figure BDA0002954630830000099
Data distribution p ofgUpper calculated mathematical expected value,
Figure BDA00029546308300000910
λ denotes regularization coefficient, ψ denotes regularization, ru,iRepresenting elements in a high-dimensional sparse scoring matrix R,
Figure BDA00029546308300000911
representing the predicted score of user u for item i,
Figure BDA00029546308300000912
representing a known set of scoring data.
The beneficial effects of the further scheme are as follows: in the present invention, in order to learn the features of the score data in R, first assume a row vector { R ] in R1,…,ru,…,rnObey the true data distribution pr(ii) a Establishing a self-attention confrontation neural network model, wherein a generator uses a self-attention self-encoder structure, and a discriminator uses a full-connection neural network; to enable the generator to generate a true score vector, it is assumed that the predicted score vector generated by the generator obeys the generator distribution pg. If can enable pgAnd prThe same, the prediction score vector generated by the generator is real at this time; to make p stand forgClose to prUntil the same, the distance between the two distributions is taken as the objective function of the model. Updating the parameters of the generator by using the target function to ensure that the distribution distance is smaller and smaller; due to pgAnd prThe analytical formula (2) is unknown, and the distance between the two cannot be calculated by a clear formula. Therefore, the difference between the sample data of the two distributions and the sample data of the real distribution is estimated through the discriminator and then is substituted into the Wasserstein distance formula to approximately calculate pgAnd prThe distance of (d); to obtain sample data for the producer distribution, the input to the producer is first obtained. Decoder sample mask vector learned from S2
Figure BDA0002954630830000101
And the mask vector and the score vector r of the useruFusion, providing more distribution information for scoring dataThe fused data is used as the input of the generator. The generator calculates the prediction scoring vector of each user
Figure BDA0002954630830000102
Vector prediction scores
Figure BDA0002954630830000103
And a truth score vector { r1,…,ru,…,rnThe data is used as the input of a discriminator for evaluating the difference between the two types of data and the real scoring data; since the true score vector has only a small amount of known score data, a large number of unknown scores are filled with 0, and the predicted score values of all the predicted score vectors are difficult for the discriminator to discriminate the true and false of the two. Therefore, the prediction score vector is thinned by using the mask vector, only the prediction score corresponding to the known score data in the prediction score vector is reserved, so that the consent is kept in the form, and only the characteristics of the true and false score data are reserved on the data, thereby being convenient for the distinguishing of a discriminator. The sparse prediction score vector and the real score vector are used as the input of a discriminator, and the discrimination result is output; and finally training a model: the discriminants are trained first, the generators are trained second, and regularization (which represents the mean square error between the prediction scores and the true scores) is used until the model converges.
Drawings
FIG. 1 is a flow chart of a recommendation system score prediction method.
Detailed Description
The embodiments of the present invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides a recommendation system score prediction method based on a self-attention confrontation neural network, comprising the following steps:
s1: collecting user information, project information and project grading data of a user, and constructing a high-dimensional sparse grading matrix and a corresponding masking matrix;
s2: extracting the distribution characteristics of a mask matrix by using a self-attention encoder to generate distribution information about a high-dimensional sparse scoring matrix;
s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model according to distribution information and a high-dimensional sparse scoring matrix;
s4: and evaluating the high-dimensional sparse scoring matrix by using the trained scoring prediction model to complete the scoring prediction of the project by the user.
In the embodiment of the present invention, as shown in fig. 1, step S1 includes the following sub-steps:
s11: collecting user information, project information and project grading data of users to obtain a user set
Figure BDA0002954630830000111
Item set
Figure BDA0002954630830000112
And user scoring set of items
Figure BDA0002954630830000113
Wherein n denotes the number of users, m denotes the number of items, u1,u2,…,unRepresenting 1 st to nth users, i1,i2,…,imDenotes the 1 st to m-th items, su,iRepresents the score of the user u on the item i, and v represents the maximum value of the score;
s12: aggregation of ratings according to user to items
Figure BDA0002954630830000114
Constructing a high-dimensional sparse scoring matrix R, wherein each element Ru,iThe expression of (c) is:
Figure BDA0002954630830000115
s13: according to the high-dimensional sparse scoring matrix R, a corresponding mask matrix H epsilon {0,1} is constructedn×mWherein each element hu,iThe expression of (a) is:
Figure BDA0002954630830000116
wherein 1 indicates that the user u has a known score for the item i, 0 indicates that the user u has an unknown score for the item i,
Figure BDA0002954630830000117
represents the set of known elements in the R,
Figure BDA0002954630830000118
representing a set of unknown elements.
In the invention, a recommendation system model is designed aiming at the internet application based on the grading feedback, and is used for predicting the missing grading data and providing the possibly interested items for the user. First, user information, project information, and user rating data for projects, such as movie rating, joke rating, and web service quality rating, need to be collected from a practical application. Matrix array
Figure BDA0002954630830000121
In (1),
Figure BDA0002954630830000122
and
Figure BDA0002954630830000123
respectively represent a known element set and an unknown element set in R, because
Figure BDA0002954630830000124
Thus R is a high-dimensional sparse matrix. The mask matrix H can reflect the overall distribution characteristics of the known scores in R, and each row vector HuE H can reflect the distribution characteristics of the scoring data of the user u.
In the embodiment of the present invention, as shown in fig. 1, step S2 includes the following sub-steps:
s21: setting each row vector { H) in the mask matrix H1,…,hu,…,hnObey the first data distribution q (h);
s22: defining a self-attention encoder to receive a first number from a first numberSample h according to distribution q (h)uE H is converted into corresponding low-dimensional hidden feature representation zuWherein z isuOne sample of the second data distribution q (z);
s23: representing the low-dimensional hidden features from the second data distribution q (z) as zuAs input to a self-attention decoder, and generates samples huOf the reconstructed sample
Figure BDA0002954630830000125
S24: calculate sample huAnd reconstructing the sample
Figure BDA0002954630830000126
The reconstruction error rec _ error in between;
s25: setting a distribution p (z) of a known analytic expression, and training a self-attention encoder and a self-attention decoder according to the distance between the distribution p (z) of the known analytic expression and a second data distribution q (z);
s26: and converting the mask vector into a low-dimensional hidden feature representation of the distribution p (z) conforming to the known analytic expression by using a trained self-attention encoder, and converting sample data of the distribution p (z) of the known analytic expression into distribution information by using a trained self-attention decoder to generate the distribution information about the high-dimensional sparse scoring matrix.
In the present invention, assume that each row vector H in H1,…,hu,…,hnObey a data distribution q (h). The corresponding low dimensional hidden feature matrix is represented as
Figure BDA0002954630830000127
Wherein the row vector z1,…,zu,…,znD represents the dimension of the hidden feature representation. It is assumed that the low dimensional implicit features represent a data distribution q (z) subject to an analytical unknowns.
In the embodiment of the present invention, as shown in fig. 1, step S22 includes the following sub-steps:
s221: randomly sampling t mask vectors { H) from a mask matrix H by using a small batch gradient descent algorithmu,hu+1…,hu+tForming an input matrix X;
s222: defining a self-attention encoder, taking an input matrix X as an input of the self-attention encoder, and distributing samples h from a first data distribution q (h)uE H is converted into corresponding low-dimensional hidden feature representation zuWherein z isuIs a sample of the second data distribution q (z).
In the embodiment of the present invention, as shown in fig. 1, in step S222, the self-attention encoder includes a convolutional layer, a self-attention layer, and a pooling layer;
the method for constructing the convolutional layer comprises the following steps: the convolutional layer contains K1 × 1 convolutional kernels, and the feature map E of the input matrix X is extracted by using the convolutional kernels, and the calculation formula is as follows:
Figure BDA0002954630830000131
wherein, represents a two-dimensional convolution calculation,
Figure BDA0002954630830000132
the parameters representing the k-th convolution kernel,
Figure BDA0002954630830000133
represents the deviation of the kth convolution kernel, σ (·) represents the activation function;
the method for constructing the self-attention layer comprises the following steps: calculating a dependency relationship matrix Y between each element in the feature mapping E, and fusing the dependency relationship matrix Y into the feature mapping E to obtain a fused feature I, wherein each element Y in the feature mapping EpAnd the calculation formula of the fusion characteristic I is respectively as follows:
Figure BDA0002954630830000134
Figure BDA0002954630830000135
wherein e ispP-th element, E, representing a feature map EqQ-th element, y, representing a feature map EpThe p-th element representing Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, γ (E) represents a normalization factor,
Figure BDA0002954630830000136
the parameters representing the kth self-attention layer convolution kernel,
Figure BDA0002954630830000137
the deviation of the kth self-attention layer convolution kernel;
the method for constructing the pooling layer comprises the following steps: inputting the fusion feature I into a pooling layer with a pooling kernel size of c × c and a sliding stride of a, wherein the expression of the pooling layer is as follows:
Z=MeanPooling2D(I)
where MeanPooling2D (. cndot.) represents average pooling, Z represents a low-dimensional implicit feature matrix, and each row vector { Z ·u,zu+1…,zu+tDenotes a mask vector hu,hu+1…,hu+tThe low dimensional implicit feature representation of.
In the invention, in the self-attention layer, firstly, a dependency matrix Y between each element in a feature mapping E is calculated, the dependency matrix Y is fused into the feature mapping E, and global features are introduced for local features extracted from each receptive field in convolution operation, so that richer information is brought to the following convolution layer; wherein g (-) represents a mapping function for computing a feature vector of a point.
In the embodiment of the present invention, as shown in fig. 1, in step S23, samples { h } are generatedu,hu+1…,hu+tCorresponding reconstructed data
Figure BDA0002954630830000141
Using matrices formed from reconstructed data
Figure BDA0002954630830000142
Expressed, the calculation formula is:
Figure BDA0002954630830000143
Figure BDA0002954630830000144
wherein D is0Input of a representation model, Dl-1Denotes the output of layer l-1, DlRepresents the output of the l-th layer, UlRepresents the output of the l-th upsampling layer, UL-1Represents the output of the L-1 st upsampling layer, K represents the number of convolution kernels, σ (-) represents the activation function,
Figure BDA0002954630830000145
representing the parameters of the kth convolution kernel in layer l-1,
Figure BDA0002954630830000146
represents the deviation of the kth convolution kernel in the L-1 layer, L represents the number of deconvolution layers, Z represents the low-dimensional hidden feature matrix, UpSamplling 2D (·) represents the upsampled layer,
Figure BDA0002954630830000147
parameters representing the kth convolution kernel in the lth layer,
Figure BDA0002954630830000148
indicating the deviation of the kth convolution kernel in the L-th layer.
DlAnd UlThe intermediate result of the calculation step is shown, so that the detailed process of the calculation step is convenient to explain, and the calculation reconstruction data has a direct relation.
In the embodiment of the present invention, as shown in fig. 1, in step S24, the calculation formula of the reconstruction error rec _ error is as follows:
Figure BDA0002954630830000151
wherein, X represents an input matrix,
Figure BDA0002954630830000152
representing a reconstructed data matrix.
In the embodiment of the present invention, as shown in fig. 1, in step S25, calculating a distance C (p (z)) between a distribution p (z) of a known analytic expression and a second data distribution q (z), q (z)) includes the following sub-steps:
s251: setting a distribution p (z) of a known analytical formula;
s252: establishing a discriminator by utilizing a full-connection neural network;
s253: randomly sampling t low-dimensional hidden feature representations from a distribution p (z) of known analytic expressions
Figure BDA0002954630830000153
Randomly sampling t low-dimensional hidden feature representations { z ] from the distribution q (z)1,…,zt},
S254: representing low-dimensional hidden features
Figure BDA0002954630830000154
And
Figure BDA0002954630830000155
as the input of the discriminator, and outputs the discrimination result;
s253: based on the determination result, the distance C (p (z), q (z)) between the distribution p (z) of the known analytical formula and the second data distribution q (z) is calculated by the following formula:
Figure BDA0002954630830000156
wherein D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure BDA0002954630830000157
indicating that a mathematical expectation is calculated on a second data distribution q (z) of z,
Figure BDA0002954630830000158
is shown in
Figure BDA0002954630830000159
Data distribution of
Figure BDA00029546308300001510
The mathematical expectation value is calculated.
In the present invention, to enable a decoder to generate new mask vectors
Figure BDA00029546308300001511
The decoder must receive the other vectors from q (z)
Figure BDA00029546308300001512
However, the analytical formula of q (z) is unknown, and it is difficult to directly obtain other data samples. For indirectly obtaining other samples z of the q (z) distribution*Assuming that a distribution p (z) for which the analytic expression is known is the equivalent distribution of q (z), the samples in q (z) are obeyed to p (z) and the samples in p (z) are also obeyed to q (z). At this time, a new mask vector can be generated by using the samples in p (z) as input data of the decoder. To satisfy this assumption, the distance between q (z) and p (z) is taken as part of the encoder and decoder objective function, with which the parameters of the encoder and decoder are updated, enabling the encoder to convert the data distribution q (h) to an equivalent distribution p (z) of q (z), and the decoder to convert the low-dimensional hidden feature representation output by the encoder to a new mask vector. After training, the transformed low-dimensional implicit features represent samples that are more and more similar to p (z).
In the embodiment of the present invention, as shown in fig. 1, in step S26, the target function expression of the self-attention encoder is rec _ error; the objective function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is
Figure BDA0002954630830000161
Where rec _ error represents sample huAnd reconstructing the sample
Figure BDA0002954630830000162
C (p (z), q (z)) represents the distance between the distribution p (z) of the known analytical formula and the second data distribution q (z), D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure BDA0002954630830000163
is shown in
Figure BDA0002954630830000164
Data distribution of
Figure BDA0002954630830000165
The mathematical expectation value is calculated.
In the present invention, after training is completed, the encoder can convert the mask vector into a low-dimensional hidden feature representation that obeys the data distribution p (z), and the decoder can convert the sample data from p (z) into useful distribution information. Since the analytic expression of p (z) is known, a large amount of low-dimensional hidden feature representation data can be uniformly obtained from the p (z), and more comprehensive distribution information about a high-dimensional sparse scoring matrix is obtained through conversion of a decoder.
In the embodiment of the present invention, as shown in fig. 1, step S3 includes the following sub-steps:
s31: setting row vectors { R) in high-dimensional sparse scoring matrix R1,…,ru,…,rnObey the true data distribution prAnd the row vector r1,…,ru,…,rnAs a true score vector, where ruA score vector representing user u;
s32: constructing a self-attention confrontation neural network model, wherein a generator of the self-attention confrontation neural network model is in a self-attention self-encoder structure, a discriminator is in a fully-connected neural network, and a prediction score vector generated by the generator is set to obey the generator distribution pg
S33: sample to be reconstructed
Figure BDA0002954630830000166
And a score vector r of the useruPerforming fusion to obtain fusion data
Figure BDA0002954630830000167
The calculation formula for fusion is:
Figure BDA0002954630830000168
wherein h isuAn indication sample, < > indicates a logical operation;
s34: fusing data
Figure BDA0002954630830000171
As input to the generator, a prediction score vector for each user is calculated
Figure BDA0002954630830000172
The calculation formula is as follows:
Figure BDA0002954630830000173
wherein,
Figure BDA0002954630830000174
n represents the number of users, SAAE (-) represents the self-attention self-encoder;
s35: vector prediction scores
Figure BDA0002954630830000175
And a truth score vector { r1,…,ru,…,rnAs the input of the discriminator, the real data distribution p is discriminatedrSum generator distribution pgThe difference between the grading sample data and the real grading data and outputting a judgment result;
s36: training a scoring prediction model according to the judgment result until the scoring prediction model is converged;
in step S35, the sparsification method is used to facilitate the differentiation by the discriminator, and the expression is:
Figure BDA0002954630830000176
wherein,
Figure BDA0002954630830000177
representing a prediction score vector of the user u after sparsification;
in step S36, training the discriminator according to the discrimination result, where the target function expression is:
Figure BDA0002954630830000178
wherein, JDisAn objective function of the discriminator is represented,
Figure BDA0002954630830000179
is shown at ruTrue data distribution p ofrThe mathematical expectation value is calculated as above,
Figure BDA00029546308300001710
is shown in
Figure BDA00029546308300001711
Generator distribution p ofgCalculating a mathematical expected value, and Dis (·) represents a discrimination result input by the discriminator;
training the generator according to the discrimination result, wherein the target function expression is as follows:
Figure BDA00029546308300001712
wherein, JGenAn objective function of the generator is represented,
Figure BDA00029546308300001713
is shown in
Figure BDA00029546308300001714
Data distribution p ofgThe mathematical expectation value is calculated as above,
Figure BDA00029546308300001715
λ denotes regularization coefficient, ψ denotes regularization, ru,iRepresenting elements in a high-dimensional sparse scoring matrix R,
Figure BDA00029546308300001716
representing the predicted score of user u for item i,
Figure BDA00029546308300001717
representing a known set of scoring data.
In the present invention, in order to learn the features of the score data in R, first assume a row vector { R ] in R1,…,ru,…,rnObey the true data distribution pr(ii) a Establishing a self-attention confrontation neural network model, wherein a generator uses a self-attention self-encoder structure, and a discriminator uses a full-connection neural network; to enable the generator to generate a true score vector, it is assumed that the predicted score vector generated by the generator obeys the generator distribution pg. If can enable pgAnd prThe same, the prediction score vector generated by the generator is real at this time; to make p begClose to prUntil the same, the distance between the two distributions is taken as the objective function of the model. Updating the parameters of the generator by using the target function to ensure that the distribution distance is smaller and smaller; due to pgAnd prThe analytical formula (2) is unknown, and the distance between the two cannot be calculated by a clear formula. Therefore, the difference between the sample data of the two distributions and the sample data of the real distribution is estimated through the discriminator and then is substituted into the Wasserstein distance formula to approximately calculate pgAnd prThe distance of (d); to obtain sample data for the producer distribution, the input to the producer is first obtained. Decoder sample mask vector learned from S2
Figure BDA0002954630830000181
And the mask vector and the score vector r of the useruAnd fusion, namely providing more distribution information for the scoring data, and taking the fused data as the input of the generator. The generator calculates the prediction scoring vector of each user
Figure BDA0002954630830000182
Vector prediction scores
Figure BDA0002954630830000183
And a truth score vector { r1,…,ru,…,rnThe data is used as the input of a discriminator for evaluating the difference between the two types of data and the real scoring data; since the true score vector has only a small amount of known score data, a large number of unknown scores are filled with 0, and the predicted score values of all the predicted score vectors are difficult for the discriminator to discriminate the true and false of the two. Therefore, the prediction score vector is thinned by using the mask vector, only the prediction score corresponding to the known score data in the prediction score vector is reserved, so that the consent is kept in the form, and only the characteristics of the true and false score data are reserved on the data, thereby being convenient for being distinguished by a discriminator. The sparse prediction score vector and the real score vector are used as the input of a discriminator, and the discrimination result is output; and finally training a model: the discriminants are trained first, the generators are trained second, and regularization (which represents the mean square error between the prediction scores and the true scores) is used until the model converges.
In a specific implementation process, the hyper-parameters of the model influence the performance of the model, and the hidden layer dimension and the number of hidden layers of the generator need to be carefully adjusted. In addition, when training the self-attention-confrontation neural network, a regularization term is added to the objective function, the regularization term is quite related to the prediction precision of the model, and the sparseness of the regularization term needs to be carefully selected to balance the proportion between the regularization term and the distribution distance.
In summary, the invention considers the scoring matrix for solving the high-dimensional sparsity from two aspects of the distribution characteristics and the scoring data characteristics of the high-dimensional sparse matrix. First, the present invention combines the self-attention mechanism with the discrimination autoencoder and provides a specific method for application in the recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data. Meanwhile, the model adopts a convolution neural network to extract the distribution characteristics of the local area in the mask matrix. Besides, the dependency relationship between all data in the mask matrix is calculated by using a self-attention mechanism, and the global distribution characteristic is obtained. And finally, the distribution characteristics of the local area and the global distribution characteristics are fused to train the model, so that the distribution information of the mask matrix can be effectively and comprehensively obtained. Secondly, the method establishes a prediction model based on the antagonistic neural network to estimate the missing score data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of the objective function of the antagonistic neural network, and the prediction precision of the model is improved.
The working principle and the process of the invention are as follows: the recommendation system scoring prediction method based on the self-attention confrontation neural network provided by the invention utilizes the whole data distribution characteristic of the self-attention confrontation neural network learning high-dimensional sparse matrix and utilizes the self-attention mechanism and the convolution neural network learning relationship between the local region scoring data of the high-dimensional sparse matrix, thereby being beneficial to improving the prediction precision.
The invention has the beneficial effects that:
(1) the invention combines the self-attention mechanism and the discrimination automatic encoder and provides a specific method applied to a recommendation system. And extracting the distribution information of the score data from the mask matrix of the high-dimensional sparse matrix by using a self-attention distinguishing self-encoder, and providing more distribution information for subsequent learning score data characteristics and prediction score data. Meanwhile, the model adopts a convolution neural network to extract the distribution characteristics of the local area in the mask matrix. Besides, the dependency relationship between all data in the mask matrix is calculated by using a self-attention mechanism, and the global distribution characteristic is obtained. And finally, the distribution characteristics of the local area and the global distribution characteristics are fused to train the model, so that the distribution information of the mask matrix can be effectively and comprehensively acquired.
(2) The invention establishes a prediction model based on the antagonistic neural network to estimate the missing scoring data in the high-dimensional sparse matrix. The distribution information of the high-dimensional sparse matrix and the scoring data are fused to be used as training data, and a self-attention mechanism is fused into the generator model, so that the dependency relationship between the scoring data sensed by the neural network is favorably resisted, and the characteristics of the scoring data are better learned. Meanwhile, the mean square error between the predicted scoring data and the real scoring data is used as a regularization term of an anti-neural network target function, and the prediction accuracy of the model is improved.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (9)

1. A recommendation system scoring prediction method based on a self-attention confrontation neural network is characterized by comprising the following steps:
s1: collecting user information, project information and project scoring data of a user, and constructing a high-dimensional sparse scoring matrix and a corresponding masking matrix;
s2: extracting the distribution characteristics of a mask matrix by using a self-attention encoder to generate distribution information about a high-dimensional sparse scoring matrix;
s3: building a scoring prediction model of a recommendation system by using a self-attention antagonistic neural network, and training the scoring prediction model according to distribution information and a high-dimensional sparse scoring matrix;
s4: evaluating a high-dimensional sparse scoring matrix by using the trained scoring prediction model to complete the scoring prediction of the project by the user;
the step S1 includes the following sub-steps:
s11: collecting user information, project information and project grading data of users to obtain a user set
Figure FDA0003615654490000011
Item set
Figure FDA0003615654490000012
And user scoring set of items
Figure FDA0003615654490000013
Wherein n denotes the number of users, m denotes the number of items, u1,u2,…,unRepresenting the 1 st to nth users, i1,i2,…,imDenotes the 1 st to m-th items, su,iRepresents the score of the user u on the item i, and v represents the maximum value of the score;
s12: aggregation of ratings from user to items
Figure FDA0003615654490000014
Constructing a high-dimensional sparse scoring matrix R, wherein each element Ru,iThe expression of (c) is:
Figure FDA0003615654490000015
s13: according to the high-dimensional sparse scoring matrix R, constructing a corresponding mask matrix H epsilon {0,1}n×mWherein each element hu,iThe expression of (a) is:
Figure FDA0003615654490000016
wherein 1 indicates that the user u has a known score for the item i, 0 indicates that the user u has an unknown score for the item i,
Figure FDA0003615654490000021
represents the set of known elements in the R,
Figure FDA0003615654490000022
representing a set of unknown elements.
2. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 1, wherein the step S2 comprises the following sub-steps:
s21: let each row vector { H in the mask matrix H1,…,hu,…,hnObey the first data distribution q (h);
s22: defining a self-attention encoder, samples h from a first data distribution q (h)uE H is converted into corresponding low-dimensional hidden feature representation zuWherein z isuOne sample of the second data distribution q (z);
s23: representing the low-dimensional hidden features from the second data distribution q (z) as zuAs input to a self-attention decoder, and generates samples huOf the reconstructed sample
Figure FDA0003615654490000023
S24: calculate sample huAnd reconstructing the sample
Figure FDA0003615654490000024
The reconstruction error rec _ error in between;
s25: setting a distribution p (z) of a known analytic expression, and training a self-attention encoder and a self-attention decoder according to the distance between the distribution p (z) of the known analytic expression and a second data distribution q (z);
s26: and converting the mask vector into a low-dimensional hidden feature representation of the distribution p (z) conforming to the known analytic expression by using a trained self-attention encoder, and converting sample data of the distribution p (z) of the known analytic expression into distribution information by using a trained self-attention decoder to generate the distribution information about the high-dimensional sparse scoring matrix.
3. The self-attention-directed neural network-based recommendation system score prediction method of claim 2, wherein the step S22 comprises the sub-steps of:
s221: randomly sampling t mask vectors { H) from a mask matrix H by using a small batch gradient descent algorithmu,hu+1…,hu+tForming an input matrix X;
s222: defining a self-attention encoder, taking an input matrix X as an input of the self-attention encoder, and distributing samples h from a first data distribution q (h)uE H is converted into corresponding low-dimensional hidden feature representation zuWherein z isuIs a sample of the second data distribution q (z).
4. The method according to claim 3, wherein in step S222, the self-attention encoder comprises a convolutional layer, a self-attention layer and a pooling layer;
the method for constructing the convolutional layer comprises the following steps: the convolutional layer comprises K1 × 1 convolutional kernels, and the feature mapping E of the input matrix X is extracted by using the convolutional kernels, and the calculation formula is as follows:
Figure FDA0003615654490000031
wherein, represents a two-dimensional convolution calculation,
Figure FDA0003615654490000032
the parameters representing the k-th convolution kernel,
Figure FDA0003615654490000033
represents the deviation of the kth convolution kernel, σ (·) represents the activation function;
the method for constructing the self-attention layer comprises the following steps: a dependency matrix Y between each element in the feature map E is calculated,and the dependency relationship matrix Y is fused into a feature mapping E to obtain a fusion feature I, wherein each element Y in the feature mapping EpAnd the calculation formula of the fusion characteristic I is respectively as follows:
Figure FDA0003615654490000034
Figure FDA0003615654490000035
wherein e ispP-th element, E, representing a feature map EqQ-th element, y, representing a feature map EpRepresents the p-th element of Y, f (-) represents a function for calculating any two-point similarity relationship, g (-) represents a mapping function, Y (E) represents a normalization factor,
Figure FDA0003615654490000036
the parameters representing the kth self-attention layer convolution kernel,
Figure FDA0003615654490000037
the deviation of the kth self-attention layer convolution kernel;
the method for constructing the pooling layer comprises the following steps: inputting the fusion feature I into a pooling layer with a pooling kernel size of c × c and a sliding stride of a, wherein the expression of the pooling layer is as follows:
Z=MeanPooling2D(I)
where MeanPooling2D (. cndot.) represents average pooling, Z represents a low-dimensional implicit feature matrix, and each row vector { Z ·u,zu+1…,zu+tDenotes a mask vector hu,hu+1…,hu+tThe low dimensional implicit feature representation of.
5. The method for predicting the recommender system score based on the self-attention-directed neural network as claimed in claim 2, wherein in step S23, samples { h } are generatedu,hu+1…,hu+tCorresponding reconstructed data
Figure FDA0003615654490000041
Using matrices formed from reconstructed data
Figure FDA0003615654490000042
Expressed, the calculation formula is:
Figure FDA0003615654490000043
Figure FDA0003615654490000044
wherein D is0Input of a representation model, Dl-1Denotes the output of layer l-1, DlRepresents the output of the l-th layer, UlRepresents the output of the l-th upsampling layer, UL-1Represents the output of the L-1 st upsampling layer, K represents the number of convolution kernels, σ (-) represents the activation function,
Figure FDA0003615654490000045
representing the parameters of the kth convolution kernel in layer l-1,
Figure FDA0003615654490000046
represents the deviation of the kth convolution kernel in the L-1 layer, L represents the number of deconvolution layers, Z represents a low-dimensional hidden feature matrix, UpSampling2D (·) represents an upsampled layer,
Figure FDA0003615654490000047
parameters representing the kth convolution kernel in the lth layer,
Figure FDA0003615654490000048
indicating the deviation of the kth convolution kernel in the L-th layer.
6. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 2, wherein in the step S24, the calculation formula of the reconstruction error rec _ error is as follows:
Figure FDA0003615654490000049
wherein, X represents an input matrix,
Figure FDA00036156544900000410
representing a reconstructed data matrix.
7. The method for predicting the scoring of the recommendation system based on the self-attention-confrontation neural network as claimed in claim 2, wherein the step S25 of calculating the distance C (p (z), q (z)) between the distribution p (z) of the known analytic expression and the second data distribution q (z) comprises the following sub-steps:
s251: setting a distribution p (z) of a known analytical formula;
s252: establishing a discriminator by utilizing a full-connection neural network;
s253: randomly sampling t low-dimensional hidden feature representations from a distribution p (z) of known analytic expressions
Figure FDA00036156544900000411
Randomly sampling t low-dimensional hidden feature representations { z ] from the distribution q (z)1,…,zt},
S254: representing low-dimensional hidden features
Figure FDA0003615654490000051
And z1,…,ztThe result is used as the input of the discriminator, and the discrimination result is output;
s253: based on the determination result, a distance C (p (z), q (z)) between the distribution p (z) of the known analytical formula and the second data distribution q (z) is calculated, wherein the calculation formula is:
Figure FDA0003615654490000052
wherein D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure FDA0003615654490000053
indicating that a mathematical expectation is calculated on a second data distribution q (z) of z,
Figure FDA0003615654490000054
is shown in
Figure FDA0003615654490000055
Data distribution of
Figure FDA0003615654490000056
The mathematical expectation value is calculated.
8. The method for predicting the scoring of the recommendation system based on the self-attention confrontation neural network as claimed in claim 2, wherein in the step S26, the objective function expression of the self-attention encoder is rec _ error; the target function expression of the discriminator is dis _ loss ═ C (p (z), q (z)); the objective function expression of the self-attention decoder is
Figure FDA0003615654490000057
Where rec _ error represents sample huAnd reconstructing the sample
Figure FDA0003615654490000058
C (p (z), q (z)) represents the distance between the distribution p (z) of the known analytical formula and the second data distribution q (z), D (-) represents the discrimination result, log (-) represents the logarithmic function,
Figure FDA0003615654490000059
is shown in
Figure FDA00036156544900000510
Data distribution of
Figure FDA00036156544900000511
The mathematical expectation value is calculated.
9. The self-attention-directed neural network-based recommendation system score prediction method of claim 2, wherein the step S3 comprises the sub-steps of:
s31: setting row vectors { R } in high-dimensional sparse scoring matrix R1,…,ru,…,rnObey the true data distribution prAnd the row vector r1,…,ru,…,rnAs a true score vector, where ruA score vector representing user u;
s32: constructing a self-attention confrontation neural network model, wherein a generator of the self-attention confrontation neural network model is in a self-attention self-encoder structure, a discriminator is in a fully-connected neural network, and a prediction score vector generated by the generator is set to obey the generator distribution pg
S33: sample to be reconstructed
Figure FDA0003615654490000061
And a score vector r of the useruPerforming fusion to obtain fusion data
Figure FDA0003615654490000062
The calculation formula for fusion is:
Figure FDA0003615654490000063
wherein h isuAn indication sample, < > indicates a logical operation;
s34: fusing data
Figure FDA0003615654490000064
As input to the generator, a prediction score vector for each user is calculated
Figure FDA0003615654490000065
The calculation formula is as follows:
Figure FDA0003615654490000066
wherein,
Figure FDA0003615654490000067
u 1, …, n, n denotes the number of users, SAAE (·) denotes a self-attention self-encoder;
s35: vector prediction scores
Figure FDA0003615654490000068
And a truth score vector { r1,…,ru,…,rnAs the input of the discriminator, the real data distribution p is discriminatedrSum generator distribution pgThe difference between the grading sample data and the real grading data and outputting a judgment result;
s36: training a scoring prediction model according to the judgment result until the scoring prediction model is converged;
in step S35, the sparsification method is used to facilitate the differentiation by the discriminator, and the expression is:
Figure FDA0003615654490000069
wherein,
Figure FDA00036156544900000610
representing a prediction score vector of the user u after sparsification;
in step S36, the discriminator is trained according to the discrimination result, and the target function expression is:
Figure FDA00036156544900000611
wherein, JDisAn objective function of the discriminator is represented,
Figure FDA00036156544900000612
is shown at ruTrue data distribution p ofrThe mathematical expectation value is calculated as above,
Figure FDA00036156544900000613
is shown in
Figure FDA00036156544900000614
Generator distribution p ofgCalculating a mathematical expected value, wherein Dis (·) represents a discrimination result input by the discriminator;
training the generator according to the discrimination result, wherein the target function expression is as follows:
Figure FDA00036156544900000615
wherein, JGenAn objective function that represents the generator is determined,
Figure FDA00036156544900000616
is shown in
Figure FDA00036156544900000617
Data distribution p ofgThe mathematical expectation value is calculated as above and,
Figure FDA0003615654490000071
λ denotes regularization coefficient, ψ denotes regularization, ru,iRepresenting elements in a high-dimensional sparse scoring matrix R,
Figure FDA0003615654490000072
representing the predicted score of user u for item i,
Figure FDA0003615654490000073
representing a known set of scoring data.
CN202110217932.1A 2021-02-26 2021-02-26 Recommendation system scoring prediction method based on self-attention confrontation neural network Active CN112784173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110217932.1A CN112784173B (en) 2021-02-26 2021-02-26 Recommendation system scoring prediction method based on self-attention confrontation neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110217932.1A CN112784173B (en) 2021-02-26 2021-02-26 Recommendation system scoring prediction method based on self-attention confrontation neural network

Publications (2)

Publication Number Publication Date
CN112784173A CN112784173A (en) 2021-05-11
CN112784173B true CN112784173B (en) 2022-06-10

Family

ID=75762027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110217932.1A Active CN112784173B (en) 2021-02-26 2021-02-26 Recommendation system scoring prediction method based on self-attention confrontation neural network

Country Status (1)

Country Link
CN (1) CN112784173B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486257B (en) * 2021-07-01 2023-07-11 湖北工业大学 Coordinated filtering convolutional neural network recommendation system and method based on countermeasure matrix decomposition
CN114693624B (en) * 2022-03-23 2024-07-26 腾讯科技(深圳)有限公司 Image detection method, device, equipment and readable storage medium
CN115225369B (en) * 2022-07-15 2023-04-28 北京天融信网络安全技术有限公司 Botnet detection method, device and equipment
CN118333054B (en) * 2024-06-12 2024-08-23 之江实验室 Text-to-text system and method based on local-global attention

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129463A (en) * 2011-03-11 2011-07-20 北京航空航天大学 Project correlation fused and probabilistic matrix factorization (PMF)-based collaborative filtering recommendation system
CN102789499B (en) * 2012-07-16 2015-08-12 浙江大学 Based on the collaborative filtering method of implicit relationship situated between article
CN103942288B (en) * 2014-04-10 2017-02-08 南京邮电大学 Service recommendation method based on user risk preferences
CN105844261A (en) * 2016-04-21 2016-08-10 浙江科技学院 3D palmprint sparse representation recognition method based on optimization feature projection matrix
CN106055873A (en) * 2016-05-20 2016-10-26 北京旷视科技有限公司 Fitness auxiliary method and apparatus based on image recognition
CN106446015A (en) * 2016-08-29 2017-02-22 北京工业大学 Video content access prediction and recommendation method based on user behavior preference
CN107122722A (en) * 2017-04-19 2017-09-01 大连理工大学 A kind of self-adapting compressing track algorithm based on multiple features
CN107273349B (en) * 2017-05-09 2019-11-22 清华大学 A kind of entity relation extraction method and server based on multilingual
EP3622521A1 (en) * 2017-10-16 2020-03-18 Illumina, Inc. Deep convolutional neural networks for variant classification
CN108595550A (en) * 2018-04-10 2018-09-28 南京邮电大学 A kind of music commending system and recommendation method based on convolutional neural networks
CN108563640A (en) * 2018-04-24 2018-09-21 中译语通科技股份有限公司 A kind of multilingual pair of neural network machine interpretation method and system
CN108665308A (en) * 2018-05-07 2018-10-16 华东师范大学 Score in predicting method and apparatus
CN108874790A (en) * 2018-06-29 2018-11-23 中译语通科技股份有限公司 A kind of cleaning parallel corpora method and system based on language model and translation model
CN109522372A (en) * 2018-11-21 2019-03-26 北京交通大学 The prediction technique of civil aviaton field passenger value
CN109784806B (en) * 2018-12-27 2023-09-19 北京航天智造科技发展有限公司 Supply chain control method, system and storage medium
CN111160016B (en) * 2019-04-15 2022-05-03 深圳碳云智能数字生命健康管理有限公司 Semantic recognition method and device, computer readable storage medium and computer equipment
CN110188351B (en) * 2019-05-23 2023-08-25 鼎富智能科技有限公司 Sentence smoothness and syntax scoring model training method and device
CN110196946B (en) * 2019-05-29 2021-03-30 华南理工大学 Personalized recommendation method based on deep learning
US11144721B2 (en) * 2019-05-31 2021-10-12 Accenture Global Solutions Limited System and method for transforming unstructured text into structured form
CN110442781B (en) * 2019-06-28 2023-04-07 武汉大学 Pair-level ranking item recommendation method based on generation countermeasure network
CN110866637B (en) * 2019-11-06 2022-07-05 湖南大学 Scoring prediction method, scoring prediction device, computer equipment and storage medium
CN111061951A (en) * 2019-12-11 2020-04-24 华东师范大学 Recommendation model based on double-layer self-attention comment modeling
CN111126864A (en) * 2019-12-26 2020-05-08 中国地质大学(武汉) Street quality assessment method based on man-machine confrontation score
CN111191718B (en) * 2019-12-30 2023-04-07 西安电子科技大学 Small sample SAR target identification method based on graph attention network
CN112328900A (en) * 2020-11-27 2021-02-05 北京工业大学 Deep learning recommendation method integrating scoring matrix and comment text

Also Published As

Publication number Publication date
CN112784173A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112784173B (en) Recommendation system scoring prediction method based on self-attention confrontation neural network
CN112446591B (en) Zero sample evaluation method for student comprehensive ability evaluation
CN105975573B (en) A kind of file classification method based on KNN
CN111859680A (en) Comprehensive evaluation method for system performance
CN113298230B (en) Prediction method based on unbalanced data set generated against network
CN112541532B (en) Target detection method based on dense connection structure
CN111222847B (en) Open source community developer recommendation method based on deep learning and unsupervised clustering
CN103714148B (en) SAR image search method based on sparse coding classification
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
CN115047421A (en) Radar target identification method based on Transformer
CN112685591A (en) Accurate picture retrieval method for user interest area and feedback guidance
CN114926693A (en) SAR image small sample identification method and device based on weighted distance
CN114580262A (en) Lithium ion battery health state estimation method
CN115907122A (en) Regional electric vehicle charging load prediction method
Wang et al. Classification and extent determination of rock slope using deep learning
CN111898822B (en) Charging load interval prediction method based on multi-correlation-day scene generation
Salman et al. Creating a cutting-edge neurocomputing model with high precision
CN109063095A (en) A kind of weighing computation method towards clustering ensemble
CN117371511A (en) Training method, device, equipment and storage medium for image classification model
CN117786441A (en) Multi-scene photovoltaic user electricity consumption behavior analysis method based on improved K-means clustering algorithm
Wen et al. Short-term load forecasting based on feature mining and deep learning of big data of user electricity consumption
Mendez-Ruiz et al. SuSana Distancia is all you need: Enforcing class separability in metric learning via two novel distance-based loss functions for few-shot image classification
CN118211494B (en) Wind speed prediction hybrid model construction method and system based on correlation matrix
JP2020035042A (en) Data determination device, method, and program
Liu et al. A hybrid model integrating improved fuzzy c-means and optimized mixed kernel relevance vector machine for classification of coal and gas outbursts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant