Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a movie group recommendation method and system based on convolution collaborative filtering, and solves the defects of the conventional recommendation method.
The purpose of the invention is realized by the following technical scheme: a movie group recommendation method based on convolution collaborative filtering, the recommendation method comprising:
the user data and the commodity content data are obtained by an operator to form a user group and are processed into a format which can be identified by a model;
processing the data by using a recommendation algorithm based on convolution collaborative filtering to obtain a recommendation list of the user group;
recommending the relevant user groups, simultaneously acquiring feedback data of the users, returning the feedback data to the system, processing the feedback data into a corresponding format, then performing data processing and calculation on a recommendation list by using a convolution-based collaborative filtering algorithm, and continuously recommending commodities to the user groups.
Further, the data processing based on the convolution collaborative filtering recommendation algorithm comprises a user group embedding aggregation step, an item group embedding aggregation step and a feature interactive learning step of convolution collaborative filtering of the attention neural network; the user group embedding aggregation step of the attention neural network comprises the following steps:
collecting and processing original data, cleaning and recombining to obtain user data and project data;
embedding the user data of the user group by adopting an attention neural network, and learning the weighting weight of the user belonging to a certain specific group through the whole model;
and carrying out weighted aggregation on the embedded features of the users according to the weighted weights of the users belonging to a certain specific user group, thereby obtaining the embedded features of the user group.
Further, the feature interactive learning step of the convolution collaborative filtering includes:
fusing the embedding characteristics of the user group and the embedding characteristics of the project group according to the dot product mode of the corresponding elements, and superposing the fused result with the embedding characteristics of the user group and the embedding characteristics of the project group according to columns to obtain the dimensionality of data;
sending the obtained fusion characteristics of the data dimensionality into a convolution neural network for convolution operation processing;
and inputting the data output by the convolution operation into a full-connection layer network, continuing to train the model and continuously improving the accuracy of the model.
Further, the item group embedding aggregation step includes: and carrying out embedding operation on the high-dimensional sparse data through a neural network embedding algorithm according to the attribute data and the item ID of the item, and carrying out dimension reduction on the high-dimensional sparse data to convert the high-dimensional sparse data into low-dimensional and dense embedded feature vectors.
Further, the method further comprises: and combining new effective users into a new user group according to rules irregularly, and acquiring the recommended commodity information of the new user group through the convolution collaborative filtering recommendation algorithm.
Further, the user data and the commodity content data include: user ID, user age, user gender, user selected preferences, user address information, user browsing information, user purchase information, item ID, and category to which the item belongs.
A movie group recommendation system based on convolutional collaborative filtering, the movie group recommendation system comprising:
a data acquisition module: the system is used for acquiring user data and commodity content data to form a user group and processing the user data and the commodity content data into a format which can be identified by a model;
a data processing module: the recommendation method comprises the steps of processing the data by using a recommendation algorithm based on convolution collaborative filtering to obtain a recommendation list of a user group;
a recommendation module: the system is used for recommending the relevant user groups, simultaneously acquiring feedback data of the users, returning the feedback data to the system, processing the feedback data into a corresponding format, then performing data processing based on a convolution collaborative filtering algorithm to calculate a recommendation list, and continuously recommending commodities to the user groups.
Further, the data processing module comprises: embedding a user group into an aggregation unit and a feature interaction learning unit;
the user group embedding aggregation unit: embedding the user data of the user group through an attention neural network to obtain user weight, and then performing weighted aggregation on the embedded characteristics of the users to obtain the embedded characteristics of the user group;
the feature interaction learning unit: acquiring embedded characteristics of the user group and the project group for fusion, splicing according to the columns, inputting the total characteristics into a single-channel convolution neural network for convolution operation after splicing, and finally inputting data output by the convolution operation into a full-connection layer network.
The invention has the beneficial effects that: a movie group recommendation method and system based on convolution collaborative filtering are disclosed, after linear fusion of user embedding (or user group embedding) and project embedding characteristics, processed fusion embedding vectors are directly sent to a single-layer convolution neural network; because the convolutional neural network is in the process of calculation, neurons between networks are locally connected, and a group of connections can share a weight, so that a lot of parameters are reduced. Meanwhile, the pooling characteristic of the convolutional neural network can also well learn useful data of user embedding (or user group embedding) and item embedding characteristics, so that the preference score of a user or a user group for an item can be acquired more effectively. Therefore, the hit rate of the recommended commodities of the model can be effectively improved.
Detailed Description
In order to make the purpose, technical solution and advantages of the embodiments of the present application clearer, the technical solution in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, but not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the present invention relates to a movie group recommendation method based on convolution collaborative filtering, which includes:
s1, acquiring new login user information through a big data operator, wherein the new login user information comprises information such as a user ID, a user age, a user gender, user selected preferences, an irregularly updated address of the user and the like;
s2, forming a new user group by using the user data acquired in the step S1 through a certain data processing mode, and carrying out data cleaning and statistics on the data of the commodities, so that subsequent model processing work is facilitated, and the data processing work facilitates the subsequent first directional recommendation of the commodities to the new user;
and S3, pushing the user and project data obtained in the step S2 to the model of the product. Through the cleaning and recombination of data, new effective users are not regularly combined into a new user group according to rules; by a convolution collaborative filtering recommendation algorithm, firstly, the embedded characteristics of the user group and the project group are obtained, then the characteristics are spliced according to columns, the total characteristics are input into a single-channel convolution neural network after splicing, and the data output by convolution is continuously input into a full-connection layer network. By training the model, the accuracy of the model is continuously improved. Finally, acquiring a commodity recommendation list of the user group by using the trained model;
and S4, acquiring original data of the new user and the commodity from the operator big data system, wherein the original data comprises user login information, user age and gender, recent browsing data of the user, historical user purchasing data, new and old commodity IDs and the like. Using the new click data of the batch of users to further push commodities to the users;
and S5, acquiring data of the effective user group about browsing time, browsing times, effective operation types and the like of the recommended commodities. Model learning and commodity recommendation are continuously carried out on the existing user groups. Meanwhile, new users are obtained through a big data operator, new user groups are formed regularly, and directional recommendation of commodities is conducted.
As shown in fig. 2, the user group aggregation algorithm based on the attention neural network comprises the following steps:
and A1, collecting and processing the original data, and cleaning and recombining the original data to obtain user data and project data, wherein the user data comprises the number, the ID and the attribution group of the users. The data of the article includes the number of the article, the ID of the article, and the like;
a2, based on step A1, carrying out weighted aggregation processing on the user data of each user group, wherein the weighted mode adopts an attention neural network. The specific implementation comprises the following steps: the data of each user is subjected to an embedding process, and the weighted weight of the user belonging to a certain group is learned through the whole model. The weights of the items are also learned. The learned user weights are:
wherein h (i, t) in the above formula represents the user, item embedding combination propagated through the full connection. On this basis, the individual user-embedded attention weight is obtained via the softmax activation function.
And A3, applying the weight of the user learned in the step A2 and belonging to a specific group to carry out weighted aggregation on the embedded features of the user, thereby obtaining the embedded features of the group. To facilitate subsequent feature fusion operations. The group embedded features aggregated by the user embedded features are represented as:
here, the group embedding feature g obtained based on CAMRA2011 datal(i) Has a dimension of 256 x 32.
Further, the implementation of the embedded feature of the set of items includes: and carrying out embedding operation on the high-dimensional sparse data through a neural network embedding algorithm according to the attribute data and the item ID of the item, and carrying out dimension reduction on the high-dimensional sparse data to convert the high-dimensional sparse data into low-dimensional and dense embedded feature vectors.
The item refers to a commodity or an article in the recommendation list, and includes many types of commodities, such as commodities on apps of Taobao, Jingdong, Shuduo and the like; or virtual goods such as movies, music, etc.
As shown in fig. 3, the feature interactive learning algorithm based on convolution collaborative filtering is:
b1, fusing the acquired group characteristics and the acquired project characteristics in a dot-product mode according to corresponding elements: gl(i)⊙vjAnd overlapping the fused result with the group characteristics and the item characteristics according to the column. The data dimension after stacking is 256 × 96.
And B2, sending the fusion features with the dimension of 256 × 96 acquired in the step B1 into a convolutional neural network, and performing convolution operation processing on the fusion features.
B3, the concrete operation of the model convolution structure comprises: the embedding layer maps the interaction, the fully-connected layer, and the prediction layer.
Firstly, convolution operation is carried out, when convolution operation is carried out, a convolution kernel is assumed to be k x k, boundary filling is set to be P, step size is S, and input dimension is assumed to be H x W, then output dimension H becomes H
W is changed into
Let the
convolution kernel k 3, the padding P1, and the step S1. Then dimension 256 x 96 of the input remains unchanged after convolution. After convolution, pass throughThe convolutional embedding and the original embedding are element dot multiplied.
V=F(A,E)=[a1·e1,···,af·ef]=[c1,···,cf]
Wherein A represents original embedding, E represents convolution embedding, and V represents embedding of original feature importance degree through convolution processing.
Then using the full connectivity layer:
s=RELU(W2(RELU(W1V+b1))+b2)
wherein V is data obtained by embedding interaction; a is1: a column of vectors (we set a total of f columns) of the original embedded features (after the user embedded features and the item embedded features are fused); e.g. of the type1: a convolution-embedded column of vectors (we set a total of f columns) is obtained after the original embedded features are subjected to convolution layer processing; c. C1:a1·e1The result obtained by calculation adopts the vector calculation mode of FIG. 5 among the results; b1、b2: bias vectors of neural network forward propagation by optimization algorithm and W1、W2Training is carried out synchronously, wherein W1、W2Representing a mapping parameter matrix.
Finally, a prediction score can be obtained by using the prediction layer.
We can vary W1,W2To determine the amount of the parameter. The parameters of a convolution kernel added by a single layer convolution are 9 parameters, and when W is1:96*32,W2:32*1. The parameters of the fully connected layers are 3104(96 × 32+32 × 1) parameters. It can be seen that the amount of parameters added by convolution is small.
Where 256 x 96 data dimensions are derived: the batch of the program is firstly set to 256 pieces of data per batch, that is, each batch of training set comprises 256 pieces of data of users (groups) and 256 pieces of data of items, and the characteristic dimension of each entity is set to 32. As shown in FIG. 3, the dimensions of the embedded features of the user (group), the item, and the user (group) and item interactions are the same:
gl(i),vj,gl(i)⊙vj∈R256*32
in the fusion stage, three groups of embedded features are fused according to 'columns', so that the number of rows is kept unchanged, the number of columns is added, and the dimensionality after fusion is changed into:
Original Embedding∈R256*96
as shown in fig. 4, the boundary was complemented by 0 using a convolution kernel of 3 × 3: let P be 1 and let the step size of the convolution be equal to 1, then the dimension of the embedded feature remains unchanged after the convolution operation using the convolution kernel. As shown in fig. 5, the dot product between vector elements is performed for Original Embedding (Original Embedding) and convolutional Embedding (CNN Embedding). The resulting formula is:
V=F(A,E)=A⊙E=[a1·e1,···,af·ef]=[c1,···,cf]
the original embedded feature and the convolution embedded feature here are already features after user (group) and item interaction.
The present invention divides data into 290 groups of users in a particular experiment. The data includes 113334 pieces of user scoring training data, 3010 pieces of user scoring test data, 143618 pieces of group scoring training data and 1450 pieces of group scoring test data. And 3010 pieces of negative sample data were constructed for user test data, 1450 pieces of negative sample data were constructed for user group test data, each piece of test data followed by 100 negative sample items (negative samples that were not clicked by the group or the user). Wherein the negative sample data of users and groups of users is used in the evaluation of the model.
The online data is obtained by collecting active user data of a big data platform, and the function of recommending proper commodities to groups or single users can be achieved by processing the data collected regularly in a streaming mode.
By using the convolutional neural network, the increasing chance of the parameters can be ignored after the convolution is increased, and 9 parameters of the Convolutional Neural Network (CNN) are equivalent to 3104 parameters of a multilayer perceptron (MLP); the convolution Embedding is obtained by convolution processing of Original Embedding (Original Embedding), and the part serves as the feature weight of the Original Embedding, namely the convolution Embedding and the Original Embedding are subjected to dot multiplication, and after the dot multiplication, the importance of the Original embedded feature is successfully modeled.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.