CN110688585B

CN110688585B - Personalized movie recommendation method based on neural network and collaborative filtering

Info

Publication number: CN110688585B
Application number: CN201910912752.8A
Authority: CN
Inventors: 杨新武; 熊乐歌; 王羽钧; 董雨萌; 杜欣钰; 宋霖涛
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2022-04-19
Anticipated expiration: 2039-09-25
Also published as: CN110688585A

Abstract

The invention discloses a personalized movie recommendation method based on a neural network and collaborative filtering, which is a quick and effective method for extracting features of movie plots by adopting a Bert neural network, forming a feature matrix related to item to be linked with Funk-SVD, and generating a complete U-I matrix by utilizing a matrix decomposition technology to obtain all prediction scores. Firstly, extracting the characteristics of the movie plot by utilizing a Bert neural network, and obtaining a characteristic matrix about the movie item; then, connecting the obtained characteristic matrix with a Funk-SVD algorithm, optimizing by utilizing a matrix decomposition technology and a gradient descent method to obtain a complete U-I matrix with the minimum error, and finally obtaining a series of operations such as all the prediction scores; on the basis of the original explicit feedback and implicit feedback, the method adds auxiliary information, namely the movie scenario, and more accurately obtains the feature matrix of the item, so that the minimum error is reduced by 2.40%, and the prediction accuracy is improved.

Description

Personalized movie recommendation method based on neural network and collaborative filtering

Technical Field

The invention belongs to the field of artificial intelligence-based personalized recommendation, and particularly relates to a quick and effective method for extracting characteristics of a movie plot by adopting a Bert neural network, forming a characteristic matrix related to item to be linked with a Funk-SVD, and generating a complete U-I matrix by utilizing a matrix decomposition technology to obtain all prediction scores.

Background

Currently, there are three main approaches to implementing recommendation systems that are more used: content-based recommendations (CB), collaborative filtering recommendations (CF), and hybrid recommendations.

CB, compare the item with the items previously liked by the user and then recommend the best matching item. But the main problems with this approach are the cold start problem and similar user reliability problems.

CF: collaborative filtering is the most popular algorithm in a recommendation system, and is modeled by analyzing user and article interaction data to predict the preference of a user for an article. The main obstacle is the sparsity of the user and article interaction data.

Mixing and filtering: the method combines various recommendation algorithms, realizes defect complementation and realizes better recommendation effect. In practical applications, we can adopt a suitable combination strategy for specific problems. The combination of content-based recommendations and collaborative filtering recommendations is currently a research and application rich combination.

In calculating the item scoring matrix, Natural Language Processing (NLP) is used for the movie scenario in order to extract more accurate feature vectors. Two strategies exist to pre-train the relationship that produces a word vector and a specific NLP task downstream: feature-based processing (e.g., ELMo) and fine-tuning processing (e.g., the generated Pre-trained Transformer (OpenAI GPT)). Both of these approaches limit the results of pre-training to generate word vectors, mainly because their standard language models are unidirectional, which limits the structure used in training and results in a decrease in the accuracy of feature extraction.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a personalized movie recommendation method based on a neural network and collaborative filtering. The method has the overall idea that the Bert neural network is used for carrying out feature extraction on the movie scenario and obtaining a feature matrix related to the movie item; and then connecting the obtained characteristic matrix with a Funk-SVD algorithm, optimizing by using a matrix decomposition technology and a gradient descent method to obtain a complete U-I matrix with the minimum error, and finally obtaining a series of operations such as all the prediction scores.

In order to achieve the purpose, the technical scheme adopted by the invention is a personalized recommender based on neural network and collaborative filtering, which comprises the following steps:

neural network Bert:

bert replaces a small number of words with Mask or another random word with a small probability when training a bi-directional language model in order to force the model to increase memory of the context. The berttransform uses a bi-directional self-attribute to extract and encode statement information from left to right and right to left, respectively. The use of a very large data set, migration from the source domain to the target domain learns effectively improved characterization capabilities of the model. The data in the experiment is a movie plot text, and because the sentences describing the movie plot are long, compared with the method that the RNN extracts the features according to the time sequence, the transformer can effectively ensure that the previous features do not disappear Bert, and the method comprises two main steps: pre-training and fine-tuning. Wherein, in the pre-training process, Bert shades 15% of the input movie scenario text, the whole sequence is run through a transform Encoder, and then only the shaded movie scenario part is predicted, so as to achieve deep bidirectional pre-training representation. First, the method of Bert is used to convert the story text into word vectors and obtain feature matrices, and then the resulting matrices are used in the CF model. Bert uses the structure of a transform, which consists of several stacked layers, each layer consisting of an attention layer and a non-linear function applied to each input element. The Transformer iteratively uses the steps of syntactic parsing and semantic synthesis to solve their interdependence problem, thereby better generating a vector containing all movie features, i.e., an item feature matrix.

Collaborative filtering model:

the collaborative filtering model adopted in the Bert-SVD model provided by the method is Funk-SVD, and from the perspective of the relationship between a user and a project, the first focus is explicit feedback, namely data which can be directly presented in a digital form, such as the value of the user to a certain project. In the following formulae, r (ui) represents the predicted score value, u represents the overall average of all scoring data, and b_uThe mark bias of a specific user is shown, and the shadow of the human subjective factor on the mark in reality is restoredLoud speaker, b_iThe score bias generated by a specific item is represented, and the influence of different scores caused by the item attribute in reality is restored, so that the specific differentiation is realized through the difference of bias items.

r(ui)＝u+b_i+b_u (1)

For the bias term b_i，b_uThe average value n of the scores generated by a specific user or item is solved, and the bias is obtained through the difference value between the average value n and the total average value u.

b_u＝n_u-u (2)

b_i＝n_i-u (3)

In order to further increase the utilization rate of the data, the calculation of implicit feedback, such as comment records, browsing records, purchasing records and the like of the user, is added. By means of matrix decomposition technology, two K-dimensional matrixes P and Q are formed through decomposition and are used for describing the implicit characteristics of users and items respectively, and the requirement for the number of the implicit feedback types in the algorithm is reduced through K. The influence of original explicit feedback on 100% of a prediction result is weakened by adding an implicit feedback item, and prediction is carried out from multiple dimensions, so that the precision is improved.

r(ui)＝u+b_i+b_u+q_i ^Tp_u (4)

Gradient descent method:

because the initial values of all elements in the P and Q matrixes are set randomly by the system, the values of all elements in the matrixes are updated iteratively by a gradient descent method until the system converges, the error is reduced to obtain an optimal solution, e_uiRepresents the error of a certain predicted score from the known score r (ui), and SSE is the sum of the squared errors.

e_ui＝R(ui)-r(ui) (5)

SSE＝Σ_u,ieui²＝Σ_u,i[r(ui)-Σ_k＝1p_ukq_ki]² (6)

After gradient solution, the final result is represented by the following formula, where η is the learning rate and λ is the regularization parameter, excessive convergence is avoided, and p is updated according to the gradient_ukAnd q is_kiComprises the following steps:

p_uk＝p_uk+2η(e_uiq_ki-λp_uk) (7)

q_ki＝q_ki+2η(e_uip_uk-λq_ki) (8)

the method is characterized in that: comprises the following steps of (a) carrying out,

step 1, a group of 943 × 1682 scoring matrixes including 1682 items and 943 users are obtained from MovieLens in an experiment, only 100000 groups of scoring data are correspondingly generated, and the sparsity of a U-I matrix is 93.7%. In the Bert-SVD model, the movie scenario of each item is crawled from the IMDB by means of Python crawler, and the content features extracted from the scenario summarization are used as key item factor vectors in the item recommendation model to become key components of the recommendation model, which affects the training of the model and the prediction of unknown ratings of CS items.

Step 2, firstly, a 943 × 100 user decomposition matrix and an 1682 × 100 item decomposition matrix are set by using random numbers, a K value in a CF algorithm is selected as 100 in the data set, in order to verify the influence of the learning process of adding bert on errors, two groups of comparison experiments are carried out, one group is a matrix generated by directly using two random values, the other group is an item feature matrix generated by replacing the original random value with the item feature matrix calculated by the bert through a movie plot, and the subsequent operation result is iterated for 800 times to make the system converge to obtain the minimum value of the errors, so that the two feature matrices obtained after training can predict the vacancy items of the U-I matrix in the data set.

Comparing results after SVD is added into the Bert neural network:

when the Bert is used for learning the movie scenario, the expected output result is an item matrix of 1682 x 100, and the item feature matrix generated by iteration of matlab random values is directly replaced.

The error is calculated using RMSE, the root mean square error, also known as the standard error, which is defined as i ═ 1, 2, 3, … n. In a limited number of measurements, the root mean square error is often represented by: [ ∑ di ^2/n]^1/2Re, wherein: n is the number of measurements(ii) a di is the deviation of a set of measured values from the true value.

Two experimental hypotheses are generated for the feature matrix output by the Bert, the first one is directly used as the feature matrix of the item, iterative calculation is not carried out, errors are increased, and the hypotheses fail. In the second method, only the output result is used for replacing the random initial value in the original matrix, the minimum error is reduced by 2.40%, and the specific experimental result is as follows:

step 3, obtaining the target compound through 800 iterations

Two decomposed feature matrices are then used to populate the scoring null entries in the real 943 x 1682U-I matrix according to equation (4).

Drawings

FIG. 1 is a schematic diagram of a Bert neural network.

Fig. 2 is a flow chart of the NLP process.

FIG. 3 is a comparison of experimental error for Bert-SVD and Funk-SVD.

Detailed Description

The invention will be further explained with reference to the drawings and examples.

FIG. 1 is a schematic diagram of a Bert neural network.

Fig. 2 is a flow chart of the NLP process.

Fig. 3 is a comparison of experimental errors of Bert-SVD and Funk-SVD, and the present invention uses a set of 100000 sets of scored data generated by 943 users, 1682 items provided by Movielens for testing, and sets the implicit feedback dimension K to 100, the learning rate to 0.002, the regularization parameter to 0.01, and the number of iterations to 800. The minimum error of the original Funk-SVD is 0.129, and the minimum error of the invention is 0.126.

Claims

1. A personalized movie recommendation method based on combination of a Bert neural network and a Funk-SVD model in a collaborative filtering algorithm is characterized by comprising the following steps: comprises the following steps of (a) carrying out,

step 1, obtaining a group of 943 × 1682 scoring matrixes from MovieLens in an experiment, wherein the scoring matrixes comprise 1682 items and 943 users, only 100000 groups of scoring data are correspondingly generated, and the sparsity of a U-I matrix is 93.7%; in the Bert-SVD model, the movie plot of each item is crawled from the IMDB in a Python crawler mode, and the content features extracted from the plot summary are used as key item factor vectors in the item recommendation model to become key components of the recommendation model;

step 2, firstly generating a 943 x 100 user random feature matrix and an 1682 x 100 item random feature matrix, selecting a K value in a CF algorithm as 100 in the data set, carrying out two groups of comparison experiments in order to verify the influence of the learning process added with bert on errors, wherein one group of comparison experiments is to directly use two random feature matrices, the other group of comparison experiments is to replace the item feature matrix generated by the original random value with the item feature matrix calculated by the bert through a movie plot, and the subsequent operation result is iterated for 800 times to make the system converge to obtain the minimum value of the errors, so that the two feature matrices obtained after training can predict the vacant items of the U-I matrix in the data set;

taking the output result of the Bert neural network as the input of the Funk-SVD:

learning the movie plot by using Bert, wherein an expected output result is an item matrix of 1682 x 100, and the item feature matrix generated by iteration of matlab random values is directly replaced;

the error is calculated using RMSE, the root mean square error, also known as the standard error, which is defined as i 1, 2, 3, … n; in a limited number of measurements, the root mean square error is often represented by: [ ∑ di ^2/n]^1/2Re, wherein: n is the number of measurements; di is the deviation of a set of measured values from the true values;

step 3, obtaining the target compound through 800 iterations

Two decomposed feature matrices are used, and then the two matrices are used for filling scoring empty items in a real U-I matrix of 943 x 1682 according to a formula (4);

in the collaborative filtering model, the collaborative filtering model adopted in the put-forward Bert-SVD model is Funk-SVD, and from the perspective of the relationship between users and items, the first focus is on explicit feedback, namely data which can be directly presented in a digital form; in the following formula, r (ui) represents a predicted score value, u represents the overall average value of all score data, bu represents the score bias of a specific user, the influence of human subjective factors on the score in reality is restored, bi represents the score bias generated by a specific item, and the influence of different scores caused by item attributes in reality is restored, so that specific differentiation is realized through the difference of bias items;

r(ui)＝u+bi+bu (1)

for the calculation of the bias terms bi and bu, firstly, the average value n of scores generated by a specific user or project is solved, and then the bias is obtained through the difference value between the average value n and the total average value u;

bu＝nu-u (2)

bi＝ni-u (3)

in order to further increase the utilization rate of the data, the calculation of implicit feedback is added; decomposing to form two K-dimensional matrixes P and Q by means of a matrix decomposition technology, wherein the two K-dimensional matrixes P and Q are respectively used for describing the implicit characteristics of users and items, and the requirement on the number of implicit feedback types in the algorithm is reduced by K;

r(ui)＝u+bi+bu+qiTpu (4)

in the gradient descent method, because the initial values of all elements in the P and Q matrixes are set randomly by the system, the values of all elements in the matrixes are updated iteratively by the gradient descent method until the system converges, the error is reduced to obtain the optimal solution, eui represents the error between a certain predicted score and a known score R (ui), and SSE is the sum of square errors;

eui＝R(ui)-r(ui) (5)

SSE＝Σu,i eui2＝Σu,i[r(ui)-Σk＝1 puk qki]2 (6)

after gradient solution, the final result is represented by the following formula, where η is the learning rate, λ is the regularization parameter, excessive convergence is avoided, and puk and qki after updating according to the gradient are:

puk＝puk+2η(eui qki-λpuk) (7)

qki＝qki+2η(eui puk-λqki) (8)。

2. the personalized movie recommendation method based on the combination of the Bert neural network and the Funk-SVD model in the collaborative filtering algorithm as claimed in claim 1, wherein: in the neural network Bert, data in an experiment is a movie plot text, and because a sentence describing a movie plot is long, compared with a method that an RNN extracts features according to a time sequence, a transform can effectively ensure that the previous features do not disappear, and the Bert comprises two steps: pre-training and fine-tuning; wherein, in the pre-training process, Bert shields 15% of the input movie plot text, the whole sequence is run through a transform Encoder, and then only the shielded movie plot part is predicted, so as to achieve deep two-way pre-training representation; firstly, converting an episode text into a word vector by using a Bert method and obtaining a characteristic matrix, and then using the obtained matrix in a CF model; bert uses the structure of a transform, which consists of several stacked layers, each layer consisting of an attention layer and a non-linear function applied to each input element; the Transformer iteratively uses the steps of syntactic parsing and semantic synthesis to solve their interdependence problem, thereby better generating a vector containing all movie features, i.e., an item feature matrix.