CN115687757A

CN115687757A - Recommendation method fusing hierarchical attention and feature interaction and application system thereof

Info

Publication number: CN115687757A
Application number: CN202211322085.6A
Authority: CN
Inventors: 黄发良; 隆广庆; 尹云飞; 宋佩华; 李林; 戴智鹏; 黄恩博; 何广静; 元昌安; 龙连杰
Original assignee: Nanning Normal University
Current assignee: Nanning Normal University
Priority date: 2022-10-28
Filing date: 2022-10-28
Publication date: 2023-02-03

Abstract

The invention discloses a recommendation method for integrating hierarchical attention and feature interaction, which comprises the following steps of: s1, collecting user instance information; s2, according to the user instance information, training a recommendation model by utilizing the low-order to high-order implicit and explicit characteristics interaction by utilizing a second-order interaction operator and a level attention mechanism; s3, calculating the probability of recommending the target item for the user by using a trained recommendation model for the user instance information and the target item to be predicted; and S4, recommending the target item with the recommendation probability larger than or equal to the preset recommendation threshold value to the user. The method of the invention can better depict the potential interest of the user, improve the performance of the recommendation system, can be widely applied to the recommendation system and improve the active service quality of information.

Description

Recommendation method fusing hierarchical attention and feature interaction and application system thereof

Technical Field

The invention relates to the technical field of recommendation systems, in particular to a recommendation method fusing hierarchical attention and feature interaction and an application system thereof.

Background

With the rapid development of information technology, various online services such as e-commerce, news information, social platforms and the like are in a large number, so that people are in a predicament of 'mass data but poor knowledge' while bringing convenience to life of people, and information meeting the needs of people is difficult to find in a data sea. Search engines have been developed to alleviate this problem to some extent, but they require the user to provide a clear need because search engines use keyword matching techniques to feed back to the user a list of items in the search information base that match the user's needs most closely. In reality, people often have difficulty in accurately describing the needs of the people by using a few keywords. In addition, the search engine cannot meet the personalized demands of different users, since the feedback results for the same keyword are identical. Aiming at the difficult problem that the individual requirements of people are difficult to meet, information recommendation technology and a recommendation system are developed.

A core task in the information recommendation method is how to efficiently and accurately calculate the importance degree of each feature and interactive feature. In the existing recommendation method, the feature interaction mainly has two forms of explicit feature interaction and implicit feature interaction. The former mainly utilizes a well-designed interaction network, such as a factorization model, to promote obvious interaction among various features. The latter usually utilizes a deep neural network to implicitly interact with features, such as deep factorization model deep fm. Recently, some more advanced recommendation methods have attempted to fuse explicit and implicit feature interaction information. However, when the models fuse the information of the two aspects, the dimensions of low-order interaction and high-order interaction are not considered, and only the high-order interaction of the features is completely lost to the deep neural network for self-learning, so that the effect of the feature interaction cannot be effectively embodied.

Disclosure of Invention

In order to solve at least the above defects or problems, the present invention proposes an information recommendation method based on feature interaction and hierarchical attention, that is, the recommendation method of the present invention integrates hierarchical attention and feature interaction. According to the method, a factorization machine is used for carrying out explicit feature interaction, and according to an interaction result, a second-order interaction operator is used for extracting interaction information, and feature interaction from a low order to a high order is carried out implicitly through level attention. During which feature interactions from lower order to higher order are accomplished by optimizing the representation of the feature by attention mechanism to provide a feature representation of higher order information for the next explicit feature interaction. Therefore, the implicit and explicit characteristic interaction from low order to high order is realized by utilizing a second-order interaction operator and a hierarchical attention mechanism, and the performance of the recommendation system is effectively improved.

The invention provides a recommendation method based on feature interaction and hierarchical attention, which comprises the following steps:

s1, collecting user instance information, wherein the user instance information comprises user attributes, project attributes and user historical behavior attributes;

s2, according to the user instance information, carrying out interactive training recommendation models on implicit and explicit characteristics from a low order to a high order by utilizing a second-order interaction operator and a level attention mechanism;

s3, calculating the probability of recommending the target item for the user by using the trained recommendation model for the user instance information to be predicted and the target item;

and S4, recommending the target item with the recommendation probability larger than or equal to the preset recommendation threshold value to the user.

Preferably, S2 comprises:

s2-1, initializing a parameter set of the recommendation model;

s2-2, carrying out sparse coding in a one-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through a sparse representation layer of the recommendation model to obtain sparse representation of the user instance information;

s2-3, converting the sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the recommendation model to obtain embedded representation of the user and the project;

s2-4, collecting a small batch of samples from the conversion expression of all the example samples obtained in the S2-3;

s2-5, calculating a prediction score of the item recommended by the user by using a small batch of samples through a second-order interaction operator and a forward process of a hierarchical attention network;

s2-6, calculating the predicted Loss based on the cross entropy;

s2-7, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the recommendation model, and searching the learning rate between {0.0001,0.01 };

s2-8, performing loop from S2-4 to S2-7 until the number of iterations is equal to the number of specified iterations.

Preferably, S2-5 comprises:

the hierarchical attention network is formed by stacking L attention layers, wherein the L (L =0,1, …, L) attention layer comprises:

x ⁽⁰⁾ ＝[v ₁ ，v ₂ ，…，v _m ]

…

wherein the superscript (l) indicates that the current attention layer is positioned at the level numbered l, x ⁽⁰⁾ ＝[v ₁ ，v ₂ ，…，v _m ]Is the output of the embedding layer of the recommendation model, v _i (i =1,2, …, m) is the vector of representations of the i-th feature of the embedded layer output, m is the number of attributes of the user instance,

embedding vector v corresponding to ith feature input into ith layer _i The (c) th dimension of (a),

IS the k-dimension, IS, of the interactive feature vector of the l-th layer ^(l) Representing the interactive characteristic vector of the l-th layer, M is the layer number of the multilayer perceptron corresponding to the output module, W _i (i =1,2, …, M) and b _i (i =1,2, …, M) are the weight and offset, respectively, for the i-th layer of the multi-layer perceptron, and h and b are the weight and offset, respectively, of the linear transformation;

to input the embedding vector corresponding to the ith feature of the ith layer,

is the attention coefficient of the first layer of attention, W ^(l) And b ^(l) Respectively the connection weight and the bias of the l-th layer in the depth model,

is the normalized attention coefficient, x, of the ith attention layer ^(l+1) And x ^(l) Respectively, input to the l +1 th layer and the l layer in the depth model, delta is a nonlinear excitation function sigmoid,

is the output of the i-th layer in a multi-layer feedforward network, o ^(l) Representing the output of the last layer in the multi-layer feedforward network, and T representing a vector transposition operator;

then, all the output layers of the hierarchical attention network are linearly integrated, and the recommended prediction score of the item x is calculated by utilizing a softmax function

In particular O = [ O ] ⁽⁰⁾ ，o ⁽¹⁾ ，…，o ^(L) ]And

wherein W _p And b _p Respectively, the weight and the bias of the linear transformation.

Preferably, the cross-entropy penalty in S2-6 is:

wherein Θ is a parameter set of the recommendation model, N is a total number of training examples, and λ is a weight of a regularization term; v. of _i Is the training data label of whether the ith item was recommended,

is the probability that the model calculates that the ith item was recommended, i ranges from i =1,2 … … N.

Still another object of the present invention is to obtain a recommendation system using the above method, comprising:

an acquisition unit configured to acquire user instance information;

a presumption unit configured to input the user instance information into a recommendation model trained according to the method of any one of claims 1-5, and calculate the probability that the user recommends the target item by using the trained recommendation model for the user instance information to be predicted;

and the pushing unit is configured to recommend the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the terminal of the user.

The invention at least comprises the following beneficial effects:

compared with the traditional recommendation method, the method realizes the implicit and explicit characteristic interaction from low order to high order by utilizing the second-order interaction operator and the level attention mechanism, can better depict the potential interest of the user, can effectively improve the performances such as the accuracy and the like of the recommendation system, can be widely applied to the recommendation system, and improves the active service quality of information.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a flow chart of a recommendation method of the present invention.

FIG. 2 is a schematic diagram of the training of the proposed model of the present invention.

FIG. 3 is a bar graph of the average run time per epoch for different models on the Criteo dataset.

FIG. 4 is a bar graph of the average run time per epoch for different models on an Avazu dataset.

FIG. 5 is a bar graph of the average run time per epoch for different models on the Movielens-1M dataset.

FIG. 6 is a bar graph of the average run time per epoch for different models on the Book-crosslinking dataset.

Detailed Description

The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.

It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the devices and materials described therein are commercially available without otherwise specified.

As shown in fig. 1-2, the present invention discloses a recommendation method based on feature interaction and hierarchical attention, which comprises: encoding input instance information into a sparse feature vector, and embedding the sparse feature vector in a low-dimensional manner; extracting interaction information between the embedded feature vectors by using a second-order interaction operator; realizing multi-level high-order characteristic interaction from front to back through a level attention mechanism; and (3) obtaining interest expression vectors of the users by linearly combining the outputs of different layers of the deep neural network, calculating the recommendation probability of each information item according to the interest expression vectors, and recommending the target item corresponding to the recommendation probability which is greater than or equal to a preset recommendation threshold value to the users. The specific operation is as follows:

step 1: inputting user instance information comprising user attributes, item attributes and user historical behavior attributes;

the user attributes mainly include, but are not limited to, age, gender and occupation, the project attributes mainly include, but are not limited to, category and price, and the user historical behavior attributes mainly include, but are not limited to, online user click behavior, browsing behavior, comment behavior and like behavior. For example, the user instance information may be [ user id:1994, male, hotel waiter; item id: dell notebook or applet notebook, 3800-6000 yuan range; and (4) behavior id: the browsing time of the Tan-Bao-De official flagship shop is more than 10 seconds, the browsing time of the Del-Jingdong-owned official flagship shop is more than 30 seconds, and the like.

Step 2: training a recommendation model according to the user instance information;

the step 2 further comprises the following steps:

step 2.1: initializing a parameter set of the recommendation model;

step 2.2: carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through a sparse representation layer of the recommendation model to obtain a sparse representation vector of the user instance information;

step 2.3: converting the sparse representation vector of the user instance information into a low-dimensional dense vector through an embedding layer of the recommendation model to obtain an embedded representation vector of the user instance information;

step 2.4: collecting a set formed by embedded expression vectors of user instance information to obtain a small batch of samples;

step 2.5: calculating a prediction score of the item recommended by the user by using a second-order interaction operator and a forward process of a hierarchical attention network for a small batch of samples;

in said step 2.5, the prediction score may be calculated according to the following method: first, the output o of each layer l of the hierarchical attention network is calculated according to the following formula ^(l) Then, then

x ⁽⁰⁾ ＝[v ₁ ，v ₂ ，…，v _m ]

…

is the attention coefficient of the first layer of attention, W ^(l) And b ^(l) Respectively the connection weight and the offset of the ith layer in the depth model,

is the output of the i-th layer in a multi-layer feedforward network, o ^(l) Representing the output of the last layer in a multi-layer feedforward network, T representing the vector transpose operator

Then, carrying out the operation; linearly integrating all the layer outputs of the hierarchical attention network and calculating the recommended prediction score of the item x by utilizing a softmax function

In particular O = [ O ] ⁽⁰⁾ ，o ⁽¹⁾ ，…，o ^(L) ]And

Step 2.6: the prediction Loss based on the cross entropy is calculated by the formula

Theta is a parameter set of the recommendation model, N is the total number of training examples, and lambda is the weight of the regularization term; y is _i Is the training data label of whether the ith item was recommended,

is the probability that the ith item was recommended as calculated by the model, with i ranging from i =1,2 … … N;

step 2.7: optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the recommendation model, and searching the learning rate between {0.0001,0.01 };

step 2.8: training the recommendation model in a manner that loops from step 2.4 to step 2.7 until the number of iterations equals a specified number of iterations; the number of given iterations is specifically determined by the error stability of the cross-entropy losses obtained by the loop execution, usually with the fluctuation of the cross-entropy losses obtained in the last 10 consecutive times being less than 10 ^-8 After training is completed, the trained guessed model is recorded as HAFM.

And step 3: calculating the probability of recommending the target item for the user by utilizing a trained recommendation model for the user instance information to be predicted and the target item;

and 4, step 4: recommending the target item with the recommendation probability larger than or equal to a preset recommendation threshold value to the user.

< testing of the effect of the trained inference model HAFM >

Experimental Environment

The experiments related to the invention all adopt a uniform experiment environment, and the main parameters of the hardware environment and the software environment used by the experiments are shown in table 1 based on the requirements of the depth model.

Table 1 basic information of experimental environment

Data set

To evaluate the performance and efficiency of the model, experiments were performed with 4 standard datasets for click rate prediction. Table 2 lists the statistics for these 4 data sets, each divided by a ratio of 8: 1, for training, validation and testing.

Table 2 basic statistics of four data sets

Baseline model

Baseline models can be roughly divided into three major categories: logistic regression models, FM-based models, and deep network-based models, are briefly described below.

(1) Logistic regression model

LR: the LR model is the most classical baseline model in click-through estimates.

(2) Model based on FM

FM: on the basis of LR, feature second-order interactive operation is introduced, and model expression is improved.

NFM: and after the FM second-order interaction, canceling the summation operation, and performing deeper feature interaction on the FM result by using the DNN.

FFM: the concept of the feature domain is introduced, and each feature domain learns one embedding for each feature, so that feature representation is enhanced.

AFM: a weight is learned for each interactive feature using the attention network.

Deep FM: the deep learning network and the FM are combined in parallel, and first-order features, explicit second-order feature interaction and implicit feature interaction are learned simultaneously.

xDeepFM: on the basis of deep FM, a CIN network is further designed to replace an explicit interaction module, and the estimation capability of the feature interaction promotion model is carried out at the vector level.

IFM: sample perception is proposed, and each input sample is given different weight, so that the embedded representation of the optimization features is updated.

DIFM: on the basis of IFM, the main idea in the Transformer is introduced, and the vector level interaction is learned through self-attention mechanics.

(3) Deep network based model

FNN: embedding by utilizing FM pre-training characteristics, and then carrying out subsequent deep characteristic interaction by using DNN.

AFN: a logarithmic neural network layer for adaptively adjusting interaction orders is designed, and the orders of characteristic interaction can be selected through the network, so that the performance of the model is improved.

WD: from the perspective of "generalization" and "memory," deep learning networks and LR are combined in parallel.

DCN: and (3) providing explicit interaction of cross network learning characteristics, implicit interaction of learning characteristics by using a neural network module, and finally integrating the output of the two modules.

IPNN: belongs to the PNN model and is in the same family with OPNN. After embedding the layer, performing explicit feature interaction in an inner product mode.

OPNN: belongs to the PNN model and is in the same family with IPNN. After embedding the layer, explicit feature interaction is performed in an outer product mode.

DRM: the importance of each embedding dimension to the model is calculated through an attention structure, so that the embedding quality of the features is improved.

MaskNet: the capability of DNN excavation of complex interaction features can be improved by using MaskBlock, maskNet families have a parallel model and a serial model due to different maskBlock stacking modes, the serial model is MaskNet, and the parallel model is MaskNetx.

Evaluation index

The experiment adopts two most commonly used evaluation indexes of AUC and Logloss in a click rate estimation task. The AUC is an area under an ROC curve, is an evaluation index for measuring the performance of the binary classification model, can represent the probability that a predicted positive case is arranged in front of a predicted negative case, and the higher the AUC value is, the better the performance is. Logloss is cross entropy loss, can represent the difference between real sample and prediction probability, and in the click rate estimation model, the smaller the Logloss, the more accurate the model prediction is represented. It is emphasized that 1% AUC boost and Logloss reduction are significant advances in offline prediction of click rate estimation tasks.

Details of the experiment

All models were trained using the PyTorch framework, and for comparative fairness, all models were learned using an Adam optimizer to optimize cross entropy loss, and the learning rate was searched between {0.0001,0.001,0.01 }. The embedding size is set to 16 and other hyper-parameters are kept consistent with default parameters in their paper reports or open source code. For the model containing the DNN module, the DNN hidden layer depth is uniformly set to be 2, and the number of neurons in each hidden layer is set to be 16. For the Criteo dataset, the Batch Size (Batch Size) is set to 4096, the Avazu dataset Batch Size is set to 2048, and for both the Movielens-1M and Book-Cross datasets, batch Size is set to 128. The regularization parameter λ is searched between {0.00001,0.0001,0.001 }. And respectively fixing a plurality of groups of random number seeds during model comparison, carrying out a plurality of groups of experiments, and averaging the results.

Next, the private hyper-parameters of the partial model are described:

the AFM model attention factor size was set to 16.

The number of cross layers of the DCN model is set to 3.

The CIN network layer parameters of the xDeepFM model are consistent with the default DNN parameters and are set to [16, 16].

The log neural network dimension of the AFN model was set to 1500.

In the multi-head attention module in the diff model, the number of attention heads is set to 4, the number of attention layers is 3, and the dimension size of attention is set to 16.

The maskbock number of the MaskNet model is set to 3.

The HAFM model employs a 3-layer attention network, i.e., L =3.

The maximum iteration period (epoch) of the model is set to 50, and an early stopping method (the maximum performance reduction step number is set to 2) is adopted to avoid overfitting of the model to the training data and shorten the training time. The specific operation is as follows:

(1) continuing to finish the training of the current period, judging whether the maximum training period is reached or not when the training is finished, and if the maximum training period is reached, terminating the training and turning to the step (4); otherwise, jumping to the step (2).

(2) Calculating the AUC of the current model by using the verification set, and if the AUC rises, carrying out next cycle training and skipping to the step (1); otherwise, recording the maximum decrease times of the AUC, and jumping to the step (3).

(3) Judging the maximum AUC (AUC) reduction times, if the times are more than 2, terminating the training, and skipping to the step (4); otherwise, jumping to the step (1).

(4) And terminating the model training and outputting the model result when the AUC of the verification set is the highest.

TABLE 3 AUC and Logloss results for click rate prediction on four datasets

Prediction accuracy analysis

Table 3 shows the overall performance of the different models on four different data sets. From the experimental results it can be observed that: 1) From the whole, compared with the most advanced models based on FM and deep networks, the HAFM model of the invention has great improvement on the indexes of AUC and Logloss, and achieves better performance on all four data sets. 2) Compared with the LR model, all the other feature-containing interaction models have better prediction accuracy, which shows the importance and effectiveness of feature interaction operation on CTR estimation. 3) Compared with an FM model, the NFM achieves performance improvement on both Criteo and Avazu data sets, but achieves opposite effects on small data sets MovieLens-1M and Book-Cross with a small number of features, and conversely achieves larger improvement on other methods for model interaction by utilizing DNN. This aspect illustrates that DNN is a contributing role for deep-level, high-order interactions and non-linear interactions of features, but simply stacking DNN after a second-order interaction of features seems to promote less and is no longer applicable for scenarios with a smaller number of features. On the other hand, it is shown that FM-based explicit second-order feature interaction is still more likely in scenarios with a smaller number of features. Therefore, the HAFM introduces DNN to carry out implicit high-order feature interaction, nonlinear interaction and explicit feature interaction on the features, and the click rate prediction accuracy can be effectively improved. 4) Compared with the FM model, the AFM and FFM models have improved performance except for the MovieLens-1M data set. This indicates that there is some improvement in performance whether feature attention is introduced or feature domain perceptual attention is introduced. This also illustrates that the attention mechanism has some promotion effect on the feature interaction of CTR prediction. The HAFM model introduces a hierarchical attention mechanism in linguistics, and the interaction of capturing features from low-order stratum to high-order stratum obtains remarkable improvement of effect. 5) Compared with other FM-based high-order feature interaction models and deep network advanced models, the HAFM model has the advantages of being remarkably superior to the other FM-based high-order feature interaction models and deep network advanced models. This also indicates the effectiveness of the hierarchical attention network proposed by the present invention.

Time efficiency analysis

FIGS. 3-6 show the run-time comparison of all models per iteration cycle (epoch) over four data sets. The abscissa of each column represents the current model, the ordinate represents the training time of one iteration cycle of the current model, and the darkened column represents the model HAFM provided by the invention. From the figure it can be observed that: 1) As can be seen from the entirety, the run time of the model on the Criteo and Avazu datasets is significantly higher than the MovieLens-1M and Book-Cross datasets due to the large number of features and samples in the first two datasets relative to the second two datasets, as shown in Table 2. 2) The time efficiency of LR is highest on all datasets, and FM is second order, since both models are relatively simple, FM adds second order feature interoperation on the basis of LR. Xdepfm is the least efficient model on Criteo and Avazu datasets, next to FFM. Since these two models have too many model parameters and computational operations. Each layer of CIN network calculation introduced by the xDepFM model needs Hadamard (Hadamard) product operation with all vectors of an input layer pairwise. The FFM model embeds independent expression vectors in different feature domains for each feature, and introduces too many parameters. On the MovieLens-1M and Book-Cross datasets, HAFM was the least efficient model. This is because for a small data set with a small number of features and samples, the number of feature fields is smaller than the feature embedding size, the computation overhead of the model occupies most of the runtime, and the HAFM model includes 3-layer attention computation, which reduces the efficiency of model operation, but it can be ignored for a large data set with a large number of features.

Through the test analysis, the implicit and explicit feature interaction from low order to high order is realized by utilizing a second-order interaction operator and a level attention mechanism, the accuracy, the running speed and other performances of the recommendation system are effectively improved, the recommendation system can be widely applied, and the active service quality of information is improved.

According to the recommendation method of the present invention, the present invention also provides a recommendation system, comprising:

an acquisition unit configured to acquire user instance information; for example, a background such as a mobile phone client or a computer client acquires data such as the age, sex, occupation, types and prices of favorite items, webpage clicking behavior, browsing behavior, comment behavior, praise behavior and the like of a user to obtain user instance information;

the presumption unit is configured to input the user instance information into a recommendation model trained according to the method of any one of claims 1-5, and calculate the probability of the user recommending the target item by using the trained recommendation model for the user instance information to be predicted;

While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses set forth in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art.

Claims

1. The recommendation method for integrating the interaction between the hierarchical attention and the features is characterized by comprising the following steps of:

s1, collecting user instance information;

s3, calculating the probability of recommending the target item for the user by using a trained recommendation model for the user instance information and the target item to be predicted;

2. The method for recommending fused hierarchical attention and feature interaction of claim 1, wherein said user instance information comprises user attributes, item attributes, and user historical behavior attributes.

3. The recommendation method for fusing hierarchical attention and feature interaction according to claim 2, wherein the S2 comprises:

s2-1, initializing a parameter set of the recommendation model;

s2-4, collecting a small batch of samples from the embedded representation obtained in the S2-3;

s2-6, calculating the predicted Loss based on cross entropy;

4. The method for recommending fusion level attention and feature interaction of claim 3, wherein said S2-5 comprises:

x ⁽⁰⁾ ＝[v ₁ ，v ₂ ，…，v _m ]

…

wherein the superscript (l) indicates that the current attention layer is positioned at the hierarchy number of l, x ⁽⁰⁾ ＝[v ₁ ，v ₂ ，…，v _m ]Is the output of the embedding layer of the recommendation model, v _i (i＝1，2，…，m) Is the expression vector of the ith characteristic output by the embedding layer, m is the attribute number of the user instance,

embedding vector v corresponding to ith feature input into ith layer _i The (d) dimension (k) of (c),

then, all the layer outputs of the hierarchical attention network are linearly integrated and the recommended prediction score of the item x is calculated by utilizing the softmax function

In particular O = [ O ] ⁽⁰⁾ ，o ⁽¹⁾ ，…，o ^(L) ]And

5. A recommendation method for fusing hierarchical attention and feature interaction according to claim 3 or 4, characterized in that the cross-entropy penalty in S2-6 is:

wherein, Θ is a parameter set of the recommendation model, N is a total number of training examples, and λ is a weight of the regularization term; y is _i Is the training data label of whether the ith item was recommended,

6. A recommendation system, comprising:

an acquisition unit configured to acquire user instance information;