CN113590945A

CN113590945A - Book recommendation method and device based on user borrowing behavior-interest prediction

Info

Publication number: CN113590945A
Application number: CN202110846763.8A
Authority: CN
Inventors: 赵雪青
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-11-02
Anticipated expiration: 2041-07-26
Also published as: CN113590945B

Abstract

The invention discloses a book recommendation method and a book recommendation device based on user borrowing behavior-interest prediction, wherein the method comprises the following steps: acquiring user borrowing behavior data; determining a basic feature label based on the user borrowing behavior data, and determining a prediction type feature label by adopting a weight calculation algorithm TFIDF and cosine similarity method; inputting the basic feature labels and the prediction class feature labels into a neural network model DeepFM constructed based on a factorization machine for Embedding feature vectorization, performing feature intersection on the feature vectors, inputting the feature vectors into a deep neural network, and outputting a recommendation result. The method and the system construct the interest prediction tag on the basis of analyzing the borrowing behavior of the library user, and recommend books to the user by adopting the deep FM. The method effectively constructs a user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirement of accurately positioning the user and improves the user satisfaction.

Description

Book recommendation method and device based on user borrowing behavior-interest prediction

Technical Field

The invention relates to the field of big data, in particular to a book recommendation method and device based on user borrowing behavior-interest prediction.

Background

Digital libraries are rising in the big data era, and it is a necessary trend to find reading hobbies of users and recommend books. With the rapid development of mobile internet and self-media, the attention of users is continuously shifting from computer to mobile. How to effectively grasp the focus of a user in the shortest time and continuously improve the satisfaction degree of the user is always an urgent problem to be solved by a recommendation system.

At present, a neural network model (Deep factor Machine, hereinafter referred to as Deep fm) constructed based on a Factorization Machine has been widely used in the CTR (Click-Through-Rate) field of recommendation, advertisement and the like, and because the Deep fm model has the memory capability of logistic regression and the generalization capability of the neural network, the user and resource characteristics can be directly learned by means of the memory capability; by adopting the generalization capability of the neural network, the rare characteristics of the user can be effectively mined and the correlation with the label can be found, and further, the automatic combination of the characteristics is carried out through the neural network, so that a more stable recommendation result is obtained.

In 2010, Massanari et al propose that a user portrait is a portrait model formed by user features, and can effectively analyze important features of a user. Aiming at the borrowing behavior of a library user, because the user behavior is single, the user characteristics are analyzed by adopting the traditional user portrait, and the various behavior characteristics of the user are difficult to be accurately depicted, so that the application of the user portrait in the field of libraries is still in the exploration period. In recent years, people such as the residual civilization and the like propose a behavior-content fusion model, people such as Yao Yuan and the like propose a method for constructing a user portrait by using a knowledge map, and people such as Chendan and the like consider that a user portrait label can be obtained from three ways of user behavior, user social data and a user label set; he et al propose a mixed model of decision trees and logistic regression; kong et al propose a new context-aware attention convolutional neural network; zhou et al propose a description of a deep interest network that addresses the diverse behavioral characteristics of users with different candidate ads generating different representation vectors. Zhang et al propose a time-interval division algorithm to analyze and quantify the interest distribution of users in a time interval. However, the above research cannot precisely locate the core requirements of the user.

Disclosure of Invention

In order to realize more accuracy of a service supply side of a library, the invention provides a book recommendation method and device based on combination of user borrowing behavior analysis and interest prediction.

The embodiment of the invention provides a book recommendation method based on user borrowing behavior-interest prediction, which comprises the following steps:

acquiring user borrowing behavior data;

determining a basic feature tag based on the user borrowing behavior data;

determining a prediction class feature label by adopting a weight calculation algorithm TFIDF and cosine similarity method based on the user borrowing behavior data;

inputting the basic feature labels and the prediction class feature labels into a neural network model DeepFM constructed based on a factorization machine for Embedding feature vectorization, performing feature intersection on the feature vectors, inputting the feature vectors into a deep neural network, and outputting a recommendation result.

In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction further comprises the following steps:

the method comprises the following steps of preprocessing data of the user borrowing behavior, and specifically comprises the following steps: data duplication removal, abnormal value processing, missing value processing and time format normalization; wherein the content of the first and second substances,

the outlier processing includes: normalizing the abnormal data beyond the normal time range;

the missing value processing comprises: deleting the row with the book number being empty;

the time format normalization includes: the data with non-uniform time format is subjected to time format conversion through a strptime () function in python.

In one embodiment, the determining of the basic feature tag includes:

drawing a histogram and a word cloud picture for the preprocessed user borrowing behavior data, and carrying out data visualization basic characteristic analysis; determining a basic feature label according to a basic feature analysis result of data visualization;

the base feature tag, comprising:

a fact type tag containing the total times of actions generated by the same user;

the rule class label is used for setting a threshold value for the fact class label of the user by combining manual experience on the basis of the fact class label, and different threshold values correspond to different user characteristics;

the text type label comprises text type information generated by a user; wherein the text class label comprises: gender, occupation, address and book name of the user, and a jieba word segmentation method is selected for feature extraction of the text labels.

In one embodiment, the weight calculation algorithm TFIDF includes:

the weight calculation algorithm TFIDF has two logical parts: interaction depth and TFIDF label total weight value;

the interaction depth refers to the depth of part of features used for measuring the user behavior under each interaction behavior: the user behavior type weight, the user behavior times and the time-dependent attenuation change of the behavior; the total weight value of the TFIDF label reflects the importance degree of different labels to the prediction result, the higher the importance degree is, the larger the weight is, and the calculation formula of the total weight of the TFIDF label is as follows:

W＝B_i×I_t×C_i×TFIDF

wherein, B_iRepresenting a weight of a type of behavior, C_iRepresenting the user behavior times, and representing the total behavior times generated by the user; i is_tThe interest degree of time attenuation is shown, and TFIDF shows the assigned weight of the user behavior label.

In one embodiment, the formula for calculating the user behavior label assignment weight is as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，L)

wherein TFIDF (U, L) represents the objective weight of the label L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of the label among all the labels of the user (IDF (U, L));

the formula for TF (U, L) is as follows:

where S (U, L) represents the number of times the tag L marks the user U, Σ S (U, L)_i) The number of all the tags on the user U is represented, and TF (U, L) represents the proportion of the marking times of the tags L in the marking times of all the tags of the user U;

the formula for IDF (U, L) is as follows:

wherein, Σ S (U)_i，L_i) Sum of all tags, Σ S (U), representing all users_iL) represents the sum of all L-tagged users, and IDF (U, L) represents the degree of scarcity of tag L in all tags of user U, i.e., the probability of occurrence of this tag.

In one embodiment, the time decay interest level is calculated as follows:

I_t＝e^-λΔt

where Δ t denotes the number of days from the observation point at the moment of occurrence of the behavior t, λ denotes the attenuation factor, I_tIndicating the level of interest at each time instant.

In one embodiment, the cosine similarity method specifically includes:

the input of the cosine similarity during weight classification is user borrowing behavior data, the output is a label with the highest recommendation score, and the calculation process is as follows:

calculating the similarity of every two labels, and orthogonalizing the two labels to obtain the combination of every two labels under each user;

calculating the number of users corresponding to each label, namely the number of the labels appearing in different users;

calculating the similarity of every two labels by using the cosine similarity, and finally obtaining a cosine similarity weight p;

calculating the relative labels recommended to the user, and making the user correspond to all the relative labels, wherein the recommendation score calculation formula is as follows:

R＝W×p

wherein, W is the total weight of the label; p is cosine similarity weight, and the calculation formula is as follows:

the above sigma represents the number of users simultaneously paying attention to the resource X and the resource Y, eta represents the number of users paying attention to the resource X, lambda represents the number of users paying attention to the resource Y, and p is a similarity coefficient for measuring the two resources concerned by the users.

carrying out normalization processing on numerical type characteristics in the basic characteristic labels and the prediction type characteristic labels by adopting a logarithmic function conversion method; and converting the class type characteristics in the basic characteristic label and the prediction class characteristic label into numerical vectors by adopting one-hot coding.

In one embodiment, the determining of the recommendation result includes:

DeepFM model: the device comprises an input layer, an embedded layer, a model layer and an output layer;

inputting the basic feature label and the prediction class feature label into a deep FM model, and carrying out Embedding feature vectorization through an Embedding layer; splicing the linear characteristics together by the model layer through a linear model, wherein the linear model is the sum of the characteristics and the corresponding characteristic weights; and the output layer outputs the recommendation result through the two fully-connected layers.

A book recommendation apparatus based on user borrowing behavior-interest prediction, comprising:

the data acquisition module is used for acquiring data of the user borrowing behaviors;

the basic tag determining module is used for determining a basic feature tag based on the user borrowing behavior data;

the category label determining module is used for determining a prediction category feature label by adopting a weight calculation algorithm TFIDF and cosine similarity method based on the user borrowing behavior data;

and the recommendation result determining module is used for inputting the basic feature labels and the prediction class feature labels into a neural network model DeepFM constructed based on a factorization machine for Embedding feature vectorization, performing feature intersection on the feature vectors, inputting the feature vectors into a deep neural network, and outputting a recommendation result.

Compared with the prior art, the book recommendation method and device based on the user borrowing behavior-interest prediction provided by the embodiment of the invention have the following beneficial effects:

the invention provides a book recommendation method based on user borrowing behavior-interest prediction, which aims at the problem that the result is not accurate enough when books are recommended in a library. The method effectively constructs a user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirement of accurately positioning the user and improves the user satisfaction.

Drawings

FIG. 1 is a flow diagram illustrating a method for user borrowing behavior-interest prediction provided in one embodiment;

FIG. 2 is a user type distribution histogram provided in one embodiment;

FIG. 3 is a family readership word cloud provided in one embodiment;

FIG. 4 is an individual user word cloud provided in one embodiment;

FIG. 5 is a comparison graph of the Accuracy change curves of the book recommendation method without the prediction class label and the book recommendation method of the present invention (including the book recommendation method with the prediction class label) in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction is provided, and the method comprises the following steps:

and step 1, reading data.

And 2, preprocessing data. And (3) preprocessing the data read in the step (1).

And 3, performing data visualization analysis and constructing a basic label. And (3) performing visual analysis on the data after preprocessing in the step (2) by drawing a histogram and a word cloud picture, and constructing 3 types of basic labels of a fact type, a rule type and a text type.

And 4, constructing a prediction class label. And (3) constructing a prediction class label for the data preprocessed in the step (2) by adopting a TFIDF and cosine similarity method.

And 5, normalizing the feature labels. And 4, normalizing the numerical characteristic labels and the category characteristic labels generated in the steps 3 and 4.

And 6, recommending books. Inputting the feature labels obtained in the step 5 into a deep FM model for Embedding feature vectorization, performing feature intersection on the feature vectors after Embedding, then inputting the feature vectors into a deep neural network, and outputting a recommendation result.

And 7, comparing and analyzing book recommendation results. And comparing and analyzing book recommendation results by using recommendation Accuracy (Accuracy) as an objective evaluation index according to a book recommendation method without a prediction type label and the method (including the book recommendation method with the prediction type label).

The data preprocessing in the step 2 specifically comprises:

and (3) preprocessing the data read in the step (1), specifically comprising data deduplication, abnormal value processing, missing value processing and time format normalization. Wherein the abnormal value processing is mainly used for normalizing abnormal data beyond a normal time range; missing value processing refers to deleting rows with book numbers being empty; time format normalization refers to time format conversion by the strptime () function in python for data with non-uniform time format.

The data visualization analysis in the step 3 specifically comprises:

the data preprocessed in the step 2 are visualized by a method in a third-party library word cluster library and a drawing library matplotlib library which display word cloud pictures in python; the basic label construction in the step 3 specifically comprises the following steps:

on the basis of data visualization analysis, 3 types of basic labels of fact class, rule class and text class are constructed by utilizing preprocessed data. Fact class labels contain the total number of actions taken by the same user. The rule class label is that a threshold value is set for a statistic class label of a user by combining manual experience on the basis of a fact class label, and different threshold values correspond to different user characteristics. The text label comprises text information generated by the user, such as the sex, occupation, address, book name and the like of the user, and a jieba word segmentation method is selected for feature extraction of the text label.

The process of constructing the prediction class label in the step 4 specifically comprises the following steps:

and (3) constructing a prediction class label for the data preprocessed in the step (2) by adopting a TFIDF and cosine similarity method, wherein the specific process is as follows:

the TFIDF weight calculation logic is mainly divided into two parts: depth of interaction, TFIDF tag total weight value. The interaction depth refers to the depth of user behavior under each interaction behavior, and some features can measure the depth of the user behavior, such as user behavior type weight, user behavior frequency and the decay change of the behavior over time. And (4) calculating the importance degree of the different labels on the prediction result by using the TFIDF weight, wherein the higher the importance degree is, the higher the weight is. The TFIDF tag weight is calculated as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，F) (1)

where TFIDF (U, L) represents the objective weight of the label L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of the label among all the labels of the user (IDF (U, L)).

The formula for TF (U, L) is as follows:

where S (U, L) represents the number of times the tag L marks the user U, Σ S (U, L)_i) The number of all tags on the user U is indicated, and TF (U, L) indicates the proportion of the number of times of tagging of the tag L to the number of times of tagging of all tags of the user U.

The formula for IDF (U, L) is as follows:

wherein, Σ S (U)_i，L_i) Sum of all tags, Σ S (U), representing all users_iL) represents the sum of all L-tagged users, and IDF (U, L) represents the degree of scarcity of tag L in all tags of user U, i.e., the probability of occurrence of this tag. If a label L appears with little chance and is used to label user U, this makes the relationship between user U and label L tighter.

The interestingness at each moment in the time attenuation function in the analysis of the attenuation change of the behavior along with the time is represented by the following formula:

I_t＝e^-λΔt (4)

The calculation formula of the total weight of the user label is as follows:

W＝B_i×I_t×C_i×TFIDF (5)

The method for utilizing the cosine similarity in the step 4 specifically comprises the following steps:

cosine similarity is a measure for measuring the difference between two vectors by using the cosine value of the included angle of the two vectors, the closer the value is to 1, the more the included angle tends to 0 degree, the greater the correlation is, the calculation formula is as follows,

wherein, σ represents the number of users paying attention to the resource X and the resource Y at the same time, η represents the number of users paying attention to the resource X only, λ represents the number of users paying attention to the resource Y only, and p is a similarity coefficient for measuring the two resources concerned by the user. The larger the value of p, the higher the probability that the user will focus on two resources at the same time.

And when the cosine similarity is subjected to weight classification, inputting the preprocessed data feature list, and outputting the label with the highest recommendation score. The calculation process is as follows:

1. and calculating the similarity of every two labels, and orthogonalizing the two tables to obtain the combination of every two labels under each user.

2. And calculating the number of users corresponding to each label, namely the number of the labels appearing in different users.

3. And calculating the similarity of every two labels by using the cosine similarity, and finally obtaining the cosine similarity weight p.

4. Calculating the relative labels recommended to the user, and making the user correspond to all the labels related to the user, wherein a recommendation score calculation formula is as follows:

R＝W×p (7)

wherein, W is the total weight of the labels, and p is the weight of cosine similarity.

The normalization of the feature tag in the step 5 is specifically as follows:

and 4, normalizing the numerical characteristic labels and the category characteristic labels generated in the steps 3 and 4. And (3) carrying out normalization processing on the numerical characteristics (including user numbers, book numbers, ages and the like) by adopting a logarithmic function conversion method, wherein the logarithmic function conversion is as shown in a formula (8), and the one-hot codes are used for converting the category type characteristics (including the gender, occupation, personal information and resource categories of the users) into numerical vectors.

f(a)＝log₁₀(a) (8)

Wherein, a is a specific value corresponding to the numerical characteristic to be processed, and f (a) is a numerical value normalized by a logarithmic function with a base 10 as a base.

The book recommendation in the step 6 comprises the following specific processes:

and the deep FM model comprises an input layer, an Embedding layer, a model layer and an output layer, and the feature tag obtained in the step 5 is input into the deep FM model to be subjected to Embedding feature vectorization through the Embedding layer.

The model layer firstly splices linear features together through a generalized linear model, the linear model is as formula (9), feature crossing is carried out on the feature vectors after Embedding, and then the feature vectors are input into a deep neural network:

y＝ω₁x₁+ω₂x₂+...+ω_nx_n (9)

wherein x is_iRefer to individual characteristics of the input, ω_iThe weight of each feature is referred to, and the weight can reach a value which is in accordance with the prediction effect of the model through back propagation learning of the model.

And the output layer outputs the recommendation result through the two fully-connected layers.

Step 7, the specific process of comparing and analyzing the book recommendation result is as follows:

the recommendation Accuracy (Accuracy) is used as an objective evaluation index, and the book recommendation result is contrastively analyzed by calculating the Accuracy (Accuracy) value and drawing an Accuracy curve according to the book recommendation method without the prediction type label and the book recommendation method (including the book recommendation method with the prediction type label).

Example 1

And (3) executing steps 1 and 2:

10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out as an experimental data set by adopting offline borrowing data of a certain provincial library. The user has attributes of user number, age, gender, occupation and the like, and the book has attributes of book number, book name, category, author, publishing company and the like. The preprocessing of the data comprises duplication elimination, abnormal value processing, missing value processing and time format normalization.

And (3) executing the step:

the method comprises the steps of enabling preprocessed data to be visualized through a method in a third-party library word cluster library and a drawing library matplotlib library which are used for displaying word cloud pictures in python, basically labeling the data on the basis of visualization, dividing the data into a fact type label, a text type label and a rule type label 3 type basic label according to label generation types, and storing the label of a user in a format of a two-dimensional table structure.

And (4) executing:

constructing a prediction class label for the preprocessed data by using a TFIDF and cosine similarity method, adding a time attenuation function when calculating the label weight by using the TFIDF, and indirectly reflecting the interest change of a user on corresponding resources through the attenuation change of user behavior along with time; and calculating the similarity of every two labels by using cosine similarity, finally obtaining cosine similarity weight, and outputting the label with the highest recommendation score.

And (5) executing steps and 6:

carrying out normalization processing on the numerical characteristic by adopting a logarithmic function conversion method, and carrying out one-hot coding processing on the category characteristic to convert the category characteristic into a numerical vector; carrying out Embedding feature vectorization on the processed feature tag through an Embedding layer; meanwhile, in order to enable the model to learn various characteristics in data, the linear characteristics are spliced together by utilizing a generalized linear model, then characteristic crossing is carried out on the characteristic vectors after Embedding, meanwhile, according to the two characteristics of the age and the occupation of the user, the book recommendation effect of the user is far better than the recommendation effect only depending on single age or occupation characteristics, finally, the characteristic vectors after Embedding are input into a deep neural network for training, and an output result is obtained through two full-connection layers.

And 7, executing the step:

when the deep FM model is used for testing, the basic label and the label result added with the prediction class label are selected, 80% of the basic label and the label result added with the prediction class label are used as training sets for training samples, 20% of the basic label and the prediction class label are used as testing sets for testing the result of the model. 20% of the training sets were used as validation sets, which showed the results of each training. The recommendation effect of the method (the book recommendation method containing the prediction type label) is better than that of the book recommendation method without the prediction type label through the Accuracy change curve.

Here, 10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out as an experimental data set by using offline borrowing data of a certain provincial library.

In the aspect of objective evaluation, a common Accuracy (Accuracy) index is selected as a verification index of an experimental result, a calculation formula of the Accuracy is as follows,

wherein TP represents the number of samples labeled as positive samples and predicted to be also positive samples; TN represents the number of samples labeled as negative and predicted to be also negative; FP represents the number of samples labeled negative and predicted positive; FN represents the number of samples labeled as positive samples and predicted as negative samples.

As can be seen from the Accuracy variation curve in fig. 5. Compared with the book recommendation method without the prediction type tag, the method has better recommendation effect and can be used for book recommendation service of a digital library.

In one embodiment, a book recommendation apparatus based on user borrowing behavior-interest prediction is provided, the apparatus comprising:

and the data acquisition module is used for acquiring the data of the borrowing behavior of the user.

And the basic tag determining module is used for determining the basic feature tag based on the user borrowing behavior data.

And the category label determining module is used for determining the prediction category characteristic label by adopting a weight calculation algorithm TFIDF and cosine similarity method based on the user borrowing behavior data.

For specific limitations of a book recommendation device based on user borrowing behavior-interest prediction, reference may be made to the above limitations of a book recommendation method based on user borrowing behavior-interest prediction, which are not described herein again. The various modules in the above-mentioned book recommendation device based on user borrowing behavior-interest prediction can be realized in whole or in part by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features. Furthermore, the above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A book recommendation method based on user borrowing behavior-interest prediction is characterized by comprising the following steps:

acquiring user borrowing behavior data;

determining a basic feature tag based on the user borrowing behavior data;

2. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 1, further comprising:

3. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 2, wherein the determination of the base feature tag comprises:

the base feature tag, comprising:

4. The book recommendation method based on user borrowing behavior-interest prediction according to claim 3, wherein the weight calculation algorithm TFIDF comprises:

W＝B_i×I_t×C_i×TFIDF

5. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 3, wherein the calculation formula of the user behavior tag assignment weight is as follows:

TFIDF(U，L)＝TF(U，L)×IDF(U，L)

the formula for TF (U, L) is as follows:

the formula for IDF (U, L) is as follows:

6. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 3, wherein the time decay interestingness calculation formula is as follows:

I_t＝e^-λΔt

7. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 3, wherein the cosine similarity method specifically comprises:

R＝W×p

8. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 4, further comprising:

9. The book recommendation method based on user borrowing behavior-interest prediction as claimed in claim 4, wherein the determination of recommendation result comprises:

10. A book recommendation apparatus based on user borrowing behavior-interest prediction, comprising: