CN113590945B - Book recommendation method and device based on user borrowing behavior-interest prediction - Google Patents

Book recommendation method and device based on user borrowing behavior-interest prediction Download PDF

Info

Publication number
CN113590945B
CN113590945B CN202110846763.8A CN202110846763A CN113590945B CN 113590945 B CN113590945 B CN 113590945B CN 202110846763 A CN202110846763 A CN 202110846763A CN 113590945 B CN113590945 B CN 113590945B
Authority
CN
China
Prior art keywords
user
label
labels
behavior
tfidf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110846763.8A
Other languages
Chinese (zh)
Other versions
CN113590945A (en
Inventor
赵雪青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Polytechnic University
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202110846763.8A priority Critical patent/CN113590945B/en
Publication of CN113590945A publication Critical patent/CN113590945A/en
Application granted granted Critical
Publication of CN113590945B publication Critical patent/CN113590945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a book recommendation method and device based on user borrowing behavior-interest prediction, wherein the method comprises the following steps: acquiring data of borrowing behaviors of a user; determining a basic feature tag based on the borrowing behavior data of the user, and determining a prediction type feature tag by adopting a weight calculation algorithm TFIDF and cosine similarity method; inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the vectorization of the emboding features, and inputting the feature vectors into a deep neural network after feature crossing, and outputting a recommendation result. According to the method, the interest prediction label is constructed on the basis of analyzing the borrowing behaviors of the library user, and the deep FM is adopted to recommend books to the user. The method effectively builds the user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirements of the accurate positioning user and improves the user satisfaction.

Description

Book recommendation method and device based on user borrowing behavior-interest prediction
Technical Field
The invention relates to the field of big data, in particular to a book recommendation method and device based on user borrowing behavior-interest prediction.
Background
Digital libraries are increasingly rising in the big data age, and it is a necessary trend to mine the reading hobbies of users and recommend books. With the rapid development of mobile internet and self-media, the attention of users is continuously being transferred from the computer end to the mobile end. How to effectively grasp the focus of the user in the shortest time and continuously improve the user satisfaction is always a problem to be solved by the recommendation system.
Currently, a neural network model (Deep Factorization Machine, hereinafter abbreviated as deep fm) constructed based on a factoring machine has been widely used in the field of CTR (Click-Through-Rate) such as recommendation and advertisement, and because the deep fm model has a memory capacity of logistic regression and a generalization capacity of a neural network, users and resource characteristics can be directly learned by means of the memory capacity thereof; the generalization capability of the neural network is adopted, so that rare characteristics of a user can be effectively mined, the correlation with a label can be found, and further, the characteristics are automatically combined through the neural network, and a relatively stable recommendation result is obtained.
In 2010, massanari et al propose that the user representation is a representation model formed by user features, which can effectively analyze important features of the user. For the borrowing behavior of library users, as the user behavior is single, the user characteristics are analyzed by adopting the traditional user portrait, and the various behavior characteristics of the users are difficult to finely describe, so that the application of the user portrait in the library field is still in an exploration period. In recent years, yu Chuanming et al propose a behavior-content fusion model, yao Yuan et al propose a method for constructing a user portrait with a knowledge graph, chen Dan et al consider that a user portrait tag can be obtained from three approaches of user behavior, user social data and a user tag set; he et al propose a hybrid model of decision tree and logistic regression; kong et al propose a new context aware attention convolutional neural network; zhou et al propose a deep interest network to generate different representation vectors with different candidate advertisements to address the description of the diverse behavioral characteristics of the user. Zhang et al propose a time period division algorithm that analyzes and quantifies the user's interest distribution over a period. However, the above studies do not accurately locate the core needs of the user.
Disclosure of Invention
In order to achieve more accurate library service supply side, the invention provides a book recommendation method and device based on combination of user borrowing behavior analysis and interest prediction.
The book recommendation method based on the user borrowing behavior-interest prediction provided by the embodiment of the invention comprises the following steps:
acquiring data of borrowing behaviors of a user;
determining a basic feature tag based on the user borrowing behavior data;
based on the borrowing behavior data of the user, determining a prediction type feature tag by adopting a weight calculation algorithm TFIDF and cosine similarity method;
inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the vectorization of the emboding features, and inputting the feature vectors into a deep neural network after feature crossing, and outputting a recommendation result.
In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction further includes:
preprocessing the borrowing behavior data of the user, which specifically comprises the following steps: data deduplication, outlier processing, missing value processing and time format normalization; wherein, the liquid crystal display device comprises a liquid crystal display device,
the outlier processing includes: carrying out normalization processing on abnormal data exceeding a normal time range;
the missing value processing includes: deleting the row with the book number of being empty;
the time format normalization includes: the data with non-uniform time format is subjected to time format conversion through strptime () function in python.
In one embodiment, the determining of the basic feature tag includes:
drawing a histogram and a word cloud picture on the preprocessed user borrowing behavior data, and performing basic feature analysis of data visualization; determining a basic feature label according to a basic feature analysis result of data visualization;
the basic feature tag comprises:
the fact type label comprises the total times of behaviors generated by the same user;
the rule class labels are used for setting thresholds for the fact class labels of the users by combining manual experience on the basis of the fact class labels, and different thresholds correspond to different user characteristics;
a text class label containing text class information generated by a user; wherein the text class label comprises: the gender, occupation, address and book name of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method.
In one embodiment, the weight calculation algorithm TFIDF includes:
the weight calculation algorithm TFIDF has two logical parts: interaction depth, TFIDF label total weight value;
the interaction depth refers to the depth of the user's behavior measured by the partial features under each interaction behavior: user behavior type weight, number of user behaviors and decay change of behaviors with time; the TFIDF label total weight value reflects the importance degree of different labels to the prediction result, the higher the importance degree is, the larger the weight is, and the calculation formula of the TFIDF label total weight is as follows:
W=B i ×I t ×C i ×TFIDF
wherein B is i Representing behavior type weights, C i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i t Indicating time-decay interestingness, TFIDF indicates user behavior label assignment weights.
In one embodiment, the calculation formula of the user behavior label assignment weight is as follows:
TFIDF(U,L)=TF(U,L)×IDF(U,L)
wherein TFIDF (U, L) represents the objective weight of the labels L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of that label in the total labels of the user (IDF (U, L));
the calculation formula of TF (U, L) is as follows:
where S (U, L) represents the number of times the tag L marks the user U, ΣS (U, L) i ) The number of all the labels on the user U is represented, and TF (U, L) represents the proportion of the marking times of the labels L in the marking times of all the labels of the user U;
the calculation formula of IDF (U, L) is as follows:
wherein ΣΣs (U i ,L i ) Sum of all labels representing all users, Σs (U i L) represents the sum of all L-tagged users, IDF (U, L) represents the degree of scarcity of the tag L among all the tags of the user U, i.e. the probability of occurrence of this tag.
In one embodiment, the calculation formula of the time-decay interestingness is as follows:
I t =e -λΔt
wherein Deltat represents the number of days from the observation point at the behavior occurrence time t, lambda represents the attenuation factor, I t Indicating the interestingness at each moment.
In one embodiment, the cosine similarity method specifically includes:
the input when the cosine similarity is weighted and classified is user borrowing behavior data, the label with the highest recommended score is output, and the calculation process is as follows:
calculating the similarity of every two labels, and orthogonalizing the two labels to obtain every two combinations of all the labels under each user;
calculating the number of users corresponding to each label, namely the number of each label in different users;
calculating the similarity of every two labels by using cosine similarity, and finally obtaining cosine similarity weight p;
calculating related labels recommended to the user, and corresponding the user to all related labels, wherein a recommendation score calculation formula is as follows:
R=W×p
wherein W is the total weight of the tag; p is cosine similarity weight, and the calculation formula is as follows:
the σ represents the number of users who pay attention to the resource X and the resource Y at the same time, η represents the number of users who pay attention to the resource X, λ represents the number of users who pay attention to the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the users.
In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction further includes:
carrying out normalization processing on the numerical value type characteristics in the basic characteristic labels and the prediction type characteristic labels by adopting a logarithmic function conversion method; and converting the category type features in the basic feature tag and the predicted category feature tag into numerical vectors by using one-hot coding.
In one embodiment, the determining of the recommendation result includes:
deep fm model: the embedded type model comprises an input layer, an embedded layer, a model layer and an output layer;
inputting the basic feature tag and the prediction feature tag into a deep FM model, and carrying out Embedding feature vectorization through an Embedding layer; the model layer firstly splices the linear characteristics together through a linear model, wherein the linear model is the sum of the characteristics and the corresponding characteristic weights; the output layer outputs the recommended result through the two full-connection layers.
A book recommendation device based on user borrowing behavior-interest prediction, comprising:
the data acquisition module is used for acquiring the borrowing behavior data of the user;
the basic tag determining module is used for determining basic feature tags based on the borrowing behavior data of the user;
the category label determining module is used for determining a prediction category characteristic label by adopting a weight computing algorithm TFIDF and cosine similarity method based on the borrowing behavior data of the user;
the recommendation result determining module is used for inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factor decomposition machine to perform the vectorization of the Embedding feature, and inputting the feature vector into the deep neural network after feature crossing, and outputting a recommendation result.
Compared with the prior art, the book recommendation method and device based on the user borrowing behavior-interest prediction provided by the embodiment of the invention have the following beneficial effects:
aiming at the problem that the result is not accurate enough when library books are recommended, the invention provides a book recommendation method based on user borrowing behavior-interest prediction, an interest prediction label is constructed on the basis of analyzing library user borrowing behavior, and deep FM is adopted to recommend the books to the user. The method effectively builds the user behavior label system, carries out personalized recommendation by combining the user interests, realizes the core requirements of the accurate positioning user and improves the user satisfaction.
Drawings
FIG. 1 is a flow diagram of the construction of a user borrowing behavior-interest prediction method provided in one embodiment;
FIG. 2 is a histogram of user type distributions provided in one embodiment;
FIG. 3 is a word cloud of family readers community provided in one embodiment;
FIG. 4 is a cloud of individual user words provided in one embodiment;
FIG. 5 is a graph comparing Accuracy curves of an unpredicted class-labeled book recommendation method and the method of the present invention (book recommendation method including predictive class labels) provided in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, a book recommendation method based on user borrowing behavior-interest prediction is provided, the method comprising:
and step 1, reading data.
And 2, preprocessing data. And (3) preprocessing the data read in the step (1).
And 3, data visual analysis and basic label construction. And (3) drawing a histogram and a word cloud image on the data preprocessed in the step (2) to perform visual analysis on the data, and constructing 3-class basic tags of facts, rules and texts.
And 4, constructing a prediction type label. And (3) constructing a prediction type label by adopting a TFIDF and cosine similarity method for the data preprocessed in the step (2).
And 5, normalizing the feature labels. And (3) carrying out normalization processing on the numerical type feature labels and the category type feature labels generated in the steps (3) and (4).
And 6, recommending books. And (3) inputting the feature labels obtained in the step (5) into a deep FM model to perform the vectorization of the features, performing feature intersection on the feature vectors subjected to the vectorization, and then inputting the feature vectors into a deep neural network to output a recommendation result.
And 7, comparing and analyzing book recommendation results. And (3) taking the recommendation Accuracy (Accuracy) as an objective evaluation index, and comparing and analyzing book recommendation results according to the book recommendation method without the prediction type label and the book recommendation method comprising the prediction type label.
The data preprocessing in the step 2 specifically comprises the following steps:
preprocessing the data read in the step 1, wherein the preprocessing specifically comprises data deduplication, outlier processing, missing value processing and time format normalization. The abnormal value processing is mainly used for carrying out normalization processing on abnormal data exceeding a normal time range; the missing value processing means that a line with a book number being empty is deleted; time format normalization refers to time format conversion by strptime () function in python for data whose time format is not uniform.
The data visualization analysis in the step 3 specifically includes:
the data preprocessed in the step 2 are visualized through a method in a third party library wordcloud library and a drawing library matplotlib library for displaying word cloud pictures in python; the basic label construction in the step 3 specifically comprises the following steps:
on the basis of data visualization analysis, the 3-class basic tags of the fact class, the rule class and the text class are constructed by utilizing the preprocessed data. The fact-like tag contains the total number of actions that the same user has generated. The rule class labels are based on fact class labels, and are combined with manual experience to set thresholds for statistical class labels of users, and different thresholds correspond to different user characteristics. The text label comprises text information generated by a user, such as gender, occupation, address, picture name and the like of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method.
The process of constructing the prediction type label in the step 4 specifically includes:
and (3) constructing a prediction type label by adopting a TFIDF and cosine similarity method on the data preprocessed in the step (2), wherein the specific process is as follows:
the TFIDF weight calculation logic is largely divided into two parts: interaction depth, TFIDF label total weight value. The interaction depth refers to the depth of the user behavior, such as the weight of the user behavior type, the number of times of the user behavior and the attenuation change of the behavior with time, which can be measured by some characteristics of the user under each interaction behavior. And (3) calculating and reflecting the importance degree of different labels on the prediction result by using the TFIDF weight, wherein the higher the importance degree is, the larger the weight is. The formula for calculating TFIDF tag weights is as follows:
TFIDF(U,L)=TF(U,L)×IDF(U,F) (1)
where TFIDF (U, L) represents the objective weight of the labels L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of that label in the total labels of the user (IDF (U, L)).
The calculation formula of TF (U, L) is as follows:
where S (U, L) represents the number of times the tag L marks the user U, ΣS (U, L) i ) The number of all the tags on the user U is represented, and TF (U, L) represents the specific weight of the number of tags L in the number of tags of the user U.
The calculation formula of IDF (U, L) is as follows:
wherein ΣΣs (U i ,L i ) Sum of all labels representing all users, Σs (U i L) represents the sum of all L-tagged users, IDF (U, L) represents the degree of scarcity of the tag L among all the tags of the user U, i.e. the probability of occurrence of this tag. If a tag L is present at a small rate, and the tag is used to mark the user U, this results in a tighter relationship between the user U and the tag L.
The interestingness expression formula of each moment in the time decay function in the time decay change analysis of the behavior along with time is as follows:
I t =e -λΔt (4)
wherein Deltat represents the number of days from the observation point at the behavior occurrence time t, lambda represents the attenuation factor, I t Indicating the interestingness at each moment.
The calculation formula of the total weight of the user tag is as follows:
W=B i ×I t ×C i ×TFIDF (5)
wherein B is i Representing behavior type weights, C i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i t Indicating time-decay interestingness, TFIDF indicates user behavior label assignment weights.
The cosine similarity method in the step 4 specifically comprises the following steps:
cosine similarity is a measure for measuring the difference of two vectors by using the cosine value of the included angle of the two vectors, wherein the closer the cosine value is to 1, the more the included angle tends to 0 degree, the larger the correlation is, the following calculation formula is adopted,
wherein σ represents the number of users who pay attention to both the resource X and the resource Y, η represents the number of users who pay attention to only the resource X, λ represents the number of users who pay attention to only the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the user. The larger the p value, the higher the probability that the user is paying attention to both resources at the same time.
The input of the cosine similarity when the weight classification is carried out is a preprocessed data characteristic column, and the label with the highest recommended score is output. The calculation process is as follows:
1. and calculating the similarity of every two tags, and orthogonalizing the two tables to obtain every two combinations of all the tags under each user.
2. And calculating the number of users corresponding to each tag, namely the number of the tags in different users.
3. And calculating the similarity of every two labels by using the cosine similarity, and finally obtaining the cosine similarity weight p.
4. Calculating related labels recommended to the user, and corresponding the user to all the labels related to the user, wherein a recommendation score calculation formula is as follows:
R=W×p (7)
wherein W is the total weight of the tag, and p is the cosine similarity weight.
The normalization of the characteristic label in the step 5 is specifically as follows:
and (3) carrying out normalization processing on the numerical type feature labels and the category type feature labels generated in the steps (3) and (4). The numerical type features (including user numbers, book numbers, ages and the like) are normalized by adopting a logarithmic function conversion method, wherein the logarithmic function conversion is shown as a formula (8), and the category type features (including gender, occupation, personal information and resource categories of the user) are converted into numerical vectors by using one-hot codes.
f(a)=log 10 (a) (8)
Wherein a is a specific value corresponding to the numerical feature to be processed, and f (a) is a numerical value normalized by a logarithmic function based on 10.
The specific process of book recommendation in the step 6 is as follows:
the deep FM model comprises an input layer, an Embedding layer, a model layer and an output layer, the feature tag obtained in the step 5 is input into the deep FM model, and the Embedding layer is used for carrying out the vectorization of the Embedding feature.
The model layer firstly splices linear features together through a generalized linear model, the linear model is as shown in a formula (9), feature intersection is carried out on the feature vector after the Embedding, and then the feature vector is input into a deep neural network:
y=ω 1 x 12 x 2 +...+ω n x n (9)
wherein x is i Refers to the various features of the input, ω i The weight of each feature can be obtained through back propagation learning of the model to reach a value conforming to the prediction effect of the model.
The output layer outputs the recommended result through the two full-connection layers.
The specific process of the book recommendation result comparison analysis in the step 7 is as follows:
the method comprises the steps of adopting a recommendation Accuracy (Accurcy) as an objective evaluation index, and comparing and analyzing book recommendation results by calculating an Accuracy (Accurcy) value and drawing an Accurcy curve aiming at the book recommendation method without the prediction type label and the book recommendation method comprising the prediction type label.
Example 1
Executing the steps 1 and 2:
10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out by adopting off-line borrowing data of a certain provincial library to serve as an experimental data set. Wherein the user has the attributes of user number, age, sex, occupation and the like, and the book has the attributes of book number, book name, category, author, publishing agency and the like. Preprocessing the data comprises deduplication, outlier processing, missing value processing and time format normalization.
Executing the step 3:
the preprocessed data are visualized through a method in a third party library word closed library and a drawing library matplotlib library of a word cloud diagram displayed in python, basic labeling is carried out on the data on the basis of the visualization, the data are divided into a fact type label, a text type label and a rule type label 3 type basic label according to label generation types, and the labels of users are stored in a two-dimensional table structure format.
And 4, executing the step:
constructing a prediction type label for the preprocessed data by using a TFIDF and cosine similarity method, adding a time attenuation function when calculating the label weight by using the TFIDF, and indirectly reflecting the interest change of the user to the corresponding resource through the attenuation change of the user behavior along with time; and calculating the similarity of every two labels by using the cosine similarity, finally obtaining the cosine similarity weight, and outputting the label with the highest recommended score.
And (5) executing the steps (5, 6):
carrying out normalization processing on the numerical value type features by adopting a logarithmic function conversion method, and carrying out one-hot coding processing on the category type features to convert the category type features into numerical value vectors; performing Embedding feature vectorization on the processed feature tag through an Embedding layer; meanwhile, in order to enable the model to learn each feature in the data, the generalized linear model is utilized to splice the linear features together, then feature intersection is carried out on feature vectors after the Embedding, and meanwhile, according to the two features of the age and occupation of a user, the effect of book recommendation of the user is far better than the recommendation effect of the feature of only relying on single age or occupation, and finally the feature vectors after the Embedding are input into a deep neural network for training, and output results are obtained through two layers of fully connected layers.
Step 7 is executed:
when the deep FM model is used for testing, a basic label and a label result with a prediction type label are selected, 80% of the basic label and the label result with the prediction type label are used as training sets for training samples, 20% of the basic label and the label result with the prediction type label are used as testing sets, and the result of the test model is obtained. 20% of the training set was used as a validation set, with validation showing the results of each training. The Accumey change curve shows that the method (book recommendation method comprising the prediction type label) has better recommendation effect than the book recommendation method without the prediction type label.
Here, 10 ten thousand user behavior records and 60 ten thousand book resource data of 1 ten thousand users are randomly screened out by using offline borrowing data of a certain provincial library as an experimental data set.
In the aspect of objective evaluation, a commonly used Accuracy (Accurcry) index is selected as a verification index of an experimental result, the calculation formula of Accurcry is as follows,
wherein TP represents the number of samples for which the label is positive and the prediction is also positive; TN represents the number of samples that are marked as negative and predicted as negative; FP represents the number of samples that the label is a negative sample and the prediction is a positive sample; FN represents the number of samples that are marked positive and predicted negative.
As can be seen from the Accuracy profile in fig. 5. The method has better recommending effect than the book recommending method without the predictive label, and can be used for book recommending service of a digital library.
In one embodiment, a book recommendation device based on user borrowing behavior-interest prediction is provided, the device comprising:
and the data acquisition module is used for acquiring the borrowing behavior data of the user.
And the basic tag determining module is used for determining basic feature tags based on the borrowing behavior data of the user.
And the category label determining module is used for determining a prediction category characteristic label by adopting a weight computing algorithm TFIDF and cosine similarity method based on the user borrowing behavior data.
The recommendation result determining module is used for inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factor decomposition machine to perform the vectorization of the Embedding feature, and inputting the feature vector into the deep neural network after feature crossing, and outputting a recommendation result.
The specific definition of a book recommendation device based on the user borrowing behavior-interest prediction can be referred to as the definition of a book recommendation method based on the user borrowing behavior-interest prediction hereinabove, and will not be described herein. The respective modules in the book recommendation device based on the user borrowing behavior-interest prediction may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description. Also, the above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (7)

1. A book recommendation method based on user borrowing behavior-interest prediction, comprising:
acquiring data of borrowing behaviors of a user;
determining a basic feature tag based on the user borrowing behavior data;
based on the borrowing behavior data of the user, determining a prediction type feature tag by adopting a weight calculation algorithm TFIDF and cosine similarity method;
inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factorizer to perform the emboding feature vectorization, and inputting the feature vector into a deep neural network after feature crossing to output a recommendation result;
the determining of the basic feature tag comprises the following steps:
drawing a histogram and a word cloud picture on the preprocessed user borrowing behavior data, and performing basic feature analysis of data visualization; determining a basic feature label according to a basic feature analysis result of data visualization;
the basic feature tag comprises:
the fact type label comprises the total times of behaviors generated by the same user;
the rule class labels are used for setting thresholds for the fact class labels of the users by combining manual experience on the basis of the fact class labels, and different thresholds correspond to different user characteristics;
a text class label containing text class information generated by a user; wherein the text class label comprises: the gender, occupation, address and book name of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method;
the weight calculation algorithm TFIDF includes:
the weight calculation algorithm TFIDF has two logical parts: interaction depth, TFIDF label total weight value;
the interaction depth refers to the depth of the user's behavior measured by the partial features under each interaction behavior: user behavior type weight, number of user behaviors and decay change of behaviors with time; the TFIDF label total weight value reflects the importance degree of different labels to the prediction result, the higher the importance degree is, the larger the weight is, and the calculation formula of the TFIDF label total weight is as follows:
W=B i ×I t ×C i ×TFIDF
wherein B is i Representing behavior type weights, C i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i t The method comprises the steps of representing time decay interestingness, and TFIDF represents user behavior label allocation weights;
the cosine similarity method specifically comprises the following steps:
the input when the cosine similarity is weighted and classified is user borrowing behavior data, the label with the highest recommended score is output, and the calculation process is as follows:
calculating the similarity of every two labels, and orthogonalizing the two labels to obtain every two combinations of all the labels under each user;
calculating the number of users corresponding to each label, namely the number of each label in different users;
calculating the similarity of every two labels by using cosine similarity, and finally obtaining cosine similarity weight p;
calculating related labels recommended to the user, and corresponding the user to all related labels, wherein a recommendation score calculation formula is as follows:
R=W×p
wherein W is the total weight of the tag; p is cosine similarity weight, and the calculation formula is as follows:
the σ represents the number of users who pay attention to the resource X and the resource Y at the same time, η represents the number of users who pay attention to the resource X, λ represents the number of users who pay attention to the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the users.
2. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, further comprising:
preprocessing the borrowing behavior data of the user, which specifically comprises the following steps: data deduplication, outlier processing, missing value processing and time format normalization; wherein, the liquid crystal display device comprises a liquid crystal display device,
the outlier processing includes: carrying out normalization processing on abnormal data exceeding a normal time range;
the missing value processing includes: deleting the row with the book number of being empty;
the time format normalization includes: the data with non-uniform time format is subjected to time format conversion through strptime () function in python.
3. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the calculation formula of the user behavior label assignment weight is as follows:
TFIDF(U,L)=TF(U,L)×IDF(U,L)
wherein TFIDF (U, L) represents the objective weight of the labels L with respect to the user U, i.e. the product of the importance of each label L to the user U (TF (U, L)) and the importance of that label in the total labels of the user (IDF (U, L));
the calculation formula of TF (U, L) is as follows:
wherein S (U, L) represents the number of times the tag L marks the user U, ΣS (U.L) i ) The number of all the labels on the user U is represented, and TF (U, L) represents the proportion of the marking times of the labels L in the marking times of all the labels of the user U;
the calculation formula of IDF (U, L) is as follows:
wherein ΣΣs (U i ,L i ) Sum of all labels representing all users, Σs (U i L) represents the sum of all L-tagged users, IDF (U, L) represents the degree of scarcity of the tag L among all the tags of the user U, i.e. the probability of occurrence of this tag.
4. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the calculation formula of the time-decay interest level is as follows:
I t =e -λΔt
wherein Deltat represents the number of days from the observation point at the behavior occurrence time t, lambda represents the attenuation factor, I t Indicating the interestingness at each moment.
5. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, further comprising:
carrying out normalization processing on the numerical value type characteristics in the basic characteristic labels and the prediction type characteristic labels by adopting a logarithmic function conversion method; and converting the category type features in the basic feature tag and the predicted category feature tag into numerical vectors by using one-hot coding.
6. The book recommendation method based on user borrowing behavior-interest prediction as recited in claim 1, wherein the determining of the recommendation result comprises:
deep fm model: the embedded type model comprises an input layer, an embedded layer, a model layer and an output layer;
inputting the basic feature tag and the prediction feature tag into a deep FM model, and carrying out Embedding feature vectorization through an Embedding layer; the model layer firstly splices the linear characteristics together through a linear model, wherein the linear model is the sum of the characteristics and the corresponding characteristic weights; the output layer outputs the recommended result through the two full-connection layers.
7. A book recommendation device based on user borrowing behavior-interest prediction, comprising:
the data acquisition module is used for acquiring the borrowing behavior data of the user;
the basic tag determining module is used for determining basic feature tags based on the borrowing behavior data of the user;
the category label determining module is used for determining a prediction category characteristic label by adopting a weight computing algorithm TFIDF and cosine similarity method based on the borrowing behavior data of the user;
the recommendation result determining module is used for inputting the basic feature tag and the prediction type feature tag into a neural network model deep FM constructed based on a factor decomposition machine to perform the vectorization of the Embedding feature, and inputting the feature vector into a deep neural network after feature crossing, and outputting a recommendation result;
the basic tag determining module is specifically configured to:
drawing a histogram and a word cloud picture on the preprocessed user borrowing behavior data, and performing basic feature analysis of data visualization; determining a basic feature label according to a basic feature analysis result of data visualization;
the basic feature tag comprises:
the fact type label comprises the total times of behaviors generated by the same user;
the rule class labels are used for setting thresholds for the fact class labels of the users by combining manual experience on the basis of the fact class labels, and different thresholds correspond to different user characteristics;
a text class label containing text class information generated by a user; wherein the text class label comprises: the gender, occupation, address and book name of the user, and the characteristic extraction of the text label adopts a jieba word segmentation method;
the weight calculation algorithm TFIDF includes:
the weight calculation algorithm TFIDF has two logical parts: interaction depth, TFIDF label total weight value;
the interaction depth refers to the depth of the user's behavior measured by the partial features under each interaction behavior: user behavior type weight, number of user behaviors and decay change of behaviors with time; the TFIDF label total weight value reflects the importance degree of different labels to the prediction result, the higher the importance degree is, the larger the weight is, and the calculation formula of the TFIDF label total weight is as follows:
W=B i ×I t ×C i ×TFIDF
wherein B is i Representing behavior type weights, C i Representing the number of user behaviors and representing the total number of behaviors generated by a user; i t The method comprises the steps of representing time decay interestingness, and TFIDF represents user behavior label allocation weights;
the cosine similarity method specifically comprises the following steps:
the input when the cosine similarity is weighted and classified is user borrowing behavior data, the label with the highest recommended score is output, and the calculation process is as follows:
calculating the similarity of every two labels, and orthogonalizing the two labels to obtain every two combinations of all the labels under each user;
calculating the number of users corresponding to each label, namely the number of each label in different users;
calculating the similarity of every two labels by using cosine similarity, and finally obtaining cosine similarity weight p;
calculating related labels recommended to the user, and corresponding the user to all related labels, wherein a recommendation score calculation formula is as follows:
R=W×p
wherein W is the total weight of the tag; p is cosine similarity weight, and the calculation formula is as follows:
the σ represents the number of users who pay attention to the resource X and the resource Y at the same time, η represents the number of users who pay attention to the resource X, λ represents the number of users who pay attention to the resource Y, and p is a similarity coefficient for measuring the two resources which are paid attention to by the users.
CN202110846763.8A 2021-07-26 2021-07-26 Book recommendation method and device based on user borrowing behavior-interest prediction Active CN113590945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110846763.8A CN113590945B (en) 2021-07-26 2021-07-26 Book recommendation method and device based on user borrowing behavior-interest prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110846763.8A CN113590945B (en) 2021-07-26 2021-07-26 Book recommendation method and device based on user borrowing behavior-interest prediction

Publications (2)

Publication Number Publication Date
CN113590945A CN113590945A (en) 2021-11-02
CN113590945B true CN113590945B (en) 2023-07-28

Family

ID=78250153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110846763.8A Active CN113590945B (en) 2021-07-26 2021-07-26 Book recommendation method and device based on user borrowing behavior-interest prediction

Country Status (1)

Country Link
CN (1) CN113590945B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826850B (en) * 2019-09-24 2022-09-30 深圳市一德文化科技有限公司 Smart campus library management method and system based on information processing
CN117035245B (en) * 2023-10-10 2023-12-26 湖北中文在线数字出版有限公司 Book borrowing method and system based on digital person

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719145A (en) * 2009-11-17 2010-06-02 北京大学 Individuation searching method based on book domain ontology
CN110619084A (en) * 2019-08-29 2019-12-27 西安工程大学 Method for recommending books according to borrowing behaviors of readers in library
CN111444428A (en) * 2020-03-27 2020-07-24 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
CN112163161A (en) * 2020-10-14 2021-01-01 上海交通大学 Recommendation method and system for college library, readable storage medium and electronic equipment
CN112765339A (en) * 2021-01-21 2021-05-07 山东师范大学 Personalized book recommendation method and system based on reinforcement learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278910A1 (en) * 2014-03-31 2015-10-01 Microsoft Corporation Directed Recommendations
US20210174257A1 (en) * 2019-12-04 2021-06-10 Cerebri AI Inc. Federated machine-Learning platform leveraging engineered features based on statistical tests

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719145A (en) * 2009-11-17 2010-06-02 北京大学 Individuation searching method based on book domain ontology
CN110619084A (en) * 2019-08-29 2019-12-27 西安工程大学 Method for recommending books according to borrowing behaviors of readers in library
CN111444428A (en) * 2020-03-27 2020-07-24 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
CN112163161A (en) * 2020-10-14 2021-01-01 上海交通大学 Recommendation method and system for college library, readable storage medium and electronic equipment
CN112765339A (en) * 2021-01-21 2021-05-07 山东师范大学 Personalized book recommendation method and system based on reinforcement learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Maria Soledad Pera 等. Analyzing Book-Related Features to Recommend Books for Emergent Readers.《HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media》.2015,221-230. *
兴趣驱动的用户借阅行为分析及启发式借阅流程模型构建;夏小娜 等;《 图书馆理论与实践 》;57-64 *
基于用户兴趣变化的数字图书馆知识推荐服务研究;曾子明 等;《图书馆论坛》;94-99 *
基于读者偏好变化的高校图书个性化推荐方法;胡代平 等;《系统管理学报》;824-829 *

Also Published As

Publication number Publication date
CN113590945A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
Srinivasan et al. Biases in AI systems
WO2019175571A1 (en) Combined methods and systems for online media content
CN109947909B (en) Intelligent customer service response method, equipment, storage medium and device
Kennard et al. Evaluating word embeddings using a representative suite of practical tasks
US8166032B2 (en) System and method for sentiment-based text classification and relevancy ranking
CN112860841B (en) Text emotion analysis method, device, equipment and storage medium
CN106611375A (en) Text analysis-based credit risk assessment method and apparatus
CN110544155A (en) User credit score acquisition method, acquisition device, server and storage medium
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
CN113590945B (en) Book recommendation method and device based on user borrowing behavior-interest prediction
US11367117B1 (en) Artificial intelligence system for generating network-accessible recommendations with explanatory metadata
US20230177626A1 (en) Systems and methods for determining structured proceeding outcomes
CN109255012A (en) A kind of machine reads the implementation method and device of understanding
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN110826315B (en) Method for identifying timeliness of short text by using neural network system
Zheng et al. Algorithm for recommending answer providers in community-based question answering
CN109740156B (en) Feedback information processing method and device, electronic equipment and storage medium
Li et al. The user preference identification for product improvement based on online comment patch
CN115329207B (en) Intelligent sales information recommendation method and system
Liang et al. Detecting novel business blogs
He et al. Word embedding based document similarity for the inferring of penalty
Boluki et al. Evaluating the effectiveness of pre-trained language models in predicting the helpfulness of online product reviews
CN112508615A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
Hiniduma et al. Data Readiness for AI: A 360-Degree Survey
Chaudhary et al. Fake News Detection During 2016 US Elections Using Bootstrapped Metadata-Based Naïve Bayesian Classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant