CN109918563B - Book recommendation method based on public data - Google Patents

Book recommendation method based on public data Download PDF

Info

Publication number
CN109918563B
CN109918563B CN201910066005.7A CN201910066005A CN109918563B CN 109918563 B CN109918563 B CN 109918563B CN 201910066005 A CN201910066005 A CN 201910066005A CN 109918563 B CN109918563 B CN 109918563B
Authority
CN
China
Prior art keywords
books
book
recommendation
readers
reader
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910066005.7A
Other languages
Chinese (zh)
Other versions
CN109918563A (en
Inventor
王会进
朱蔚恒
龙舜
涂能彬
李田章
黄穗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201910066005.7A priority Critical patent/CN109918563B/en
Publication of CN109918563A publication Critical patent/CN109918563A/en
Application granted granted Critical
Publication of CN109918563B publication Critical patent/CN109918563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a book recommendation method based on public data, which particularly relates to the field of digital libraries, and specifically comprises the following recommendation methods: the invention uses the collective intelligent book recommendation technology to continuously collect the sales and recommendation information of books issued by each large network bookstore through the network technology, and integrates the reading experience and wisdom of massive users with the quantity far exceeding that of a library according to the ideas of the public data as a recommendation reference standard, thereby completing the analysis of the reading value of the books so as to recommend the most suitable books to readers, and providing a targeted recommendation scheme according to the different characteristics of various library books of the books and the personalized requirements of the users.

Description

Book recommendation method based on public data
Technical Field
The invention relates to the technical field of digital libraries, in particular to a book recommendation method based on public data.
Background
Currently, the explosive development of publishing industry provides more and more books for people, and readers have higher and higher demands for reading, and the requirements for book recommendation are characterized by diversity due to different backgrounds of readers. Taking a library as an example, besides the increasing of books, the main changes are concentrated on the internal environment atmosphere and the service quality of staff, aiming at the increasingly strong demand response deficiency in the aspects of individuation, initiative, high-efficiency information service and the like of users, in particular to the aspect of providing good book consultation service. Readers often want libraries to recommend some relevant book materials for selection based on a review request. Current library systems in this respect mostly offer only very limited services, either depending on the level of expertise of the library personnel, or recommending books based on limited users of the library borrowing records. However, the sharing of data of a single library or a plurality of libraries is still a small-scale data set, so that very accurate and comprehensive book recommendation service is difficult to realize;
the development direction of library information service is to provide personalized information service, help readers find books, improve book utilization rate and borrowing rate, and solve the problem of increasingly expanding 'book information overload'. Currently, this field of work is mainly focused on the digital library field. The personalized book information service for book borrowing of the traditional library has not been developed obviously in the field of practical application. Information service technology research works of Library information recommendation systems, such as a FAB recommendation system of Stanford university and a My Library system of Conneler university, have been carried out earlier abroad. The 'my library' system of the library of Zhejiang university is successfully developed and applied in China earlier, related research and development mainly researches association rules among books in the library or on-line bookstores, designs a recommendation algorithm related to calculation, and recommends a related book list to readers. Two types of techniques, content-based recommendation and collaborative filtering recommendation, are mostly used;
Content-based recommendation: it recommends to the user a product similar to the product he liked in the past based on the product he liked in the past. The process generally comprises the following three steps: 1) Extracting features for each item; 2) Learning the preference characteristics of a user by using the characteristic data of items that the user likes (and dislikes) in the past; 3) By comparing the user preference characteristics obtained in the previous step with the characteristics of the candidate items, a group of items with the greatest correlation are recommended for the user. The method has the advantages that: 1) Independence between users; 2) The result is easy to explain; 3) The new item may immediately get a recommendation. And disadvantages include: 1) Items may not be feature-easy to extract; 2) Potential interest characteristics of the user cannot be obtained; 3) No recommendation can be made for the new user;
collaborative filtering recommendation: collaborative filtering recommendation (Collaborative Filtering recommendation) algorithm is used for amazon (amazon.com) online bookstore and Facebook advertisement recommendation, and similar (interest) users of a specified user are found in a user group by analyzing user interests, and evaluation of certain information by the similar users is integrated to form a preference degree prediction of the specified user for the information by a system. Collaborative filtration has the following advantages: (1) Information difficult to automatically analyze based on content by a machine can be filtered; (2) Filtering can be performed based on some complex and difficult-to-express concepts (information quality, grade); (3) the novelty of the recommendation. The disadvantages are: (1) The user's evaluation of the merchandise is very sparse, so that the similarity between users based on the user's evaluation may be inaccurate (i.e., sparsity problem); (2) System performance may decrease with increasing users and goods; (3) If a certain commodity is never rated by the user, it is impossible for the commodity to be recommended (i.e., an initial rating problem). Therefore, the current e-commerce recommendation system adopts a recommendation technology combining several technologies.
The invention patent of patent application publication number CN104679835A discloses a book recommendation method based on multi-view hash, which comprises the following steps: 1) The behavior data of the user on two views, including book click data and search data, are screened out from the log collection system; 2) Constructing user feature vectors of users on clicking and searching views; 3) Obtaining user hash codes, hash functions and weights of the two views through a multi-view hash algorithm by utilizing behavior data of the two views; 4) Searching similar users for target users by using the obtained user hash codes; 5) And obtaining a book set clicked by the similar user, calculating the preference degree of the target user on the books as a recommendation candidate list, and returning to the previous N books with the maximum preference degree of the target user. According to the method, the behavior data of the user in the two views can be integrated into the hash code, so that the book recommendation accuracy is improved; on the other hand, hamming distance calculation speed of the hash code is high, so that book recommendation efficiency can be improved;
the invention patent of patent application publication number CN103886067A discloses a method for recommending books by using implicit topics of labels, wherein books are used as documents, book labels are used as words in the documents, an LDA-Gibbs algorithm is adopted to carry out topic modeling on the book labels to obtain a label-topic model, then the corresponding relation between the users and the labels is obtained according to book reading records of the users, an LDA-reference algorithm is used to obtain a user-topic model, finally users with similar interests are found according to similarity of the users on topic distribution, and collaborative filtering recommendation is carried out on the books. The invention fully digs semantic information in the book label, reduces the dimension required by the expression user by using the theme, reduces the calculated amount, is beneficial to improving the quality of the recommended result and has a certain practical value;
The invention patent of patent application publication number CN103886048A discloses a clustering-based incremental digital book recommendation method, which comprises the following steps: (1) Acquiring information of books read by a user from a website access log of the user, and then generating a user representation vector; (2) Selecting clusters to be calculated by using the dimension array, and then calculating cosine similarity between a user and the clusters to form a candidate set; (3) Finding out the cluster with highest similarity with the target user from the candidate set, then clustering according to the merging result, and incrementally updating cluster centers and cluster diameters; (4) And using the cluster core value as a sorting function to sort the items in the cluster, and using the items with high sorting as recommendation results. According to the invention, the preference information of the user on the books can be mined from the book access log of the user, and then the user is recommended, so that the expansibility and the instantaneity of the recommendation method are improved, and the resource utilization rate of the digital books and the reading experience of the user are enhanced;
the invention patent of patent application publication number CN103488714A discloses a book recommendation method and system based on a social network, wherein the method comprises the following steps: step 1, extracting interaction information of a user and other users in a social network, constructing a plurality of interaction type friend groups for the user, and dividing the other users with successful interaction relation with the user into different interaction type friend groups according to the interaction type, wherein the successful interaction is that the user responds to the interaction relation between the user and the other users; and 2, calculating the successful interaction number of the user and each friend in each interaction type friend group, selecting the first plurality of friends with the largest successful interaction number from each interaction type friend group, and recommending books with the largest reading of the plurality of friends to the user. The invention belongs to the technical field of network communication, and can conduct personalized recommendation of books according to the interaction behavior of users in a social network;
The proposal of the invention adopts a conventional recommendation method based on content or collaborative filtering, and has the main problems that:
(1) There is no way to provide easy access to large amounts of data that are good enough, and some even lack means to obtain the desired data in actual use. For example: the personal information and the use behavior and other data of the user are relied on for analysis, and in practice, when the user only wants to search books by using the library search system, the user seldom logs in the personal account, that is, the system cannot collect the use process data of the user, the personalized service lacks the usability, and even if readers log in the personal account, the search system still lacks a recommendation module of book correlation; other schemes need labels related to book contents, the quality of the labels directly influences the recommendation accuracy, and the quality of a method for automatically labeling by simply relying on contents such as book keywords cannot be ensured;
(2) Are retrieved based on topic keywords or the like. The search mode is obtained based on keyword matching, the number of search results is hundreds, and the interest of a user is often only a small part of the search results, and books with related contents but without the keywords cannot be searched due to the lack of recommendation technology of related books, so that the search results are unreasonable;
(3) The defects of a recommendation method based on content recommendation or collaborative filtering cannot be overcome, wherein the defects comprise difficult extraction of project features; the potential interest characteristics of the user cannot be analyzed; no recommendation can be made for the new user; sparse user evaluation of the commodity can cause analysis misalignment of similarity among users; the increase of users and books leads to the performance degradation of the system; the product that is not evaluated is not recommended, etc.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides a book recommendation method based on public data, which directly collects mass information from all large online bookstores through a web crawler, so that the problem of analyzing data sources is well solved; the intrinsic association between books obtained by analyzing the real behaviors of a large number of readers is not dependent on shared keywords or other manual or automatic labels, and can be associated to books with related content and no keywords, and the scheme ensures that book recommendation is not dependent on borrowing records of a limited number of readers, but can be directly obtained by analyzing comments of a large number of users of online bookstores, so that the scheme is more reasonable; the invention constructs a simulated library according to the library-like book arrangement mode, packages the reading value information of the books into a service interface, provides the service interface for readers, public libraries and other institutions, and recommends the related books with the best reading value by the system according to keywords or appointed books provided by the users, so that the users can obtain better book information service.
In order to achieve the above purpose, the present invention provides the following technical solutions: a book recommendation method based on public data specifically comprises the following recommendation methods:
firstly, acquiring book data of online bookstores, namely developing a web crawler, acquiring book data from excellent-amazon online bookstores, constructing a network of association relations among books according to information provided by the online bookstores, wherein the information of the association relations among books comprises '… books are purchased by people who purchase the books', '… books are purchased frequently', '… books are purchased by people who browse the books', and star rating of users, wherein the association relations are regarded as a recommendation relation among the books except for specific rating of readers, the recommendation relations can form a huge association network, and acquiring the association data among books except basic data, and storing the data into a database after cleaning and finishing;
step two, the reading value evaluation of books which can provide information by online bookstores is carried out, an evaluation algorithm (hereafter called a BookRank algorithm) is provided based on the data collected in the step one, the reading value of each book is measured according to the quantity and the quality of the in-chain and out-chain of the book, the reading value (hereafter called BookRank) of each book is calculated specifically, and the core thought of the BookRank algorithm is as follows: each link to the book is a vote for its approval, 1) the more approved books are more recommended than the more approved books are more recommended, 2) the more approved source books are of higher quality, the more approved books are more recommended than the high quality books are, the more one book is linked means the more voted by the other books, the higher the reading value is, and at this time, the higher BookRank value is given for measuring the importance of the books, namely the recommendation;
Acquiring data of the collected feature books, acquiring related data of the collected feature books from a library by taking information service as exchange conditions according to the identity of a borrower, then excavating internal relations between books and between readers and between books by analyzing a reader behavior log table, specifically reserving, borrowing and continuing borrowing, measuring the degree of closeness of the relation by calculating contribution weights between books and readers, influence degree between books and similarity support degree between books, and storing the acquired feature book borrowing data into a database after cleaning and finishing;
and step four, evaluating the reading value of the collected feature books, and measuring the recommendation value of the collected feature books according to the following principle based on the data in the step three: 1) The more times that the books and readers with higher authority have associated behaviors, the higher the recommendation value of the books and readers with higher authority; 2) The more times that the books and more readers have associated behaviors, the more recommendation is worth recommending; 3) The more readers the two books have associated behaviors simultaneously with more readers, the higher the recommendation weight among the books is;
the reading value of books is evaluated according to the identity, the borrowing quantity and various aspects of other books borrowed: firstly, calculating influence of different readers on different book categories from original data, wherein the more times that the reader A has a behavior association relation on a certain category C of books in a reader behavior log of the reader A, the more influence the reader A has on the book category C than on other book categories; if the identity of the reader A is more specialized and authoritative, the reader A has a larger influence on the book class C than other readers have on the book class C; in order to measure the influence degree of different readers on the book category to which the books related to the readers belong and the influence degree of the reader identity authority on the book category, a reader-category weight is introduced;
Then, calculating recommended weights among books, and if the books X and Y have association relations with a plurality of readers at the same time on the basis of the previous step, the contribution weights of the books X to the books Y are weighted sums of the reader-class weights of the readers to the class to which the books Y belong; the similarity between the books X and Y is the proportion of the number of readers with association relationship with the books to the union of the number of readers with association relationship with the books X and Y; the recommendation weight of the book X to the book Y is the product of the contribution weight of the book X to the book Y and the similarity of the book X and the book Y; the invention introduces a recommendation weight w (y-x) between books, wherein only reader groups associated with books x and y at the same time are considered;
step five, a book recommendation strategy is designed on the basis of obtaining book values, wherein the two book recommendation strategies are a recommendation strategy aiming at keyword retrieval and a recommendation strategy aiming at a certain book respectively, and the recommended books are provided for users according to the ordering of the book values from high to low;
recommendation strategies for search keywords: when a user inputs keywords to search, the system classifies and counts the number of the categories of books according to the different categories of the books, reasonably distributes the recommended number of the books in each category according to the different numbers of the categories in the results, selects books from the categories according to the BookRank values, and sorts the books according to the BookRank values after summarizing the books;
Recommended strategies for a particular book: when a user selects a certain book, recommending a certain number of related books according to the classification information of the book, wherein the thinking is as follows: when a user selects a book, it means that the books interested by the user may be in the book category classified as the selected book, so that the books which are worth recommending should be concentrated in the books in the current category, and the category is taken as a book selecting catalog; in the book database, all books are classified according to a tree structure, a total catalog is a total classification, the catalog name of the total catalog is a total classification number, the total catalog is used as a root node of the tree structure, a sub-catalog is used as a middle node, and books are used as leaf nodes; in this way, when the leaf number of the book selecting directory is insufficient or even none, the parent directory of the book selecting directory is set as the current book selecting directory, and the book selecting directory is expanded step by step until the selected books reach the recommended number N specified by the system;
therefore, when the books are not recommended sufficiently under the current classification, the recommended range can be enlarged to the upper-level catalogue, and when a user needs to search for a book, the system firstly searches for the book according to the search keywords input by the user, and the book with content correlation with the searched result is presented to the user according to a specific sorting algorithm;
And providing an interface for book reading values and recommendation services, storing the book reading values and related recommendation information in a background database, and providing an external group of inquiry interfaces through a network to provide related information services for each library.
In a preferred embodiment, the iterative formula of the BookRank in step two is as follows:
wherein BR n (A) Book rank value representing book a; (BR) n-1 (T i ) Representative book T i Book A, when it appears in book T, has BookRank value at last iteration i Is T i Linked to a; c (T) i ) Representative book T i The total number of all books in the related book list; d represents a damping coefficient; in the above formulaRepresenting a process of randomly accessing books by access chain in an associated networkMiddle T i The part of the book rank value with the proportion of d is distributed to each book in the related form, so that book A obtains the book T i 1/C (T) i ) Is a BookRank value of (C).
In a preferred embodiment, in the second step, after the algorithm iterates multiple times during the calculation, the book's book value finally converges to a more stable value.
In a preferred embodiment, in the second step, in the algorithm implementation, according to the calculation principle of PageRank's power method, bookRank also calculates in this way, so as to transform the above-mentioned formula abstraction into a solution Wherein x is the initial BookRank value of each book, and the initial value can be arbitrarily set, and the initial value is set to be 1; the matrix A is:
A=(1-d)×ee T +d×P T
wherein d is a damping coefficient, e is an n-dimensional column vector, e T For n-dimensional row vectors, P T Is a probability transition matrix [ S Krishnaswany ]]The method comprises the steps of carrying out a first treatment on the surface of the In each iterative calculation, the matrix A changes, and the calculation process of the power iterative method is as follows:
(1) x+.init// give an initial BookRank value
(2) r≡ax// r is BookRank value, A is a link relation matrix (as shown in formula 5.5)
(3) if f (||x-r||) < epsilon, return r// compare BookRank values before and after iteration
(4) else x+.r, goto (2)// fail ε, continue iteration
Wherein init is a set initial BookRank value, x represents a current BookRank value, r represents an iterated BookRank value, and ε represents a tolerance value.
In a preferred embodiment, in the second step, in order to construct the probability transition matrix a, the book association relationship link network is expressed as a link relationship matrix of n rows (n+1) columns, so as to facilitate implementation of the system program.
In a preferred embodiment, the "reader-class" weights in step four are as follows:
wherein i represents readers with association relation with books, c represents the category of books with association relation with reader i, h (i, c) represents the book behavior times of the books with association relation with reader i as category c, f (i) represents the behavior times of the books with association relation with reader i, k represents the weight value coefficient of the reader identity authority, the value range is 0 to 1, the reader with override authority has larger identity authority weight value k, and lambda (i, c) represents the weight value of 'reader-category', namely the influence degree of the reader i on the category of the books with association relation as category c.
In a preferred embodiment, in the fourth step, the recommended weights between books are as follows:
wherein i is the number of readers having an association with the book x, j is the number of readers having an association with the book x, c (x) is the book category to which the book x belongs, λ (j, c) represents the contribution weight of the reader j to the book category c, k represents the number of readers having an association with the book x and the book y at the same time, m represents the number of readers having an association with the book x, and n represents the number of readers having an association with the book y.
In a preferred embodiment, the book recommendation formula in the step four is as follows:
wherein RMD (W) represents the total number of recommendations made for the search keyword W; n represents the number of different book categories appearing in the search result; n represents the total number of search results;N i representing the number of books belonging to the ith class in the search result; a is that i Representing the total number of ith categories in the entire book database;representing a range of book numbers selected from the ith category for recommendation; u is a number of coefficients, the value of which is between 0 and 1, so as to ensure that the number of books selected as the recommendation in the class is within a reasonable range, and M is an upper limit for the recommendation of each class of books.
The invention has the technical effects and advantages that:
1. according to the invention, by means of a collective intelligent book recommendation technology, book sales and recommendation information issued by each large network bookstore is continuously collected through a network technology, according to the ideas of the public data as recommendation reference standard, the reading experience and wisdom of massive users with the quantity far exceeding that of users of one library are collected, so that the analysis of book reading value is completed, the most suitable books are recommended to readers, the libraries are not limited to the local limited borrowing data for recommendation, the reader service level of the libraries can be obviously improved, and a targeted recommendation scheme can be provided according to different characteristics of various library books and personalized requirements of the users, and the library is provided with service interfaces in a service mode;
2. the invention directly collects the massive information from each large online bookstore through the web crawler, thereby well solving the problem of analyzing the data sources; the intrinsic association between books obtained by analyzing the real behaviors of a large number of readers is not dependent on shared keywords or other manual or automatic labels, and can be associated to books with related content and no keywords, and the scheme ensures that book recommendation is not dependent on borrowing records of a limited number of readers, but can be directly obtained by analyzing comments of a large number of users of online bookstores, so that the scheme is more reasonable;
3. The invention constructs a simulated library according to the library-like book arrangement mode, packages the reading value information of the books into a service interface, provides the service interface for readers, public libraries and other institutions, and recommends the related books with the best reading value by the system according to keywords or appointed books provided by the users, so that the users can obtain better book information service.
Drawings
Fig. 1 is an overall flow chart of the present invention.
Fig. 2 is a flowchart of a book recommendation step according to search keywords provided by a user in embodiment 2 of the present invention.
Fig. 3 is a flowchart showing the recommended steps for a specific book in embodiment 3 of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
the invention provides a book recommendation method based on public data as shown in fig. 1, which specifically comprises the following recommendation methods:
Firstly, acquiring book data of online bookstores, namely developing a web crawler, acquiring book data from excellent-amazon online bookstores, constructing a network of association relations among books according to information provided by the online bookstores, wherein the information of the association relations among books comprises '… books are purchased by people who purchase the books', '… books are purchased frequently', '… books are purchased by people who browse the books', and star rating of users, wherein the association relations are regarded as a recommendation relation among the books except for specific rating of readers, the recommendation relations can form a huge association network, and acquiring the association data among books except basic data, and storing the data into a database after cleaning and finishing;
step two, the reading value evaluation of books which can provide information by online bookstores is carried out, an evaluation algorithm (hereafter called a BookRank algorithm) is provided based on the data collected in the step one, the reading value of each book is measured according to the quantity and the quality of the in-chain and out-chain of the book, the reading value (hereafter called BookRank) of each book is calculated specifically, and the core thought of the BookRank algorithm is as follows: each link to the book is a vote for its approval, 1) the more approved books are more recommended than the more approved books are more recommended, 2) the more approved source books are of higher quality, the more approved books are more recommended than the high quality books are, the more one book is linked means the more voted by the other books, the higher the reading value is, and at this time, the higher BookRank value is given for measuring the importance of the books, namely the recommendation;
The iterative formula for BookRank is as follows:
wherein BR n (A) Book rank value representing book a; (BR) n-1 (T i ) Representative book T i Book A, when it appears in book T, has BookRank value at last iteration i Is T i Linked to a; c (T) i ) Representative book T i The total number of all books in the related book list; d represents a damping coefficient; in the above formulaRepresenting T in the random access of books by access chain in an associated network i The part of the book rank value with the proportion of d is distributed to each book in the related form, so that book A obtains the book T i 1/C (T) i ) Is a BookRank value of (C);
in the calculation process, after a plurality of iterations, the BookRank value of each book finally converges to a more stable value;
in an algorithmic implementation, bookRank also uses this in accordance with the computational principles of PageRank's power methodThe method calculates the model, thereby abstracting the above formula and converting the abstract formula into solutionWherein x is the initial BookRank value of each book, and the initial value can be arbitrarily set, and the initial value is set to be 1; the matrix A is:
A=(1-d)×ee T +d×P T
wherein d is a damping coefficient, e is an n-dimensional column vector, e T For n-dimensional row vectors, P T Is a probability transition matrix [ S Krishnaswany ] ]The method comprises the steps of carrying out a first treatment on the surface of the In each iterative calculation, the matrix A changes, and the calculation process of the power iterative method is as follows:
(1) x+.init// give an initial BookRank value
(2) r≡ax// r is BookRank value, A is a link relation matrix (as shown in formula 5.5)
(3) if f (||x-r||) < epsilon, return r// compare BookRank values before and after iteration
(4) else x+.r, goto (2)// fail ε, continue iteration
Wherein init is a set initial BookRank value, x represents a current BookRank value, r represents an iterated BookRank value, and epsilon represents a tolerance value;
in order to construct the probability transition matrix A, the book association relation link network is expressed as a link relation matrix of n rows (n+1) columns, so that the realization of a system program is facilitated, and a BookRank algorithm is as follows (BookRank program flow pseudo code):
step three, collecting data of the collected characteristic books, wherein the online bookstore has information of the collected characteristic books (ancient books, old books, journals and the like) in the library, and the reading value of the books must be evaluated in a mode different from the method, so that according to the identity of a borrower, related data of the collected characteristic books after desensitization (removal or confusion of data possibly causing personal information data leakage) are obtained from the library by taking information service as exchange conditions, then the internal association between the books and readers (reservation, borrowing and continuous borrowing) and between the books is mined through analyzing a behavior log table, the degree of closeness of the relationship is measured by calculating the contribution weight between the books and readers, the influence degree between the books and the similarity support degree between the books, and the obtained characteristic book borrowing data is stored in a database after cleaning and sorting;
And step four, evaluating the reading value of the collected feature books, and measuring the recommendation value of the collected feature books according to the following principle based on the data in the step three: 1) The more times that the books and readers with higher authority have associated behaviors, the higher the recommendation value of the books and readers with higher authority; 2) The more times that the books and more readers have associated behaviors, the more recommendation is worth recommending; 3) The more readers the two books have associated behaviors simultaneously with more readers, the higher the recommendation weight among the books is;
the reading value of books is evaluated according to the identity of borrowers (related to the professional authority), the borrowing quantity and various aspects of other books borrowed: firstly, calculating influence of different readers on different book categories from original data, wherein the more times that the reader A has a behavior association relation on a certain category C of books in a reader behavior log of the reader A, the more influence the reader A has on the book category C than on other book categories; if the identity of the reader A is more specialized and authoritative, the reader A has a larger influence on the book class C than other readers have on the book class C; in order to measure the influence degree of different readers on the book category to which the books related to the readers belong and the influence degree of the reader identity authority on the book category, the weight of the reader-category is introduced as follows:
Wherein i represents readers with association relation with books, c represents the category of books with association relation with reader i, h (i, c) represents the book behavior times of the books with association relation for reader i, f (i) represents the behavior times of the books with association relation for reader i, k represents the weight value coefficient of reader identity authority, the value range is 0 to 1, the reader with override authority has larger identity authority weight value k, and lambda (i, c) represents the weight value of 'reader-category', namely the influence degree of the reader i on the category of the books with association relation is the category c;
then, calculating recommended weights among books, and if the books X and Y have association relations with a plurality of readers at the same time on the basis of the previous step, the contribution weights of the books X to the books Y are weighted sums of the reader-class weights of the readers to the class to which the books Y belong; the similarity between the books X and Y is the proportion of the number of readers with association relationship with the books to the union of the number of readers with association relationship with the books X and Y; the recommendation weight of the book X to the book Y is the product of the contribution weight of the book X to the book Y and the similarity of the book X and the book Y; the invention introduces recommendation weights w (y- > x) between books, wherein only reader groups associated with books x and y at the same time are considered:
Wherein i is the number of readers having an association with the book x, j is the number of readers having an association with the book x, c (x) is the book category to which the book x belongs, λ (j, c) represents the contribution weight of the reader j to the book category c, k represents the number of readers having an association with the book x and the book y at the same time, m represents the number of readers having an association with the book x, and n represents the number of readers having an association with the book y;
step five, a book recommendation strategy is designed on the basis of obtaining book values, wherein the two book recommendation strategies are a recommendation strategy aiming at keyword retrieval and a recommendation strategy aiming at a certain book respectively, and the recommended books are provided for users according to the ordering of the book values from high to low;
recommendation strategies for search keywords: when a user inputs keywords to search, the system classifies and counts the number of the categories of books according to the different categories of the books, reasonably distributes the recommended number of the books in each category according to the different numbers of the categories in the results, selects books from the categories according to the BookRank values, and sorts the books according to the BookRank values after summarizing the books;
wherein RMD (W) represents the total number of recommendations made for the search keyword W; n represents the number of different book categories appearing in the search result; n represents the total number of search results; n (N) i Representing the number of books belonging to the ith class in the search result; a is that i Representing the total number of ith categories in the entire book database;representing a range of book numbers selected from the ith category for recommendation; u is a number coefficient, the value is between 0 and 1, so that the number of books selected as recommendation in the class is ensured to be in a reasonable range, and M is an upper limit for recommendation of each class of books;
recommended strategies for a particular book: when a user selects a certain book, recommending a certain number of related books according to the classification information of the book, wherein the thinking is as follows: when a user selects a book, it means that the books interested by the user may be in the book category classified as the selected book, so that the books which are worth recommending should be concentrated in the books in the current category, and the category is taken as a book selecting catalog; in the book database, all books are classified according to a tree structure, a total catalog is a total classification, the catalog name of the total catalog is a total classification number, the total catalog is used as a root node of the tree structure, a sub-catalog is used as a middle node, and books are used as leaf nodes; in this way, when the leaf number of the book selecting directory is insufficient or even none, the parent directory of the book selecting directory is set as the current book selecting directory, and the book selecting directory is expanded step by step until the selected books reach the recommended number N specified by the system;
Therefore, when the books are not recommended sufficiently under the current classification, the recommended range can be enlarged to the upper-level catalogue, and when a user needs to search for a book, the system firstly searches for the book according to the search keywords input by the user, and the book with content correlation with the searched result is presented to the user according to a specific sorting algorithm;
and providing an interface for book reading values and recommendation services, storing the book reading values and related recommendation information in a background database, and providing an external group of inquiry interfaces through a network to provide related information services for each library.
The key idea of the invention is that a network of association relations between books is firstly constructed according to information provided by online bookstores, and the information related between books comprises '… books are purchased by people who purchase the books', '… books are purchased frequently together', '… books are purchased by people who browse the books', and the like, and star rating of users is not included, but the specific rating of readers on books is not included; and then, adopting a PageRank algorithm similar to Google to analyze and calculate the reading value information of the books from the association relation network between books. For a large number of library feature books (such as ancient books, old books publications and the like) in library collection, the invention evaluates the reading value of books according to the identity of borrowers (related to professional authority), the borrowing quantity, the borrowed other books and the like;
On the basis, the invention provides two query modes based on the keywords and the book names, and the related books suitable for the requirements of the readers are recommended to the readers according to the query of the readers and the reading value of the books.
Example 2:
the book recommendation step according to the search keywords provided by the user is as follows (refer to fig. 2 of the specification specifically):
1) Starting;
2) Inputting keywords;
3) If the book information matched with the search term exists, acquiring books with recommended association according to the related result, if the books are not present, outputting related prompts, and ending;
4) Acquiring books with recommendation association according to the related results, filtering recommendation book information if the books are acquired, and directly ending if the books are not acquired;
5) And after the recommended book information is filtered, outputting books with high recommended weight, and finally ending.
Example 3:
the recommended steps for a particular book are as follows (see in particular figure 3 of the description):
1) Searching a recommendation list of books on the current table, screening books which are classified to be consistent with the current books if the recommendation list is found, and selecting a book list of catalogue equipment to which the current books belong if the recommendation list is not found;
2) After books consistent with the current book classification are screened out, when N books are enough, recommendation is directly carried out;
3) Selecting books in the book selecting directory after selecting the book directory of the directory equipment to which the current books belong, and supplementing the books to N books;
4) If N is enough, the recommendation is directly carried out, and if N is not enough, whether the father directory is a root node is checked;
5) If the parent directory is the root node, the recommendation is directly performed, if the parent directory is not the root node, the parent directory is set as the book selecting directory, then the book selecting directory is returned to, and the recommendation is continued in the step of supplementing the book selecting directory to the N books.
The last points to be described are: first, in the description of the present application, it should be noted that, unless otherwise specified and defined, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be mechanical or electrical, or may be a direct connection between two elements, and "upper," "lower," "left," "right," etc. are merely used to indicate relative positional relationships, which may be changed when the absolute position of the object being described is changed;
secondly: in the drawings of the disclosed embodiments, only the structures related to the embodiments of the present disclosure are referred to, and other structures can refer to the common design, so that the same embodiment and different embodiments of the present disclosure can be combined with each other under the condition of no conflict;
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A book recommendation method based on public data is characterized in that: the recommendation method specifically comprises the following steps:
firstly, acquiring book data of online bookstores, namely developing a web crawler, acquiring book data from excellent-amazon online bookstores, constructing a network of association relations among books according to information provided by the online bookstores, wherein the information of the association relations among books comprises '… books are purchased by people who purchase the books', '… books are purchased frequently', '… books are purchased by people who browse the books', and star rating of users, wherein the association relations are regarded as a recommendation relation among the books except for specific rating of readers, the recommendation relations can form a huge association network, and acquiring the association data among books except basic data, and storing the data into a database after cleaning and finishing;
step two, the reading value evaluation of books which can provide information by online bookstores, based on the data collected in the step one, an evaluation algorithm, hereinafter referred to as a BookRank algorithm, is provided, the reading value of each book is measured according to the number and the quality of the in-chain and out-chain of the book, and the reading value of each book is calculated specifically, and the core thought of the algorithm, hereinafter referred to as BookRank, bookRank algorithm, is as follows: each link to the book is a vote for its approval, 1) the more approved books are more recommended than the more approved books are more recommended, 2) the more approved source books are of higher quality, the more approved books are more recommended than the high quality books are, the more one book is linked means the more voted by the other books, the higher the reading value is, and at this time, the higher BookRank value is given for measuring the importance of the books, namely the recommendation;
Acquiring data of the collected feature books, acquiring related data of the collected feature books from a library by taking information service as exchange conditions according to the identity of a borrower, then excavating internal relations between books and between readers and between books by analyzing a reader behavior log table, specifically reserving, borrowing and continuing borrowing, measuring the degree of closeness of the relation by calculating contribution weights between books and readers, influence degree between books and similarity support degree between books, and storing the acquired feature book borrowing data into a database after cleaning and finishing;
and step four, evaluating the reading value of the collected feature books, and measuring the recommendation value of the collected feature books according to the following principle based on the data in the step three: 1) The more times that the books and readers with higher authority have associated behaviors, the higher the recommendation value of the books and readers with higher authority; 2) The more times that the books and more readers have associated behaviors, the more recommendation is worth recommending; 3) The more readers the two books have associated behaviors simultaneously with more readers, the higher the recommendation weight among the books is;
the reading value of books is evaluated according to the identity, the borrowing quantity and various aspects of other books borrowed: firstly, calculating influence of different readers on different book categories from original data, wherein the more times that the reader A has a behavior association relation on a certain category C of books in a reader behavior log of the reader A, the more influence the reader A has on the book category C than on other book categories; if the identity of the reader A is more specialized and authoritative, the reader A has a larger influence on the book class C than other readers have on the book class C; in order to measure the influence degree of different readers on the book category to which the books related to the readers belong and the influence degree of the reader identity authority on the book category, a reader-category weight is introduced;
Then, calculating recommended weights among books, and if the books X and Y have association relations with a plurality of readers at the same time on the basis of the previous step, the contribution weights of the books X to the books Y are weighted sums of the reader-class weights of the readers to the class to which the books Y belong; the similarity between the books X and Y is the proportion of the number of readers with association relationship with the books to the union of the number of readers with association relationship with the books X and Y; the recommendation weight of the book X to the book Y is the product of the contribution weight of the book X to the book Y and the similarity of the book X and the book Y; introducing a recommendation weight w (y-x) between books, wherein only reader groups associated with books x and y at the same time are considered;
step five, a book recommendation strategy is designed on the basis of obtaining book values, wherein the two book recommendation strategies are a recommendation strategy aiming at keyword retrieval and a recommendation strategy aiming at a certain book respectively, and the recommended books are provided for users according to the ordering of the book values from high to low;
recommendation strategies for search keywords: when a user inputs keywords to search, the system classifies and counts the number of the categories of books according to the different categories of the books, reasonably distributes the recommended number of the books in each category according to the different numbers of the categories in the results, selects books from the categories according to the BookRank values, and sorts the books according to the BookRank values after summarizing the books;
Recommended strategies for a particular book: when a user selects a certain book, recommending a certain number of related books according to the classification information of the book, wherein the thinking is as follows: when a user selects a book, it means that the books interested by the user may be in the book category classified as the selected book, so that the books which are worth recommending should be concentrated in the books in the current category, and the category is taken as a book selecting catalog; in the book database, all books are classified according to a tree structure, a total catalog is a total classification, the catalog name of the total catalog is a total classification number, the total catalog is used as a root node of the tree structure, a sub-catalog is used as a middle node, and books are used as leaf nodes; in this way, when the leaf number of the book selecting directory is insufficient or even none, the parent directory of the book selecting directory is set as the current book selecting directory, and the book selecting directory is expanded step by step until the selected books reach the recommended number N specified by the system;
therefore, when the books are not recommended sufficiently under the current classification, the recommended range can be enlarged to the upper-level catalogue, and when a user needs to search for a book, the system firstly searches for the book according to the search keywords input by the user, and the book with content correlation with the searched result is presented to the user according to a specific sorting algorithm;
And providing an interface for book reading values and recommendation services, storing the book reading values and related recommendation information in a background database, and providing an external group of inquiry interfaces through a network to provide related information services for each library.
2. The method for book recommendation based on public data according to claim 1, wherein: the iterative formula of the BookRank in the second step is as follows:
BR n (A)=(1-d)+d×∑ n i=1 (BR n-1 (T i )/C(T i ))
wherein BR n (A) Book rank value representing book a; (BR) n-1 (T i ) Representative book T) i Book A, when it appears in book T, has BookRank value at last iteration i Is T i Linked to a; c (T) i ) Representative book T i The total number of all books in the related book list; d represents a damping coefficient; dx Σ in the above formula n i=1 (BR n-1 (T i )/C(T i ) Representing T in the random access of books by access chain in an associated network i The part of the book rank value with the proportion of d is distributed to each book in the related form, so that book A obtains the book T i 1/C (T) i ) Is a BookRank value of (C).
3. The method for book recommendation based on public data according to claim 2, wherein: in the second step, after the algorithm is iterated for a plurality of times in the calculation process, the BookRank value of each book finally converges to a more stable value.
4. A method of book recommendation based on public data as claimed in claim 3, wherein: in the second step, in the algorithm implementation, according to the calculation principle of PageRank exponentiation, bookRank also calculates in this way, so as to abstract the above formula and convert it into solutionWherein x is an initial BookRank value of each book, and the initial values can be arbitrarily set, and are all set to be 1; the matrix A is:
A=(1-d)×ee T +d×P T
wherein d is a damping coefficient, e is an n-dimensional column vector, e T For n-dimensional row vectors, P T Is a probability transition matrix [ S Krishnaswany ]]The method comprises the steps of carrying out a first treatment on the surface of the In each iterative calculation, the matrix a is changing.
5. The method for book recommendation based on public data of claim 4, wherein: in the second step, in order to construct the probability transition matrix A, the book association relationship link network is expressed as a link relationship matrix of n rows (n+1) columns, so that the realization of a system program is facilitated.
6. The method for book recommendation based on public data according to claim 1, wherein: the reader-class weight in the fourth step is as follows:
wherein i represents readers with association relation with books, c represents the category of books with association relation with reader i, h (i, c) represents the book behavior times of the books with association relation with reader i as category c, f (i) represents the behavior times of the books with association relation with reader i, k represents the weight value coefficient of the reader identity authority, the value range is 0 to 1, the reader with override authority has larger identity authority weight value k, and lambda (i, c) represents the weight value of 'reader-category', namely the influence degree of the reader i on the category of the books with association relation as category c.
7. The method for book recommendation based on public data according to claim 1, wherein: in the fourth step, the recommended weights between books are as follows:
wherein i is the number of readers having an association with the book x, j is the number of readers having an association with the book x, c (x) is the book category to which the book x belongs, λ (j, c) represents the contribution weight of the reader j to the book category c, k represents the number of readers having an association with the book x and the book y at the same time, m represents the number of readers having an association with the book x, and n represents the number of readers having an association with the book y.
8. The method for book recommendation based on public data according to claim 1, wherein: the book recommendation formula in the fourth step is as follows:
wherein RMD (W) represents the total number of recommendations made for the search keyword W; n represents the number of different book categories appearing in the search result; n represents the total number of search results; n (N) i Representing the number of books belonging to the ith class in the search result; a is that i Representing the total number of ith categories in the entire book database;representing the number of books selected from the ith category for recommendationA range of amounts; u is a number of coefficients, the value of which is between 0 and 1, so as to ensure that the number of books selected as the recommendation in the class is within a reasonable range, and M is an upper limit for the recommendation of each class of books.
CN201910066005.7A 2019-01-24 2019-01-24 Book recommendation method based on public data Active CN109918563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910066005.7A CN109918563B (en) 2019-01-24 2019-01-24 Book recommendation method based on public data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910066005.7A CN109918563B (en) 2019-01-24 2019-01-24 Book recommendation method based on public data

Publications (2)

Publication Number Publication Date
CN109918563A CN109918563A (en) 2019-06-21
CN109918563B true CN109918563B (en) 2023-10-20

Family

ID=66960668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910066005.7A Active CN109918563B (en) 2019-01-24 2019-01-24 Book recommendation method based on public data

Country Status (1)

Country Link
CN (1) CN109918563B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674448A (en) * 2019-09-27 2020-01-10 浙江树联智能科技有限公司 Book pushing method and device
CN112800341B (en) * 2021-04-15 2021-06-22 广州华赛数据服务有限责任公司 Education resource transmission system based on big data
CN113505154B (en) * 2021-05-31 2022-12-02 南京分布文化发展有限公司 Digital reading statistical analysis method and system based on big data
CN114398540B (en) * 2021-12-17 2024-04-26 广州诺图计算机科技有限公司 Intelligent book order recommending method for book management
CN115098803B (en) * 2022-08-24 2022-12-06 深圳市华图测控系统有限公司 Book recommendation method and system based on mobile library
CN116911926A (en) * 2023-06-26 2023-10-20 杭州火奴数据科技有限公司 Advertisement marketing recommendation method based on data analysis
CN116578726B (en) * 2023-07-10 2023-09-29 悦读天下(北京)国际教育科技有限公司 Personalized book recommendation system
CN116720927B (en) * 2023-08-08 2023-11-03 北京人天书店集团股份有限公司 Book recommendation method, system and storage medium
CN116992093B (en) * 2023-09-14 2024-05-28 东北农业大学 Library intelligent indexing method, device and storage medium based on reader borrowing behaviors

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183727A (en) * 2014-05-29 2015-12-23 上海研深信息科技有限公司 Method and system for recommending book
CN106067134A (en) * 2016-06-03 2016-11-02 朱志伟 A kind of network self-service type books are recommended and are purchased and borrow method
CN106202184A (en) * 2016-06-27 2016-12-07 华中科技大学 A kind of books personalized recommendation method towards libraries of the universities and system
CN106897430A (en) * 2017-02-27 2017-06-27 温州职业技术学院 The implementation method of digital library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105183727A (en) * 2014-05-29 2015-12-23 上海研深信息科技有限公司 Method and system for recommending book
CN106067134A (en) * 2016-06-03 2016-11-02 朱志伟 A kind of network self-service type books are recommended and are purchased and borrow method
CN106202184A (en) * 2016-06-27 2016-12-07 华中科技大学 A kind of books personalized recommendation method towards libraries of the universities and system
CN106897430A (en) * 2017-02-27 2017-06-27 温州职业技术学院 The implementation method of digital library

Also Published As

Publication number Publication date
CN109918563A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
CN109918563B (en) Book recommendation method based on public data
CN107391687B (en) Local log website-oriented hybrid recommendation system
Mohamed et al. Recommender systems challenges and solutions survey
CN110162706B (en) Personalized recommendation method and system based on interactive data clustering
CN103678635A (en) Network music aggregation recommendation method based on label digraphs
Amami et al. A graph based approach to scientific paper recommendation
US20110320442A1 (en) Systems and Methods for Semantics Based Domain Independent Faceted Navigation Over Documents
Ortega et al. Artificial intelligence scientific documentation dataset for recommender systems
CN114254201A (en) Recommendation method for science and technology project review experts
Liu et al. Fast recommendation on latent collaborative relations
CN110795613A (en) Commodity searching method, device and system and electronic equipment
Zhang et al. Multi-view dynamic heterogeneous information network embedding
CN116431895A (en) Personalized recommendation method and system for safety production knowledge
Yoshida et al. New performance index “attractiveness factor” for evaluating websites via obtaining transition of users’ interests
CN114861079A (en) Collaborative filtering recommendation method and system fusing commodity features
Joseph et al. A Comparative Study of Collaborative Movie Recommendation System
Liu et al. Multi-domain collaborative recommendation with feature selection
Utama et al. SCIENTIFIC ARTICLES RECOMMENDATION SYSTEM BASED ON USER’S RELATEDNESS USING ITEM-BASED COLLABORATIVE FILTERING METHOD
Baral et al. PERS: A personalized and explainable POI recommender system
Cui et al. Method of Collaborative Filtering Based on Uncertain User Interests Cluster.
Miranda et al. Towards the Use of Clustering Algorithms in Recommender Systems.
Xiao et al. Hybrid Embedding of Multi-Behavior Network and Product-Content Knowledge Graph for Tourism Product Recommendation.
Rahman et al. A conceptual model for the E-commerce application recommendation framework using exploratory search
Wu et al. Clustering technology application in e-commerce recommendation system
Martín Muñoz et al. Development of a travel recommender system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant