CN110619084B

CN110619084B - Method for recommending books according to borrowing behaviors of library readers

Info

Publication number: CN110619084B
Application number: CN201910807495.1A
Authority: CN
Inventors: 宋爱香
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2023-05-05
Anticipated expiration: 2039-08-29
Also published as: CN110619084A

Abstract

The invention discloses a method for recommending books according to borrowing behaviors of library readers, which comprises the following steps: step 1, acquiring a reader borrowing set as an excavation sample by combining a book management authority and interview through an access system which is mainly conducted by a pre-expert; step 2, preprocessing the data of the excavated sample; step 3, clustering according to book types, finding out reading commonalities among readers, and dividing the books into four groups by using a k-Means algorithm and taking book borrowing conditions as parameters, wherein each group represents a reading mode; and 4, building a model by utilizing a GRNN algorithm, training, inputting book borrowing information of readers, and outputting four clustered groups. The method for recommending books according to the borrowing behavior of the readers in the library can be used for recommending books to the personal digital library of the readers and guiding the readers to borrow books with strong correlation; hidden relationships between disciplines are found to optimize the reader discipline knowledge structure.

Description

Method for recommending books according to borrowing behaviors of library readers

Technical Field

The invention belongs to the technical field of book informatization management, and particularly relates to a method for recommending books according to library reader borrowing behaviors.

Background

The library has the advantages of culturing morals, improving literacy, optimizing class, expanding knowledge surface and the like, and is widely established in various universities. With the rapid development of society and the continuous improvement of the living standard of people, the book borrowing information level can not meet the demands of readers. In recent years, information technology has been developed at a high speed, meaning the arrival of a large data age, and data mining technology has been applied to various fields. The reader borrows the books to be information-rich, and valuable data in the books borrowing information is obtained by utilizing a data mining technology, so that a foundation is laid for building an intelligent library.

Data mining is a process of extracting and exploring knowledge from a large amount of data which looks irregular, and searching for knowledge hidden in deep layers. The common data mining method has the advantages of classification, clustering, regression analysis, association rules, deviation analysis, web page mining and the like, and has the advantages of high data utilization rate, good book classification effect and the like. However, the data mining technique has a number of disadvantages, such as the inability to learn access to it for a long period of time.

The neural network algorithm has the advantages of strong self-learning capability, high running speed, good self-adaptation performance and the like, and can solve the problems of complex environment information, unclear background knowledge and ambiguous reasoning rules. Therefore, on the basis of data mining, library readers can be recommended with books by combining with the GRNN algorithm.

Disclosure of Invention

The invention aims to provide a method for recommending books according to library reader borrowing behaviors, which can effectively recommend readers and enhance reading experience of readers.

The technical scheme adopted by the invention is as follows: a method for recommending books according to library reader borrowing behaviors comprises the following steps:

step 1, acquiring a reader borrowing set as an excavation sample by combining a book management authority and interview through an access system which is mainly conducted by a pre-expert;

step 2, preprocessing the data of the excavated sample;

step 3, clustering according to book types, finding out reading commonalities among readers, and dividing the books into four groups by using a k-Means algorithm and taking book borrowing conditions as parameters, wherein each group represents a reading mode;

and 4, building a model by utilizing a GRNN algorithm, training, inputting book borrowing information of readers, outputting four groups of clusters, and recommending books contained in the four output cluster groups respectively.

The present invention is also characterized in that,

step 1, taking reader borrowing data of a library as an object, using an MS SQL Server2008 as a basic framework of the data, and carrying out pre-processing on the data in cooperation with an Excel tool to obtain 1000 transaction data related to the reader borrowing set as an excavation sample.

The step 2 specifically comprises the following steps:

step 2.1: firstly, cleaning data of the excavated sample: finding and correcting identifiable errors in the data file, checking data consistency, and processing invalid values and missing values;

step 2.2: and then converting the cleaned data: the books borrowed by readers are divided into 22 categories according to the "middle drawing method", and the related book types are predicted through the borrowing of the readers on certain books;

step 2.3: finally, integrating the converted data: the history of borrowing books by readers is integrated into transaction data which accords with the data mining requirement, and each transaction represents one borrowing action of the readers and comprises a unique identifier and an integration of various books borrowed by the readers.

The step 3 specifically comprises the following steps:

step 3.1: clustering according to book types, randomly searching four data points as initial centroid of each class, and representing four finally divided groups;

step 3.2: searching the class most similar to each object according to the principle of nearest to the center, and dividing other objects into corresponding classes;

step 3.3: after the assignment of the objects is completed, for each class, the average of all the objects is found as the new "centroid" of the group;

step 3.4: according to the principle of nearest to the center, dividing all objects again;

step 3.5: returning to step 3.3 until all generated classes have not changed.

The GRNN model training step of the step 4 specifically comprises the following steps:

step 4.1: dividing 1000 groups of samples into 20 samples with the same number, selecting 19 samples for training, establishing a GRNN model, and taking the rest one sample, namely 50 groups of samples, as test samples for verification, and calculating an error mean square value of a test result and a true value;

step 4.2: setting the value interval of the smooth factor sigma as [0.01,1], and changing the step length to 0.01 and gradually increasing;

step 4.3: and (3) readjusting the training sample and the test sample, repeating the steps 4.1 and 4.2, and repeating the steps for 20 times to obtain 20 groups of error mean square values corresponding to each smoothing factor, and obtaining the average value of the 20 groups of error mean square values.

The beneficial effects of the invention are as follows: according to the method for recommending books according to library reader borrowing behaviors, readers are divided into a plurality of groups with similar attributes through a cluster analysis model, the groups have similar attributes, and the reading mode of the readers is mined through the analysis cluster model; the clustered readers are divided into different groups according to different attributes, the groups are provided with similar characteristics, GRNN algorithm analysis is carried out according to the characteristics, the relevance of book borrowing is obtained, effective recommendation is carried out on the readers, reading experience of the readers is enhanced, and book management efficiency and the intelligent level of a library are improved.

Drawings

FIG. 1 is a flowchart of a K-means algorithm in a method for book recommendation based on library reader borrowing behavior in accordance with the present invention;

FIG. 2 is a block diagram of the GRNN algorithm in the method of book recommendation based on library reader borrowing.

Detailed Description

The invention will be described in detail with reference to the accompanying drawings and detailed description.

The invention provides a method for recommending books according to library reader borrowing behaviors, which is implemented according to the following steps:

step 1, acquiring a reader borrowing set as an excavation sample by combining a book management authority and interview through an access system which is mainly conducted by a pre-expert.

The related data set takes library readers to borrow data as a study object, and uses MS SQL Server2008 as a basic framework of the data to perform pre-processing on the data in cooperation with an Excel tool. And (3) obtaining 1000 pieces of transaction data related to the borrowing set of readers by processing the original data, and mining a sample.

And 2, preprocessing such as screening, denoising and the like is needed to be carried out on the data before digging. The method is implemented according to the following steps:

(1) The data cleaning comprises the detection of null values and the pertinence and uniqueness of sample data, the discovery and correction of identifiable errors in a data file, the checking of data consistency, the processing of invalid values and missing values and the like.

(2) The data conversion divides books borrowed by readers into 22 categories according to the "middle-image method", and the related book types are predicted through borrowing of books of certain categories by readers.

(3) The integration of data is to integrate the history list of borrowing of readers into transaction data which accords with the data mining requirement, and each transaction represents one borrowing action of readers and comprises a unique identification and a set of various books borrowed by the readers.

And step 3, clustering according to the book type to find out reading commonalities among readers. The k-Means algorithm is utilized, book borrowing conditions are used as parameters to be divided into four groups, and each group represents a reading mode.

The clustering analysis of the k-Means algorithm is carried out, as shown in fig. 1, specifically according to the following method:

(1) Randomly finding four data points as the initial "centroid" of each class, and representing the four classes of groups that are ultimately divided;

(2) Searching the class most similar to each object according to the principle of nearest to the center, and dividing other objects into corresponding classes;

(3) After the assignment of the objects is completed, for each class, the average of all the objects is found as the new "centroid" of the group;

(4) According to the principle of nearest to the center, dividing all objects again;

(5) Returning to the step (3) until all the generated classes have not changed.

Step 4, building a model by utilizing a GRNN algorithm and training, inputting book information borrowed by readers, outputting four groups of clusters, recommending books contained in the four output cluster groups respectively, and accordingly carrying out collection construction and information service work in a targeted manner; strengthen the knowledge of the relevant disciplines of the librarian and accurately provide discipline consultation.

The GRNN algorithm is used to build a model, as shown in fig. 2, in which neurons included in the input layer serve as distribution units, the number of which depends on characteristic signal data of the samples, and the learning samples are transmitted to the pattern layer without processing. The number of neurons contained in the pattern layer depends on the total number of learning samples, and each neuron corresponds to a different type of sample, and the pattern layer is connected through a connection systemNumber Y ⁱ The summing layer is connected. The summing layer sums the two different neurons and passes the result of the computation to the output layer. Dividing the obtained two calculation results by each neuron in the output layer to obtain a final output result, wherein the number of the neurons depends on the output type of the sample, and the output final result corresponds to the serial number of the estimation result.

Examples

The method comprises the steps of selecting borrowing data of readers in a library of western engineering university as a study object, wherein each borrowing of the readers has one piece of record data, compiling a program to integrate a record data set by utilizing a storage process provided by a program SQL Server2008, so that each reader borrows the record to form a transaction; the reader's identity, book borrowing type, number borrowed, book class number are displayed in the same row. All books are borrowed to get '1', otherwise '0'. The reader borrows transaction data as shown in table 1.

Table 1 readers borrow transaction data

Clustering is carried out according to the book types, and reading commonalities among readers are found out. The k-Means algorithm is used, the book borrowing condition is taken as a parameter to be divided into 4 groups, and each group represents 1 reading mode.

Table 2 book borrowing mode clustering

The GRNN sample is obtained by a librarian to borrow book information from college students, and the book types comprise I1, I2, TM, TN, TQ, H3, TP, O1 and O4, and the total test data is 1000 groups. After the above data are obtained, I1, I2, TM, TN, TQ, H3, TP, O1, and O4 are taken as characteristic values. Each sample has 9 eigenvalues, corresponding to 4 different types of clusters, namely cluster-1, cluster-2, cluster-3 and cluster-4, and the sample data is shown in Table 2 and is part of the sample data.

Table 3 book borrowing sample data

And 3, establishing a model and a training sample by using the GRNN to finish nonlinear mapping of input parameters and output results. And establishing a GRNN model, taking 9 characteristic values as input parameters, and taking a cluster-1, a cluster-2, a cluster-3 and a cluster-4 as output results. In addition, the essence of the network model training searches for the optimal smoothness factor, and the model evaluation performance is improved. The GRNN training procedure is as follows:

1. the 1000 groups of samples are divided into 20 samples with the same number, 19 samples are selected for training, a GRNN model is built, the remaining one sample, namely 50 groups of samples, are used as test samples for verification, and the error mean square value of the test result and the true value is calculated.

2. Setting the value interval of the smooth factor sigma as [0.01,1], and changing the step length by 0.01 to gradually increase.

3. Readjusting the training sample and the test sample, and repeating steps 1 and 2. And (3) circulating for 20 times to obtain 20 groups of error mean square values corresponding to each smoothing factor, and calculating the average value of the 20 groups of error mean square values.

Through the training steps, when the value of the smoothing factor sigma is smaller than or equal to 0.1, the diagnosis effect is good, and in order to improve the generalization capability of the diagnosis model, the value of the optimal smoothing factor is finally determined to be 0.1, and the corresponding minimum mean square error is 3.4e ^-4 . Therefore, the optimal smoothness factor, σ=0.1, was used for the book borrowing model.

The training accuracy of the intelligent analysis algorithm for the borrowing behavior of the library reader is 94.00%.

According to the intelligent analysis algorithm for the borrowing behavior of the library readers, the borrowing behavior of the readers recorded in the circulation information data of the library is researched, the K-means cluster analysis algorithm and the GRNN algorithm are utilized for mining and analyzing the data, and the readers are divided into a plurality of groups with similar attributes, namely the borrowing preference of the readers. The results of the cluster analysis and the GRNN algorithm can be used for recommending books to the personal digital library of readers, and guiding the readers to borrow books with strong correlation; hidden relationships between disciplines are found to optimize the reader discipline knowledge structure. In practical work applications such as database purchasing, book purchasing and subject construction, the borrowing tendency of readers of each group is known, the collection construction and information service work are facilitated, the subject consultation is accurately provided, and the popularization and application of the mining result are enhanced.

Claims

1. A method for recommending books according to library reader borrowing behavior, comprising the steps of:

step 1, acquiring a reader borrowing set as an excavation sample by combining a book management authority and interview through an access system which is mainly conducted by a pre-expert; taking reader borrowing data of a library as an object, using an MS SQL Server2008 as a basic framework of the data, and carrying out pre-processing on the data by matching with an Excel tool to obtain 1000 transaction data related to the reader borrowing set as an excavation sample;

step 2, preprocessing the data of the excavated sample; the method specifically comprises the following steps:

step 2.3: finally, integrating the converted data: the history of borrowing books by readers is collected into transaction data which accords with the data mining requirement, and each transaction represents one borrowing behavior of the readers and comprises a unique identifier and a collection of various books borrowed by the readers;

step 3, clustering according to book types, finding out reading commonalities among readers, and dividing the books into four groups by using a k-Means algorithm and taking book borrowing conditions as parameters, wherein each group represents a reading mode; the method specifically comprises the following steps:

step 3.5: returning to the step 3.3 until all the generated classes have not changed;

step 4, building a model by utilizing a GRNN algorithm and training, inputting book information borrowed by readers, outputting four groups of clusters, and recommending books contained in the four output cluster groups respectively; the GRNN model training steps specifically comprise: