CN104750845A

CN104750845A - Apriori algorithm-based electronic book resource combined cataloguing method

Info

Publication number: CN104750845A
Application number: CN201510166306.9A
Authority: CN
Inventors: 葛君伟; 顾小龙; 方义秋; 贺茜
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2015-04-09
Filing date: 2015-04-09
Publication date: 2015-07-01

Abstract

The invention relates to an Apriori algorithm-based electronic book resource combined cataloguing method. The Apriori algorithm-based electronic book resource combined cataloguing method mainly comprises, through interfaces provided by various college electronic book resource databases, integrating various database resources of electronic book resources onto an electronic book resource combined retrieval platform. The Apriori algorithm-based electronic book resource combined cataloguing method is composed of two parts of, firstly, generating a classification replica catalog through the Apriori algorithm, namely, by mining the association rules among data; secondly, generating a central replica catalog through the Apriori algorithm, namely, by mining the association rules through the Apriori algorithm on the basis of the classification replica catalog, to obtain a final central replica catalog. Through combined cataloging of the data resources, the data resource retrieval and query efficiency can be further improved.

Description

A kind of books e-sourcing United catalogue method based on Apriori algorithm

Technical field

The invention belongs to books Electronic resource management method field, be specially a kind of books e-sourcing United catalogue method based on Apriori algorithm.

Background technology

Because books e-sourcing is from different platform, how better to manage mass data resource, how from huge database, to excavate valuable information, improving Library Management Level, better serving for soldiers, is problem worth thinking deeply about.And data mining (Data Mining, DM) technology is people from the data of magnanimity store decimation pattern, the mutual relationship found out between the rule of data variation and data provide method.Wherein correlation rule is the important Task of Data Mining one, is exactly to find association potential between data item, finds out dependence unknown between mass data.

But from the result of retrieval, the correlative study now for the data mining of books e-sourcing is also fewer.The Library Management Level of books e-sourcing is also in original data label and utilizes the stage, data resource retrieval efficiency that is slow and inquiry is low, how utilizing data mining technology, the efficiency improving data resource retrieval and inquiry just becomes a problem demanding prompt solution.

In the present patent application, first each Books in University Library e-sourcing database is utilized to provide interface, the various database resources of books e-sourcing are realized to be incorporated on books e-sourcing retrieval-by-unification platform, solve the situation of each application system segmentation of books e-sourcing, to originate difference, structure is different, and the various database resources that usage is different focus on the unified platform.The correlation rule of recycling data mining algorithm mining data resource, generates classification copy cataloguing and center copy cataloguing, by the United catalogue of data resource, and then improves the efficiency of data resource retrieval and inquiry.

Summary of the invention

Slow for library data resource retrieval in prior art and inquire about inefficient problem, the invention provides the fast books e-sourcing United catalogue method based on Apriori algorithm with improving the efficiency of inquiring about of a kind of data resource retrieval, technical scheme of the present invention is as follows: a kind of books e-sourcing United catalogue method based on Apriori algorithm, and it comprises the following steps:

101, utilize each Books in University Library e-sourcing database to provide interface, realize the various database resources of books e-sourcing being incorporated on books e-sourcing retrieval-by-unification platform;

102, Transaction Information base resource step 101 integrated adopts Apriori algorithm to generate classification copy cataloguing, and namely generate classification copy cataloguing by the correlation rule between mining data, concrete steps are:

A, preset minimum support number mincount, scanning transaction database DB also counts to get C1, finds out and meets the minimum 1-frequent item set supporting counting, be designated as L ₁;

B, reconfigure L ₁in item collection produce Candidate Set C ₂, again scan transaction database, find out and meet the minimum 2-frequent item set L supporting counting ₂, obtain the frequent item set L of classification copy cataloguing ₂;

103, according to the classification copy cataloguing L obtained in step 102 ₂, reconfigure L by Apriori_gen (L2) ₂in item collection produce Candidate Set C ₃, delete Candidate Set C ₃in do not belong to L ₂candidate; Circulation like this is gone down, until can not find new frequent k-item collection, namely scans transaction database, finds L _ifor empty set, then algorithm terminates, and the frequency collection finally obtained is as center copy cataloguing, and the classification copy that then utilization obtains is catalogued and copy cataloguing in center carries out United catalogue, carries out the resource retrieval of books e-sourcing by United catalogue.

Advantage of the present invention and beneficial effect as follows:

The present invention proposes a kind of books e-sourcing United catalogue method based on Apriori algorithm.By integrating books e-sourcing, Apriori algorithm generating center copy is utilized to catalogue and classification copy cataloguing, when carrying out books the search of electronic resources, directly carry out searching retrieval from classification copy cataloguing and center copy cataloguing the inside, greatly improve books the search of electronic resources speed and efficiency.

Accompanying drawing explanation

Fig. 1 is the overall generation cataloguing scheme schematic diagram of the preferred embodiment of the present invention;

Fig. 2 be the preferred embodiment of the present invention Apriori algorithm process flow diagram;

Fig. 3 is that the Apriori algorithm of the preferred embodiment of the present invention generates classification copy cataloguing schematic diagram;

Fig. 4 is that the Apriori algorithm of the preferred embodiment of the present invention generates classification copy cataloguing result figure;

Fig. 5 is the Apriori algorithm generating center copy cataloguing schematic diagram of the preferred embodiment of the present invention;

Fig. 6 is the Apriori algorithm generating center copy cataloguing result figure of the preferred embodiment of the present invention;

Embodiment

Below in conjunction with accompanying drawing, the invention will be further described:

The present invention includes two parts, Part I is that Apriori algorithm generates classification copy cataloguing, namely generates classification copy cataloguing by the correlation rule between mining data.Part II is Apriori algorithm generating center copy cataloguing, namely on the basis of classification copy cataloguing, recycles the excavation that Apriori algorithm carries out correlation rule, and obtain final center copy cataloguing, its General layout Plan as shown in Figure 1.

Detailed protocol describes

1) Apriori algorithm generates classification copy cataloguing

Apriori algorithm is a kind of algorithm of the most influential Mining Boolean Association Rules frequent item set.Its core is the recursive algorithm frequently collecting thought based on two benches.This correlation rule belongs to one-dimensional, individual layer, Boolean Association Rules in classification.Here, the item collection that all supports are greater than minimum support is called frequent item set, is called for short collection frequently.

Apriori algorithm uses the round-robin method of hierarchical sequence search to produce frequent item set, namely explores with frequent k-item collection and produces (k+1)-item collection.First, find out the frequent item set that length is 1, be designated as L ₁, for generation of frequent 2-item collection L ₂set, and L ₂for generation of frequent 3-item collection L ₃, so circulation is gone down, until can not find new frequent k-item collection.Look for each L _kneed scan database once, algorithm flow chart as shown in Figure 2.

Here we suppose to comprise 4 affairs in books e-sourcing transaction database DB, namely | and DB|=4, minimum support number mincount=2, i.e. minimum support minsup=2/4=50%.The detailed process of Mining Frequent Item Sets is as described below:

(1) data filtering

First transaction database DB scanned and count, obtaining C1, now taking out counting and be greater than the item collection of minimum support number ({ support number of D} is 1 be less than minimum support number 2 to its middle term collection, therefore delete D} item collection), produce L1={{A}, { B}, { C}, { F}}.The correlation rule produced due to first time scan database can produce redundant data, so continue to carry out second time scanning to database.

(2) generation of classification copy cataloguing

Now generate C2 by Apriori_gen (L1) (producing corresponding candidate), scan database DB, (wherein { A is counted to each collection in C2, B}, { A, C} support number is 1, be less than minimum support number 2, therefore { A is deleted, B}, { A, C} two item collection), calculate each Candidate Set in C2 and obtain L2, frequent item set { A now in obtained L2, C}, { B, C}, { B, F}, { C, F} catalogues as the classification copy in item data storehouse, generative process as shown in Figure 3, the frequency collection data item classification copy cataloguing result that data extract based on books e-sourcing data as shown in Figure 4.

2) Apriori algorithm generating center copy cataloguing

According to the classification copy cataloguing L2 generated, C3 is generated by Apriori_gen (L2), scanning transaction database D, each project in C3 is counted, take out the item collection (wherein { A being greater than minimum support number in C3, B, C}, { A, B, F}, { A, C, the support number of F} tri-item collection is 1, be less than minimum support number 2, therefore these three are deleted, leave the item collection that counting is greater than minimum support number), finally obtain L3, wherein { B, C, F} is that the frequency collection finally obtained is catalogued as center copy, generative process as shown in Figure 4, based on books e-sourcing data, the center copy cataloguing result of data genaration (has screened book series editor as shown in Figure 6, volume number, Deng nonmatching grids field).

The present invention has obtained classification copy cataloguing and center copy cataloguing by experiment, and for books e-sourcing retrieval-by-unification platform, retrieval effectiveness is desirable, consistent with the expection of design.

These embodiments are interpreted as only being not used in for illustration of the present invention limiting the scope of the invention above.After the content of reading record of the present invention, technician can make various changes or modifications the present invention, and these equivalence changes and modification fall into the scope of the claims in the present invention equally.

Claims

1., based on a books e-sourcing United catalogue method for Apriori algorithm, it is characterized in that, comprise the following steps: