CN104750845A - Apriori algorithm-based electronic book resource combined cataloguing method - Google Patents

Apriori algorithm-based electronic book resource combined cataloguing method Download PDF

Info

Publication number
CN104750845A
CN104750845A CN201510166306.9A CN201510166306A CN104750845A CN 104750845 A CN104750845 A CN 104750845A CN 201510166306 A CN201510166306 A CN 201510166306A CN 104750845 A CN104750845 A CN 104750845A
Authority
CN
China
Prior art keywords
cataloguing
copy
books
apriori algorithm
sourcing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510166306.9A
Other languages
Chinese (zh)
Inventor
葛君伟
顾小龙
方义秋
贺茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201510166306.9A priority Critical patent/CN104750845A/en
Publication of CN104750845A publication Critical patent/CN104750845A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an Apriori algorithm-based electronic book resource combined cataloguing method. The Apriori algorithm-based electronic book resource combined cataloguing method mainly comprises, through interfaces provided by various college electronic book resource databases, integrating various database resources of electronic book resources onto an electronic book resource combined retrieval platform. The Apriori algorithm-based electronic book resource combined cataloguing method is composed of two parts of, firstly, generating a classification replica catalog through the Apriori algorithm, namely, by mining the association rules among data; secondly, generating a central replica catalog through the Apriori algorithm, namely, by mining the association rules through the Apriori algorithm on the basis of the classification replica catalog, to obtain a final central replica catalog. Through combined cataloging of the data resources, the data resource retrieval and query efficiency can be further improved.

Description

A kind of books e-sourcing United catalogue method based on Apriori algorithm
Technical field
The invention belongs to books Electronic resource management method field, be specially a kind of books e-sourcing United catalogue method based on Apriori algorithm.
Background technology
Because books e-sourcing is from different platform, how better to manage mass data resource, how from huge database, to excavate valuable information, improving Library Management Level, better serving for soldiers, is problem worth thinking deeply about.And data mining (Data Mining, DM) technology is people from the data of magnanimity store decimation pattern, the mutual relationship found out between the rule of data variation and data provide method.Wherein correlation rule is the important Task of Data Mining one, is exactly to find association potential between data item, finds out dependence unknown between mass data.
But from the result of retrieval, the correlative study now for the data mining of books e-sourcing is also fewer.The Library Management Level of books e-sourcing is also in original data label and utilizes the stage, data resource retrieval efficiency that is slow and inquiry is low, how utilizing data mining technology, the efficiency improving data resource retrieval and inquiry just becomes a problem demanding prompt solution.
In the present patent application, first each Books in University Library e-sourcing database is utilized to provide interface, the various database resources of books e-sourcing are realized to be incorporated on books e-sourcing retrieval-by-unification platform, solve the situation of each application system segmentation of books e-sourcing, to originate difference, structure is different, and the various database resources that usage is different focus on the unified platform.The correlation rule of recycling data mining algorithm mining data resource, generates classification copy cataloguing and center copy cataloguing, by the United catalogue of data resource, and then improves the efficiency of data resource retrieval and inquiry.
Summary of the invention
Slow for library data resource retrieval in prior art and inquire about inefficient problem, the invention provides the fast books e-sourcing United catalogue method based on Apriori algorithm with improving the efficiency of inquiring about of a kind of data resource retrieval, technical scheme of the present invention is as follows: a kind of books e-sourcing United catalogue method based on Apriori algorithm, and it comprises the following steps:
101, utilize each Books in University Library e-sourcing database to provide interface, realize the various database resources of books e-sourcing being incorporated on books e-sourcing retrieval-by-unification platform;
102, Transaction Information base resource step 101 integrated adopts Apriori algorithm to generate classification copy cataloguing, and namely generate classification copy cataloguing by the correlation rule between mining data, concrete steps are:
A, preset minimum support number mincount, scanning transaction database DB also counts to get C1, finds out and meets the minimum 1-frequent item set supporting counting, be designated as L 1;
B, reconfigure L 1in item collection produce Candidate Set C 2, again scan transaction database, find out and meet the minimum 2-frequent item set L supporting counting 2, obtain the frequent item set L of classification copy cataloguing 2;
103, according to the classification copy cataloguing L obtained in step 102 2, reconfigure L by Apriori_gen (L2) 2in item collection produce Candidate Set C 3, delete Candidate Set C 3in do not belong to L 2candidate; Circulation like this is gone down, until can not find new frequent k-item collection, namely scans transaction database, finds L ifor empty set, then algorithm terminates, and the frequency collection finally obtained is as center copy cataloguing, and the classification copy that then utilization obtains is catalogued and copy cataloguing in center carries out United catalogue, carries out the resource retrieval of books e-sourcing by United catalogue.
Advantage of the present invention and beneficial effect as follows:
The present invention proposes a kind of books e-sourcing United catalogue method based on Apriori algorithm.By integrating books e-sourcing, Apriori algorithm generating center copy is utilized to catalogue and classification copy cataloguing, when carrying out books the search of electronic resources, directly carry out searching retrieval from classification copy cataloguing and center copy cataloguing the inside, greatly improve books the search of electronic resources speed and efficiency.
Accompanying drawing explanation
Fig. 1 is the overall generation cataloguing scheme schematic diagram of the preferred embodiment of the present invention;
Fig. 2 be the preferred embodiment of the present invention Apriori algorithm process flow diagram;
Fig. 3 is that the Apriori algorithm of the preferred embodiment of the present invention generates classification copy cataloguing schematic diagram;
Fig. 4 is that the Apriori algorithm of the preferred embodiment of the present invention generates classification copy cataloguing result figure;
Fig. 5 is the Apriori algorithm generating center copy cataloguing schematic diagram of the preferred embodiment of the present invention;
Fig. 6 is the Apriori algorithm generating center copy cataloguing result figure of the preferred embodiment of the present invention;
Embodiment
Below in conjunction with accompanying drawing, the invention will be further described:
The present invention includes two parts, Part I is that Apriori algorithm generates classification copy cataloguing, namely generates classification copy cataloguing by the correlation rule between mining data.Part II is Apriori algorithm generating center copy cataloguing, namely on the basis of classification copy cataloguing, recycles the excavation that Apriori algorithm carries out correlation rule, and obtain final center copy cataloguing, its General layout Plan as shown in Figure 1.
Detailed protocol describes
1) Apriori algorithm generates classification copy cataloguing
Apriori algorithm is a kind of algorithm of the most influential Mining Boolean Association Rules frequent item set.Its core is the recursive algorithm frequently collecting thought based on two benches.This correlation rule belongs to one-dimensional, individual layer, Boolean Association Rules in classification.Here, the item collection that all supports are greater than minimum support is called frequent item set, is called for short collection frequently.
Apriori algorithm uses the round-robin method of hierarchical sequence search to produce frequent item set, namely explores with frequent k-item collection and produces (k+1)-item collection.First, find out the frequent item set that length is 1, be designated as L 1, for generation of frequent 2-item collection L 2set, and L 2for generation of frequent 3-item collection L 3, so circulation is gone down, until can not find new frequent k-item collection.Look for each L kneed scan database once, algorithm flow chart as shown in Figure 2.
Here we suppose to comprise 4 affairs in books e-sourcing transaction database DB, namely | and DB|=4, minimum support number mincount=2, i.e. minimum support minsup=2/4=50%.The detailed process of Mining Frequent Item Sets is as described below:
(1) data filtering
First transaction database DB scanned and count, obtaining C1, now taking out counting and be greater than the item collection of minimum support number ({ support number of D} is 1 be less than minimum support number 2 to its middle term collection, therefore delete D} item collection), produce L1={{A}, { B}, { C}, { F}}.The correlation rule produced due to first time scan database can produce redundant data, so continue to carry out second time scanning to database.
(2) generation of classification copy cataloguing
Now generate C2 by Apriori_gen (L1) (producing corresponding candidate), scan database DB, (wherein { A is counted to each collection in C2, B}, { A, C} support number is 1, be less than minimum support number 2, therefore { A is deleted, B}, { A, C} two item collection), calculate each Candidate Set in C2 and obtain L2, frequent item set { A now in obtained L2, C}, { B, C}, { B, F}, { C, F} catalogues as the classification copy in item data storehouse, generative process as shown in Figure 3, the frequency collection data item classification copy cataloguing result that data extract based on books e-sourcing data as shown in Figure 4.
2) Apriori algorithm generating center copy cataloguing
According to the classification copy cataloguing L2 generated, C3 is generated by Apriori_gen (L2), scanning transaction database D, each project in C3 is counted, take out the item collection (wherein { A being greater than minimum support number in C3, B, C}, { A, B, F}, { A, C, the support number of F} tri-item collection is 1, be less than minimum support number 2, therefore these three are deleted, leave the item collection that counting is greater than minimum support number), finally obtain L3, wherein { B, C, F} is that the frequency collection finally obtained is catalogued as center copy, generative process as shown in Figure 4, based on books e-sourcing data, the center copy cataloguing result of data genaration (has screened book series editor as shown in Figure 6, volume number, Deng nonmatching grids field).
The present invention has obtained classification copy cataloguing and center copy cataloguing by experiment, and for books e-sourcing retrieval-by-unification platform, retrieval effectiveness is desirable, consistent with the expection of design.
These embodiments are interpreted as only being not used in for illustration of the present invention limiting the scope of the invention above.After the content of reading record of the present invention, technician can make various changes or modifications the present invention, and these equivalence changes and modification fall into the scope of the claims in the present invention equally.

Claims (1)

1., based on a books e-sourcing United catalogue method for Apriori algorithm, it is characterized in that, comprise the following steps:
101, utilize each Books in University Library e-sourcing database to provide interface, realize the various database resources of books e-sourcing being incorporated on books e-sourcing retrieval-by-unification platform;
102, Transaction Information base resource step 101 integrated adopts Apriori algorithm to generate classification copy cataloguing, and namely generate classification copy cataloguing by the correlation rule between mining data, concrete steps are:
A, preset minimum support number mincount, scanning transaction database DB also counts to get C1, finds out and meets the minimum 1-frequent item set supporting counting, be designated as L 1;
B, reconfigure L 1in item collection produce Candidate Set C 2, again scan transaction database, find out and meet the minimum 2-frequent item set L supporting counting 2, obtain the frequent item set L of classification copy cataloguing 2;
103, according to the classification copy cataloguing L obtained in step 102 2, reconfigure L by Apriori_gen (L2) 2in item collection produce Candidate Set C 3, delete Candidate Set C 3in do not belong to L 2candidate; Circulation like this is gone down, until can not find new frequent k-item collection, namely scans transaction database, finds L ifor empty set, then algorithm terminates, and the frequency collection finally obtained is as center copy cataloguing, and the classification copy that then utilization obtains is catalogued and copy cataloguing in center carries out United catalogue, carries out the resource retrieval of books e-sourcing by United catalogue.
CN201510166306.9A 2015-04-09 2015-04-09 Apriori algorithm-based electronic book resource combined cataloguing method Pending CN104750845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510166306.9A CN104750845A (en) 2015-04-09 2015-04-09 Apriori algorithm-based electronic book resource combined cataloguing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510166306.9A CN104750845A (en) 2015-04-09 2015-04-09 Apriori algorithm-based electronic book resource combined cataloguing method

Publications (1)

Publication Number Publication Date
CN104750845A true CN104750845A (en) 2015-07-01

Family

ID=53590529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510166306.9A Pending CN104750845A (en) 2015-04-09 2015-04-09 Apriori algorithm-based electronic book resource combined cataloguing method

Country Status (1)

Country Link
CN (1) CN104750845A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149751A (en) * 2007-10-29 2008-03-26 浙江大学 Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
US20150052101A1 (en) * 2013-08-16 2015-02-19 Hon Hai Precision Industry Co., Ltd. Electronic device and method for transmitting files

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149751A (en) * 2007-10-29 2008-03-26 浙江大学 Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
US20150052101A1 (en) * 2013-08-16 2015-02-19 Hon Hai Precision Industry Co., Ltd. Electronic device and method for transmitting files

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张海燕: ""数据挖掘技术应用于大学图书馆系统研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
林郎碟: ""Apriori算法在图书推荐服务中的应用与研究"", 《计算机技术与发展》 *
梁子乐,等: ""基于Apriori算法的图书信息管理系统"", 《微计算机信息》 *

Similar Documents

Publication Publication Date Title
CN103631909B (en) System and method for combined processing of large-scale structured and unstructured data
CN103617217B (en) Hierarchical index based image retrieval method and system
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN103970853A (en) Method and device for optimizing search engine
CN111382226A (en) Database query retrieval method and device and electronic equipment
CN103020281A (en) Data storage and search method based on numerical indexing of spatial data
CN105550375A (en) Heterogeneous data integrating method and system
CN111506621A (en) Data statistical method and device
JP2019512125A (en) Database archiving method and apparatus, archived database search method and apparatus
CN104834650A (en) Method and system for generating effective query tasks
Kricke et al. Graph data transformations in Gradoop
CN105095436A (en) Automatic modeling method for data of data sources
CN103870489B (en) Chinese personal name based on search daily record is from extending recognition methods
CN106294792A (en) The method for building up of correlation inquiry system and set up system
CN106462591A (en) Partition filtering using smart index in memory
CN101894161B (en) Recurring event access method and device for real-time monitoring
CN103984700A (en) Heterogeneous data analysis method for vertical search of scientific information
CN104714956A (en) Comparison method and device for isomerism record sets
CN110825792A (en) High-concurrency distributed data retrieval method based on golang middleware coroutine mode
CN104750845A (en) Apriori algorithm-based electronic book resource combined cataloguing method
CN107577690B (en) Recommendation method and recommendation device for mass information data
CN105512161A (en) Thangka image interesting area semantic annotation and retrieval system
Olawumi et al. Scientometric review and analysis: A case example of smart buildings and smart cities
CN106952198A (en) A kind of Students ' Employment data analysing method based on Apriori algorithm
CN111107493B (en) Method and system for predicting position of mobile user

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150701

RJ01 Rejection of invention patent application after publication