CN107741993A

CN107741993A - A kind of method of University Digital Library data mining

Info

Publication number: CN107741993A
Application number: CN201711077156.XA
Authority: CN
Inventors: 崔垒
Original assignee: Foshan Zhangyang Technology Co Ltd
Current assignee: Foshan Zhangyang Technology Co Ltd
Priority date: 2017-11-06
Filing date: 2017-11-06
Publication date: 2018-02-27

Abstract

The method that a kind of University Digital Library data provided in an embodiment of the present invention are dug, it is more for college student quantity, school systems are various and complicated, each individually subsystem has these information of the data storage database of oneself to be closed in respective subsystem, it can not be interacted between subsystem database, information can not be shared, simple inquiry, addition, modification, deletion and statistical function can only be provided, become qualified information island, using isolated island, cause the resource serious waste of school, the problems such as cannot get effective reasonable utilization.A kind of method dug it is an object of the present invention to provide University Digital Library data, avoid above mentioned problem, related data largely is borrowed using caused in digital library, therefrom excavates our useful informations interested, for teaching management and offer decision-making foundation of optimizing allocation of resources.

Description

A kind of method of University Digital Library data mining

Technical field

The present invention relates to library data search field, more particularly to a kind of side of University Digital Library data mining Method.

Background technology

The rapid progress of the rapid development of computer science, particularly database technology and network technology so that people obtain Win the confidence breath and propagate information approach is more and more extensive, speed is increasingly faster, mode is more and more diversified, bar codes technique and letter With a large amount of uses of card, the IT application process of the association areas such as business, insurance, finance is caused to accelerate, All Around The World seemingly night Between enter an entirely different fresh information epoch.As the development of information technology is, it is necessary to the information content for storing and propagating Increasing, the form and species of information are increasingly abundanter, and the mechanism of traditional libraries obviously can not meet these needs.Cause This, there has been proposed the imagination of digital library.Digital library is the storage of a digitized information, can be stored a large amount of each The information of kind form, user can easily access it by network, to obtain these information, and the storage of its information and user Access without geographical restrictions.

Data mining technology is applied relatively extensively, but in the education sector of non-profit property in the various business of profitability Using but extremely poor.IT application in education sector is an indispensable important ring for China's information project, and modern education The only way which must be passed.China has built up and come into operation at present education and research network, national Broadband Satellite remote education network, height " Campus Interconnectivity " information engineering of school " Digital Campus " construction project and common primary school is all China's IT application in education sector Important content.Colleges and universities are the most important things of education sector, and the building action of Digital Campus has some idea of.The hair of Digital Campus Exhibition, the predicament of " information magnanimity, knowledge are very few " is inevitably also brought, how to turn into carefully using these information must face To realistic problem.Data mining technology can it is convenient and swift and it is efficient from vastness Digital Campus information in extract Implicit useful information, there is provided to policymaker as decision-making foundation, not merely with theory significance, the more information to education sector Change to build and there is important realistic function.

Therefore a kind of method for needing University Digital Library data to dig, can utilize caused a large amount of in digital library Borrow related data and obtain information interested, the offer decision-making foundation that manages and optimize allocation of resources is provided for teaching.

The content of the invention

A kind of method dug it is an object of the present invention to provide University Digital Library data, utilizes digital library In it is caused largely borrow related data, therefrom excavate our useful informations interested, be teaching management and optimization resource Configuration provides decision-making foundation.

A kind of method that University Digital Library data are dug, methods described include：

Step S101：Obtain the information base data of library's Borrowing History；

Step S102：Preprocessed data；

Step S103：Mining data；

Step S104：It is stored in linked database；

Step S105：Export book recommendation information.

Specifically, step S101：Obtain the information base data of library's Borrowing History；Wherein described information storehouse includes reader Information bank and book information storehouse.

Specifically, the information reader library package includes：The classification of reader, the age of reader, the specialty of reader and reading The hobby interests of person.

Specifically, the book information library package includes：The call number of books, the bar code of books, the title of books, figure The author of the publishing house of book, the publication date of books and books.

Specifically, step S102：Preprocessed data；Including by with different-format, separate sources, different geographical positions Put, the data that characteristic is different physically or are in logic integrated together, the data acquisition system of one unified standard of formation.

Specifically, step S102：Preprocessed data；It is clear including the data of mistake and careless mistake are carried out.

Specifically, step S102：Preprocessed data；Also include unified specificationization and handle all data, find being total to for data Same feature, then find a suitable description method and stipulations conversion is carried out to data.

Specifically, step S103：Mining data；Including being excavated using packet Apriori algorithm.

As seen through the above technical solutions：The side that a kind of University Digital Library data provided in an embodiment of the present invention are dug Method, more for college student quantity, school systems are various and complicated, and each individually subsystem has the data storage number of oneself According to storehouse, these information are closed in respective subsystem, can not be interacted between subsystem database, and information can not be shared, can only The simple inquiry of offer, addition, modification, deletion and statistical function, become qualified information island, using isolated island, lead Cause the resource serious waste of school, the problems such as cannot get effective reasonable utilization.It is an object of the present invention to provide one kind The method that University Digital Library data are dug, avoids above mentioned problem, largely correlation is borrowed using caused in digital library Data, our useful informations interested are therefrom excavated, for teaching management and offer decision-making foundation of optimizing allocation of resources.

Brief description of the drawings

Some specific embodiments of the present invention are described in detail by way of example, and not by way of limitation with reference to the accompanying drawings hereinafter. Identical reference denotes same or similar part or part in accompanying drawing.It should be appreciated by those skilled in the art that these What accompanying drawing was not necessarily drawn to scale.In accompanying drawing：

Fig. 1 is the method flow diagram that a kind of University Digital Library data of the embodiment of the present invention are dug.

Embodiment

The process of traditional Readers ' Borrowing Books books and periodicals is substantially as follows：Reader logs in book lending system, specific not knowing (such case refers to mostly, although borrowing direction at heart, to specifically borrowing bibliography simultaneously in the case of book borrowing purpose Do not decide), by browsing Library Frontpage, nearest one section may be searched and borrow bibliography ranking list, or browse graph The new book of the newest restocking in book shop etc. approach determines the bibliography finally to be borrowed, then logs in the book retrieval system in library System, the list that checks out is filled in, complete book borrowing and reading；Another situation is that had clearly to borrow very much books, directly logs in books The Books Retrieve System in shop, is then filled out the list that checks out, it is possible to which books are borrowed in completion.Book borrowing and reading process is carefully analyzed, is held very much Easy can finds that big too many levels all has uncertainty to reader's Many times wherein, if timely closed at this moment to reader Suitable recommendation, the demand that quickly auxiliary determines reader is so not only able to, reduces meaningless planless query process The time wasted, and whole process seemingly has special messenger to accompany, and gives reader more preferable Interactive Experience.

This below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out it is clear, Complete description, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Base Embodiment in the present invention, those of ordinary skill in the art obtained under the premise of creative work is not made it is all its His embodiment, belongs to the scope of protection of the invention.

Referring to Fig. 1, the method flow diagram dug for a kind of University Digital Library data of the embodiment of the present application；

Step S101：Obtain the information base data of library's Borrowing History；

Step S102：Preprocessed data；

The data of various forms, various sources, various geographical position, various characteristics physically or are in logic integrated To together, the data acquisition system of a unified standard is formed.

Step S103：Mining data；

It should be noted that data mining has following characteristics：First, data source must be the real original application of magnanimity Data；These real application data number of levelss are quite big, and there are many incomplete fuzzy data item, or even together Sample also vicious item (having noise), this is just needed during data mining, carries out data prediction；2nd, data mining Purpose be to find the knowledge for potentially having actual value.The purpose of data mining is for convenience it is found that hidden The knowledge of Tibetan, the knowledge excavated easily should be realized and be employed.3rd, data mining has specific aim.To the greatest extent The method of pipe data mining is diversified, and data mining is often analyzed and researched for a certain particular problem, is dug It is specific specific to excavate the knowledge come.

Step S104：It is stored in linked database；

Step S105：Export book recommendation information.

Obtain reader's books interested.

Further, step S101：Obtain the information base data of library's Borrowing History；Wherein described information storehouse includes reading Person's information bank and book information storehouse.

Further, the information reader library package includes：The classification of reader, the age of reader, reader specialty and The hobby interests of reader.

Each reader that the reader of Borrowing System occurs or is browsed by Catalog Search system queries to library, All it is potential service object, personalized ventilation system can be carried out to them.These potential service objects, have different special Belong to classification (undergraduate, Master degree candidate, doctoral candidate, teacher, scientific research personnel, administrative staff, common teaching and administrative staff etc.), they With different category attributes.

Further, the book information library package includes：The call number of books, the bar code of books, books title, The author of the publishing house of books, the publication date of books and books.

Further, step S102：Preprocessed data；Including by with different-format, separate sources, different geographical positions Put, the data that characteristic is different physically or are in logic integrated together, the data acquisition system of one unified standard of formation.

Further, step S102：Preprocessed data；It is clear including the data of mistake and careless mistake are carried out.

The initial data obtained from data source can have such-and-such mistake and careless mistake unavoidably.Such as in some tables Library card attribute should be 14 integers, and some is but shown as 0, and the ratio that this data occupy in total data is especially small, So will not be had an impact to the whole structure of data mining, therefore take the method directly deleted.

Further, step S102：Preprocessed data；Also include unified specificationization and handle all data, find data Common trait, then find a suitable description method and stipulations conversion is carried out to data.

Further, step S103：Mining data；Including being excavated using packet Apriori algorithm.

The process that Apriori algorithm is associated rule digging to data is broadly divided into two steps：First, ceaselessly follow Ring iterative, all frequent item sets are calculated, it is necessary that these obtained frequent item sets must are fulfilled for such a condition-support The minimum support threshold value being previously set more than or equal to user；Second step, generated on the basis of these frequent item sets out More than or equal to the rule of the min confidence of user's setting.Wherein search frequent item set is the core of Apriori algorithm, is accounted for whole The overwhelming majority of the amount of calculation of algorithm.Apriori algorithm shortcoming：The frequent scanning of first pair of database, in circulating each time Will scan database, cause sizable I/O expenses.Second generates substantial amounts of potential candidate.

Apriori algorithm improvement strategy has both direction, and one is the number for controlling scan database, and another is exactly to control Make the scale of potential candidate.We just put forth effort to improve this algorithm in terms of the two, improve the effect of algorithm performs Rate.For shortcoming one, the method for taking database to be grouped reduces the number of data record in scan database, reduces I/O Expense.For shortcoming two, we take the method that first beta pruning reconnects, and Apriori algorithm is first to connect beta pruning again, packet Apriori algorithm is acted in a diametrically opposite way, and so equivalent to the radix reduced before connecting, has deleted those nonmatching grids, institute Can effectively reduce connection number, so as to enhance the efficiency of algorithm.

Database is grouped, i.e., when database scan for the first time, the occurrence number of each is counted, 1- item Candidate Set C1 are produced, then transaction database D is grouped according to the maximum number of affairs middle term, that is to say, that there are i The set of the affairs of item is designated as D_i, so as to which transaction database, D points have been N number of group of D₁, D₂... D_N(N is the maximal term included Number).When by frequent 1- item collections L1 generation candidate's 2- item Candidate Sets C₂, during to C2 each candidate's item count, it is not necessary to scan whole Individual database D, but only scan D2 to DN.By that analogy, the record number scanned every time is all being reduced.

Apriori algorithm is grouped, i.e., first beta pruning reconnects.Directly first connection can produce many non-frequent subsets.Packet Apriori algorithm can avoid producing many non-frequent subsets.

So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized It is set to and covers other all these variations or modifications.

Claims

1. a kind of University Digital Library data dig stubborn method, it is characterised in that methods described includes：

Step S101：Obtain the information base data of library's Borrowing History；

Step S102：Preprocessed data；

Step S103：Mining data；

Step S104：It is stored in linked database；

Step S105：Export book recommendation information.

2. according to the method for claim 1, it is characterised in that step S101：Obtain the information bank of library's Borrowing History Data；Wherein described information storehouse includes information reader storehouse and book information storehouse.

3. according to the method for claim 2, it is characterised in that the information reader library package includes：The classification of reader, read Age, the specialty of reader and the hobby interests of reader of person.

4. according to the method for claim 2, it is characterised in that the book information library package includes：The call number of books, The bar code of books, the title of books, the publishing house of books, the author of the publication date of books and books.

5. according to the method for claim 1, it is characterised in that step S102：Preprocessed data；Including that will have not apposition The different data of formula, separate sources, different geographical position, characteristic physically or are in logic integrated together, and form one The data acquisition system of individual unified standard.

6. according to the method for claim 1, it is characterised in that step S102：Preprocessed data；Including by mistake and careless mistake Data carry out it is clear.

7. according to the method for claim 1, it is characterised in that step S102：Preprocessed data；Also include unified specification All data are handled, find the common trait of data, a suitable description method is then found and stipulations conversion is carried out to data.

8. according to the method for claim 1, it is characterised in that step S103：Mining data；Including using packet Apriori algorithm is excavated.