CN107741993A - A kind of method of University Digital Library data mining - Google Patents
A kind of method of University Digital Library data mining Download PDFInfo
- Publication number
- CN107741993A CN107741993A CN201711077156.XA CN201711077156A CN107741993A CN 107741993 A CN107741993 A CN 107741993A CN 201711077156 A CN201711077156 A CN 201711077156A CN 107741993 A CN107741993 A CN 107741993A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- books
- reader
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000007418 data mining Methods 0.000 title description 13
- 238000005065 mining Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 abstract description 5
- 238000012986 modification Methods 0.000 abstract description 5
- 238000007726 management method Methods 0.000 abstract description 4
- 238000007792 addition Methods 0.000 abstract description 3
- 238000013500 data storage Methods 0.000 abstract description 3
- 238000012217 deletion Methods 0.000 abstract description 3
- 230000037430 deletion Effects 0.000 abstract description 3
- 239000002699 waste material Substances 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000013138 pruning Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000009423 ventilation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Primary Health Care (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The method that a kind of University Digital Library data provided in an embodiment of the present invention are dug, it is more for college student quantity, school systems are various and complicated, each individually subsystem has these information of the data storage database of oneself to be closed in respective subsystem, it can not be interacted between subsystem database, information can not be shared, simple inquiry, addition, modification, deletion and statistical function can only be provided, become qualified information island, using isolated island, cause the resource serious waste of school, the problems such as cannot get effective reasonable utilization.A kind of method dug it is an object of the present invention to provide University Digital Library data, avoid above mentioned problem, related data largely is borrowed using caused in digital library, therefrom excavates our useful informations interested, for teaching management and offer decision-making foundation of optimizing allocation of resources.
Description
Technical field
The present invention relates to library data search field, more particularly to a kind of side of University Digital Library data mining
Method.
Background technology
The rapid progress of the rapid development of computer science, particularly database technology and network technology so that people obtain
Win the confidence breath and propagate information approach is more and more extensive, speed is increasingly faster, mode is more and more diversified, bar codes technique and letter
With a large amount of uses of card, the IT application process of the association areas such as business, insurance, finance is caused to accelerate, All Around The World seemingly night
Between enter an entirely different fresh information epoch.As the development of information technology is, it is necessary to the information content for storing and propagating
Increasing, the form and species of information are increasingly abundanter, and the mechanism of traditional libraries obviously can not meet these needs.Cause
This, there has been proposed the imagination of digital library.Digital library is the storage of a digitized information, can be stored a large amount of each
The information of kind form, user can easily access it by network, to obtain these information, and the storage of its information and user
Access without geographical restrictions.
Data mining technology is applied relatively extensively, but in the education sector of non-profit property in the various business of profitability
Using but extremely poor.IT application in education sector is an indispensable important ring for China's information project, and modern education
The only way which must be passed.China has built up and come into operation at present education and research network, national Broadband Satellite remote education network, height
" Campus Interconnectivity " information engineering of school " Digital Campus " construction project and common primary school is all China's IT application in education sector
Important content.Colleges and universities are the most important things of education sector, and the building action of Digital Campus has some idea of.The hair of Digital Campus
Exhibition, the predicament of " information magnanimity, knowledge are very few " is inevitably also brought, how to turn into carefully using these information must face
To realistic problem.Data mining technology can it is convenient and swift and it is efficient from vastness Digital Campus information in extract
Implicit useful information, there is provided to policymaker as decision-making foundation, not merely with theory significance, the more information to education sector
Change to build and there is important realistic function.
Therefore a kind of method for needing University Digital Library data to dig, can utilize caused a large amount of in digital library
Borrow related data and obtain information interested, the offer decision-making foundation that manages and optimize allocation of resources is provided for teaching.
The content of the invention
A kind of method dug it is an object of the present invention to provide University Digital Library data, utilizes digital library
In it is caused largely borrow related data, therefrom excavate our useful informations interested, be teaching management and optimization resource
Configuration provides decision-making foundation.
A kind of method that University Digital Library data are dug, methods described include:
Step S101:Obtain the information base data of library's Borrowing History;
Step S102:Preprocessed data;
Step S103:Mining data;
Step S104:It is stored in linked database;
Step S105:Export book recommendation information.
Specifically, step S101:Obtain the information base data of library's Borrowing History;Wherein described information storehouse includes reader
Information bank and book information storehouse.
Specifically, the information reader library package includes:The classification of reader, the age of reader, the specialty of reader and reading
The hobby interests of person.
Specifically, the book information library package includes:The call number of books, the bar code of books, the title of books, figure
The author of the publishing house of book, the publication date of books and books.
Specifically, step S102:Preprocessed data;Including by with different-format, separate sources, different geographical positions
Put, the data that characteristic is different physically or are in logic integrated together, the data acquisition system of one unified standard of formation.
Specifically, step S102:Preprocessed data;It is clear including the data of mistake and careless mistake are carried out.
Specifically, step S102:Preprocessed data;Also include unified specificationization and handle all data, find being total to for data
Same feature, then find a suitable description method and stipulations conversion is carried out to data.
Specifically, step S103:Mining data;Including being excavated using packet Apriori algorithm.
As seen through the above technical solutions:The side that a kind of University Digital Library data provided in an embodiment of the present invention are dug
Method, more for college student quantity, school systems are various and complicated, and each individually subsystem has the data storage number of oneself
According to storehouse, these information are closed in respective subsystem, can not be interacted between subsystem database, and information can not be shared, can only
The simple inquiry of offer, addition, modification, deletion and statistical function, become qualified information island, using isolated island, lead
Cause the resource serious waste of school, the problems such as cannot get effective reasonable utilization.It is an object of the present invention to provide one kind
The method that University Digital Library data are dug, avoids above mentioned problem, largely correlation is borrowed using caused in digital library
Data, our useful informations interested are therefrom excavated, for teaching management and offer decision-making foundation of optimizing allocation of resources.
Brief description of the drawings
Some specific embodiments of the present invention are described in detail by way of example, and not by way of limitation with reference to the accompanying drawings hereinafter.
Identical reference denotes same or similar part or part in accompanying drawing.It should be appreciated by those skilled in the art that these
What accompanying drawing was not necessarily drawn to scale.In accompanying drawing:
Fig. 1 is the method flow diagram that a kind of University Digital Library data of the embodiment of the present invention are dug.
Embodiment
The process of traditional Readers ' Borrowing Books books and periodicals is substantially as follows:Reader logs in book lending system, specific not knowing
(such case refers to mostly, although borrowing direction at heart, to specifically borrowing bibliography simultaneously in the case of book borrowing purpose
Do not decide), by browsing Library Frontpage, nearest one section may be searched and borrow bibliography ranking list, or browse graph
The new book of the newest restocking in book shop etc. approach determines the bibliography finally to be borrowed, then logs in the book retrieval system in library
System, the list that checks out is filled in, complete book borrowing and reading;Another situation is that had clearly to borrow very much books, directly logs in books
The Books Retrieve System in shop, is then filled out the list that checks out, it is possible to which books are borrowed in completion.Book borrowing and reading process is carefully analyzed, is held very much
Easy can finds that big too many levels all has uncertainty to reader's Many times wherein, if timely closed at this moment to reader
Suitable recommendation, the demand that quickly auxiliary determines reader is so not only able to, reduces meaningless planless query process
The time wasted, and whole process seemingly has special messenger to accompany, and gives reader more preferable Interactive Experience.
This below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out it is clear,
Complete description, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Base
Embodiment in the present invention, those of ordinary skill in the art obtained under the premise of creative work is not made it is all its
His embodiment, belongs to the scope of protection of the invention.
Referring to Fig. 1, the method flow diagram dug for a kind of University Digital Library data of the embodiment of the present application;
Step S101:Obtain the information base data of library's Borrowing History;
Step S102:Preprocessed data;
The data of various forms, various sources, various geographical position, various characteristics physically or are in logic integrated
To together, the data acquisition system of a unified standard is formed.
Step S103:Mining data;
It should be noted that data mining has following characteristics:First, data source must be the real original application of magnanimity
Data;These real application data number of levelss are quite big, and there are many incomplete fuzzy data item, or even together
Sample also vicious item (having noise), this is just needed during data mining, carries out data prediction;2nd, data mining
Purpose be to find the knowledge for potentially having actual value.The purpose of data mining is for convenience it is found that hidden
The knowledge of Tibetan, the knowledge excavated easily should be realized and be employed.3rd, data mining has specific aim.To the greatest extent
The method of pipe data mining is diversified, and data mining is often analyzed and researched for a certain particular problem, is dug
It is specific specific to excavate the knowledge come.
Step S104:It is stored in linked database;
Step S105:Export book recommendation information.
Obtain reader's books interested.
Further, step S101:Obtain the information base data of library's Borrowing History;Wherein described information storehouse includes reading
Person's information bank and book information storehouse.
Further, the information reader library package includes:The classification of reader, the age of reader, reader specialty and
The hobby interests of reader.
Each reader that the reader of Borrowing System occurs or is browsed by Catalog Search system queries to library,
All it is potential service object, personalized ventilation system can be carried out to them.These potential service objects, have different special
Belong to classification (undergraduate, Master degree candidate, doctoral candidate, teacher, scientific research personnel, administrative staff, common teaching and administrative staff etc.), they
With different category attributes.
Further, the book information library package includes:The call number of books, the bar code of books, books title,
The author of the publishing house of books, the publication date of books and books.
Further, step S102:Preprocessed data;Including by with different-format, separate sources, different geographical positions
Put, the data that characteristic is different physically or are in logic integrated together, the data acquisition system of one unified standard of formation.
Further, step S102:Preprocessed data;It is clear including the data of mistake and careless mistake are carried out.
The initial data obtained from data source can have such-and-such mistake and careless mistake unavoidably.Such as in some tables
Library card attribute should be 14 integers, and some is but shown as 0, and the ratio that this data occupy in total data is especially small,
So will not be had an impact to the whole structure of data mining, therefore take the method directly deleted.
Further, step S102:Preprocessed data;Also include unified specificationization and handle all data, find data
Common trait, then find a suitable description method and stipulations conversion is carried out to data.
Further, step S103:Mining data;Including being excavated using packet Apriori algorithm.
The process that Apriori algorithm is associated rule digging to data is broadly divided into two steps:First, ceaselessly follow
Ring iterative, all frequent item sets are calculated, it is necessary that these obtained frequent item sets must are fulfilled for such a condition-support
The minimum support threshold value being previously set more than or equal to user;Second step, generated on the basis of these frequent item sets out
More than or equal to the rule of the min confidence of user's setting.Wherein search frequent item set is the core of Apriori algorithm, is accounted for whole
The overwhelming majority of the amount of calculation of algorithm.Apriori algorithm shortcoming:The frequent scanning of first pair of database, in circulating each time
Will scan database, cause sizable I/O expenses.Second generates substantial amounts of potential candidate.
Apriori algorithm improvement strategy has both direction, and one is the number for controlling scan database, and another is exactly to control
Make the scale of potential candidate.We just put forth effort to improve this algorithm in terms of the two, improve the effect of algorithm performs
Rate.For shortcoming one, the method for taking database to be grouped reduces the number of data record in scan database, reduces I/O
Expense.For shortcoming two, we take the method that first beta pruning reconnects, and Apriori algorithm is first to connect beta pruning again, packet
Apriori algorithm is acted in a diametrically opposite way, and so equivalent to the radix reduced before connecting, has deleted those nonmatching grids, institute
Can effectively reduce connection number, so as to enhance the efficiency of algorithm.
Database is grouped, i.e., when database scan for the first time, the occurrence number of each is counted,
1- item Candidate Set C1 are produced, then transaction database D is grouped according to the maximum number of affairs middle term, that is to say, that there are i
The set of the affairs of item is designated as Di, so as to which transaction database, D points have been N number of group of D1, D2... DN(N is the maximal term included
Number).When by frequent 1- item collections L1 generation candidate's 2- item Candidate Sets C2, during to C2 each candidate's item count, it is not necessary to scan whole
Individual database D, but only scan D2 to DN.By that analogy, the record number scanned every time is all being reduced.
Apriori algorithm is grouped, i.e., first beta pruning reconnects.Directly first connection can produce many non-frequent subsets.Packet
Apriori algorithm can avoid producing many non-frequent subsets.
As seen through the above technical solutions:The side that a kind of University Digital Library data provided in an embodiment of the present invention are dug
Method, more for college student quantity, school systems are various and complicated, and each individually subsystem has the data storage number of oneself
According to storehouse, these information are closed in respective subsystem, can not be interacted between subsystem database, and information can not be shared, can only
The simple inquiry of offer, addition, modification, deletion and statistical function, become qualified information island, using isolated island, lead
Cause the resource serious waste of school, the problems such as cannot get effective reasonable utilization.It is an object of the present invention to provide one kind
The method that University Digital Library data are dug, avoids above mentioned problem, largely correlation is borrowed using caused in digital library
Data, our useful informations interested are therefrom excavated, for teaching management and offer decision-making foundation of optimizing allocation of resources.
So far, although those skilled in the art will appreciate that detailed herein have shown and described multiple showing for the present invention
Example property embodiment, still, still can be direct according to present disclosure without departing from the spirit and scope of the present invention
It is determined that or derive many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognized
It is set to and covers other all these variations or modifications.
Claims (8)
1. a kind of University Digital Library data dig stubborn method, it is characterised in that methods described includes:
Step S101:Obtain the information base data of library's Borrowing History;
Step S102:Preprocessed data;
Step S103:Mining data;
Step S104:It is stored in linked database;
Step S105:Export book recommendation information.
2. according to the method for claim 1, it is characterised in that step S101:Obtain the information bank of library's Borrowing History
Data;Wherein described information storehouse includes information reader storehouse and book information storehouse.
3. according to the method for claim 2, it is characterised in that the information reader library package includes:The classification of reader, read
Age, the specialty of reader and the hobby interests of reader of person.
4. according to the method for claim 2, it is characterised in that the book information library package includes:The call number of books,
The bar code of books, the title of books, the publishing house of books, the author of the publication date of books and books.
5. according to the method for claim 1, it is characterised in that step S102:Preprocessed data;Including that will have not apposition
The different data of formula, separate sources, different geographical position, characteristic physically or are in logic integrated together, and form one
The data acquisition system of individual unified standard.
6. according to the method for claim 1, it is characterised in that step S102:Preprocessed data;Including by mistake and careless mistake
Data carry out it is clear.
7. according to the method for claim 1, it is characterised in that step S102:Preprocessed data;Also include unified specification
All data are handled, find the common trait of data, a suitable description method is then found and stipulations conversion is carried out to data.
8. according to the method for claim 1, it is characterised in that step S103:Mining data;Including using packet
Apriori algorithm is excavated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711077156.XA CN107741993A (en) | 2017-11-06 | 2017-11-06 | A kind of method of University Digital Library data mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711077156.XA CN107741993A (en) | 2017-11-06 | 2017-11-06 | A kind of method of University Digital Library data mining |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107741993A true CN107741993A (en) | 2018-02-27 |
Family
ID=61234056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711077156.XA Pending CN107741993A (en) | 2017-11-06 | 2017-11-06 | A kind of method of University Digital Library data mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107741993A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344320A (en) * | 2018-08-03 | 2019-02-15 | 昆明理工大学 | A kind of book recommendation method based on Apriori |
CN115794801A (en) * | 2022-12-23 | 2023-03-14 | 东南大学 | Data analysis method for mining chain relation of automatic driving accident cause |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183727A (en) * | 2014-05-29 | 2015-12-23 | 上海研深信息科技有限公司 | Method and system for recommending book |
CN105760547A (en) * | 2016-03-16 | 2016-07-13 | 中山大学 | Book recommendation method and system based on user clustering |
CN106202184A (en) * | 2016-06-27 | 2016-12-07 | 华中科技大学 | A kind of books personalized recommendation method towards libraries of the universities and system |
CN106649583A (en) * | 2016-11-17 | 2017-05-10 | 安徽华博胜讯信息科技股份有限公司 | Book borrowing data association rule analysis method based on SAS |
CN107153850A (en) * | 2016-03-04 | 2017-09-12 | 上海阿法迪智能标签系统技术有限公司 | The acquisition methods and system of books relevant information |
-
2017
- 2017-11-06 CN CN201711077156.XA patent/CN107741993A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105183727A (en) * | 2014-05-29 | 2015-12-23 | 上海研深信息科技有限公司 | Method and system for recommending book |
CN107153850A (en) * | 2016-03-04 | 2017-09-12 | 上海阿法迪智能标签系统技术有限公司 | The acquisition methods and system of books relevant information |
CN105760547A (en) * | 2016-03-16 | 2016-07-13 | 中山大学 | Book recommendation method and system based on user clustering |
CN106202184A (en) * | 2016-06-27 | 2016-12-07 | 华中科技大学 | A kind of books personalized recommendation method towards libraries of the universities and system |
CN106649583A (en) * | 2016-11-17 | 2017-05-10 | 安徽华博胜讯信息科技股份有限公司 | Book borrowing data association rule analysis method based on SAS |
Non-Patent Citations (1)
Title |
---|
司贯中等: "分组 Apriori 在图书借阅系统中的应用研究", 《微处理机》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109344320A (en) * | 2018-08-03 | 2019-02-15 | 昆明理工大学 | A kind of book recommendation method based on Apriori |
CN115794801A (en) * | 2022-12-23 | 2023-03-14 | 东南大学 | Data analysis method for mining chain relation of automatic driving accident cause |
CN115794801B (en) * | 2022-12-23 | 2023-08-15 | 东南大学 | Data analysis method for mining cause chain relation of automatic driving accidents |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Multi-database mining | |
Frank | Requirements for a database management system for a GIS. | |
CN106547809A (en) | Complex relation is represented in chart database | |
CN106294521A (en) | Date storage method and data warehouse | |
CN101853305A (en) | Method for establishing comprehensive agricultural environmental information database | |
CN104391908B (en) | Multiple key indexing means based on local sensitivity Hash on a kind of figure | |
Danping et al. | The data mining of the human resources data warehouse in university based on association rule | |
CN106020794A (en) | Layout method of complex page portal page | |
Wang et al. | Research and implementation of the customer-oriented modern hotel management system using fuzzy analytic hiererchical process (FAHP) | |
CN107741993A (en) | A kind of method of University Digital Library data mining | |
CN107679151A (en) | A kind of data processing method based on ELA big data flight deck systems | |
Kopáčková et al. | Decision support systems or business intelligence: what can help in decision making? | |
Wang et al. | Facilitating connectivity in composite information systems | |
Zhao et al. | Design of digital business center of enterprise project management system based on Information Technology | |
Tian | Application of artificial intelligence system in libraries through data mining and content filtering methods | |
CN106095443A (en) | A kind of API call mode method for digging based on C/C++ code library | |
Sweeney et al. | Teradata Data Mart Consolidation Return on Investment at GST | |
Yi et al. | Study the personal push service of university library based on big data mining | |
Li | Research on the Social Security and Elderly Care System under the Background of Big Data | |
Umadevi et al. | An Analytical Review on Big Data in a Diversified Approach | |
Huang et al. | Research on precision marketing of real estate market based on data mining | |
Renfro | Economic database systems: further reflections on the state of the art | |
Tou | Information systems | |
Jain | NSF workshop on visual information management systems: workshop report | |
Ruoxin et al. | Design of MICE service platform based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180227 |
|
RJ01 | Rejection of invention patent application after publication |