CN103744945A - Method for rapidly and accurately searching for target book by web crawler technology - Google Patents

Method for rapidly and accurately searching for target book by web crawler technology Download PDF

Info

Publication number
CN103744945A
CN103744945A CN201310754637.5A CN201310754637A CN103744945A CN 103744945 A CN103744945 A CN 103744945A CN 201310754637 A CN201310754637 A CN 201310754637A CN 103744945 A CN103744945 A CN 103744945A
Authority
CN
China
Prior art keywords
books
book
web crawlers
precisely
typing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310754637.5A
Other languages
Chinese (zh)
Inventor
朱龙腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Original Assignee
SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd filed Critical SHANGHAI BOSHI INFORMATION SCIENCE & TECHNOLOGY Co Ltd
Priority to CN201310754637.5A priority Critical patent/CN103744945A/en
Publication of CN103744945A publication Critical patent/CN103744945A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for rapidly and accurately searching for a target book by a web crawler technology. The process of the method comprises inputting electronic books and establishing an electronic library; classifying and placing the input books into different sub-webpages; inputting keywords of the book which needs to be read; crawling book webpages related to the target book by the web crawler technology; performing analysis on the crawled webpages; outputting filtered books after the analysis; selecting the target book to read. According to the method for rapidly and accurately searching for the target book by the web crawler technology, new electronic books can be input into the electronic library timely and accordingly the richness of the book classes of the books of the electronic library is ensured; the target book can be obtained rapidly and accurately by the web crawler technology; the book classes are not limited to the word class and also comprise the picture class, the video class and the like; the method is suitable for the electronic library and also suitable for an electronic book website and accordingly certain promotion is brought to the development of the electronic books.

Description

Utilize web crawlers technology quick and precisely to find the method for object books
Invention field
The present invention relates to a kind of method of accurately selecting fast object books in e-book process of reading, belong to networking technology area.
Background technology
Library automation, is the appearance along with electrotype thing, and the development of the network communications technology, engenders.Library automation, has that storage capacity is large, speed is fast, the holding time is long, cost is low, be convenient to the features such as interchange.CD this-massage storage, can store than the information of high several thousand times of traditional book, more much more than microfilm, and comprise image, video, sound, etc.Utilize electronic technology, in this kind of library, we can be soon from vast as the open sea books, find own needed information material.This library, the time of preserving quantity of information is much longer, does not have the problems such as mildew and rot, infested.Utilize network, far away from several a thousand lis, ten thousand li unit, family in, can use this books, efficiency is high.Marine at immense book, it is not so easy wanting to find fast and accurately target books, for we read e-book, has brought certain difficulty, has hindered the development of e-book.
Summary of the invention
The present invention solves the problem of fast searching object books in library automation at present, and a kind of method of utilizing web crawlers technology quick and precisely to find object books is provided.The present invention includes following steps:
Step 1: typing e-book is set up e-book database;
Step 2: the book classification of typing is put into different sub-pages;
Step 3: input needs the keyword of read books;
Step 4: utilize the web crawlers technology pair books webpage relevant with object books to capture;
Step 5: to capturing to such an extent that webpage is analyzed;
Step 6: the books after analyzing after output filtering, select object books to read.
Invention effect: the new e-book of typing that e-book database of the present invention can be instant, guaranteed books kind rich of these library automation books, the kind that Adoption Network crawler technology obtains the more fast accurate books of object books only limits to word class, also there is picture category, video class etc., the method is not only applicable to library automation, is applicable to e-book website yet, will bring certain promotion for the development of e-book.
Accompanying drawing explanation
Fig. 1 utilizes web crawlers technology quick and precisely to find the process flow diagram of object books method.
Embodiment
Embodiment: referring to utilizing web crawlers technology quick and precisely to find the process flow diagram 1 of object books method, present embodiment is comprised of following steps:
Step 1: typing e-book is set up e-book database;
Step 2: the book classification of typing is put into different sub-pages;
Step 3: input needs the keyword of read books;
Step 4: utilize the web crawlers technology pair books webpage relevant with object books to capture;
Step 5: to capturing to such an extent that webpage is analyzed;
Step 6: the books after analyzing after output filtering, select object books to read.
The e-book of typing must the kind by books be named it it, the sub-pages of the books of the typing kind of typing books has been given different domain names, the e-book that receiving rope needs need to be inputted the figure title of these books, or affiliated subject kind, web crawlers is the webpages that capture fast object books to the receipts rope of object books, books on webpage are being captured, and to capturing to such an extent that webpage is analyzed, are mainly analyze and input the immediate books of books.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and in the situation that not deviating from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard example as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, is therefore intended to all changes that drop in the implication and the scope that are equal to important document of claim to include in the present invention.Any Reference numeral in claim should be considered as limiting related claim.

Claims (6)

1. utilize web crawlers technology quick and precisely to find a method for object books, it is characterized in that it is realized by following steps:
Step 1: typing e-book, set up e-book database;
Step 2: the book classification of typing is put into different sub-pages;
Step 3: input needs the keyword of read books;
Step 4: utilize the web crawlers technology pair books webpage relevant with object books to capture;
Step 5: to capturing to such an extent that webpage is analyzed;
Step 6: the books after analyzing after output filtering, select object books to read.
2. according to a kind of method of utilizing web crawlers technology quick and precisely to find object books described in claims 1, it is characterized in that: the e-book of typing described in step 1 must the kind by books be named it it.
3. according to a kind of method of utilizing web crawlers technology quick and precisely to find object books described in claims 1, it is characterized in that: the sub-pages of the books of typing described in the step 2 kind of typing books has been given different domain names.
4. according to a kind of method of utilizing web crawlers technology quick and precisely to find object books described in claims 1, it is characterized in that: the e-book of receiving rope needs described in step 3 need to be inputted the figure title of these books, or affiliated subject kind.
5. according to a kind of method of utilizing web crawlers technology quick and precisely to find object books described in claims 1, it is characterized in that: web crawlers described in step 4 is the webpages that capture fast object books to the receipts rope of object books, the books on webpage are being captured.
6. according to a kind of method of utilizing web crawlers technology quick and precisely to find object books described in claims 1, it is characterized in that: described in step 5, to capturing to such an extent that webpage is analyzed, be mainly analyze and input the immediate books of books.
CN201310754637.5A 2013-12-31 2013-12-31 Method for rapidly and accurately searching for target book by web crawler technology Pending CN103744945A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310754637.5A CN103744945A (en) 2013-12-31 2013-12-31 Method for rapidly and accurately searching for target book by web crawler technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310754637.5A CN103744945A (en) 2013-12-31 2013-12-31 Method for rapidly and accurately searching for target book by web crawler technology

Publications (1)

Publication Number Publication Date
CN103744945A true CN103744945A (en) 2014-04-23

Family

ID=50501963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310754637.5A Pending CN103744945A (en) 2013-12-31 2013-12-31 Method for rapidly and accurately searching for target book by web crawler technology

Country Status (1)

Country Link
CN (1) CN103744945A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591992A (en) * 2012-02-15 2012-07-18 苏州亚新丰信息技术有限公司 Webpage classification identifying system and method based on vertical search and focused crawler technology
US8504555B2 (en) * 2008-06-25 2013-08-06 Microsoft Corporation Search techniques for rich internet applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8504555B2 (en) * 2008-06-25 2013-08-06 Microsoft Corporation Search techniques for rich internet applications
CN102591992A (en) * 2012-02-15 2012-07-18 苏州亚新丰信息技术有限公司 Webpage classification identifying system and method based on vertical search and focused crawler technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓丽君: "基于Deep Web的图书信息集成与查询系统", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
JP6505221B2 (en) Method and apparatus for providing multimedia content
JP6381002B2 (en) Search recommendation method and apparatus
US10878020B2 (en) Automated extraction tools and their use in social content tagging systems
CN102270206A (en) Method and device for capturing valid web page contents
CN105574062A (en) File retrieval method and apparatus and terminal
CN102682082B (en) Network Flash searching system and network Flash searching method based on content structure characteristics
CN104268283A (en) Method for automatically analyzing Internet web page
Mezaris et al. Real-life events in multimedia: detection, representation, retrieval, and applications
CN103473275A (en) Automatic image labeling method and automatic image labeling system by means of multi-feature fusion
US10331800B2 (en) Search results modulator
WO2022271319A1 (en) Smart summarization, indexing, and post-processing for recorded document presentation
Truong et al. Video search based on semantic extraction and locally regional object proposal
CN103744944A (en) Method for re-filtering in webpage or data crawling by web crawler
CN104090878A (en) Multimedia checking method, terminal, server and system
CN103744945A (en) Method for rapidly and accurately searching for target book by web crawler technology
CN111401047A (en) Method and device for generating dispute focus of legal document and computer equipment
CN111723177B (en) Modeling method and device of information extraction model and electronic equipment
CN104765885A (en) Expansion method and device for UGC library
KR101862178B1 (en) Method for customized posting and server implementing the same
CN106649337A (en) Method and device for identifying webpage column
CN106599002B (en) Topic evolution analysis method and device
CN104516941A (en) Related document search apparatus and method, and program
Rettberg et al. Mining the Knowledge Base: Exploring Methodologies for Analysing the Field of Electronic Literature
KR101434773B1 (en) Method and apparatus for displaying photo-tag cloud
Kirton et al. Digitisation and dissemination: A reverse image lookup study to assess the reuse of images of paintings from the National Gallery's website

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140423

WD01 Invention patent application deemed withdrawn after publication