CA2272983A1 - A text searching system for searching web pages according to a keyword and classification data provided by a user - Google Patents

A text searching system for searching web pages according to a keyword and classification data provided by a user Download PDF

Info

Publication number
CA2272983A1
CA2272983A1 CA002272983A CA2272983A CA2272983A1 CA 2272983 A1 CA2272983 A1 CA 2272983A1 CA 002272983 A CA002272983 A CA 002272983A CA 2272983 A CA2272983 A CA 2272983A CA 2272983 A1 CA2272983 A1 CA 2272983A1
Authority
CA
Canada
Prior art keywords
data
text
searching
keyword
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002272983A
Other languages
French (fr)
Inventor
Kuo-Jen Chao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tornado Technology Co Ltd
Original Assignee
Tornado Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tornado Technology Co Ltd filed Critical Tornado Technology Co Ltd
Priority to CA002272983A priority Critical patent/CA2272983A1/en
Priority to JP11173228A priority patent/JP2001014317A/en
Publication of CA2272983A1 publication Critical patent/CA2272983A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Abstract

The invention relates to a text searching system for searching web pages according to keyword and classification data provided by a user. The text searching system comprises a computer having a memory and a processor, a text data file having text data of web pages, a text index file having keyword searching data for searching keywords, a classification index file having classification data, and a searching program for searching text data which are matched with user provided keyword data and user provided classification data.

Description

Text searching system for searching web pages according to a keyword and classification data provided by a user BACKGROUND OF THE INVENTION
1. Field of the Invention The invention relates to a text searching system, and more particularly, to a text searching system for searching web pages according to a keyword and classification data provided by a user.
2. Description of the Prior Art As the number of web pages on the Internet increases, a searching system becomes necessary for searching the myriad of web pages for specific information.
Please refer to Fig.l. Fig.1 is a functional block diagram of a prior art searching system 10. The searching system 10 comprises a computer (not shown) , a text data file 16, a text index file 20, and a searching program 24. The computer comprises a memory 12 for storing programs and data and a processor 14 for executing the programs stored in the memory 12. The text data file 16, text index file 20, and searching program 24 are stored in the memory 12. The text data file 16 has text data 18 of web pages of a plurality of world wide web sites. The text index file 20 has keyword searching data 22 for searching keywords contained in the text data 18 of each of the web pages of the text data file 16. The searching program 24 is used for searching the text index file 20 according to keyword data provided by a user so as to find text data 18 of the all web pages having the user provided keyword data in the text data file 16.

Please refer to Fig.2. Fig.2 is a perspective diagram of the keyword searching data 22 in Fig.l. The keyword searching data 22 of the text index file 20 is built according to the text data 18 of the text data file 16. Each keyword searching data 22 has a keyword 21 and address data 23 of the keyword 21 in all web pages. As shown in Fig.2, the address data of the keyword "world" in all web pages are al, a2, a3...; the address data of the keyword "world wide web" in all web pages are cl, c2, c3. . . . When the user inputs a keyword, the searching program 24 searches the text index file 20 according to the keyword provided by the user to find the keyword searching data 22 corresponding to the keyword for getting the address data of the keyword in all web pages . Finally, the text data file 16 is used for transmitting the text data 18 of all web pages having the keyword to the user.
Because the prior art searching system 10 uses a keyword for searching web pages, the text data of all web pages containing the keyword are returned. This takes an excessive amount of time to transmit. In searching for the web pages within a specific classification, the searching system 10 transmits the text data of all the web pages containing the keyword to the user but most of the transmitted web pages are not well matched with the user provided classification.
Therefore, more time must be spent searching and transmitting.
For example, if the user wants to search for web pages of movies containing references to "tornado", the searching system 10 will transmit the text data of all web pages containing the word "tornado" to the user. However, these transmitted web pages will include irrelevant pages concerning unrelated topics such as meteorology, history, and news . Therefore, more time must be spent manually selecting the pages that are actually pertinent.
SUMMARY OF THE INVENTION
It is therefore a primary objective of the present invention to provide a text searching system for searching web pages according to a keyword and classification data provided by a user to solve the mentioned problem.
Briefly, in a preferred embodiment, the present invention provides a text searching system comprising:
a computer having a memory for storing programs and data and a processor for executing the programs stored in the memory;
a text data file stored in the memory having text data of web pages of a plurality of world wide web sites;
a text index file stored in the memory having keyword searching data forsearching keywordscontainedin the text data of each of the web pages of the text data file;
a classification index file stored in the memory having classification data corresponding to the classification of each of the web pages of the text data file; and a searching program stored in the computer for searching the text index file and the classification index file according to keyword and classification data provided by a user so as to find text data which are matched with the user provided keyword data and contained in a plurality of target web pages whose classifications are matched with the user provided classification data in the text data file.
3 It is an advantage of the present invention that the text searching system according to the present invention uses a keyword and classification data provided by the user for finding all the web pages which belong to the classification data and have the keyword. The text searching systems only transmit the text data of all the web pages belonging to the classification data and having the keyword to the user. Therefore, the searching and transmission time is greatly reduced thus making the text searching system more efficient.
These and other obj ects and the advantages of the present invention will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig.l is a functional block diagram of a prior art searching system.
Fig.2 is a perspective diagram of the keyword searching data in Fig. 1.
Fig.3 is a functional block diagram of a text searching system according to the present invention.
Fig. 4 is a perspective diagram of another text searching system according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Please refer to Fig.3. Fig.3 is a functional block diagram
4 of a text searching system 30 according to the present invention. The text searching system 30 comprises a computer (not shown), a text data file 36, a text index file 40, a classification index file 44, and a searching program 48. The computer comprises a memory 32 for storing programs and data and a processor 34 for executing the programs stored in the memory 32. The text data file 36, text index file 40, classification index file 44 and searching program 48 are stored in the memory 32. The text data file 36 has text data 38 of web pages of a plurality of world wide web sites. The text index file 40 has keyword searching data 42 for searching keywords contained in the text data 38 of each of the web pages of the text data file 36. The classification index file 44 has classification data 46 corresponding to the classification of each of the web pages of the text data file 36. The searching program 48 is used for searching the text index file 40 and the classification index file 44 according to keyword and classification data provided by a user so as to find text data 38 which are matched with the user provided keyword data and contained in a plurality of target web pages whose classifications are matched with the user provided classification data in the text data file 36.
The keyword searching data 42 of the text index file 40 is built according to the text data 38 of the text data file 36. Each keyword searching data 42 has a keyword and address data of the keyword in all web pages. Each classification data 46 of the classification index file 44 has a plurality of classifications 54, and each classification 54 has web page data 50 of all the web pages belonging to the classification.
Each web page data 50 comprises a keyword position indexing data 52 of the web page. The keyword position indexing data
5 52 is used for pointing to the positions of the keyword searching data 42 of the specific web page contained in the text index file 40.
When a user inputs keyword and classification data, the searching program 48 searches the classification index file 44 according to the classification data provided to find the web page data 50 of all web pages belonging to the classification data. Then, the searching program 48 searches the position of the keyword searching data 42 of the text data 38 of each web page in the text index file 40 according to the keyword position indexing data 52 of the web page data 50. Then, the searching program 48 searches the keyword searching data 42 of all web pages belonging to the classification data in the text index file 40 according to the keyword provided by the user to find the text data 38 of all web pages which belong to the classification data and have the keyword. Finally, the text data file 36 is used for transmitting the text data 38 of all web pages belonging to the classification data and having the keyword to the user.
Please refer to Fig.4. Fig.4 is a perspective diagram of another text searching system 60 according to the present invention. The classification index file 62 of the text searching system 60 contains the classification data 64 of the web pages of each keyword searching data 42 in the text index file 40. When a user inputs keyword and classification data, the searching program 66 searches the text index file 40 according to the keyword provided to find all the keyword searching data 42 matched with the user provided keyword data and the address data of the keyword in all the web pages . Then, the searching program 66 searches the classification index
6 file 62 according to the keyword searching data 42 to find the classification data 64 of the web page of each matched keyword searching data 42. The searching program 66 finds all keyword searching data 42 belonging to the classification data according to the classification data provided by the user to find the text data 38 of all web pages which belong to the classification data and have the keyword. Finally, the text data file 36 is used for transmitting the text data 38 of all web pages belonging to the classification data and having the keyword to the user.
The text searching system 30 uses the classification index file 44 to find all web pages belonging to the classification data provided by the user, and then uses the text index file 40 and the keyword provided by the user to find all the web pages belonging to the classification data and having the keyword. The text searching system 60 uses the text index file 40 to find all web pages having the keyword provided by the user, and then uses the classification index file 62 and the classification data provided by the user to find all the web pages belonging to the classification data and having the keyword.
Compared with the prior art searching system 10, the text searching systems 30, 60 according to the present invention use keyword and classification data provided by the user and finds all the web pages that belong to the classification data and have the keyword. The text searching systems 30, 60 transmit only the text data of all the web pages belonging to the classification data and having the keyword to the user.
Therefore, the searching and transmission time is greatly reduced and the text searching system is more efficient.
7 Those skilled in the art will readily observe that numerous modifications and alterations of the propeller may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
8

Claims (4)

What is claimed is:
1. A text searching system comprising:
a computer having a memory for storing programs and data and a processor for executing the programs stored in the memory;
a text data file stored in the memory having text data of web pages of a plurality of world wide web sites;
a text index file stored in the memory having keyword searching data for searching keywords contained in the text data of each of the web pages of the text data file;
a classification index file stored in the memory having classification data corresponding to the classification of each of the web pages of the text data file; and a searching program stored in the computer for searching the text index file and the classification index file according to keyword and classification data provided by a user so as to find text data which are matched with the user provided keyword data and contained in a plurality of target web pages whose classifications are matched with the user provided classification data in the text data file.
2. The text searching system of claim 1 wherein the classification index file contains a plurality of classifications and the web page data of all the web pages belonging to each of the classifications, and wherein the searching program searches the classification index file to find all the target web pages whose classifications are matched the user provided classification data, and then searches the text index file to find text data which are matched with the user provided keyword data and contained in the target web pages of the text data file.
3. The text searching system of claim 2 wherein the web page data of each specific web page in the classification index file contain keyword position indexing data for pointing the positions of the keyword searching data of the specific web page contained in the text index file, and wherein the searching program searches the classification index file to find the positions of the keyword searching data of the target web pages in the text index file, and then searches the keyword searching data of the target web pages to find the text data which are matched with the user provided keyword data and contained in the target web pages of the text data file.
4. The text searching system of claim 1 wherein the classification index file contains the classification of the web page of each keyword searching data in the text index file, and wherein the searching program searches the text index file to find all the keyword searching data matched with the user provided keyword data, and then searches the classification index file to find the classification of the web page of each matched keyword searching data so as to locate the keyword searching data of the target web pages, and finally finds the text data contained in the text data file using the keyword searching data of the target web pages.
CA002272983A 1999-05-20 1999-05-20 A text searching system for searching web pages according to a keyword and classification data provided by a user Abandoned CA2272983A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA002272983A CA2272983A1 (en) 1999-05-20 1999-05-20 A text searching system for searching web pages according to a keyword and classification data provided by a user
JP11173228A JP2001014317A (en) 1999-05-20 1999-06-18 Text retrieval system for retrieving web page with keyword and classification data provided by user

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CA002272983A CA2272983A1 (en) 1999-05-20 1999-05-20 A text searching system for searching web pages according to a keyword and classification data provided by a user
JP11173228A JP2001014317A (en) 1999-05-20 1999-06-18 Text retrieval system for retrieving web page with keyword and classification data provided by user

Publications (1)

Publication Number Publication Date
CA2272983A1 true CA2272983A1 (en) 2000-11-20

Family

ID=32094408

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002272983A Abandoned CA2272983A1 (en) 1999-05-20 1999-05-20 A text searching system for searching web pages according to a keyword and classification data provided by a user

Country Status (2)

Country Link
JP (1) JP2001014317A (en)
CA (1) CA2272983A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100520428B1 (en) * 2005-01-13 2005-10-11 엔에이치엔(주) Method and system for managing various kinds of keywords by interworking the keywords depending on user authentication

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0962705A (en) * 1995-08-25 1997-03-07 Sony Corp Recording medium and reproducing device
JPH09265482A (en) * 1996-01-26 1997-10-07 Mitsubishi Electric Corp Database retrieval device and database retrieval method

Also Published As

Publication number Publication date
JP2001014317A (en) 2001-01-19

Similar Documents

Publication Publication Date Title
US8095487B2 (en) System and method for biasing search results based on topic familiarity
US9218397B1 (en) Systems and methods for improved searching
US9165033B1 (en) Efficient query rewriting
US5978833A (en) Method and apparatus for accessing and downloading information from the internet
US6226630B1 (en) Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders
US7254580B1 (en) System and method for selectively searching partitions of a database
US6304872B1 (en) Search system for providing fulltext search over web pages of world wide web servers
US9298721B2 (en) Prioritized search results based on monitored data
US20110179002A1 (en) System and Method for a Vector-Space Search Engine
US20050080783A1 (en) Universal interface for retrieval of information in a computer system
EP1056024A1 (en) Text searching system
US20070162481A1 (en) Pattern index
US20030023582A1 (en) Identifying links of interest in a web page
US8959077B2 (en) Multi-layer search-engine index
WO2009114228A1 (en) Graph-based keyword expansion
CN1839386A (en) Internet searching using semantic disambiguation and expansion
US20040267722A1 (en) Fast ranked full-text searching
KR20080024156A (en) Back-off mechanism for search
US8745062B2 (en) Systems, methods, and computer program products for fast and scalable proximal search for search queries
US11360958B2 (en) Techniques for indexing and querying a set of documents at a computing device
WO2011011063A2 (en) Method and system for document indexing and data querying
CN110546633A (en) Named entity based category tag addition for documents
US9135328B2 (en) Ranking documents through contextual shortcuts
CN102937991A (en) Search navigation system and method
US7472115B2 (en) Contextual flyout for search results

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued