WO2003005240A1 - Apparatus for searching on internet - Google Patents
Apparatus for searching on internet Download PDFInfo
- Publication number
- WO2003005240A1 WO2003005240A1 PCT/NO2002/000244 NO0200244W WO03005240A1 WO 2003005240 A1 WO2003005240 A1 WO 2003005240A1 NO 0200244 W NO0200244 W NO 0200244W WO 03005240 A1 WO03005240 A1 WO 03005240A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- web
- web pages
- list
- unit
- server
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Definitions
- the invention relates to publishing and searching on the World Wide Web, in short web.
- the web contains a large number of web pages, provided by content providers and stored on web servers connected together on the internet. End users often use search engines for finding information. Typical examples of search engines are Fast, AltaVista, Google and Yahoo.
- search engines are typically based on submission or crawling, or a combination of these two principles, for providing data for a search index. This search index is then the basis for serving search requests from end users.
- URLs are submitted by end users or content providers, or fetched from directory services like Open Directory. The web pages corresponding to the URLs are then downloaded and used as basis for a the search index.
- Crawling based search engines are based on a crawler that starts on chosen start web pages, downloads these, finds web pages that are referenced on the downloaded pages and downloads the referenced pages, and so on.
- the downloaded pages are used as basis for a search index.
- the start pages can be submitted pages.
- crawling is typically used. This means that the crawling must be repeated regularly, leading to heavy load on transmission lines, on the web servers of the content providers and on the computers at the search engine. Because of these costs, the crawling of the presumably interesting part of the web is often restricted to be done at a speed corresponding to an average revisit period from from a few weeks to a few months. Therefore, news can be available for some time at the web servers before being found by the search ingenes. New pages that are not linked to, can remain unfound.
- search engines trying to crawl a significant part of the web. It is an object of the invention to provide search engines with a new way of updating their indices based on what is available on the internet.
- Fig. 1 a illustrates a search engine based on prior art.
- Fig. 1 b illustrates a search engine based on the present innovation.
- Fig. 2 shows a detailed example of information that can flow from web servers to a central change server and further to search engines.
- Fig. 3 shows several details of the inner structure for an agent.
- Figs. 4 a and b illustrate ways of balancing the communication load for a central site.
- Fig. 5 shows an example of a configuration file for the agent.
- Fig. 6 illustrate one way of interconnecting search engines and cache servers based on the present innovation.
- Fig. 7 shows a flexible, error tolerant, scalable architecture for data flow in a web change server.
- Figs. 8 a and b illustrate two possible arrangements for building a list over documents existing on the web.
- the basic principle of the invention is that local agents are installed on or near web servers or groups of web servers, and these agents detect and transmit changes to a central web change server. Web changes are then communicated further to search engines, giving the search engines a basis for downloading only pages that are new or modified, and allowing them to remove parts of the index corresponding to deleted web pages.
- the basic principle is illustrated both for use within a search engine and for providing a standalone change directory service that can serve several search engines.
- Web pages 12 14 are stored on a web server 10.
- a search engine 20 has a crawler 22.
- the crawler 22 reads through the web pages it finds, and produces a index 24.
- a user on an end user machine 30 searches the index 24 with a local browser 32.
- Wanted documents 14 are downloaded from the web server 10.
- Fig. 1b shows a search engine based on the present invention.
- Web pages 52 are stored on a web server 50.
- An agent 54 crawls the web pages 52. For each crawl, a log file 56 is generated. This log file 56 is used for change detection.
- Information about modifications, covering new, changed and deleted pages, is sent from the agent 54 to a loader 62 in a search engine 60.
- the search engine 60 updates an index 64 based on the results from crawling new and changed pages, and deletes index entries based on deleted pages.
- the index 64 can then be queried and the results presented in a browser 72 on an end user machine 70 in the traditional way.
- the amount of modified web pages on a web server will usually be significantly less than the total amount of web pages on the server. Therefore, much less data will be transmitted, and the work load for the search engine will be much less when using the present innovation. Also, the agent can transmit the modification list to the search engine soon after modifications have been made, instead of waiting for the search engine to find the modifications during crawling some time.
- One way is to implement a service or demon that executes regularly according to a time interval and a first starting time, or some other schedule. Another way is to let administrators start the agent manually. Yet another way is to implement the agent so that it can be started from a script, possibly synchronized with other executing on web servers.
- the web change server can be a componenf of a search engine.
- the web change server can be used as a stand alone service, serving several search engines.
- the latter has the advantage of allowing publishing data on several search engines while only installing and maintaining one agent.
- Fig. 2 shows an example of information that can be transmitted.
- Two agents each residing on a web server, sends two modification lists 10 20 containing URLs referring to modified web pages to a central server where the lists 10 20 are combined to one aggregated modification list 30.
- Various extracts 40 50 can so be made from this aggregated list 30 and can be sent to various search engines.
- new web pages are marked with '+'
- changed web pages are marked with '!'
- deleted web pages are marked with '-'.
- the aggregated list can be produced by concatenating incoming modification lists in order of arrival.
- Fig. 3 shows one preferred embodiment for detecting modifications.
- An agent 10 is installed on a web server 5.
- a crawler 15 crawls the web pages 25 on the web server 5.
- a log 35 is made based on the crawling.
- a change detector 40 compares the log 35 from the newest crawling with the log 45 from the previous crawling. The differences between these logs are summarized in a modification list and transmitted.
- One preferred embodiment for the log is to make this as a table with one row for each web page, one column with URL for each web page and one column with a check sum for each web page.
- the change detector 50 can then compare the newest log 40 with the previous log 60 and report changes as follows: A URL which is present only in the newest log is reported as a new page, a URL which is present only in the previous log is reported as a deleted page, and a URL which is present in both lists but with different checksums is reported as a changed page. URLs which are present in both logs 40 60 are not reported.
- the checksum is generated based on Exclusive Or (XOR) of groups of characters on the web page.
- XOR Exclusive Or
- checksums can be used.
- the method should preferable generate a relatively short checksum, e.g. from 16 to 128 bits, and at the same time it should be a relatively low probability that two web pages with different contents are assigned the same checksum.
- the same content may be represented by several URLs. These web pages are called duplicates. One such case is when aliases are used.
- the crawl log may be sorted or accessed in checksum order. In cases where two or more pages have the same checksum, one of the URLs can be discarded.
- the selection of which web page to report in case of duplicates can be done using a fixed rule.
- the shortest URL can be used, and in case two or more URLs have the same length, the first in alfabetic order can be used.
- Some web pages have mostly static content but also contain some minor automatically changing part, e.g. a clock or a hit counter. Such pages should not be reported as always changing, because this would cause a search engine to download and reindex all such pages each time "changes" are reported, resulting in unnecessary load on servers and network.
- This problem can be solved in several different ways.
- One way is to take certain elements of such pages out from checksum calculation. E.g, all instances of strings of the forms "99:99:99” or “99/99-9999" can be replaced by blanks before or during checksum generation. It should be possible to control this using a configuration tool, including controlling exactly which strings should be taken out.
- An administrator could be presented a menu of examples of strings to take away. Alternatively, the administrator could be given the possibility of specifying such strings, e.g. by using regular expressions as used by the "grep" command in UNIX.
- error message pages corresponding to missing pages also called " dead links.
- the HTTP protocol allows missing pages to be reported with error message 404, without further content.
- a crawler can detect this and stop further actions.
- some web servers are programmed to respond with an error message containing an element such as "Sorry, the page AAAA.htm was not found".
- every URL that leads to the web server but that does not refer to an existing web page would result in a unique web page, which in malign cases could lead to an endless amount of web pages, overloading the agent and/or the search engines.
- Such cases can be solved by removing self references from the web pages before or during check sum generation.
- some pages may be dynamically generated. For such cases, it should be possible to implement the agent so that for some pages, both new, changed and deleted pages are reported, while for other pages, only new and deleted pages are reported.
- One possibility is to discriminate based on file type, e.g. so that pages whose URL has the extension ".html” are checked for all types of modifications, while pages whose URL has the extension ".asp" are only checked with respect to new and deleted pages.
- Another possibility of discrimination is to base this on folders.
- some publishing tool is used for preparing web sites. For such cases, it may be more efficient to base the list of modifications directly on results from the publishing tool instead of crawling, or the two methods may be used in combination.
- This can be a list of existing URLs, so that the agent would perform change detection and transmit afterwards. Alternatively, it can be a list of modifications, suitable for direct transmission.
- the URLs reported by the agent should be the same as the URLs as seen from the perspective of a search engine and end user. This is a major reason for using the HTTP protocol for crawling, instead of using the FILE protocol. Therefore, the HTTP protocol will be a natural choice in most cases. However, advantages of the FILE protocol may be significant in some cases, like faster execution or not depending on a web server.
- the agent should have a mechanism for converting FILE-based URLs into HTTP- based URLs that can be used from outside.
- the agent must transmit the modification list back to the central site.
- Possible protocols include FTP (File Transfer Protocol), mail (e.g. SMTP) and HTTP (Hypertext Transfer Protocol).
- FTP File Transfer Protocol
- mail e.g. SMTP
- HTTP Hypertext Transfer Protocol
- the agent takes the initiative to communication by either starting an FTP session, sending an email or by issuing an HTTP request. This results in efficient use of the communication channel, in that communication is only initiated when there is something to communicate.
- the central site could initiate the communication.
- One advantage of this is that the central site could achieve a better load distribution over time.
- the agent should be authenticated both when using FTP, email and HTTP. Authentication can be divided into two phases: Authenticating an administrator and corresponding agent when registering, and later authenticating submissions supposedly coming from the given agent.
- the authenticating made during registration can be made manually. Alternatively, automatic support for the process can be added.
- the connection between crawl area and the name and email address of the administrator can be ensured using lookup in a Whois-database. Validity of the email address can be ensured by sending an email to the given address and requesting an answer. Subsequent authentication of submissions can be done using Public Key Encryption, using key pairs generated during registration.
- One method for building a cache on internet is to provide a mechanism that copies web pages from a content provider, stores these web pages on one or more intermediate locations, and deliver these pages on request.
- Such a caching service needs a mechanism for ensuring cache coherency, that is ensuring that the copy delivered to users is functionally identical to the original web page residing at the web server.
- One traditional method is based on HTTP headers: Each time a web page is requested, a caching server fetches the corresponding header from the original web server. If the header is identical to the header stored at the cache server, then the rest of the web page is served from the cache server. This method relies on correctly generated HTTP headers, which cannot always be ensured. This method further relies on communication with the content provider for each web page to be delivered, which results in unwanted network traffic. By employing the methods disclosed in the present application, both problems can be reduced.
- the modification list can be used as basis for indicating which web pages can be served from the cache and which web pages have to be refetched from the original web servers.
- a caching service might be able to report hit count for each web page. This can be reported back to the web change server and distributed further to search engines and other interested participants. Hit counts can be valuable for the search engines for selecting which pages to download and index, and also for ranking results to be presented to end users.
- An agent 10 on a web server 15 reports modifications to a web change server 20.
- the web change server 20 further sends URLs to a cache server 25 and a search engine 30. Both will download the modified pages from the web server 15, enabling them to deliver search results and web pages respectively.
- the cache server 25 will maintain hit rates, which are reported back to the web change server 20 and are further reported to the search engine 30, thereby allowing improved search result ranking.
- several different business models are possible.
- Content providers could be requested to pay for receiving a more efficient way to publish their content than what is otherwise possible. Payment could be calculated e.g. based on number of URLs submitted, or based on size of the monitored web site.
- Infra structure vendors like communication, hosting or caching companies could be requested to pay for reducing stress on their infra structure or for adding functionality to their customers. Payment could be based on estimated or measured reduced stress of their infrastructure, or by splitting income that such vendors might receive from their customers for the added functionality.
- Search engines could be requested to pay for improving the quality of their indices or for reducing their communication costs. Payment could be based on number of URLs received or exploited.
- search engines are specialized on category. For such search engines, it is relevant to subscribe to data within the selected categories only.
- a category can be assigned to the crawl area at the time of registration.
- a user can select category from a list or from a tree structure.
- OpenDirectory is one example of a category tree structure that can be used.
- each URL is assigned one or more categories.
- One data format that is useful for URL level categorization is to add a category column to the modification list. This column could then be filled with category codes according to a given list or tree structure.
- the category column can be based on configuration data entered by a web server administrator. E. g, all content within a given folder may be assigned a given category.
- Categorization may also be based on data or metadata.
- the agent can look for given keywords in the header or body part of the web pages.
- search engines could do the same with language. This can be handled in a similar way as with categories.
- a language could be registered at crawl area level.
- a language code can be carried in the modification lists. This language code can be configured, based on data, based on meta data, or otherwise supplied by an administrator.
- the agent can be allowed to run without limitation regarding processor load or network traffic.
- the agent In cases where the agent competes with other processes, it may be advantageous to limit the use of resources. E. g., if the agent is executed on the same computer as a web server program, then the web server performance might be degraded while the agent is executing. For such cases, the agent should be limited with respect to resource usage.
- One way is to limit the HTTP requests to a given number of pages or kilobytes per second or minute. Another way is to limit the percentage of CPU time used. Another way is to limit the amount of RAM used. Yet another way is to limit the amount of disk used. There should be a possibility to set such limits during configuration.
- Fig. 7 shows a scalable and error tolerant architecture for a back end system handling modification lists.
- FTP servers 00 05 accept incoming FTP sessions from agents transferring modification lists.
- the modification lists are stored on disk 10 15. As long as there is at least one FTP server running, the agents will be able to transmit their modification lists.
- Each FTP server is essentially independent of the rest of the architecture, making them robust to failures in the rest of the system.
- Aggregation servers 20 25 read modification lists and store them in aggregated modification lists 30 35.
- the aggregation servers can also authenticate the modification lists relative to crawl area registered in a database 70.
- Each aggregation server 2025 can read modification lists from disks 10 15 of several FTP servers 00 05. Therefore, the overall system will still function when one or more aggregation server is out of order, as long as one aggregation server still works.
- Extract servers 4045 extract data from the aggregated modification lists, based on extract profiles stored in the database 70, again storing on disk 50 55.
- Playout servers 60 65 distribute extracted data to the respective subscribers.
- the playout servers 60 65 can be FTP servers, email clients, HTTP servers or other means for communicating with subscribers.
- the disks at each stage serve as buffers. If one stage stops or starts running slowly, then the disks will buffer the results from the previous stage until the stage is operating again.
- Fig. IBalance! a shows one method for scaling and load balancing.
- Various crawlers 10 20 30 each have a list 1525 35 of prioritized addresses for FTP servers 40 50. When a crawler 10 tries to contact an FTP server, it chooses a prioritized address. If no contact can be made, the next address on the list can be tried.
- scaling and load balancing can be made using Network Address Translation, abbreviated NAT.
- NAT Network Address Translation
- search engines might also want to have a list of which web pages are available on the internet at a given time.
- One example is when a new search engine is established. Such a new search engine might then need a list of available web pages to start its index, to have a baseline for later modifications. A list of web pages available on the internet will from now be called a baseline list.
- Fig. 8 a shows a way of integrating a baseline database 15 in the pipeline described in fig. 7.
- An FTP server 05 is connected to a network 00, receiving modification lists.
- the modification lists are aggregated by an aggegation server 10, inserting URLs into an aggregated modification list 20 and also consolidating into a baseline list 25.
- An extract server 30 and a playout server 35 then handles data further, distributing to subscribers over a network 40.
- the method for maintaining a BaseLine list as shown in fig. IBasseLine! a is well suited for real time operation for small or medium amounts of data.
- a file based version for batch based operation is illustrated in fig. 8 b.
- An FTP server 55 receives modification lists over a network 50.
- An aggregation server 60 aggregates the data and stores them into an aggregated modification list 65.
- Batches of the modification lists are collected and sorted by a Consolidator module 70. The batches are sorted, and the sorted batches and a previous version of the baseline list 80 are read in parallel, the results are consolidated in a merge process, and the data are written to a new version 85 of the baseline database. Extract 85 and playout 90 servers can then handle the data further for final distribution over a network 95.
- Fig. 9 illustrates a search mechanism, where indexing, searching and ranking is outsourced.
- a web server 10 has a number of web pages 15, among these a search page 20.
- An agent 25 reports modifications to a web change server 30.
- the results are reported to a search engine 35, which has a module 40 for downloading and indexing, producing an index 45.
- the search engine has a query motor 50.
- the query is sent to the query motor 50 on the search engine 35, results are generated based on the index 45, and the results are returned back for display on the search page 20.
- the agent can transmit modifications to the central site.
- the agent can transfer a complete list of URLs found on the web site to the central site, and the modifications can be computed at the central site.
- This solution results in an agent with less complexity since operations are carried out on the central site instead of in the agent.
- this also results in more network traffic, since complete lists of URLs have to be transmitted, instead of just modifications.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Control And Other Processes For Unpacking Of Materials (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP02736301A EP1412878A1 (en) | 2001-07-03 | 2002-07-02 | Apparatus for searching on internet |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NO20013308 | 2001-07-03 | ||
NO20013308A NO20013308L (en) | 2001-07-03 | 2001-07-03 | Device for searching the Internet |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003005240A1 true WO2003005240A1 (en) | 2003-01-16 |
Family
ID=19912636
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/NO2002/000244 WO2003005240A1 (en) | 2001-07-03 | 2002-07-02 | Apparatus for searching on internet |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1412878A1 (en) |
NO (1) | NO20013308L (en) |
WO (1) | WO2003005240A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005020104A1 (en) * | 2003-08-18 | 2005-03-03 | Sap Aktiengesellschaft | User-requested search or modification of indices for search engines |
GB2417342A (en) * | 2004-08-19 | 2006-02-22 | Fujitsu Serv Ltd | Indexing system for a computer file store |
EP2223202A1 (en) * | 2007-11-02 | 2010-09-01 | Paglo Labs Inc. | Hosted searching of private local area network information with support for add-on applications |
US8140507B2 (en) | 2007-07-02 | 2012-03-20 | International Business Machines Corporation | Method and system for searching across independent applications |
WO2014008468A2 (en) * | 2012-07-06 | 2014-01-09 | Blekko, Inc. | Searching and aggregating web pages |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
US10346483B2 (en) * | 2009-10-02 | 2019-07-09 | Akamai Technologies, Inc. | System and method for search engine optimization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5855020A (en) * | 1996-02-21 | 1998-12-29 | Infoseek Corporation | Web scan process |
US5974455A (en) * | 1995-12-13 | 1999-10-26 | Digital Equipment Corporation | System for adding new entry to web page table upon receiving web page including link to another web page not having corresponding entry in web page table |
US6219818B1 (en) * | 1997-01-14 | 2001-04-17 | Netmind Technologies, Inc. | Checksum-comparing change-detection tool indicating degree and location of change of internet documents |
WO2001027793A2 (en) * | 1999-10-14 | 2001-04-19 | 360 Powered Corporation | Indexing a network with agents |
-
2001
- 2001-07-03 NO NO20013308A patent/NO20013308L/en not_active Application Discontinuation
-
2002
- 2002-07-02 EP EP02736301A patent/EP1412878A1/en not_active Withdrawn
- 2002-07-02 WO PCT/NO2002/000244 patent/WO2003005240A1/en not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5974455A (en) * | 1995-12-13 | 1999-10-26 | Digital Equipment Corporation | System for adding new entry to web page table upon receiving web page including link to another web page not having corresponding entry in web page table |
US5855020A (en) * | 1996-02-21 | 1998-12-29 | Infoseek Corporation | Web scan process |
US6219818B1 (en) * | 1997-01-14 | 2001-04-17 | Netmind Technologies, Inc. | Checksum-comparing change-detection tool indicating degree and location of change of internet documents |
WO2001027793A2 (en) * | 1999-10-14 | 2001-04-19 | 360 Powered Corporation | Indexing a network with agents |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005020104A1 (en) * | 2003-08-18 | 2005-03-03 | Sap Aktiengesellschaft | User-requested search or modification of indices for search engines |
GB2417342A (en) * | 2004-08-19 | 2006-02-22 | Fujitsu Serv Ltd | Indexing system for a computer file store |
US8140507B2 (en) | 2007-07-02 | 2012-03-20 | International Business Machines Corporation | Method and system for searching across independent applications |
EP2223202A1 (en) * | 2007-11-02 | 2010-09-01 | Paglo Labs Inc. | Hosted searching of private local area network information with support for add-on applications |
EP2223202A4 (en) * | 2007-11-02 | 2014-02-05 | Paglo Labs Inc | Hosted searching of private local area network information with support for add-on applications |
US10346483B2 (en) * | 2009-10-02 | 2019-07-09 | Akamai Technologies, Inc. | System and method for search engine optimization |
WO2014008468A2 (en) * | 2012-07-06 | 2014-01-09 | Blekko, Inc. | Searching and aggregating web pages |
WO2014008468A3 (en) * | 2012-07-06 | 2014-03-20 | Blekko, Inc. | Searching and aggregating web pages |
US9767206B2 (en) | 2012-07-06 | 2017-09-19 | International Business Machines Corporation | Searching and aggregating web pages |
US11630875B2 (en) | 2012-07-06 | 2023-04-18 | International Business Machines Corporation | Searching and aggregating web pages |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
Also Published As
Publication number | Publication date |
---|---|
NO20013308D0 (en) | 2001-07-03 |
NO20013308L (en) | 2003-01-06 |
EP1412878A1 (en) | 2004-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6636854B2 (en) | Method and system for augmenting web-indexed search engine results with peer-to-peer search results | |
EP1706832B1 (en) | Improved user interface | |
US9703885B2 (en) | Systems and methods for managing content variations in content delivery cache | |
US7093012B2 (en) | System and method for enhancing crawling by extracting requests for webpages in an information flow | |
KR100781725B1 (en) | Method and system for peer-to-peer authorization | |
US6360215B1 (en) | Method and apparatus for retrieving documents based on information other than document content | |
US7200665B2 (en) | Allowing requests of a session to be serviced by different servers in a multi-server data service system | |
JP3990115B2 (en) | Server-side proxy device and program | |
US8346753B2 (en) | System and method for searching for internet-accessible content | |
US6625624B1 (en) | Information access system and method for archiving web pages | |
JP4846922B2 (en) | Method and system for accessing information on network | |
JP4704750B2 (en) | Link generation system | |
US7293012B1 (en) | Friendly URLs | |
US20060235873A1 (en) | Social network-based internet search engine | |
US20050091202A1 (en) | Social network-based internet search engine | |
US20070244857A1 (en) | Generating an index for a network search engine | |
WO2004084097A1 (en) | Method and apparatus for detecting invalid clicks on the internet search engine | |
CN101046806B (en) | Search engine system and method | |
AU2001290363A1 (en) | A method for searching and analysing information in data networks | |
JP2004502987A (en) | How to build a real-time search engine | |
JP2000357176A (en) | Contents indexing retrieval system and retrieval result providing method | |
US8055665B2 (en) | Sorted search in a distributed directory environment using a proxy server | |
EP1412878A1 (en) | Apparatus for searching on internet | |
WO2001075668A2 (en) | Search systems | |
KR20010045995A (en) | An apparatus and a method for connecting uniform resource locator using e-mail address |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2002736301 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2002736301 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2002736301 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |