EP1461725A1 - Method and apparatus for information retrieval - Google Patents
Method and apparatus for information retrievalInfo
- Publication number
- EP1461725A1 EP1461725A1 EP02779016A EP02779016A EP1461725A1 EP 1461725 A1 EP1461725 A1 EP 1461725A1 EP 02779016 A EP02779016 A EP 02779016A EP 02779016 A EP02779016 A EP 02779016A EP 1461725 A1 EP1461725 A1 EP 1461725A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- information
- retrieved
- url
- text
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Definitions
- This invention relates to information retrieval, and is directed primarily but not solely automated retrieval and analysis of information available on the Internet or similar databas such as databases, internal networks and intranets.
- the invention provides a method for automated search and retrieval information available on a networked database, the method including the steps of
- the network is the Internet.
- the retrieved information is analysed.
- an alert is provided to an entity as a result of the analysis.
- the invention provides an automated information seai and retrieval system in which real time selection and retrieval of the information occurs.
- the system includes provision for archiving the retrieved information in a read accessible manner.
- the information is searched and retrieved from the Internet.
- the invention provides a method for automated searching and retrieval information, performing real time selection and retrieval of the information.
- the information is archived for subsequent analysis.
- the method preferably includes the step of establishing one or more target resource locat from which information is to be searched and retrieved.
- the target location preferably includes a URL which is spidered by the syst to identify underlying links.
- the spidering step is performed in a plurality of passes, each pass being targe toward certain links, and each pass ignoring links that are unlikely to be relevant.
- the method includes the step of retrieving information from links that app relevant.
- the method includes the step of assigning or attaching metadata to each item information to create a database record.
- the database records are archived.
- Preferably retrieved information which is not in a textual format is converted to an edita raw-text data type.
- Preferably data can be provided from other sources, for example hard copies which may converted to text using optical character recognition processors, or from an audio forr using speech recognition applications.
- the method includes the step of analysing text retrieved by the method agaii predetermined rules.
- the predetermined rules may include a literal string (key woi matches, regular expression matches, string patterns or occurrences of text, or otl linguistically defined criteria.
- the predetermined rules may additionally involve other t( analysis technology to recognise desired matches.
- the rules may be used to implemen criterion against which retrieved items of information are compared to determine th relevance to various topics and therefore the manner in which the information should indexed, or possibly discarded.
- the method includes the step of discarding or stripping all extraneous informati from the information that is retrieved.
- extraneous information may include HTI ⁇ tags, images and the like.
- relevant information which is the subject of a new record created for immedi, analysis or for archiving is stored with associated metadata (for example source URL, d retrieved, string length, HTML headers and the like).
- metadata for example source URL, d retrieved, string length, HTML headers and the like.
- each record a distinct and unique item in the database or archive and is assigned a unique identifier.
- the unique identifier may be a thirty two character UUID (universally unique identifier).
- the invention also includes apparatus to implement the system or method of one or more the preceding statements of invention.
- the invention includes a computing machine operable to implement the system or method one or more of the preceding statements of invention.
- Figure 1 an overview diagram of an information retrieval and archiving syste according to the invention
- Figure 2 is a diagrammatic time line of internet information search functions accordi to the invention.
- Figure 3 is a flow diagram of an internet search and retrieval function according to t invention.
- Figures 4a & 4b constitute a single flow diagram showing the search and retries function of Figure 3 in greater detail.
- Figure 5 is a diagram showing the action of an agent or bot spidering a target server accordance with the invention.
- Raw data is shown at a first level referenced 1. It is ti data that the present invention searches, selects and then organises or indexes to arrive relevant timely information. As can be seen from the diagram, this raw data can includedi diverse range of data formats such as hard copy documents 10, Internet data 12, audio d 14 and video data 16.
- Sources of hard copy documents include sources such as newspapers and magazine artic or other paper records.
- Internet or other network data can include data contained in or generated by HT1N documents, XML documents/feeds, dynamic pages (CGI, ASP, CFM, PHP) and WAP d, sources, amongst others.
- Audio data can include radio broadcasts, tape recordings/interviews and streaming audio ( example provided on the Internet).
- Video data can include television broadcasts, tape recordings or streaming video ( example provided on the Internet).
- OCR optical character recognition
- the application automatically scans each page, converts the document into a raw text fore using OCR (optical character recognition), and saves it into the central database.
- OCR optical character recognition
- the documents may be newspaper articles, magazine journals, printed PDF files, or oti hard-copy material.
- HTTP and similar or subsequent methods and protocols
- reque are used to supply the required HTML, or other, documents and these can then be stripped extraneous information such as HTML tags and the like to arrive at a text document.
- T processing is generally indicated using reference numeral 20 in Figure 1.
- Audio data and video data are processed using speech recognition components to transfo the audio information into a textual format.
- This process is generally indicated us: reference numeral 22 in Figure 1.
- a compii or series of computers running an application which processes audio from TV broadcas video, and other media (streaming, CDROM, etc).
- the audio/video data may be stoi digitally on a storage device connected to the computer or captured from an analogue sou such as a bank of VCRs or similar playback devices.
- the "audio signal" can be derived from either an audio or video source. Provision is ma for additional metadata with video sources that analyses and classifies video & ima information.
- the application running on the computer analyses the broadcast using speech recogniti software to convert it to a raw text form where it is saved into the central database.
- the result of the processing step in level 2 is a text document, referenced 24 which provided in electronic form.
- Each text item 24 then has metadata added to it (as will described further below) so as to create a database record in step 26, and each record is tli stored on a database 28.
- the database can then be accessed to review information of inter that has been gathered using the process.
- the information on the database c be archived in a number of convenient formats for use to track changes and patterns o ⁇ time or to review historical data information.
- a time line having an axis 30 representing time advancing in lini intervals in a direction to the right hand side of the figure shows examples of agents or b ⁇ which automatically search target data sources on the Internet.
- Agents or bots are used in the preferred embodimi to automatically search target data sources on the Internet.
- the agents are releas periodically.
- a first agent 32 which has the task of extracting informati from a specific URL e.g. theage.com may be released.
- Each agent is attached to a speci site and is profiled with information specific to that site. The information determines method and depth of spidering (this will be explained further below) and how information is extracted.
- Each agent is released at predetermined intervals and they begin harvestii information through a process as will be described further below. Once each agent b finished its automated process, it returns to a "wait" state until it is next triggered.
- another agent 34 may be attached to another UI e.g. SMH.com and be released at 8:00am.
- the agent 36 may be attached to a URL e news.com.au and be released at 9:00am.
- the agent 38 may be attached to yet another UI e.g. ordermail.com.au and be released at 10:00am.
- step 40 the agent makes an http get request to retrieve the HTIv document from its target URL. This is performed in step 42.
- the agent in step 40 is agent 32, then the URL that the request is sent to would theage.com.au.
- the document that the agent receives from the target URL will include number of links. These links will typically consist of links to other URLs. These links ⁇ filtered according to certain criteria and information the agent is loaded with and stored oi system server in a "spider list". Certain types of resource are filtered as well as compared an "exclusion list" on the server. Any URL which is listed on the exclusion list is ignored the agent. In this way, from a general known website structure, links which are known to valueless in terms of their information can be readily excluded by the system.
- This step filtering the relevant links is carried out in step 44 and is generally performed by a parsi process whereby the text and the link is analysed by the agent to look for key words known words or word patterns such as linguistically defined criteria or "themes" which ⁇ likely to indicate a relevant link to the information which is sought.
- the method includes t step of analysing text retrieved by the method against predetermined rules.
- T predetermined rules may include a literal string (key word) matches, regular expressi matches, string patterns or occurrences of text, or other linguistically defined criteria.
- T predetermined rules may additionally involve other text analysis technology to recogn desired matches.
- the rules may be used to implement a criterion against which retriev items of information are compared to determine their relevance to various topics a therefore the manner in which the information should be indexed, or possibly discarded.
- I term "spidering" refers to the process of navigating through a series of on line resources a gathering information. Therefore, the spider list which is established by the agent s forth a pattern of links at the target site which is subsequently visited by the agent to retr information as is described further below.
- step 46 the agent then proceeds to process each parsed URL from step 44 individua until all further links (of which there may be many) are checked in this manner. This occi in step 46. Again, links which are on the exclusion list are ignored by the agent.
- the agent inserts the relevant URL (or link) into a URL string tab This occurs in step 48.
- the agent then performs a query in step 50 retrieve all the URL's from the URL stream table.
- step 52 the process begins by the agent making an HTTP GET request to retrieve a documi from the first URL.
- the agent retrieves a profile for the base URL. This occurs in si 54 and the purpose is to obtain further information about any known document structure structures at the website of interest. Therefore, profiles tend to be specific toward each tarj URL. If the profile is known, then this can make the content of the HTML document mv easier to accurately retrieve in a desired form. If the structure of the HTML documi retrieved does not match the profile then the agent defaults to retrieving the entire text f the HTML document with the HTML tags stripped out.
- step 56 the agent executes the profile and in step 58 retrieves the relevi material (for example) in text with extraneous content stripped out.
- the next step 60 is for an analysis to be performed of the retrieved document.
- the ag ⁇ analyses the text retrieved against predetermined rules which may be called "themes" stoi on the system server.
- the themes may consist of actual literal string (i.e. key word) match regular expression matches, string patterns or occurrences of text or other linguistica defined criteria as determined.
- themes are defined by system users in consultation with analysts and may cons of any of the foregoing, and additionally may involve other text analysis technology recognise desired matches.
- the word "themes” is broadly used in this document describe a scheme of criteria against which retrieved items are compared to ascertain or di: documents of relevance to the user.
- step 60 should the query performed in step 60 result in a match, then the ag inserts the text document that has been retrieved into the system database. This occurs step 62. If a match is not achieved, then the document is discarded.
- the agent then returns to the next URL in the URL stre. table in step 64 so that the process begins to repeat from step 52 until all URLs have be examined.
- step 66 the agent "returns" to the system server until next cycle is due to begin.
- step 66 the agent "returns" to the system server until next cycle is due to begin.
- step 66 the agent "returns" to the system server until next cycle is due to begin.
- step 66 the agent "returns" to the system server until next cycle is due to begin.
- step 66 the agent "returns" to the system server until next cycle is due to begin.
- step 66 As described w reference to Figure 1, as each text item is added to the database, additional metadata is adc to the item so that the data is organised or indexed for subsequent retrieval or for furtl analysis for identification purposes. Therefore, as each new record is created on the syster database, the text is stored and any associated metadata (such as source URL, date retriev string length, HTML headers etc) is stored with the text.
- Each record is created is thu distinct and unique item in the data base and is assigned a unique identifier. This identii
- the system envisages storing text documents regardless of whether a theme is matched not so that recursive searches may be made.
- step 70 the agent executes in step 70 and initial query occurs in step 72 which is an HTTP request to get the base URL.
- step ! check is performed from the document returned as a result of the request. This check is review the header data from the HTML document that is returned to ascertain the last ti that the document was updated or modified.
- step 76 A comparison occurs in step 76, and if then no change, then the agent returns to step 70.
- step 78 the agent returns to step if a change has occurred, then document is received in step 78 and is parsed in step 80 to ascertain relevant links. I desired (but not absolutely necessary) that only links which relate to text documents parsed and that the agent ignores links from any exclusion list as described above.
- step 82 the parsed URL is processed and in step 84 the agent performs a query to chi whether the processed URL is present in the URL stream table. If it is not, then in step 8 further query is performed to check whether the URL is in the URL archive table. If URL is not present in that table either, then the agent inserts the URL into the URL stre table together with further parameters such as the base URL, the date and time of ] modification of the document to which the URL relates and a depth variable.
- step 84 the agent continues to process the next U in step 82 and the process continues until all the URL's have been parsed.
- step 90 the agent retrieves all the URL's that have b ⁇ passed from the URL stream table.
- a GET request is then performed in step 92 for the f URL from the URL stream table.
- a check is then performed in step 94 to see whether depth variable is greater than 1 i.e. whether there are further links in the document tha retrieved from that URL. If there is, then these links are parsed and the process is perforn again beginning at step 80 until all the subsidiary links are parsed and then the agent retu to step 96 where a query is performed to retrieve the profile for the relevant base URL.
- step 98 the agent attempts to execute retrieved profile. If there is a profile match failure, as shown in step 100, then the full texi the HTML document is simply retrieved and all the HTML tags are simply stripped from document. If there is a profile match success as shown in step 102, then the text from document is easily retrieved with extraneous content removed from it. The resultant t document is then compared with the themes referred to above to see whether a match occ in step 104. A query is then performed in step 106 to see whether the URL to which document relates already exists. If it does, then the URL is discarded and the agent turns the next URL in the URL stream table at step 108.
- the agent inserts the full text into the content items table (i.e. into the databa together with further metadata such as the base URL and further information identification and search purposes. This occurs in step 110. If for some reason an article cannot be extracted, then an email is generated in s 112. The agent then continues to repeat the process for subsequent URL's in the U stream table at step 114.
- Step 106 has the purpose of preventing information being retrieved and stored twice.
- FIG 5 a simplified diagrammatic illustration of the spidering process described abc in Figures 3, 4a and 4b is shown.
- the system server is referenced 150 and a target server which the target URL i.e. the base URL referred to above is located as referenced 152.
- agent 154 begins by making a first pass of the base URL of the target server 152. That ag then returns data to the server as shown by arrow 156. If the information returned indica that there are links to further URL's on the target server, then the agent makes a further p i.e. a second pass 158. Information from the second parse is returned to the server in s 160.
- a tb pass 162 may be made, which will again return further information to the server.
- the method provides a logical ⁇ straight forward way of spidering a target server for relevant information.
- information on a target server may be represented in a pie chart foi
- the information in an initial state of the server 170 may show that no information has b ⁇ spidered.
- a certain amount of information will have been retrieved indicated in diagram 172.
- a second pass further information will have been retrie as shown by diagram 174.
- yet more information has bt retrieved as shown by diagram 176.
- the spidered information from the server is shown the shaded portions of each diagram. As can be seen, a certain amount of information ignored and this information relates to links that have been parsed by the agent but wh have been ignored because they have been determined to be a) irrelevant, b) on a list URL's to be ignored, or c) are not in the required data form (for example do not compris text document).
- an "alert" After a content item has been stored in the database, an "alert" will be generated.
- the al configuration is definable by the client, and may take the form of an email, an SMS messa the remote updating of a web page, or remote communication with another datab system of application.
- the alert may be sent in "real-time” (as soon as the content item is retrieved) or after it ] been analysed (after the analyst has processed the content item).
- the alerts may be received singly or in digest form on a different frequency, for examj. daily, weekly, or even monthly if desired.
- the client may view "real-time" reports sowing visually the retrieval, processing J analysis of items that match their keyword themes. These reports consist of dynamic graphs, pie graphs, and other types of chart which display information and metad pertaining to these contents items.
- the client may further manipulate these charts and graj with different ranges and criteria to produce different results.
- the analysis may be performed by a human analyst or by a software component on server.
- the analysis metadata is compiled from the client perspective and stored on a p user client, so one content item may have many analyses for different clients.
- the analysis allows the user to select many database cross-sections for different repc showing the analysis metadata which is linked to retrieved content items.
- the analysis x also be displayed real-time to the client so as items are updated and analysed the on-scn information is updated with no intervention from the client.
- the analysis enables the user to quickly gain an understanding of the skew of a large volu of content at a glance; instead of perusing each item they are able to view a dissect overview in graphical format and provide a powerful tool in determining real-time trends they appear.
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AUPR914601 | 2001-11-27 | ||
AUPR9146A AUPR914601A0 (en) | 2001-11-27 | 2001-11-27 | Method and apparatus for information retrieval |
PCT/AU2002/001597 WO2003046755A1 (en) | 2001-11-27 | 2002-11-27 | Method and apparatus for information retrieval |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1461725A1 true EP1461725A1 (en) | 2004-09-29 |
EP1461725A4 EP1461725A4 (en) | 2005-06-22 |
Family
ID=3832956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP02779016A Ceased EP1461725A4 (en) | 2001-11-27 | 2002-11-27 | Method and apparatus for information retrieval |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1461725A4 (en) |
AU (1) | AUPR914601A0 (en) |
CA (1) | CA2507279A1 (en) |
NZ (1) | NZ533730A (en) |
WO (1) | WO2003046755A1 (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996029661A1 (en) * | 1995-03-20 | 1996-09-26 | Interval Research Corporation | Retrieval of hyperlinked information resources using heuristics |
US5835905A (en) * | 1997-04-09 | 1998-11-10 | Xerox Corporation | System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents |
GB2335761A (en) * | 1998-03-25 | 1999-09-29 | Mitel Corp | Information search using user profile |
WO2000041099A1 (en) * | 1998-12-30 | 2000-07-13 | Microsoft Corporation | Method for analyzing network data |
US6163804A (en) * | 1997-07-08 | 2000-12-19 | Canon Kabushiki Kaisha | Network information searching apparatus and network information searching method |
EP1069515A1 (en) * | 1999-07-15 | 2001-01-17 | Information and Communications University | Method and apparatus for web information extraction service |
US6182072B1 (en) * | 1997-03-26 | 2001-01-30 | Webtv Networks, Inc. | Method and apparatus for generating a tour of world wide web sites |
WO2001050320A1 (en) * | 1999-12-30 | 2001-07-12 | Auctionwatch.Com, Inc. | Minimal impact crawler |
WO2001057711A1 (en) * | 2000-02-02 | 2001-08-09 | Searchlogic.Com Corporation | Combinatorial query generating system and method |
US6304864B1 (en) * | 1999-04-20 | 2001-10-16 | Textwise Llc | System for retrieving multimedia information from the internet using multiple evolving intelligent agents |
US20010032205A1 (en) * | 2000-04-13 | 2001-10-18 | Caesius Software, Inc. | Method and system for extraction and organizing selected data from sources on a network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010044800A1 (en) * | 2000-02-22 | 2001-11-22 | Sherwin Han | Internet organizer |
US6516337B1 (en) * | 1999-10-14 | 2003-02-04 | Arcessa, Inc. | Sending to a central indexing site meta data or signatures from objects on a computer network |
-
2001
- 2001-11-27 AU AUPR9146A patent/AUPR914601A0/en not_active Abandoned
-
2002
- 2002-11-27 EP EP02779016A patent/EP1461725A4/en not_active Ceased
- 2002-11-27 WO PCT/AU2002/001597 patent/WO2003046755A1/en not_active Application Discontinuation
- 2002-11-27 CA CA002507279A patent/CA2507279A1/en not_active Abandoned
- 2002-11-27 NZ NZ533730A patent/NZ533730A/en unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1996029661A1 (en) * | 1995-03-20 | 1996-09-26 | Interval Research Corporation | Retrieval of hyperlinked information resources using heuristics |
US6182072B1 (en) * | 1997-03-26 | 2001-01-30 | Webtv Networks, Inc. | Method and apparatus for generating a tour of world wide web sites |
US5835905A (en) * | 1997-04-09 | 1998-11-10 | Xerox Corporation | System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents |
US6163804A (en) * | 1997-07-08 | 2000-12-19 | Canon Kabushiki Kaisha | Network information searching apparatus and network information searching method |
GB2335761A (en) * | 1998-03-25 | 1999-09-29 | Mitel Corp | Information search using user profile |
WO2000041099A1 (en) * | 1998-12-30 | 2000-07-13 | Microsoft Corporation | Method for analyzing network data |
US6304864B1 (en) * | 1999-04-20 | 2001-10-16 | Textwise Llc | System for retrieving multimedia information from the internet using multiple evolving intelligent agents |
EP1069515A1 (en) * | 1999-07-15 | 2001-01-17 | Information and Communications University | Method and apparatus for web information extraction service |
WO2001050320A1 (en) * | 1999-12-30 | 2001-07-12 | Auctionwatch.Com, Inc. | Minimal impact crawler |
WO2001057711A1 (en) * | 2000-02-02 | 2001-08-09 | Searchlogic.Com Corporation | Combinatorial query generating system and method |
US20010032205A1 (en) * | 2000-04-13 | 2001-10-18 | Caesius Software, Inc. | Method and system for extraction and organizing selected data from sources on a network |
Non-Patent Citations (5)
Title |
---|
BALDAZO R: "NAVIGATING WITH A WEB COMPASS" BYTE, MCGRAW-HILL INC. ST PETERBOROUGH, US, vol. 21, no. 3, 1 March 1996 (1996-03-01), pages 97-98, XP000600179 ISSN: 0360-5280 * |
CHAKRABARTI S ET AL: "Focused crawling: a new approach to topic-specific Web resource discovery" COMPUTER NETWORKS, ELSEVIER SCIENCE PUBLISHERS B.V., AMSTERDAM, NL, vol. 31, no. 11-16, 17 May 1999 (1999-05-17), pages 1623-1640, XP004304579 ISSN: 1389-1286 * |
DE BRA P M E ET AL: "Information retrieval in the World-Wide Web: Making client-based searching feasible" COMPUTER NETWORKS AND ISDN SYSTEMS, NORTH HOLLAND PUBLISHING. AMSTERDAM, NL, vol. 27, no. 2, November 1994 (1994-11), pages 183-192, XP004037989 ISSN: 0169-7552 * |
See also references of WO03046755A1 * |
SMITH J R ET AL SOCIETY OF PHOTO-OPTICAL INSTRUMENTATION ENGINEERS (SPIE): "AN IMAGE AND VIDEO SEARCH ENGINE FOR THE WORLD-WIDE WEB" STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES 5. SAN JOSE, FEB. 13 - 14, 1997, PROCEEDINGS OF SPIE, BELLINGHAM, SPIE, US, vol. VOL. 3022, 13 February 1997 (1997-02-13), pages 84-95, XP000742373 ISBN: 0-8194-2433-1 * |
Also Published As
Publication number | Publication date |
---|---|
EP1461725A4 (en) | 2005-06-22 |
WO2003046755A9 (en) | 2003-09-12 |
NZ533730A (en) | 2006-04-28 |
WO2003046755A1 (en) | 2003-06-05 |
CA2507279A1 (en) | 2003-06-05 |
AUPR914601A0 (en) | 2001-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10210256B2 (en) | Anchor tag indexing in a web crawler system | |
US6490579B1 (en) | Search engine system and method utilizing context of heterogeneous information resources | |
US20050010556A1 (en) | Method and apparatus for information retrieval | |
US10210222B2 (en) | Method and system for indexing information and providing results for a search including objects having predetermined attributes | |
US20020143932A1 (en) | Surveillance monitoring and automated reporting method for detecting data changes | |
US6148289A (en) | System and method for geographically organizing and classifying businesses on the world-wide web | |
US8515954B2 (en) | Displaying autocompletion of partial search query with predicted search results | |
US7167901B1 (en) | Method and apparatus for improved bookmark and histories entry creation and access | |
US7065523B2 (en) | Scoping queries in a search engine | |
EP2321745B1 (en) | Providing posts to discussion threads in response to a search query | |
US9081861B2 (en) | Uniform resource locator canonicalization | |
US7664744B2 (en) | Query categorizer | |
US8095530B1 (en) | Detecting common prefixes and suffixes in a list of strings | |
US8938455B2 (en) | System and method for determining a homepage on the world-wide web | |
US20050149519A1 (en) | Document information search apparatus and method and recording medium storing document information search program therein | |
US20070143317A1 (en) | Mechanism for managing facts in a fact repository | |
US20050086206A1 (en) | System, Method, and service for collaborative focused crawling of documents on a network | |
US20090132529A1 (en) | Method and System for URL Autocompletion Using Ranked Results | |
WO2005010701A9 (en) | Method and system for rule based indexing of multiple data structures | |
JPH11191114A (en) | Meta retrieving method, image retrieving method, meta retrieval engine and image retrieval engine | |
JP2006099341A (en) | Update history generation device and program | |
WO2001024045A2 (en) | Method, system, signals and media for indexing, searching and retrieving data based on context | |
US20050188300A1 (en) | Determination of member pages for a hyperlinked document with link and document analysis | |
US20120109965A1 (en) | System for automatic semantic-based mining | |
WO2003046755A1 (en) | Method and apparatus for information retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20040628 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LI LU MC NL PT SE SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 20050509 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1072109 Country of ref document: HK |
|
17Q | First examination report despatched |
Effective date: 20050801 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20081027 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1072109 Country of ref document: HK |