EP1254413A2 - Systeme et procede de recherche dans une base de donnees - Google Patents

Systeme et procede de recherche dans une base de donnees

Info

Publication number
EP1254413A2
EP1254413A2 EP01902550A EP01902550A EP1254413A2 EP 1254413 A2 EP1254413 A2 EP 1254413A2 EP 01902550 A EP01902550 A EP 01902550A EP 01902550 A EP01902550 A EP 01902550A EP 1254413 A2 EP1254413 A2 EP 1254413A2
Authority
EP
European Patent Office
Prior art keywords
databases
database
information
query
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01902550A
Other languages
German (de)
English (en)
Inventor
Richard David Parratt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navigateone Ltd
Original Assignee
Navigateone Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navigateone Ltd filed Critical Navigateone Ltd
Publication of EP1254413A2 publication Critical patent/EP1254413A2/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention is concerned with systems and methods for retrieving material from computerised information systems. Such systems and methods are sometimes referred to as search engines as they search through databases such as the Internet, Internet websites or collections of Internet websites.
  • the World- Wide- Web consists of a large number of separate databases or websites which can be accessed via a telecommunications link and then viewed or interrogated.
  • the different World- Wide- Web or Internet databases or websites are stored at computers which can be located at any location provided that they are connected to a telecommunications network.
  • Computers and/or databases within many organisations are also interconnected using the same techniques (so-called "Intranets").
  • Each computer or database in the network may contain information of interest to users of the network. This information may be structured in each computer or database in various ways and using various techniques.
  • the providers or compilers of each Internet website will select and implement an user interface for their website so as to allow users to gain access to information from the website or database.
  • Websites typically include a number of different possible information displays known as web pages. These are generated in response to instructions or requests posed by the user via the website user interface.
  • the information may be delivered as a formatted, human-readable page, or in a computer readable format designed for further processing by users' software applications.
  • a user For a user to retrieve a particular item of information from a collection of distributed websites such as the Internet or an Intranet, he or she needs to locate the website(s) or database(s) that holds the relevant information, learn to use the user interface of the website(s) and then use the interface to retrieve the web- page, database section or display containing that information.
  • a collection of distributed websites such as the Internet or an Intranet
  • Web directories rely on a manual process to locate each relevant individual web page and place it in a directory.
  • a team of editors review websites and input page addresses into categories, the results being stored in a database. Users can then browse through the database. Categories are generally organised hierarchically, e.g. Companies/UK/Telecoms/British Telecom.
  • the directory usually also provides a full-text search facility which allows users to look for information using keywords, phrases or boolean logic.
  • the directory compilation process is expensive due to the large number of editors required. The expense and time involved in reviewing possibly relevant websites also limits the amount of information that can be indexed at a reasonable cost in a reasonable time-scale.
  • the directory process can be quite precise, but can miss out valuable information due to sites not having been found and indexed.
  • directories examples include Yahoo (www.yahoo.com), Looksmart (www.looksmart.com) and dmoz Open Directory (www.dmoz.org).
  • Text search engines use a "web crawler" to locate content from the World- Wide- Web . This is a computer program that, starting with a number of initial website or webpage addresses with information of relevance to a particular query or type of query, follows the links between web pages to try and locate all linked pages on the Wold- Wide- Web. Most text search engines also allow web users and website owners to submit pages for indexing.
  • All data located by the web crawler is stored in a searchable database.
  • This database can then be searched or interrogated.
  • the database typically allows full text searching using keywords, phrases or boolean logic.
  • the searched text stored in the database can include the web page title, the full text of the website and "metadata" which is not displayed but provided by the web-site or database creator to aid search engines.
  • Text search engines provide coverage of every web page that is immediately accessible to the "crawler" used by the engine to gather content. This allows a text search to provide a high level of completeness, at the expense of results that may be irrelevant. Text search engines are intrinsically unable to locate information within websites that provide a database of content on multiple objects, such as newspaper archives, stock exchange quote service, airline booking services, etc. The web crawler will typically locate the front page of these sites and be unable to advance beyond the barrier of the user interface used by each particular website or database - preventing such information from being effectively indexed.
  • Natural language processing is used to match the results generated by the search against the initial request inputted by the user and obtain a statistical measure of the relevance of information to the users query.
  • Weighting is given to words in the database according to frequency of appearance, proximity and nearness to the start of the document.
  • Document comparison techniques are used, e.g. by allowing users to select a link and look for similar documents. Such systems typically use statistical matching techniques to pick up possibly similar documents. All of these techniques are limited when locating objects such as companies or places where the context is not understood by the search algorithm.
  • One system (Ask Jeeves - www.askjeeves.co.uk, or www.ask.com) allows users to enter a query in natural language. This is then statistically matched against a database of potential questions indexed against possibly relevant websites. The results are then returned as a list of those websites indexed in the database together with the statistically relevant potential questions. .
  • the database of websites and questions is typically maintained manually. Only questions similar to those held in the database are likely to receive a relevant result.
  • Meta-search engines (such as Dogpile, Copernicus, Sherlock) allow users to enter a search request which will then be presented to multiple search engines. This allows users to get the benefit of multiple technologies in finding information. Meta-search techniques tend to return a large amount of information which needs to be manually reviewed by the user.
  • the present invention in a first aspect provides a system for searching a distributed collection of databases comprising a number of databases connected to each other by a communications network system including query entry means for entering a request for information on a subject, object or matter, or a group of subjects, objects or matters, a first memory storing index entries each index entry including a portion representing a subject, or a group of subjects, objects or matter, object or matter) on which information might be sought and one or more locations entries indicating which of the databases may contain information in the respective subject, object or matter or group of subjects, objects or matters, and a second memory storing database interrogation modules routines or sub-routines for converting a request for information received by the data entry means into a set of appropriate instructions for each of the databases.
  • the present invention in a second aspect provides a method for obtaining information from a collection of databases, comprising entering a query for information comparing the query to a database of descriptions of the content or type of context of the databases constituting the collection of databases generating a list of potentially relevant databases from said comparison of the query to the database descriptions and converting or translating the query into an enquiry signal or signals recognised or processable by each of the potentially relevant databases.
  • the present invention in a third aspect provides a computer program comprising program wide means for performing the method set out above and in claims 12 to 17.
  • the present invention in a fourth aspect provides a computer program product comprising program code means stored on a computer readable medium for performing the method set out above and in claims 12 to 17.
  • Locating relevant information amongst the large amount of irrelevant content on the Internet which may contain words that match the name of the object. For instance, there are over 50,000 directly accessible web pages containing the words "British Telecom"
  • Locating information which is provided by an interactive website and is thus inaccessible to conventional web crawler based search engines Classifying results according to the type of content, e.g. news articles, charts, quotes, airline timetables, hotel information.
  • Figure 1 is a block diagram illustrating the processing of a query by a system embodying the invention
  • Figure 2 is a block diagram illustrating the architecture of a system embodying the invention
  • Figure 3 is a block diagram illustrating the database management of a system embodying the invention
  • Figure 4 is a diagrammatic illustration of the structure of a reference database used by the system of figures 1 to 4; and Figure 5 is a block diagram illustrating the components of a system embodying the invention;
  • Figures 6 and 7 are flow charts illustrating the processing of a query by the system of figures 1 to 5.
  • Preferred embodiments of the present invention are concerned with methods and/or systems for locating information relevant to a specific entity or matter held in a number of separate databases or websites or stored on a network of computers connected to a telecommunications system.
  • entity or matter on which information might be sought can be any real or abstract object or class of objects about which information exists in one or more databases.
  • the databases or websites may be on the public World- Wide- Web or on a private Intranet.
  • the term database when used in this document includes all possible stores of . information including websites and web pages.
  • a descriptions database of descriptions of databases is held on a computer or a number of computers.
  • This database contains descriptions of the type of information held in a number of databases or websites. These descriptions can be entered manually into the descriptions database by the system operator. Alternatively a website or database provider may publish its own description over the World- Wide- Web or over an Intranet in a manner such that this information will be automatically retrieved by the system and stored in the descriptions database.
  • a user accesses the system through a user interface which is used to present a query to the system.
  • the user interface prompts the user to identify the entity or matter they require information on and passes this information onto the system.
  • the system may allow one or more methods to be used to identify an entity or subject matter and present a query about such an entity or subject matter. Examples of these methods include: (a) the identification of companies by company name, symbols such as Reuters RIC codes, S & P codes or stock exchange ticker codes; (b) the identification of places or locations by postcode or latitude and longitude of a place.
  • the data structure identifying an entity or subject matter is described herein as a "Key".
  • the system will then compare the query presented by a user with the descriptions database and select those websites or databases in its descriptions database which are of possible relevance to the entity or subject matter the user has described in his or her query.
  • these selected possibly relevant databases or websites can then be selected further and ordered according to a number of criteria: for example, a priority level set by the manager of the system, the national language used on the website, the users own preferences.
  • the system also includes a database access instructions database which translates queries entered into the system into instructions which are appropriate for interrogating or viewing each of the databases or websites covered by the system.
  • a query is made by a user, this is compared to the entries in the descriptions database to determine the potentially relevant websites or databases and produce a list of these.
  • the query is than translated into instructions appropriate for these potentially relevant databases and the user invited to select those databases he or she wishes to check from the list of potentially relevant sources of information.. Any such databases, web page or websites selected are then interrogated using the appropriate instructions from the access instructions database.
  • Preferred embodiments of the invention produce a consolidated set of instructions which are passed to a user.
  • a system for searching the Internet they may be passed to the user's Internet browser application together with a suitable software module or page description that can interpret them.
  • the user can then use a common interface to select from the various sites or databases listed by the system and retrieve the information they require from each one.
  • the user interface may also provide the ability to classify sites into logical categories appropriate for the field of the search being undertaken.
  • a query such as a request for information in a particular company is compared to an indexed directory (1) of company names and company symbols.
  • a symbol representing the company on which information is sought is generated and passed or communicated to a relevance filter or scope matching module (2) comprising a descriptions database (12) of descriptions of the databases covered and indexed by the system.
  • the query represented by the company symbol is then compared to the database descriptions (12) and a list of potentially relevant databases or websites produced and passed to a symbol, key or enquiry mapping module which also receives details of the symbol representing the query posed to the system by a user.
  • the symbol, key or enquiry mapping module then converts the symbol into a signal or signals representing that symbol or company and capable of being recognised and processed by each of the potentially relevant databases.
  • a user selecting from the list of potentially relevant databases produced by the relevance filter or scope matching module can then interrogate the selected database(s) using the signal(s) created by the symbol or enquiry mapping module and the URL(s) (universal record locator) stored in the system and identifying the location of the various databases or websites.
  • the databases covered by the system can be either public (e.g. Internet websites) (4) or private (a company Intranet) (5).
  • Figure 2 illustrates a system covering private and public databases or sources of information.
  • the heart of the system is a reference database module (6) including the directory of symbols, the database descriptions and the access instructions.
  • the database module may be implemented using any relational database (such as Microsoft SQL Server, Sybase or Oracle) or similar technology in the prior art.
  • Information on the database derives from a manual analysis of two kinds of information; the set of objects the system is locating information on and the set of private and public databases containing the information.
  • the directory of objects (1) is a list of companies with a unique stock symbol for each company, in the case of the "travel” domain, a list of places with their postcode and geographical position.
  • Figure 3 shows one possible embodiment of the database creation process.
  • the reference database (6) consists of a collection of tables. Each table is in turn a collection of rows and each row a collection of columns. A column may hold any numeric or textual value.
  • the database is capable of efficiently locating any row based on the values in its columns.
  • Information on the set of objects can be manually entered into the database by a data input operator (7) using an input form (8). This will require the operator to enter all the information (name, symbol, alternative codes) for each object.
  • all or part of the directory of objects can be imported from another source such as a commercially available dataset (9) using a data translation module (16).
  • Data translation packages are supplied with most commercially available relational database systems.
  • the information is stored in a relational database table called the object directory (1) in the reference database (6).
  • Each private and public database known to the system is manually analysed to understand its description, purpose and structure. This information forms a set of database definitions (1 1). In a possible embodiment of the system these are written in the Extensible Markup Language (XML). These can then be translated into the database using a commercially available XML parser (13). The result is a database description table (12) in the reference database (6).
  • XML Extensible Markup Language
  • the Object directory or Object_Table (1) contains descriptions of all the entities the system understands. For example, in the financial domain this is a relational database table of company symbols, while in the travel domain this is a gazetteer of locations. Columns in this table relate directly to fields in the keys understood and used within the system. Rows in this table correspond to entities understood by the system.
  • the descriptions database or Site_Table (12) contains descriptions of each website indexed by the system each row in this relational database table corresponds to a website. Each website has a unique identifier (Site_Id) within this table. This identifier takes the form ⁇ name>. ⁇ name>. ⁇ name>.... and can be of arbitrary length subject to practical limitations of the underlying technology. This mechanism allows websites to be defined by different organisations without the risk of name clashes.
  • the columns of the site relational database table contain description information. This information is dependent on the domain being indexed. It consists of such fields as a full name and description to identify the site to the user, script modules used by the symbol, key or enquiry Mapping and
  • Scope Tables (15, 16) are defined for each indexed website.
  • the names of the scope tables are held in the fields Site_Scope and Site_Scope_Wild in the site description in relational database table (12).
  • the scope tables define the set of keys which the indexed website provides information on.
  • Site_Scope_Wild points to the wildcard scope relational database table (15).
  • This table holds keys with embedded wildcards, which can match several possible symbols.
  • a wildcard takes the form of a "*" character in the text of the field, which matches any arbitrary string of characters. Keys are held in the columns of the table, with each field of the key stored in a column named to correspond with the name of the field.
  • Site_Scope points to the non- wildcard scope relational database table (16).
  • This table holds discrete keys (without wildcards). Keys are held in the columns of the table, with each field of the key stored in a column named to correspond with the name of the field.
  • Mapping tables (17, 18) are defined for each indexed website. The names of mapping tables are held in fields Site_Symbol_Map and Symbol_Map_Wild in the site description in relational database table (12). These tables define a mapping between the keys defined for the domain (object directory or table 1) and the keys used by the individual websites, which may use a different symbology. For instance, in the financial domain a website may use the alphanumeric ISIN system to identify financial instruments. In this case, the table referenced by Site_Symbol_Map will define a mapping from the symbology used for the financial domain into ISIN codes.
  • Site_Symbol_Map points to the non-wildcard mapping relational database table (18). This holds a list of discrete keys in the form used by the domain and their counterparts in the form used by the individual websites. Keys are held in the columns of the table, with each field of the key stored in a column named to correspond with the name of the field.
  • Site_Symbol_Map_Wild points to the wildcard mapping relational database table (17). This holds keys with embedded wildcards, which can match several possible source keys.
  • the rules used to map wildcard symbols are described in Table 1. (See wildcard mapping rules on page 23). Keys are held in the columns of the relational database table, with each field of the key stored in a column named to correspond with the name of the field. e) The names of the field map tables (19) are defined in the field Site_Field_Map.
  • This relational database table has columns SrcField and DstField. These columns allow field names in the source key (user input or symbol) to be translated to field names in the destination key (enquiry signal in format recognised by selected database or website).
  • FIG 5. presents a block diagram of the information retrieval system of the present invention, where a user (20) makes a request for information on an entity and receives instructions on how to access that entity that are capable of interpretation by the user's browser software.
  • the user inputs a query (21) to the system.
  • the input of the query may be implemented using an HTML form or other technique.
  • the query consists of one or more text strings that identify an entity or set of entities within the system's coverage. For example, a user wanting information on the company British Telecom pic may input the text string "BRITISH TELECOM" as his or her query.
  • the query (21) is interpreted by the key text input module (22). This converts the query to a denormal key (23).
  • This key contains all the information in the users request in a standardised form. It may reference the requested entity using any valid method understood by the key normalisation module (24). Continuing the example of British Telecom pic, the users input text string "BRITISH
  • TELECOM TELECOM
  • Name a field in the key structure. If the user entered a symbol such as BT-A.GB, this would be split and stored in the fields shown in the following table:
  • the key normalisation module (24) is responsible for converting a denormal key (23) to a normalised key (25).
  • a denormal key may reference zero or more entities; for example the input BRITISH would pick up all entities including the word BRITISH.
  • a query about companies entered as BRITISH would pick up a large number of possibilities including BRITISH AEROSPACE, BRITISH TELECOM and ASSOCIATED BRITISH FOODS ) and would therefore reference more than one entity.
  • a normal key must reference one entity.
  • the denormal key (23) references either zero or more than one entities
  • the user is sent a message (26) to indicate that either no information was found or that their query was ambiguous.
  • the input BRITISH might result in the following prompt:
  • the user is presented with all possible keys and asked to select one. Continuing the example above, if the user wishes to access information on
  • BT-A.GB British Telecom pic he will select the relevant symbol (BT-A.GB).
  • BT-A.GB the unambiguous key or symbol BT-A.GB is re-presented to the key text input module (22) for further processing.
  • the key normalisation module (24) interacts with the reference database (6) to determine the list of potentially relevant databases or sources websites, webpages, of information.
  • a user will enter a query as a text string.
  • the system takes each text string and stores it in the denormal key as a name / value pair.
  • An example of this for enquiries about places might be: User entered “48° 52' N” into string "Latitude” and "5° 7' E” into string
  • Another embodiment of this invention specific to locating financial information defines a key syntax where the key takes the form ⁇ company>. ⁇ country> where
  • Denormal or unnormalised keys must be normalised so as to be capable of further processing by the system. This is achieved by the key normalisation module. This module normalises a denormal key and produces a normalised key. This process is performed by determining all the possible key entries in the symbol table stored in the reference database that match the specified key.
  • the system may store the symbol table as a table in a relational database with the columns corresponding to key fields (i.e. characteristic of entity /query /object) for system for requests on company information and the rows to valid symbols) (i.e. company or country symbols for requests on company information): for example the symbol table may be of the following format:
  • the reference database includes a listing of symbols indexed together with the possibly relevant sources of information.
  • Information is held in the database in the form of two dimensional indexed tables with an arbitrary number of rows and columns.
  • the index of possibly relevant sources of information might include the following websites:
  • the scope matching module (2) is responsible for determining the range of websites referenced by the normalised key (25) (e.g. BT-A.GB). This module uses the scope definitions (15, 16) from the database (2,6,12) to determine which websites (e.g. websites containing information on telecommunications companies, British companies, news items and/or the companies own websites) are potentially relevant to the user's request.
  • the result produced by this module is a list of filtered websites (27) such as that shown above.
  • the reference database (6) holds information defining the field or type of possible enquiry (e.g. company or financial information) and describing all the referenced websites. This information may be supplied by an online or offline process through manual entry or automatic information transfer (28).
  • the scope matching module then takes a normalised key and determines the websites that contain potentially relevant information for the entity described by that key.
  • the reference database includes a descriptions database of each website, database or information source covered by the system. This might contain the following information for each such source of information.
  • a non-wildcard relational database table (e.g. 15, 16) has a column corresponding to each key field and a row corresponding to each entity the website has information on.
  • KBWW Scope might be used to describe stocks handled by the Robertson Stephens website and have entries such as:
  • a wildcard relational database table also has a column corresponding to each key field. Each row also has a description of each entity in the form of a list of name / value pairs. However, the values may contain a textual wildcard indicated by the character "*". In testing a key against the value in the relational database table, the module will allow any character or string of characters to be matched by the character "*".
  • each key in the table is tested sequentially.
  • the row data is matched against each key field. If a match occurs, then the key can be said to be matched.
  • the matching process is performed on each website or database table in turn.
  • a pre-defined priority value is then retrieved from the reference database for each source of information, data (website or database).
  • the list of returned websites is sorted according to this value. Websites with a priority value of-1 are excluded from the result.
  • the key mapping module (3) is responsible for converting the normalised key (25) into a form understood by each website selected by the scope matching module.
  • the key mapping module uses key mapping tables (1056) in the reference database (1050) to perform key translations.
  • the single query inputted by the user may thereby be translated into the different instructions appropriate to interrogation of the different selected relevant websites or databases.
  • the result produced by this module is a list of site keys (1062); a tailored query for each database or website.
  • the key mapping module (1060) is responsible for converting the normalised key (1032) into a form understood by each website selected by the scope matching module.
  • the key mapping module takes a normalised key or input and maps it to a form recognised by each website.
  • the module takes as inputs the normalised key and a list of website definitions.
  • Each key may be mapped using a table-driven or algorithmic approach.
  • the type of mapping is defined in the website definition, held in the reference database.
  • the website definition will reference one or more tables in the database, which define mappings between the standard keys and the site specific keys.
  • the following relational database tables are defined for use in a table-driven mapping a) Site_Symbol_Map (18)
  • FIG. 1 A flowchart illustrating operation of the key mapping module is shown in Fig 6.
  • Step 31 decides whether a table-driven or algorithmic key mapping is being used. This information is provided in the website definition in the reference database.
  • step 32 accesses the relational database tables in the description.
  • step 33 the Site_Symbol_Map relational database table referenced by the description is indexed.
  • the module For a key with fields named N 0 - N m and containing values V 0 - V m , the module will perform an SQL query of the form:
  • step 35 is executed, each destination column with names of the form To_xxx (corresponding to a source column From_xxx) is copied to the key.
  • Step 36 gets the first entry in the wildcard relational database table.
  • step 37 a wildcard comparison is performed with each column value using the rules in table 1. If a match occurs, step 38 maps the values to the result.
  • Step 39 moves to the next row in the Site_Symbol_Map_Wild relational database table.
  • Step 40 checks whether the end of the table has been reached. If so, this indicates that no match has occurred. In this case, the terminal step 41 is reached and the website is not processed further, otherwise, step 37 is executed again.
  • Step 42 allows for field names to be remapped. If a relational database table Site_Field_Map is specified, this table is used to change the names of key fields in step 43. The mapped key is available at step 44.
  • a procedural script is defined to translate the key for a website.
  • the text of this script, together with an indication of the programming language is provided in the website description.
  • Script execution is provided by the basic operating platform the system is running (such as Microsoft Windows NT) or by other methods in the prior art.
  • the script execution facility is required to support the calling of methods on objects known to the script.
  • Step 45 retrieves the script from the website definition in the reference database.
  • the SetKey method of this script is called to set the key value.
  • step 47 the GetDestKey method is called to retrieve the key value. These methods are to be defined and programmed by the person creating the website definition.
  • Step 42 then performs any field mapping in the same way as for a table driven mapping.
  • the key mapping module uses key mapping tables (17, 18) in the reference database (6, 12) to perform key translations.
  • the single query inputted by the user may thereby be translated into the different instructions appropriate to interrogation of the different selected relevant websites or databases.
  • the result produced by this module is a list of site keys (30); a tailored query for each database or website.
  • the key mapping module produces a list of website access instructions (50) which can be interpreted by the database access or interrogation module (51) to interrogate selected websites.
  • the database access or interrogation module (51) is implemented as mobile code which executes in the user's database or Internet browser program. This module is responsible for interpreting a list of website access instructions (50) and providing the user with an interface which allows them to select websites that they wish to display information from. The module is then responsible for sending instructions to the websites in the form of universal record locators (URLs) to cause the websites to display data on the entity that the user requested.
  • URLs universal record locators
  • the database access module takes a normalised and mapped key and a site definition as inputs. These are used to create a set of instructions to access information on the entity identified in the query made by the use and in the websites and/or databases covered by the system. Website access instructions are created using one of two methods; template or algorithmic.
  • the template method provides a template into which key fields are substituted.
  • An appropriate access script is a data structure containing at least the fields defined in Table 2.
  • ObjText Array of Each element contains the text of a URL if strings ObjData is TRUE or displayable data otherwise.
  • this text may contain a field name delimited by escape characters. This field can be substituted for a field value.
  • ObjFramed Array of Each element is TRUE if the object referenced by boolean ObjText can be loaded in a frame (sub-window in flags a web browser), otherwise the object will be loaded in a new browser window
  • ObjData Array of Each element is TRUE if the object is a page of boolean data for display, otherwise the object is a URL flags
  • WaitLoad Array of Each element is TRUE if the browser should wait boolean for page to load before progressing otherwise the flags browser should proceed after Delay
  • Delay Array of Each element represents the number of seconds to numbers delay before loading next frame
  • the template is retrieved from the website description in the reference database. Any delimited fields in the ObjText array of the template are then substituted for the value the field name defines in the normalised and mapped key.
  • the result is a set of instructions which can be used to access the website.
  • the algorithmic method generates an access script by calling procedures in a scripting language. The text of this script, together with an indication of the programming language is provided in the website description. Script execution is provided by the basic operating platform the system is running (such as Microsoft Windows NT) or by other methods in the prior art. The script execution facility is required to support the calling of methods on objects known to the script.
  • a method, SetKey is called on the script to set the mapped key fields.
  • a further method, GetAccessScript is used to retrieve the complete access script structure.
  • This module executes in the users browser program in a typical embodiment of this invention.
  • the user will be presented with a user interface which allows them to select which of the websites selected by the system during the scope matching process to display information from.
  • the database access or interrogation module provides the ability to "drill down" into any website or database and thus simulate a users manual interaction with the site to access a particular page.
  • the input to this module is an access script as defined in Table 2.
  • a flowchart illustrating the operation of this module is shown in Fig 7.
  • the module iterates through the arrays in the access script.
  • the iterator variable N is set to 0. This indexes the access script arrays described in Table 2. These arrays are assumed to be zero based.
  • Steps 56, 57, 58 decide how the text in ObjText is to be displayed.
  • ObjText may hold HTML for immediate display or a URL to be loaded into a browser, according to the state of ObjData.
  • data or URLs may be displayed in a new browser window or in a browser frame.
  • Step 59 displays HTML as immediate data in a browser frame.
  • Step 60 displays HTML as immediate data in a new window.
  • Step 61 loads a URL into a new browser frame.
  • Step 62 loads a URL into a new browser window.
  • Step 63 decides whether to wait for a page to load in the browser. If so, the script will wait in step 64 before loading the next page. Step 65 implements a delay before loading the next page.
  • step 66 The iterator N is incremented in step 66.
  • step 67 N is tested against the size of the script arrays. If this is reached, then terminal step 68 is executed.
  • step 56 the process is repeated from step 56 until all the script instructions have been processed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système permettant d'effectuer des recherches dans un ensemble réparti de bases de données (4, 5), consistant en une multitude de bases de données reliées les unes aux autres par un système réseau de communication. Ce système comprend: des moyens de saisie d'interrogation (7, 8) permettant d'entrer une demande d'informations sur un sujet, un objet ou une question, ou sur un ensemble de sujets, d'objets ou de questions; une première mémoire (2, 6, 12) qui stocke des entrées d'index; chaque entrée d'index contient, d'une part, une portion représentant un sujet ou un ensemble de sujets, d'objets ou de questions, d'objets ou de questions à propos desquels des informations peuvent être recherchées et, d'autre part, une ou plusieurs entrées de localisation indiquant celle des bases de données susceptible de contenir lesdites informations; et une seconde mémoire (3, 6) qui stocke les modules d'interrogation des bases de données; des routines ou sous-routines permettant de convertir une demande d'informations reçue par le moyen de saisie d'interrogation en un ensemble d'instructions appropriées pour chacune des bases de données.
EP01902550A 2000-02-03 2001-02-02 Systeme et procede de recherche dans une base de donnees Withdrawn EP1254413A2 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17993400P 2000-02-03 2000-02-03
US179934P 2000-02-03
PCT/GB2001/000446 WO2001057725A2 (fr) 2000-02-03 2001-02-02 Systeme et procede de recherche dans une base de donnees

Publications (1)

Publication Number Publication Date
EP1254413A2 true EP1254413A2 (fr) 2002-11-06

Family

ID=22658589

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01902550A Withdrawn EP1254413A2 (fr) 2000-02-03 2001-02-02 Systeme et procede de recherche dans une base de donnees

Country Status (3)

Country Link
EP (1) EP1254413A2 (fr)
AU (1) AU2001230402A1 (fr)
WO (1) WO2001057725A2 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1611534A4 (fr) 2003-04-04 2010-02-03 Yahoo Inc Procede de production de resultats de recherche consistant a effectuer une recherche par algorithmes d'optimisation de sous-domaine et a fournir des resultats parraines par sous-domaine
US7752285B2 (en) 2007-09-17 2010-07-06 Yahoo! Inc. Shortcut sets for controlled environments
CN113127490B (zh) 2021-04-23 2023-02-24 山东英信计算机技术有限公司 一种键名称生成方法、装置和计算机可读存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0829811A1 (fr) * 1996-09-11 1998-03-18 Nippon Telegraph And Telephone Corporation Procédé et système pour le recouvrement d'informations
US6085186A (en) * 1996-09-20 2000-07-04 Netbot, Inc. Method and system using information written in a wrapper description language to execute query on a network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0157725A2 *

Also Published As

Publication number Publication date
WO2001057725A2 (fr) 2001-08-09
WO2001057725A3 (fr) 2002-06-13
AU2001230402A1 (en) 2001-08-14

Similar Documents

Publication Publication Date Title
US9367588B2 (en) Method and system for assessing relevant properties of work contexts for use by information services
US7895595B2 (en) Automatic method and system for formulating and transforming representations of context used by information services
US7685112B2 (en) Method and apparatus for retrieving and indexing hidden pages
US8037068B2 (en) Searching through content which is accessible through web-based forms
US7421441B1 (en) Systems and methods for presenting information based on publisher-selected labels
US6256623B1 (en) Network search access construct for accessing web-based search services
US7275061B1 (en) Systems and methods for employing an orthogonal corpus for document indexing
US6466940B1 (en) Building a database of CCG values of web pages from extracted attributes
US20050171932A1 (en) Method and system for extracting, analyzing, storing, comparing and reporting on data stored in web and/or other network repositories and apparatus to detect, prevent and obfuscate information removal from information servers
US8510339B1 (en) Searching content using a dimensional database
US6101503A (en) Active markup--a system and method for navigating through text collections
CA2288745C (fr) Procede et appareil permettant de proceder a une recherche dans une base de donnees d'informations
US20040006740A1 (en) Information access
US20080235567A1 (en) Intelligent form filler
US7099870B2 (en) Personalized web page
US20070022085A1 (en) Techniques for unsupervised web content discovery and automated query generation for crawling the hidden web
US20080072140A1 (en) Techniques for inducing high quality structural templates for electronic documents
US7013300B1 (en) Locating, filtering, matching macro-context from indexed database for searching context where micro-context relevant to textual input by user
US20020103794A1 (en) System and method for processing database queries
Nicholson Bibliomining for automated collection development in a digital library setting: Using data mining to discover Web‐based scholarly research works
WO2001024046A2 (fr) Creer, modifier, indexer, stocker, et retrouver des documents electroniques marques par des balises contextuelles
US8996514B1 (en) Mobile to non-mobile document correlation
EP1254413A2 (fr) Systeme et procede de recherche dans une base de donnees
EP1014283A1 (fr) Système et méthode basées d'intranet pour catalogage et publication
AU2007100279A4 (en) Systems and methods of directionally guided, discriminate crawling of internet real estate listings

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020827

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20040901