MXPA97005582A - Methods and / or systems to access information - Google Patents

Methods and / or systems to access information

Info

Publication number
MXPA97005582A
MXPA97005582A MXPA/A/1997/005582A MX9705582A MXPA97005582A MX PA97005582 A MXPA97005582 A MX PA97005582A MX 9705582 A MX9705582 A MX 9705582A MX PA97005582 A MXPA97005582 A MX PA97005582A
Authority
MX
Mexico
Prior art keywords
user
information
keywords
agent
jasper
Prior art date
Application number
MXPA/A/1997/005582A
Other languages
Spanish (es)
Other versions
MX9705582A (en
Inventor
John Davies Nicholas
Weeks Richard
Original Assignee
British Telecommunications Public Limited Company
John Davies Nicholas
Weeks Richard
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/GB1996/000132 external-priority patent/WO1996023265A1/en
Application filed by British Telecommunications Public Limited Company, John Davies Nicholas, Weeks Richard filed Critical British Telecommunications Public Limited Company
Publication of MX9705582A publication Critical patent/MX9705582A/en
Publication of MXPA97005582A publication Critical patent/MXPA97005582A/en

Links

Abstract

A system for accessing information stored in a distributed information database provides communication of intelligent software agents (105). Each agent (105) can be constructed as an extension of a known display (400) for a distributed information system such as the Internet World Wide Web (W3). The agent (105) is effectively integrated with the display (400) and can extract pages by means of the display (400) for storage in a smart page store. The text of the information system is summarized and stored with additional information, optionally selected by the user. The agent-based access system uses sets of keywords to locate information of interest to the user, along with the user's profiles in such a way that the pages that are being stored by a user can be notified to others whose profile indicates interest potential. Keyword sets can be extended by using a thesaur

Description

METHODS AND / OR SYSTEMS TO ACCESS INFORMATION DESCRIPTION The present invention relates to methods and / or systems for accessing information by means of a communication system. The Internet World Network is a known communication system based on a plurality of separate communication networks with inputs together. It provides a rich supply of information from many different providers, but this wealth creates a problem for accessing specific information as there is no central control and supervision. In 1982, the volume of scientific, corporate and technical information doubled every 5 years. For 1988, it doubled every 2.2 years and for 1992 every 1.6 years. With the expansion of the Internet and other networks the rate of increase will continue to increase. The key to the viability of such networks will be the ability to manage information and provide users with the information they want when they want it. According to one embodiment of the present invention, there is provided an access system for accessing information stored in a distributed and accessible form by means of a communication network, the access system comprising a plurality of software agents in such a way that a user can access the information, using the network, through an agent, where each agent is provided as a smart page store, to store summaries, together with associated information, of pieces of information accessible through the network, and Multiple keyword stores for storing sets of keywords such that the agent can identify the information for which an entry is made in the smart page store by applying either or both of the first and second sets of keywords for entries in such a page store. In a useful configuration, the first and second sets of keywords can be associated with different respective users. An agent can then be fired to apply sets of keywords to information pages in (or added to) the page store by different circumstances for different users. For example, an agent may apply a first set of keywords in the course of a storage request from a first user. However, the agent can then apply one or more additional sets of keywords to notify one or more different users of the entry.
Preferably, a group of agents will share a smart page store, although there may be multiple smart page stores in or available to the access system as a whole. This act of sharing a page store provides a way to allow the agent to monitor new page store entries for notification to potentially interested users. The embodiments of the present invention provide a distributed system of intelligent software agents that can be used to carry out information tasks, for example on the World Wide Web (3), on behalf of a user or a community of users. That is, software agents are used to store, retrieve, summarize and inform other agents about information found about 3. Network systems such as 3 are known and are built according to known architectures such as the type of architecture client / server and therefore no additional details are provided herein. The present invention does not refer to the provision of another tool for search systems such as 3: many of these already exist. They are often added with increasing coverage of the Web and sophistication of search engines. Instead, the embodiments of the present invention relate to the following problem: having found useful information in W3, how can it be stored for easy recovery and how can other users also be interested in the information to be identified and informed? Software agents provide a known approach to dealing with distributed computer systems instead of centralized ones. Each agent generally comprises the functionality to perform a task or tasks on behalf of an entity (human or machine-based) in an autonomous way, along with local information, or means to access information, to support the task or tasks. In the present specification, agents for use in storing or retrieving information in the embodiments of the present invention are designated for simplicity as "Jasper agents", whose root comes from the acronym "Joing Access to Stored Pages with Easy Retrieval" (Joint Access to Easy Recovery Stored Pages). Given the vast amount of information available about W3, it is preferable to avoid copying information from its original location to a local server. In fact, it can be argued that such an approach is contrary to the total ethos of the Web. Instead of copying information, therefore, Jasper agents only store relevant "meta-information". As will be seen later, this meta-information can be thought to be at a level above the information itself, being around it instead of being real information. It can include, for example, keywords, a summary, document title, universal resource locator (URL) and the date and time of access. This meta-information is then used to provide a pointer to, or to the "index", the actual information when a recovery request is made. The most well-known W3 clients (Mosaic, Netscape, etc.) provide some means of storage pages of interest to the user. Typically, this is done by allowing the user to create a menu (possibly hierarchical) of names associated with the particular URLs. Although this menu mechanism is useful, it quickly becomes unmanageable when a reasonably large number of W3 pages are involved. Essentially, the representation provided is not rich enough to allow the capture of everything that may be required about the stored information: the user can only provide a string naming the page. In the same way that the fact that useful meta-information such as the access data of the page is lost, an individual phrase (the name) may not be enough to accurately index a page in all contexts. Consider as a simple example information about the use of knowledge-based systems (KBS) in the recovery of information or pharmacological data: in different contexts, there may be any KBS, information retrieval or pharmacology that is of interest. Unless a name is carefully chosen to mention all three aspects, the information will be lost in one or more of its useful contexts. This problem is analogous to the problem of finding files containing desired information in a Unix file system (or other) in the Jones, W.P. document.; "On the applied use of human memory models: the memory extend personal filing system" published in Int J. Man-Machine Studies, 25, 191-228, 1986. In most classification systems, however, there is at least the service of classifying the files by the date of creation. The solution to this problem adopted in the embodiments of the present invention is to allow the user to access information through a much richer set of meta-information. The way in which Jasper agents achieve this and the way in which the resulting meta-information is exploited is explained later. An information access system, according to an embodiment of the present invention, will now be described, by way of example only, with reference to the accompanying figures in which: Figure 1 shows an information access system incorporating a Jasper agent system; Figure 2 shows in schematic format, a storage process offered by the access system; Figure 3 shows the structure of a smart page store for use in the storage process of Figure 1; Figure 4 shows in schematic format the recovery processes offered by the access system; Figure 5 shows a flow chart for the storage process of Figure 2; Figures 6, 7 and 8 show flowcharts for three information retrieval processes using a Jasper access system; and Figure 9 shows a keyword network generated using a clustering technique, for use to extend and / or apply user profiles in a Jasper system. Referring to Figure 1, an information access system according to one embodiment of the present invention can be constructed in a known form of information retrieval architecture, such as an architecture of the client-server type connected to the Internet. In more detail, a customer, such as an international company, may have multiple users equipped with personal computers or work stations 405. These may be connected through a Web World 400 (WWW) viewer in the context of the user's client to the server 410 of the client's WWW file. The Jasper agent 105, effectively an extension of the display 400, can actually be resident on the server 410 of the WWW file. The client's WWW file server 410 is connected to the Internet in the known manner, for example through the customer's own network 415 and a manager 420. The file servers 425 of the service providers can be accessed through the Internet , again via the directors. Also in the resident in, or accessible by, the client file server 410, there is a text summary tool 120 and two data stores, one having the user profiles (the profile store 430) and the other (the store 100 of smart page) having mainly the meta-information for a collection of documents. In a system based on the agent Jasper, the agent The same can be constructed as an extension of a known display such as Netscape. The agent 105 is effectively integrated with the display 400, which can be provided by Netscape or by Mosaic, etc., it can extract pages W3 from the display 400.
As described in the above, in the client-server type architecture, the text summarizer 120 and the user profile, both are based on the profile in the file server 410 of the client where the Jasper agent resides. However, - agent Jasper may appear alternatively in the context of the client's client. A Jasper agent, being a software agent, can be broadly described as a software entity that incorporates functionality to perform a task or tasks on behalf of a user, along with local data, or access local data, to support the task or tasks. The relevant tasks in a Jasper system, one or more of which can be performed by a Jasper agent, are described below. The local data will usually include data from the smart page store 100 and the profile store 420, and the functionality to be provided by a Jasper agent will generally include means to apply a text summarizer tool and store the results, access or read, an updater, at least a user profile, means to compare sets of keywords and other sets of keywords, or meta-information, and means to trigger alert messages to users.
In the preferred embodiments, a Jasper agent will also be provided with means to monitor user input for the purpose of selected set of keywords, to be compared. In a further preferred embodiment, a Jasper agent is provided with means for applying an algorithm in relation to the first and second sets of keywords to generate a measure of similarity between them. According to the measure of similarity, any of the first and second sets of keywords can then be updated proactively by means of agent Jasper, or the result of comparing the first and second sets of keywords with a third set of keywords, or with meta-information, it can be modified. The embodiments of the present invention can be constructed in accordance with different software systems. It may be convenient, for example, to apply object-oriented techniques. However, in the modalities described below, the server will be Unix-based and capable of returning to ConText, a known natural language processor system, offered by Oracle Corporation and a W3 viewer. The system can be implemented generally in "C" although the client can potentially be any machine that can support a W3 viewer.
In the next section, we discuss the mechanisms that Jasper agents offer the user to handle information. These can be grouped into two categories, storage and retrieval.
Storage Figures 2 and 5 show the actions taken when a Jasper agent 105 stores information in an intelligent page store (IPS). The user 110 first finds a page of W3 of sufficient interest to be stored by the Jasper System in an IPS 100 associated with that of the user (STEP 501). The user 110 then transmits a "storage" request to the agent Jasper 105 residing on the server 410 of the client's WWW file, and to a menu option on the client 115 of W3 selected from the user (the Mosaic and Netscape versions are currently available on all platforms) (STAGE 502). Agent 105 Jasper then invites user 110 to provide an associated annotation, which is also to be stored (STEP 503). Typically, this may be the reason why the user is interested in the page and can be very useful for other users when deciding which pages to recover from the IPS 100 to visit. (The sharing of information is discussed further below).
The agent Jasper then extracts the supply text from the page in question, again through the client 115 of W3 on W3 (STEP 504). The supply text is provided in a "HyperText" format and agent 105 Jasper first empties the HiperText language Mar up Language (HTML) tags (STEP 505). Agent 105 Jasper then sends the text to a text digestor such as "ConText" 120 (STEP 506). The ConText 120, first parses a document to determine the syntactic structure of each phrase (STEP 507). The ConText parser is robust and capable of dealing with a wide range of syntactic phenomena that occur in English classes. After the parsing of the sentence level, the ConText 120 enters its "concept processing" phase (STEP 508). Among other mechanisms offered are: • Extraction of Information: a master index of a document content is calculated, indexed on the concepts, facts and definitions in the text. • Content reduction: various summary levels are available, varying from a list of the main topics of the document to a précis of the complete document. Immediate Visualization of the Discourse: when immediately visualizing the discourse of a document, the ConText can extract all the parts of a document that are particularly relevant for a certain concept. The ConText 120 is used by the Jasper agent 105 in a client-server architecture: after analyzing the documents syntactically, the server generates independent marked versions of the application (STEP 509). Calls from the Jasper agent 105 using an Application Programming Interface (API) can then interpret the marks. Using these API calls, meta-information is obtained from the supply text (STAGE 510). Agent 105 Jasper first extracts a summary of the text from the page. The size of the summary can be controlled by the parameters that are passed to the ConText 120 and the agent 105 Jasper ensures that a summary of 100 to 150 words is obtained. Using an additional call to the ConText 120, the Jasper agent 105 then derives into a set of supply text keywords. After this, the user can be optionally presented with the opportunity to add additional keywords through an HTML 125 form (STEP 511). In this way, keywords of particular relevance to the user can be provided, while agent Jasper provides a set of keywords that can be of greater relevance to a wider community of users.
At the end of this process, agent Jasper has generated the following meta-information about the W3 page of interest: • the general keywords provided by the ConText; • the user's specific keywords; • the user's notes; • a summary of the content of the page; • the title of the document; • a universal resource location (URL) and • date and time of storage. Referring further to Figure 3, agent 105 Jasper then adds this meta-information for the page to files 130 of IPS 100 (STEP 512). In IPS 100, keywords (of both types) are then used to index themselves on files that contain meta-information for other pages.
Recovery There are three ways in which information can be retrieved from the IPS 100 using a Jasper agent. One is a normal keyword recovery mechanism, while the other two refer to the information that is shared between a community of agents and their users. Each will be described in the following sections. When a Jasper agent is installed on a user's machine, the user provides a personal profile: a set of keywords that describe the information that the user is interested in obtaining through W3. This profile is preserved, or at least maintained, by agent 105 to determine which pages are potentially of interest to a user.
Keyword Recovery As shown in Figures 4, 6, 7 and 8, for direct keyword retrieval, the user supplies a set of keywords to the Jasper agent 105 through a form 300 of HTML provided by the agent Jasper 105 (STAGE 601). Agent 105 Jasper then retrieves the 10 most similar pages maintained in PIS 100 (STAGE 602), using a simple keyword comparison and classification algorithm. The keywords provided by the user when the page was stored (as opposed to those extracted automatically by the ConText) can be given additional weight in the comparison process. The user can specify in advance a recovery threshold below which the pages will not be displayed. The agent 105 then dynamically constructs a form 305 of HTML with a classified list of links to the recovered pages and their summaries (STEP 603). Any extra annotation made by the original user is also displayed, along with the records of each recovered page. This page is then presented to the user about his W3 client (STAGE 604).
"What is the Novelty?" The mechanism Any user can request the section "What's the news?" Agent Jasper (STAGE 701). The agent 105 then interrogates the IPS 100 and retrieves the pages that have been stored most recently (STAGE 702). Then determine which of these pages best corresponds to the user's profile, again based on a simple keyword comparison and classification algorithm (STAGE 703). An HTML page is then presented to the user showing a classified list of links for the pages that have been recently stored, which correspond better to the user's profile, and also for the other pages that have been stored more recently in the IPS (STAGE 704 ), with annotations where they are provided. Therefore the user is provided with a perspective of both the recently stored pages and probably those of greatest interest to the user, and a more general selection of recently stored pages (STAGE 705). A user can update the profile that his agent Jasper maintains at any time through an HTML form that allows him to add and / or delete keywords from the profile. In this way, the user can effectively select different "contexts" in which to work. A context is defined by a set of keywords (those that constitute the profile, or those specified in a recovery request) and can be taught as those types of information in which the user is interested in a given time. The idea of applying human memory models to the classification of information was explored by Jones in the document referred to in the above, in the context of computer classification systems. As he points out in the context of a conventional classification system, there is an analogy between a directory in a classification system and a set of pages retrieved by a Jasper agent. You can think of the set of pages as a dynamically constructed directory, defined by the context in which it was retrieved. This is a very flexible notion of "directory" in two senses: first, the pages that appear in this recovery can certainly appear in others, depending on the context; and, second, there is no deep limit for the directory: the pages are "in" the directory to a greater or lesser degree depending on their correspondence with the current context. In the present approach, the number of ways to divide the information in the pages is therefore limited only by the diversity and richness of the information itself.
Communication with other agents of interest Referring to Figure 8, when a page is stored in the IPS 100 by a Jasper agent 105 (STAGE 801), the agent 105 checks the profiles of other users of the agents in its "local community" (STEP 802). This local community can be any default community. If the page corresponds to a user profile with a rating above a certain threshold (STEP 803), a message, for example, an "email" message, can be automatically generated by the agent 105 and sent to the user in question (STAGE 804), informing him of the discovery of the page. The e-mail header can, for example, have the format: JASPER KW: (keywords) This allows the user before reading the body of the message to identify it as one of the Jasper system. Preferably, a list of keywords is provided and the user can assess the relative importance of the information to which the message refers. The keywords in the header of the message vary from one user to another depending on the keywords of the page that correspond to the keywords in the user's profile, thus customizing the message to the interests of each user. The body of the message itself can give more information such as the title of the page and the URL, who stored the page and any other annotation on the page that the store provided. The agent Jasper and the system described in the foregoing provide the basis for an extremely useful way of accessing relevant information in a distributed arrangement such as W3. Variations and extensions can be made in a system without departing from the scope of the present invention. For example, at a relatively simple level, improved recovery techniques can be used. As examples, probabilistic or vector space models can be used, as described by G. Saltón in "Automatic Text Processing," published in 1989 by Addison-Wesley in Readin, Massachusetts, USA. Alternatively, indexing can be made more versatile by providing indexing on meta-information different from keywords. For example, the additional meta-information can be the date of the storage of a page and the site of origin of the page (whose Jasper can extract from the URL). These additional indexes allow the user (through an HTML form) to delimit commands of the type: Show me all the pages I stored in Cambridge University in 1994 about artificial intelligence and information retrieval. In another alternative version, a thesaurus can be used by Jasper agents to exploit keyword synonyms. This reduces the importance of entering precisely the same keywords that were used when the page was stored. In fact, it is possible to exploit the use of a thesaurus in many other areas, including the personal profiles that an agent 105 maintains for its user.
Adapting Agents The use of user profiles by Jasper agents to determine the information relevant to its users, although it is powerful, can be improved. When the user wants to change the context (maybe re-focus from one task to another, or from a job to rest), the user's profile must be specified again by adding and / or deleting keywords. A better approach is for the agent to change the user's profile according to the interests of the user, changing over time. This change of context can occur in two ways: there may be a short-term switch of the context from, for example, work to rest. The agent can identify this from a list of current contexts that it maintains for a user and change to the new context. This change can be triggered, for example, when the user visits a new page of different information. There may also be longer term changes in the contexts maintained by the agent based on the evolving interests of the user. These changes can be inferred from the user's observation by the agent. For example, known techniques that can be employed in an adapter agent include genetic algorithms, feedback learning and memory-based reasoning. Such techniques are described in an internal MIT report available in 1993, by Sheth B. & Maes P., called "Evolutionary Agents for Filtering Personal Information".
Remote Local Information Integration Another possible variation of a Jasper system is to integrate the classification system of the user's own computer with the IPS 100, in such a way that the information found in W3 and the local machine appear homogeneously to the user at the top level. The files can then be accessed in a manner similar to the way that Jasper agents access the W3 pages, freeing the user of the problems of the classification systems oriented by the name and providing a blinking interface of both local information content as remote of all types.
Grouping in Jasper Systems The IPS 100 Jasper and related documents can essentially be called a collection; is a set of documents indexed by keywords. It differs from a "traditional" collection because documents are typically located remotely from the index; the index (the IPS 100) actually points to a URL that specifies the location of the document on the Internet. In addition, several additional pieces of meta-information are attached to the documents in a Jasper system, such as the user who stored the page, when it was stored, any annotation that the user has provided, etc. An important area where a Jasper system differs from most document collections is that each document has been entered into IPS 100 by a user who made a conscious decision to mark it as a piece of information that he believes may be useful in the future. This, together with the meta-information maintained, makes a IPS 100 Jasper a very rich source of information. It has also been examined whether the known Information Retrieval (IR) techniques can be applied beneficially to the IPS 100 Jasper. In particular, the use of the grouping has been under investigation.
Grouping of Documents By using known IR techniques, the matrix of the Jasper term document can be used to calculate a similarity matrix for the documents identified in the IPS 100 Jasper. The similarity matrix gives a measure of the similarity of documents identified in the warehouse. Dice is calculated for each pair of documents. For two documents Di and Dj. 2 * [DinDj] / [Di] + [Dj] where [X] is the number of terms in X and XnY is the number of terms that co-occur in X and Y. This coefficient gives a number between 0 and 1. A coefficient of zero implies that two documents have no terms in common, whereas a coefficient of 1 implies that the sets of terms that occur in each document are identical. The similarity matrix, say Sim, represents the similarity of each pair of documents in the warehouse, in such a way that for each pair of documents i and j.
Sim (i, j) = 2 * [DinDj] / [Di] + [Dj] This matrix can be used to create groups of automatically related documents, using the grouping hierarchical grouping process described in "Hierarchic Agglomerative Clustering Methods for Automatic Document Classification" by Griffiths A et al in the Documentation Journal, 40: 3, September 1984, pp 175-205. In such a process, each document is initially placed in a group by itself and the two most similar of such groups are then combined into a larger group, for which the similarities with each of the groups must be computed. This merging process continues until there is only one document group at the highest level. The way in which similarity between groups (as opposed to individual documents) is calculated can be varied. For Jasper storage, "full-link clustering" can be used. In the grouping by complete link, the similarity between the least similar pair of documents of the two groups is used as the similarity of the group. The resulting group structures of the Jasper storage can then be used to create a three-dimensional (3D) front end on the Jasper system using the VRML (Virtual Reality Modeling Language, Virtual Reality Modeling Language). (VRML is a well-known language for three-dimensional graphic spaces or virtual worlds in network through global Internet hyperlinked within the World Web).
Clue Code Words The keywords (terms) that appear with respect to a particular JASPER document collection can also be grouped in a way that accurately reflects the document grouping technique described in the above: a similarity matrix for the keywords in the Jasper store You can build what gives a measure of the similarity of the keywords in the store. For each pair of documents, the Coefficient says is calculated. For two key words Ki and Kj, the Coefficient says is given by: 2 * [KinKj] / [Ki] + [Kj] where [X] is the number of documents in which X occurs and XnY is the number of documents in which X and Y co-occur. Once the similarity matrix for a warehouse Jasper is calculated, however, it is not necessary to group the keywords as the documents were grouped. Instead of this it is possible to exploit the same matrix in two ways, which are described below. The first way is an improvement of the profile. Here, the user's profile can be improved by using those keywords more similar to the keywords in the user's profile. Therefore, for example, if the words virtual, reality and Internet are part of a user profile, but, VRML does not, an enhanced profile can add the VRML to the original profile (assuming that the VRML is grouped near virtual, reality and Internet) . In this way, the documents that contain the VRML, but not virtual, reality and Internet can be recovered while they have not been with the unimproved profile. Figure 9 shows a sample network of keywords 900 that has been constructed from the keyword similarity matrix extracted from a current Jasper store. The algorithm is direct: given an initial starting keyword, find four words very similar to it from the similarity matrix. Join these four to the original word and repeat the process for each of the new four words. This can be repeated a number of times (in Figure 9, three times). The double 901 lines between the two words indicate that both words occur in the four most similar keywords of the others. Of course, the particular similarity coefficients can be set to each link for finer information regarding the degree of similarity between the words. The second way is a proactive search. The keywords that comprise a user profile can be used to search for new relevant WWW pages of interest proactively through Jasper, which can then present a list of new pages in which the user can be interested without the user having to make an explicit search. These proactive searches can be done through a Jasper system in a given interval, such as weekly. Clustering is useful here because a profile may reflect more than one interest. Consider, for example, the following user profile: Internet, WWW, html, football, Manchester, United, linguistics, syntactic analysis, pragmatics. Clearly, three separate interests are represented in the previous profile and the search of each one separately is very possible that of much higher results than entering only the profile as a whole, as a demand for the given user. Keyword grouping of the document collection can automate the demand generation process for proactive search using a user's Jasper agent. When the search results are obtained using Jasper, they can be summarized and matched against the user's profile in the usual way to give a priority list of the new URLs along with the locally saved summaries. Other text summarizers can be used instead of ConText. For example, NetSumm is a summary tool available by British Telecommunications standing on the Internet, at http: // www. labs .bt. com / innovate / informat / netsumm / index.htm. Although it is described in relation to location information via Internet, the embodiments of the present invention can be found useful for locating information about other systems, such as documents about the user's internal systems that are in HiperText. In addition to the inventive aspects of the present system described in the introduction to this specification, the following should be considered as expressions of novel and advantageous aspects of the system: A method for monitoring information entries to data storage, the entries being requested by any of a plurality of users, for the purpose of alerting a first user to an entry by a second user in accordance with the alert criteria determined by less in part by the first user, the method comprises: i) storing a user profile for each user, whose profile comprises at least a set of keywords and an identifier for the user; ii) detecting a request by the second user for an information entry to the data store; iii) process the request to generate the information input; iv) compare the input of information with a keyword set of the user profile for the first user; and v) in the case of a positive result of the comparison, transmit an alert message addressed to the first user. A method like the previous one, which further comprises monitoring the information entry requests by respective users and, upon detecting a significant change in the information entry requests made by a particular user, changing the keyword set used in step iv ) for that particular user in the case of a request for input from a different user. A method like the previous one wherein each information entry includes at least a set of keywords associated with a respective document and wherein the method further comprises the steps of generating a similarity matrix for at least two such word sets key, and use the similarity matrix to extend the scope of a set of keywords from a user profile in step iv) to obtain an increase in the number of positive results for the associated user. A method according to the above further comprising the step of applying a grouping algorithm to a set of keywords from a user profile to divide the set of keywords into subsets of keywords and apply at least one of the keywords. subsets of keywords instead of the whole keyword set in step iv).

Claims (12)

1. An information access system, for accessing information stored in a distributed and accessible form by means of a communications network, the access system characterized in that it comprises at least one software agent for use to access information through the network, wherein the agent is provided with, or is provided with access to, data storage, to store meta-information associated with the pieces of information accessible through the network, and to store at least a set of keywords, the agent being triggered, at the entrance of the meta-information in the data storage, to compare at least a set of keywords to the meta-information and to transmit a warning message in the case of a positive result.
2. The system, according to claim 1, characterized in that at least one set of keywords is associated with a specific user and the system comprises means for directing the alert message to that user.
3. The system, according to any of the preceding claims, for use by a plurality of users, each of the plurality having at least one associated set of keywords, characterized in that the system has means to respond to a user request for enter meta-information in the data storage, at least the set of keywords being associated with a different user to the user that makes the request, in such a way that the system responds to the meta-information entry by the first user by directing a warning message to a second user in the case of a positive correspondence with the set of keywords of the second user.
4. The system, according to any of the preceding claims, characterized in that the agent is provided with a thesaurus of synonyms for key words of the sets to increase the number of positive correspondences with the set of keywords.
5. The system, according to any of the preceding claims, characterized in that the agent is provided with means to monitor a user's entries, to detect a change in those entries and to modify or replace a set of keywords associated with that user to the detection of a change.
6. The system, according to any of claims 1 to 4, characterized in that the system is provided with means for changing a set of keywords associated with a user in response to a request from that user.
7. The system, according to any of the preceding claims, characterized in that the system is provided with means for storing at least one data grouping algorithm and for applying the algorithm to one or more sets of keywords to modify the set or sets of keywords before comparison with meta-information.
8. The system, according to any of the preceding claims, characterized in that it comprises multiple agents, the agents being located for different respective users of the system.
9. A method for monitoring information entries to a data store, the entries being requested by any of a plurality of users, for the purpose of alerting a first user to an entry by a second user in accordance with the alert criteria determined in at least part of the first user, the method is characterized in that it comprises: i) storing a user profile for each user, whose profile comprises at least a set of keywords and an identifier for the user; ii) detecting a request by the second user for an entry of information to the data store; iii) process the request to generate the information input; iv) compare the entry of information with a set of keywords of the user profile for the first user; and v) in the case of a positive result of the comparison, transmit an alert message addressed to the first user.
The method, according to claim 9, characterized in that it further comprises monitoring the requests for information input by the respective users and, upon detection of a significant change in the requests for information input made by a particular user. , changing the set of keywords used in step iv) for that particular user in the case of a request for input of information by a different user.
11. The method according to any of claims 9 or 10, characterized in that each information entry includes at least a set of keywords associated with a respective document, and wherein the method further comprises the steps of generating a similarity matrix for at least two such sets of keywords, and use the similarity matrix to extend the scope of a set of keywords from a user profile in stage iv) to obtain an increase in the number of positive results for the associated user.
12. The method according to any of claims 9 or 10, further characterized in that it comprises the step of applying a grouping algorithm to a set of keywords from a user profile to divide the set of keywords into subsets of keywords and apply at least one of the subsets of keywords instead of the whole keyword set in stage iv).
MXPA/A/1997/005582A 1995-01-23 1997-07-23 Methods and / or systems to access information MXPA97005582A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP95300420 1995-01-23
EP95300420.7 1995-01-23
PCT/GB1996/000132 WO1996023265A1 (en) 1995-01-23 1996-01-23 Methods and/or systems for accessing information

Publications (2)

Publication Number Publication Date
MX9705582A MX9705582A (en) 1997-11-29
MXPA97005582A true MXPA97005582A (en) 1998-07-03

Family

ID=

Similar Documents

Publication Publication Date Title
US6289337B1 (en) Method and system for accessing information using keyword clustering and meta-information
US5931907A (en) Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information
JP4274689B2 (en) Method and system for selecting data sets
CA2281645C (en) System and method for semiotically processing text
US5920859A (en) Hypertext document retrieval system and method
US7873648B2 (en) System and method for personalized presentation of web pages
EP0958541B1 (en) Intelligent network browser using incremental conceptual indexer
US20070176947A1 (en) Information organization and navigation by user-generated associative overlays
WO2000067159A2 (en) System and method for searching and recommending documents in a collection using shared bookmarks
Davies et al. Jasper: communicating information agents for WWW
US20040015485A1 (en) Method and apparatus for improved internet searching
Kim et al. Personalized search results with user interest hierarchies learnt from bookmarks
Cerulo et al. A taxonomy of information retrieval models and tools
JP2000508450A (en) How to organize information retrieved from the Internet using knowledge-based representations
Fowler et al. Information retrieval using pathfinder networks
Davies et al. Information agents for the world wide web
MXPA97005582A (en) Methods and / or systems to access information
Ricarte et al. A Reference Software Model for Intelligent Information Search
Chen Introduction to the JASIST special topic section on Web retrieval and mining: A Machine Learning Perspective
Sugiyama Studies on Improving Retrieval Accuracy in Web Information Retrieval
Lazarou et al. Agents for hypermedia information discovery
Abuzir et al. E-newspaper classification and distribution based on user profiles and thesaurus
Ramirez Case-based reasoning applied to information retrieval
Menon Web crawler indexing: An approach by clustering
Eskicioğlu A Search Engine for Turkish with Stemming