GB2335761A - Information search using user profile - Google Patents

Information search using user profile Download PDF

Info

Publication number
GB2335761A
GB2335761A GB9806392A GB9806392A GB2335761A GB 2335761 A GB2335761 A GB 2335761A GB 9806392 A GB9806392 A GB 9806392A GB 9806392 A GB9806392 A GB 9806392A GB 2335761 A GB2335761 A GB 2335761A
Authority
GB
United Kingdom
Prior art keywords
user
profile
search
document
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB9806392A
Other versions
GB2335761B (en
GB9806392D0 (en
Inventor
Michael Weiss
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsemi Semiconductor ULC
Original Assignee
Mitel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitel Corp filed Critical Mitel Corp
Priority to GB9806392A priority Critical patent/GB2335761B/en
Publication of GB9806392D0 publication Critical patent/GB9806392D0/en
Priority to CA002265292A priority patent/CA2265292C/en
Priority to DE19913509A priority patent/DE19913509A1/en
Publication of GB2335761A publication Critical patent/GB2335761A/en
Application granted granted Critical
Publication of GB2335761B publication Critical patent/GB2335761B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Abstract

Electronic searching to identify relevant documents uses background information about each user in a user profile. User profiles are learned from a training set of pages or by observing user behavior. Social filtering between collaborating agents may be used to track user behaviour. The invention increases the number of search results which are likely to be perceived by the user as highly relevant because of a "peer effect". The present invention allows the use of best-first search of the Web which performs better than depth-first or breadth-first search.

Description

2335761 AGENT-BASED WEB SEARCH ENGINE
FIELD OF THE INVENTION
The present invention relates generally to the field of information search and retrieval, and more particularly, to an agent-based Web search engine that improves keyword-based searches by utilizing a user profile to further restrict the search to documents of interest to the user. The present invention utilizes a network of collaborating agents that track and use information from the browsing behavior of other 10 similar users.
BACKGROUND OF THE INVENTION
Previous approaches to finding information on the World Wide Web (WWW) have included automated searching programs (search engines) such as WAIS or Web crawler (M. Koster, World Wide Web Wanderers, Spiders and Robots; http:/1 web. nexor.co.uk/mak/doc/robots/robots.huW) to locate web sites and information of interest to the user. These automated search engines suffer from the problem of returning too many search results, frequently by including documents of marginal or low relevance, reducing the usefulness of the search to the user. These typical approaches fail to consider any measure of the user interest in conducting the search. These prior art approaches use a measure of interest based only on the search keywords entered by the user. As a consequence, these search approaches return all documents which contain the search terms, including documents which are in subject areas unrelated to the user's area of interest. However, frequently background information is available which can be applied to the search. These prior art approaches do not use this valuable background information to eliminate documents which are not relevant to the user. User satisfaction with these prior art search engines is therefore low.
Lieberman in "Letizia: An Agent that Assists in Web Browsing", International Joint Conference on Artificial Intelligence, 1995, describes an approach that uses a single agent to assist the user browsing the World Wide Web. In the Lieberman approach, the agent tracks user behavior and attempts to anticipate documents of 2 interest by autonomously exploring links from the user's current position. This approach infers search goals from the user's browsing behavior and makes unsolicited recommendations of "interesting" documents. One of the drawbacks of this prior art approach is that it focuses on the behaviour of the individual user without considering other information which can be gleaned from community interest in the documents and information.
It is a reasonable assumption that documents and information of interest to one person in a community would likely be of interest to another. Valuable information obtained from the browsing behaviour of other users can be used to focus the search.
Therefore, an approach which tracks and uses information from the browsing behavior of other users "similar" to the user can be used to facilitate the search process to ascertain relevant documents.
Furthermore, a search mechanism which is augmented by utilizing a network of collaborating agents to track browsing behavior and guide searches would improve the effectiveness of the search.
SUAIY OF THE EWENTION According to one aspect of the present invention, there is provided a method for searching that uses social filtering between collaborating agents to track user behavior and to guide the search for documents and information stored in electronic form.
The present invention records background information about each user in a user profile. User profiles are learned from a training set of pages or by observing user behavior.
The method of the present invention increases the number of search results which are likely to be perceived by the user as highly relevant because of a "peer effect". The present invention allows the use of best-first search of the Web which performs better than the depth-first or breadth-first search used in past approaches. The approach according to the present invention is also highly scalable, since each agent manages only a small subset of users and documents, as compared to other known social filtering approaches (e.g. Maes, P. "Agents that Reduce Work and Information Overload", Communications of the ACM, July, 1994).
3 According to another aspect of the present invention, there is provided a method for searching and identifying relevant documents from an electronic search comprising the steps of:
(a) (b) (c) (d) (e) (f) (g) (h) attaching a user profile to a user desiring to conduct a search; attaching a document profile to each searchable document; obtaining search parameters; attaching the user profile to the search; initiating the search with search parameters; searching the searchable documents to identify candidate documents: comparing the user profile to the document profile corresponding to the candidate document to determine a successful or unsuccessful match; returning the candidate document if successful match.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a block diagram depicting an overview of a networked system implementing the search mechanism of the present invention.
Figure 2 is a block diagram of a search tree for searching web pages illustrating the use of the present invention.
Figure 3 is a block diagram depicting the present invention implemented in Java.
Figure 4 is a flowchart diagram illustrating the executing of the search commands of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODMENT
According to the present invention, there is provided an agent based web search engine where agents are used to assist users and communicate information to facilitate efficient searches. The data that is communicated among agents pertains to user profiles about individual users on the web as well as web page profiles regarding the particulars of various web pages.
The present invention tracks and uses information gathered on the browsing behavior of users that are "similar" to the user searching for information, utilizing a 4 process known as social filtering. The concept of social filtering has been described by Macs (referred to herein above), and by Laslikari, Y. et al. "Collaborative Interface Agents", Conference of the American Association of Artificial Intelligence, 1994.
Social filtering, in its basic unmodified form, does not base its correlations on the content of information but rather on correlations solely among the users or viewers of such information. Social filtering uses information about a user's social environment as a guide to locating relevant documents and information.
In the implementation of social filtering, information about the interests of individual users is gathered. Then, during the search phase information is filtered for relevance by exchanging data about the users who have expressed an interest in the information. Social filtering has, for example, been successfully applied to make recommendations about music records.
Turning to Figure 1, a networked system implementing the present invention is shown. Web server 110 is connected to local area network 104. Local area network 104 is in turn connected to the World Wide Web (WWW) 102. Likewise, web server 112 is connected to local area network 106, which is connected to World Wide Web 102. Web servers 110 and 112 are standard Internet or Intranet computing machines, as are well known in the art, that are capable of displaying web pages of hypertext markup language (HTML) format. HTML is a well known markup system used to create hypertext documents that are portable from platform to platform. To accomplish the communication tasks required of the present invention, web servers 110 and 112 use standard network communication schemes and the Internet Hypertext Transfer Protocol (HTTP) which allows the transfer of information from client to server. While the invention hereinafter will be described with respect to HTML web pages using I-ITTP, it is within the scope of this invention that other document formats and protocols could be used.
HTML web pages are stored in computer memory 114 of web server 110 and are made accessible to local users 118 and 120 as well as remote users 122 and 124 through the World Wide Web 102. Likewise, other web pages are stored in computer memory 116 of web server 112 which are available to local users 118 and 120 as well as remote users 122 and 124 through World Wide Web 102. Local users 118 and 120 and remote users 122 and 124 use a standard web browser, such as NetScape' from NetScape Communication Corporation or Microsoft Internet Explorer' from Microsoft Corporation, which can read the HTML coded web pages. Each of users 118, 120, 122 and 124, as well as each web page in memory 114 and 116 have an accessible address or universal resource locator (URL) which can be used by users, agents or other 5 network devices for locating and accessing user or web page information.
In the preferred embodiment, agents are used to perform tasks on the user's behalf, train or teach the user, help different users collaborate, and monitor events and procedures. In particular, various user and web page profiles are managed by agents. In addition, searches are facilitated by the communication and collaboration of agents.
In the preferred embodiment, agents employ social filtering relying upon correlations drawn between different users in identifying relevant documents and information.
As shown in Figure 1, agent 126 manages a portfolio of user profiles for local users 118 and 120 on web server 110. Agent 126 also manages a portfolio web page profiles for each web page stored in memory 114 of web server 110. Agent 12 8 manages a portfolio web page profiles for each web page stored in memory 116 of web server 112. In addition, agent 130 manages the user profile for user 122, while agent 132 manages the user profile for user 124. Particulars of the profiles and the information managed and communicated by agents 126, 128, 130 and 132 is described in further detail below.
A user profile, such as managed by agent 126, 128, 130 and 132 is a description of background information about a particular user. It is created by identifying and listing specific features or areas of interest to the user. One method of representing this profile is as a vector where each element represents the user's interest in a particular feature. Therefore, in the general case, the Boolean feature vector {u,,... uJ, each Boolean value Uk indicates whether the particular feature k is of interest to a user. For example, features for comparison could be defined as: (a) cars; (b) sports; (c) cooking. If a user is interested in cars and sports, but not interested in cooking, their profile would be represented as the feature vector {l, 1, 0}, where the 1 represents true and 0 represents false. This profile could then be used to correlate with profiles of user's with similar interests.
A web page profile is merely a list of the profiles of users who have visited that web page document. It may be implemented as a list of the actual user profiles, or 6 merely a pointer to the agent managing each user's profile. For example, in one embodiment, a web profile can be set up by an agent as a table of {feature, user id, interest} tuples.
An example of such is set out in Table 1 below, such as:
User id Feature a, a2 7-a3 cars 1 0 0 sports 1 1 1 cooking 0 1 1 Java 0 0 0 Internet 1 0 1 TABLE 1
The web page profile is updated as a new user visits that web page, in a similar 10 manner to which a web page hit counter is updated.
Before a search can be conducted, it is necessary to create user profiles. While user profiles could be manually generated, the preferred approach is to generate user profiles during a learning operation. During the learning operation, users will be presented with a set of web pages or questions and asked to indicate their interest by responding with either yes or no. Training pages are prepared such that they are associated with a set of features that will be included in the user profile. Therefore, given the set of positive examples of concepts (or web pages) that the user is interested in, and the set of negative examples (or web pages) the user is not interested in, the set of features that distinguish pages rated as interesting from those that are not can be learned through an information theoretic approach. The use of an information theoretic approach has been disclosed by Lang K. "Newsweeder: Learning to filter News." Twelfth International Conference on Learning, 1995 and M. Passani et al., " Syskell & Webbert: Identifying Web Sites", AAAI, 97; http:// www.ics.uci.edu/pazzani/RTF/AAAI.html, 1997. Training pages that are highly matched with a user's interest may subsequently be suggested to the user as starting points for exploration of the web. This provides the additional advantage of directing 7 the user to a web server that supports the agent based web server engine.
The present invention is better illustrated by way of an example. Turning to Figure 2, the search tree depicting the typical search actions of users of Figure 1 is shown in greater detail. The search is represented by numerous nodes 202, 204, 206, 208, 210, 212, 214, 216 and 218 Each of the nodes in the search tree corresponds to a Web page which may be on a different web server. The initial node 202 is web page currently received by a user such as described in Figure 1. Branches of the tree represent hypertext links between the web pages. Each web page is associated with a web page profile that is used to track which users have visited that particular web page.
When a user visits a web page 204 this is a good indication that it is of interest to that user.
A further, better measure of interest could be employed wherein the actual amount of time spent by a user reading a web page is recorded. This may be implemented by a profiler downloaded when a user first accesses the instrumented pages, as an invisible applet written in a suitable language, such as Java, that collects information on a user's web page traversals and captures usage access patterns including time spent on a particular page. An example of such a profiler has been described by C. Shahabi and V. Shah "Java based profiler for Capturing User Access Patterns" http://imsc.usc.edu/profiler.htmi, 1997. In this manner, the time spent viewing a web page can be reported back to the agent at the web server by the applet. If the viewing time is in a certain range, the page is considered interesting to the user. The range is bounded by a lower threshold, set and adjustable by the user agent, below which the page is considered not interesting and optionally corresponding upper threshold, beyond which the user is considered to have abandoned the web page, rather than a page of high interest. The particular search illustrated is started at the users' initial node 202. The user formulates a search by providing a keyword 222. The user profile 224 describing the user's background is implicitly added to the search specification. The user also has the option to leave the keyword unspecified. If this is done, the Web is searched for every document that matches the user's profile 224. Each node 202 - 208 has a page profile, however, only page profile 220 is illustrated in Figure 2.
As the search evolves, each of nodes 204 - 208 is tested on whether it includes the specified keyword. If it does, the correlation between the user profile 224 and the 8 page profile for the node is computed and compared against a threshold. A high-level formal description of the process by which a node is tested for match is described below.
With the following definitions:
U user profile, u = {u,, IiiQ indicates whether feature i is of interest to the user m matching vector, m = {m mm} page profile, A AMI threshold number of features number of agents or user profiles in the page profile The correlation between u and A computes to: u A = m A match occurs if: m 1 > = theta Each column of A in this description corresponds to the profile of a user who has visited that page. Rows are added to A as a user's visit is tracked by the agent in charge of that page. Alternatively, a link to the user's web page is provided. Any obvious optimizations apply as to how these profiles are actually represented which are obvious to one skilled in the art. There are several standard ways to compute this correlation. One standard way to measure the "similarity" between two features vectors ui A theta N m to compute their normalized vector product, that is, use a cosine similarity measure. For Boolean feature vector, this measure obtains values between 0 and 1, where values 25 close to 0 indicate low, and values close to 1 high degrees of similarity, respectively. The similarity function is:
siin(u,v) = cosine(u, v) where u and v are feature vectors, and cosine (u, v) is their normalized vector product:
cosine (u,v) = (uv) / (1 u 1 v 1) Using this similarity function to calculate the similarity between bectors {0, 1, 1, 0, 1} and 1, 1, 0, 0, 11, the function is:
9 sim Q0, 1, 1, 0, 1}, {1, 1, 0, 0, 1}) = 2/(sqrt(3) sqrt(3)) =.67 Therefore, using the definitions above where u is the user profile of the new visitor, and ak is the k-th column in the web page profile A, the average similarity of the user profile u and the page profile gives a measure of the correlation between the user profile u and the page profile A.
The correlation can be expressed by the fom-iula:
k-1 correlation (u, 1 sim (u, g,) / M M This is further illustrated using the web page profile in Table 2 below.
In Table 2 below, a new user with user profile {O, 1, 1, 0, 1} visits the page:
User ld Feature a, a2 a3 U cars 1 0 0 0 sports 1 1 1 1 cooking 0 1 1 1 Java 0 0 0 0 Internet 1 0 1 1 TABLE 2
The correlation is calculated by comparing the new user profile with each column of the web page profile using the similarity function across the web page profile.
sini (u, a) =.67 sirn (u, a) = ' 8 1 sim (u, a) = 1.00 correlation (u, A) = (.67+.81+1.00)13 =.83 A match between a user and a page is determined by comparing the correlation against a threshold which may be pre-defined or optionally set by the user agent. For example, if the threshold theta =.80, the user profile would be considered a match and included in the page profile, because.83 >.80.
If the web page profile is particularly large, it may be necessary to employ some optimization techniques to speed the calculation. Many optimizations are possible, for example, summing over a random sample of columns of the page profile, rather than all.
The results of the match can be presented in several known ways, for example, through colour-coding the links or annotating the link with a numerical rating a confidence level, number or percentage in the recommendation (such as described, for example, Hill et al., "Recommending and Evaluating in a Virtual Community of Users).
The search continues for each of the pages that meet the search criteria and has not been eliminated by testing for the threshold. The list of relevant pages is refined until they do not match the criteria, they contain no further links to other pages, or reach some pre-defined limit of number of results. Circular links can be handled by testing against a list of already successfully matched pages before a match is attempted.
The web pages which meet or exceed the threshold are presented to the user. In this manner, web pages, documents and information which meets both the search criteria of the user, as well as correlate with interests of similar users are delivered.
Turning to Figure 3, a preferred embodiment of the present invention, implemented in Java, is shown. The User Agent 302 operates as an applet stored on W-WW Client 304. Profile Agent 1 (306) is connected to the same local area network 300 as the user Agent 302. Profile Agent 1 (302) is implemented as a Java application on WWW Server 308 and manages Page Profile database 310. Profile Agent 2 (312) and Profile Agent 3 (318) reside on remote V Servers 314 and 320 (www.soccer.com and www.cars.com), respectively. Both W-WW Servers 314 and 320 are implemented supporting Java applications. WWW Client 304, Profile Agent 1 (302), Profile Agent 2 (306) and Profile Agent 3 (318) are provided with connections to the World Wide Web 324. V Client 304 is a standard Web browser such as Netscape Communicator or Microsoft Explorer and WWW Servers 306, 314 and 320 are standard Web servers such as Netscape FastTrack or Apache server.
11 Profile Agents 306, 312, and 318 implement the 1ITTP protocol and behave to V Client 304 just as a typical WWW server would. Each Profile Agent 306, 312, and 318 implements the following three commands: search, load, and ask. The search command is initiated by a user to conduct a search. The load command is used to further refine a search. The ask command is used by profile agent 306, 312 and 318 to inquire about interest levels for linked pages. These commands have the following format:
search= uid: profile: keywords load= uid: profile: keywords ask=uid:profile Each user is assigned a unique user id (uid) when the user is set-up with user agent 302. Each command also includes the profile of the user initiating the command.
The keywords which are part of the search and load commands are the words, separated by commas, that the user wishes to search. It is also possible to pass other information with each command, such as a threshold for assessing the relevance of a page as set by the user, as a further embodiment of the invention.
Turning to Figure 4, the processing of the commands is described in further detail.
The User Agent 402 provides the mechanism for the user to define his user profile, as previously described with respect to Figure 1, or through the use of a separate combination box for selecting features.
User Agent 402 issues search command 404 on the initial screen 403 off which the user starts his search. The user Agent 402 is embedded into the Web page for the initial screen 403 as a Java applet. This applet displays a typical search form in HTML format with a field for entering keywords 406 and button 408 to set off the search. The applet is downloaded when the initial screen 403 web page is retrieved from the WWW server from which the user starts his search. If the applet code is stored as "SearchForm. class" on the V Server, the initial screen 403 HTML web page would have the following statement:
12 applet code = SearchForin width = 200 height = 50 > /applet > The User Agent 402 on the initial screen 403 retrieves the user id and profile 410 from the WWW Client. The preferred embodiment utlizes the presence of a "cookie" mechanism by which the web server connections (such as applets or CGI scripts) can both store and retrieve information on the client side of the connection.
Such a mechanism is implemented by the major browsers, for example, Netscape Communicator as described in the preliminary specification of the cookie mechanism in "Persistent Client State HM Cookies", http://www.netscape.com/newsref/std/cookie_spec.html.
Cookies are stored as namelvalue pairs in a designated file on the W-WW Client.
Each cookie is associated with a path and an optional expiry date. When the Java applet requests a cookie from the WVW Client, the path component of the applet's document base is compared to the path attribute, and if there is a match, the cookie is visible to the applet. Commercial browsers such as Netscape provide a Java class library for accessing cookies from within an applet.
The cookie mechanism is applied as follows. When a first time user (which can be indicated by passing a special uid with the search command) submits a search through his User agent 402, the Profile Agent 412 at the server side generates a unique user id and passes it back to the User Agent 402. The User Agent 402 then creates a cookie on the client side of the connection that contains this user id and profile 410. For example, the following cookie represents a user with user id "1234" and user profile "0 110 1 " (which corresponds to the feature vector {0, 1, 1, 0, 1 using a straightforward encoding):
uid = 1234; profile = 0 110 1; path = / When the User Agent 402 subsequently issues a search command 404 to the Profile Agent 412 that created the cookie, it retrieves the user id and profile 410 from the cookie and sends it to the Profile Agent 412. As discussed above, a complete search 13 command contains the user id, the user profile and the list of keywords. For example, the following is a search command to search the Web on behalf of a user with id " 1234 " and profile " 0 110 1 " for the keyword "worldcup":
search = 1234: 0110 1: worldcup Each WWW Server is set up having an index page which contains initial links from which to begin a search. This could be a database of keywords and associated web pages containing those key words. This database could be derived from a typical search engine or robot. For example, if the user connects to a V Server to start his search, the index page might contain the following HTML initial links:
< a href = http://www. soccer. comlindex. htnil > Soccer < /a > <a href=http://www.cars.com/marketplace.huni >Cars for sale </a> < a href = http: //intranet/horne. html > ACM < /a > On receiving a search command, the Profile Agent 412 retrieves the index page from its WWW server, and issues an ask command 414 to one or more other Profile Agents 416 for recommendations on each document linked from the index page. With each ask command the user id and profile as provided. For example:
ask= 1234:01101 The Profile Agent 416 for the linked page replies with the level of interest calculated in association with other profile agents 416 using a correlation function such as previously described for the given profile. Pages whose level of interest is above a certain pre-defined threshold, or a threshold optionally set up the user agent 402, are then downloaded and filtered against the optional list of keywords. In a simple filtering scheme, pages that do not include the keywords would be removed from the list of recommended links. The Profile Agent 412 then modifies the links in the original page by encoding and including how interesting each of them would be to the user and sends the modified link page back to the User Agent 402. Each link is annotated, for 14 example, through color coding or a numerical indication of the degree of confidence that the page is relevant to the user.
For the above example, the Profile Agent 412 might have annotated the links as follows:
a href=http://www.soccer.com/index.html?load= 1234:01 1OLworldcup > < font color = WFOW > Soccer < /a > a href = http: U. cars. com/marketplace. htnil > Cars for sale< /a> a href =http://intranet/home.htnil?load= 1234:01 1OLworldcup > < font color = WFOO0 " > Intranet homepage < /a > Here a color coding scheme with one threshold is used. Any link that was recommended with a correlation at or above threshold is encoded in red (corresponding to the color code FFOOO in RGB format). Each color coded link also contains an embedded load command which contains the user id and profile 410 to be passed to the Profile Agent 412 for that page. The load command is separated from the actual link using a "?", which is a CGI (Common Gateway Interface) convention.
The search command is only invoked once. Subsequent refinements of the search are performed via load commands. For example:
load = 1234: 0110 1: worldcup The processing of the load and search commands are generally the same, except for two aspects, which warrant the separation into two commands.
First, in the case of the load command, the host of the Profile Agent to receive indicated by the command is the host part of the URL which contains the load command. For example, if the user selects link http: //intranet/horne. html? load = 1234: 0110 1: worldcup in the page retuned by Profile Agent 412, the following information will be sent (via the WWW client) in the load command 419 to the Profile Agent 420 on the host "intranet": is home.html?load = 1234:01 101:worldcup The Profile Agent 420 then
extracts from this a local path (home.html) and the actual load command (load = 1234:01101:worldcup).
Second, the page returned by the Profile Agent 420 in reply to a load command 419 contains a Java applet 422 that monitors the time that the user spends reading the page, which the Profile Agent 420 then uses as an indication of interestingness.
Further information on how recommendations are solicited from other Profile Agents and how the level of interest displayed by a user in a particular page is measured is described below.
On receiving a search command 404 or load command 419, a Profile Agent 412 or 420, as the case may be, first retrieves the appropriate page from the V Server on the same site. This is either the index page or the page at the path which was passed together with the search or load command. The Profile Agent 412 or 420, as the case may be, then extracts links to other pages from the document. For each link the Profile Agent 412 or 420 establishes a socket connection to the remote Profile Agent 416 or 426 using the URL for that link. This is not necessary if the page is already on the local V Server and monitored by the same Profile Agent. The Profile Agent 412 or 420 then sends an ask command 414 or 424 to the Profile Agent or Agents 416 or 426 for the linked page and waits for it to reply with the interest level (correlation) for that page. The Profile Agent 416 or 426 for the linked page computes the correlation between the specified user profile 410 and the profiles stored for that page in the Page Profile database and returns it to the Profile Agent 412 or 420 (as the case may be) as the interest level. The socket is then deestablished, if necessary.
In the example of Figure 3, the User Agent 302 would first send a search command with uid 1234, profile 01101, and keywords "worldcup" to Profile Agent 306, which would then load the page intranet/index.htn-d for the WWW Server 308.
Profile Agent 1 (306) then sends ask conunands with the same uid and profile to Profile Agent 2 (312) on host www.soccer.com and Profile Agent 3 (318) on host 16 www.cars.com. From the replies Profile Agent 1 (306) would then assemble a modified index page with embedded load commands. If the user now selects the link "Soccer", WWW Client 304 would connect to Profile Agent 2 (312) which parses the path part of the URL into the name of a local page (index.html) to be retrieved from V Server www.soccer.com and a load command from Profile Agents linked from within the index.html page and sent a modified page back to the V Client 304.
To capture the time spent viewing a page, a User Agent applet that measures time is started on loading the page to WWW Client 304. When the user changes to a different page (by following a link, or using one of the browser buttons like "Back", "Forward" etc.), the User Agent 302 reports the time spent while the page was visible back to the Profile Agent 312 from which the page was loaded. The User Agent 302 applet is implemented as a invisible Java applet embedded into the Web page downloaded from the Profile Agent 312. To illustrate, assuming that code for the User Agent 302 is contained in the file " TimeTracker. class", the page assembled by the Profile Agent 312 must, for example, include the following statements:
< applet code-TimeTracker width= 1 height= 1 > param. name= uid value=" 1234" > param name = profile value = "0 110 1 " > < /applet > The Profile Agent 312 at the server side records the information sent by the User Agent 302 which includes the user id, the user profile, the time spent reading the page and the URL of the page loaded. This information is then used to update the profile for that page in the Page Profile database 316. As described above, a page is considered interesting to the user if the time spent reading it is in a certain range. For such pages, the user profile is added to the page profile in the page profile database 316 at the position indicated by the user id. If an entry for the user previously existed in the profile it is overwritten. This procedural interaction between the user agent 302 and profile agent 312 is followed for all other profile agents interacting with the user agent during a search.
i 17 Although the invention has been described in terms of the preferred and several alternate embodiments, those skilled in the art will appreciate other modifications and alternation that can be made without departing from spirit and scope of the teachings of the invention. All such modifications are intended to be included within the scope of the claims appended hereto.
18

Claims (7)

1. A method for searching and identifying relevant documents from an electronic 5 search comprising the steps of:
(a) (b) (c) (d) (e) (f) (g) (h) defining a user profile for a user desiring to conduct a search; attaching a document profile to each searchable document; obtaining search parameters from said user; attaching said user profile to said search; initiating said search with said search parameters; searching said searchable documents to identify candidate documents: comparing said user profile to said document profile corresponding to said candidate document to determine a successful or unsuccessful match; returning said candidate document to said user if successful match.
2. The method of Claim 1, wherein said search parameters includes one or more key words.
3. The method of Claim 1, wherein said document profile contains user profiles of users who have expressed an interest in said document.
4. The method of Claim 1, wherein said user profiles and said documents are managed by agents.
5. The method of Claim 1, wherein said successful or unsuccessful matches are determined by social filtering between collaborating agents.
6. A method for rating an electronic search comprising: a) b) c) creating a user profile based on typical interests; delivering a document to a user for viewing; and attaching said user's profile to said document where said user is interested in said document.
Amendments to the claims have been filed as follows 1 1-8r 1 Cl CLAIMS 1. A method for searching and identifying relevant documents from an electronic search comprising the steps of.
t (a) (b) (c.) (d) (e) (f) (g) (h) defining a user profile for a user desiring to conduct a search; attaching a document profile to each searchable document; obtaining search parameters from said user; attaching said user profile to said search; initiating said search with said search parameters; searching said searchable documents to identify candidate documents; comparing by social filtering said user profile to said document profile corresponding to said candidate document to determine a successful or unsuccessful match; returning said candidate document to said user if successful match.
2. The method of Claim 1, wherein said search parameters includes one or more key words.
3. The method of Claim 1, wherein said document profile contains user profiles of users who have expressed an interest in said document.
4. The method of Claim 1, wherein said user profiles and said documents are managed by agents.
1 5. The metod of Claim 1, wherein said comparing by social filtering is performed by collaborating agents.
6. A method for rating documents for identifying relevant documents on an 30 electronic search comprising the steps of:
(a) generating a search command from a user entity, said search command having a user id, user profile and search keyword; (b) (c) (d) (e) (f) searching an index containing document keywords and corresponding linked documents, said linked documents having documents profiles; locating one or more matches between said search keyword and said document keywords in said index; asking for recommendations on said linked documents corresponding to said matches of said document keywords for said user id based on said user profile; calculating a rating of level of interest for said recommendations for each said match-by social filtering using said user profile and said document profile; returning relevant documents of said linked documents from said matches to said user entity whose said rating of level of interest is above a threshold level.
7. The method of claim 6 wherein said rating of level of interest is calculated by the correlation formula:
sirn (u, a) 1 M, where:
IA is said user profile; a ile; k is the k' column of said document prof M is the number of the said user profiles in said document profile; and sim (u, &) is the similarity between said user profile and the k' column,qf said document profile.
(a) (b) (c) (d)
GB9806392A 1998-03-25 1998-03-25 Agent-based web search engine Expired - Fee Related GB2335761B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB9806392A GB2335761B (en) 1998-03-25 1998-03-25 Agent-based web search engine
CA002265292A CA2265292C (en) 1998-03-25 1999-03-15 Agent-based web search engine
DE19913509A DE19913509A1 (en) 1998-03-25 1999-03-25 Web search engine using user background information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB9806392A GB2335761B (en) 1998-03-25 1998-03-25 Agent-based web search engine

Publications (3)

Publication Number Publication Date
GB9806392D0 GB9806392D0 (en) 1998-05-20
GB2335761A true GB2335761A (en) 1999-09-29
GB2335761B GB2335761B (en) 2003-05-14

Family

ID=10829235

Family Applications (1)

Application Number Title Priority Date Filing Date
GB9806392A Expired - Fee Related GB2335761B (en) 1998-03-25 1998-03-25 Agent-based web search engine

Country Status (3)

Country Link
CA (1) CA2265292C (en)
DE (1) DE19913509A1 (en)
GB (1) GB2335761B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2341700A (en) * 1998-05-21 2000-03-22 Ibm Parallel profile matching in a large scale webcasting system
GB2352542A (en) * 1999-03-31 2001-01-31 Ibm Information retrieval augmented by the use of communication relationship data
WO2001040964A1 (en) * 1999-12-01 2001-06-07 Amicus Software Pty Ltd Method and apparatus for network access
WO2001086494A1 (en) * 2000-05-08 2001-11-15 Sonera Oyj User profile management in a communications network
GB2366033A (en) * 2000-02-29 2002-02-27 Ibm Processing acquired data and contextual information and associating the same with available multimedia resources
US6615209B1 (en) * 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US6701362B1 (en) * 2000-02-23 2004-03-02 Purpleyogi.Com Inc. Method for creating user profiles
EP1461725A1 (en) * 2001-11-27 2004-09-29 Web-Track Media Pty Ltd Method and apparatus for information retrieval
EP1524611A2 (en) * 2003-10-06 2005-04-20 Leiki Oy System and method for providing information to a user
US7177904B1 (en) 2000-05-18 2007-02-13 Stratify, Inc. Techniques for sharing content information with members of a virtual user group in a network environment without compromising user privacy
US7694227B2 (en) 1999-11-03 2010-04-06 Sublinks Aps Method, system, and computer readable medium for managing resource links
WO2010046840A1 (en) * 2008-10-22 2010-04-29 Telefonaktiebolaget Lm Ericsson (Publ) Method and node for selecting content for use in a mobile user device
US8244766B2 (en) 2010-04-13 2012-08-14 Microsoft Corporation Applying a model of a persona to search results
EP2740054A4 (en) * 2011-08-04 2015-10-28 Krasimir Popov Searching for and creating an adaptive content
US9785987B2 (en) 2010-04-22 2017-10-10 Microsoft Technology Licensing, Llc User interface for information presentation system
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19935792A1 (en) * 1999-07-29 2001-02-01 Intershop Software Entwicklung Networked computer system for information exchange by address data field or uniform resource locator (URL), has server computer provided with an URL having address information for clear identification of server computer
NL1013193C2 (en) * 1999-10-01 2001-04-03 Resense V O F Interactive search machine for finding solutions to defined problems, selects key words associated with problem and searches database for problems associated with these words
DE19963123B4 (en) * 1999-12-24 2004-09-16 Deutsche Telekom Ag Analytical information system
DE10024368A1 (en) * 2000-05-17 2001-11-22 Michael Fahrmair Locating selection of information products involves accessing information product database containing data about information products with at least location, category information per product
CA2924940A1 (en) 2000-07-05 2002-01-10 Paid Search Engine Tools, L.L.C. Paid search engine bid management
JP2003085081A (en) * 2000-07-25 2003-03-20 Nosu:Kk Information delivery service system
AU2002259125A1 (en) * 2001-05-03 2002-11-18 Quinn, Gaellen And Michael Offline-to-online traffic generation and demographic identification process and method
DE10143940B4 (en) * 2001-09-07 2012-07-26 Peter Krug Method and device for determining relevant objects
DE10357562A1 (en) * 2003-12-10 2005-07-28 Deutsche Telekom Ag Method for guiding user to required element by use of communication system with internet, requires input of individual user profile into first data bank
US7444328B2 (en) 2005-06-06 2008-10-28 Microsoft Corporation Keyword-driven assistance
US7974880B2 (en) * 2007-01-31 2011-07-05 Yahoo! Inc. System for updating advertisement bids

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995012173A2 (en) * 1993-10-28 1995-05-04 Teltech Resource Network Corporation Database search summary with user determined characteristics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995012173A2 (en) * 1993-10-28 1995-05-04 Teltech Resource Network Corporation Database search summary with user determined characteristics

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2341700B (en) * 1998-05-21 2002-11-06 Ibm Method and apparatus for parallel profile matching in a large scale webcasting system
GB2341700A (en) * 1998-05-21 2000-03-22 Ibm Parallel profile matching in a large scale webcasting system
GB2352542A (en) * 1999-03-31 2001-01-31 Ibm Information retrieval augmented by the use of communication relationship data
GB2352542B (en) * 1999-03-31 2003-11-19 Ibm Optimisation of system performance based on communication relationship
US10120942B2 (en) 1999-11-03 2018-11-06 Apple Inc. Method, system, and computer readable medium for managing resource links
US8555172B2 (en) 1999-11-03 2013-10-08 Apple Inc. Method, system, and computer readable medium for managing resource links
US7694227B2 (en) 1999-11-03 2010-04-06 Sublinks Aps Method, system, and computer readable medium for managing resource links
GB2372859B (en) * 1999-12-01 2004-07-21 Amicus Software Pty Ltd Method and apparatus for network access
WO2001040964A1 (en) * 1999-12-01 2001-06-07 Amicus Software Pty Ltd Method and apparatus for network access
GB2372859A (en) * 1999-12-01 2002-09-04 Amicus Software Pty Ltd Method and apparatus for network access
US8214359B1 (en) 2000-02-22 2012-07-03 Google Inc. Detecting query-specific duplicate documents
US8452766B1 (en) 2000-02-22 2013-05-28 Google Inc. Detecting query-specific duplicate documents
US6615209B1 (en) * 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US7779002B1 (en) 2000-02-22 2010-08-17 Google Inc. Detecting query-specific duplicate documents
US6701362B1 (en) * 2000-02-23 2004-03-02 Purpleyogi.Com Inc. Method for creating user profiles
GB2366033B (en) * 2000-02-29 2004-08-04 Ibm Method and apparatus for processing acquired data and contextual information and associating the same with available multimedia resources
GB2366033A (en) * 2000-02-29 2002-02-27 Ibm Processing acquired data and contextual information and associating the same with available multimedia resources
WO2001086494A1 (en) * 2000-05-08 2001-11-15 Sonera Oyj User profile management in a communications network
US7822812B2 (en) 2000-05-18 2010-10-26 Stratify, Inc. Techniques for sharing content information with members of a virtual user group in a network environment without compromising user privacy
US7177904B1 (en) 2000-05-18 2007-02-13 Stratify, Inc. Techniques for sharing content information with members of a virtual user group in a network environment without compromising user privacy
EP1461725A4 (en) * 2001-11-27 2005-06-22 Web Track Media Pty Ltd Method and apparatus for information retrieval
EP1461725A1 (en) * 2001-11-27 2004-09-29 Web-Track Media Pty Ltd Method and apparatus for information retrieval
EP1524611A2 (en) * 2003-10-06 2005-04-20 Leiki Oy System and method for providing information to a user
EP1524611A3 (en) * 2003-10-06 2005-04-27 Leiki Oy System and method for providing information to a user
WO2010046840A1 (en) * 2008-10-22 2010-04-29 Telefonaktiebolaget Lm Ericsson (Publ) Method and node for selecting content for use in a mobile user device
US8244766B2 (en) 2010-04-13 2012-08-14 Microsoft Corporation Applying a model of a persona to search results
US9785987B2 (en) 2010-04-22 2017-10-10 Microsoft Technology Licensing, Llc User interface for information presentation system
US10628504B2 (en) 2010-07-30 2020-04-21 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information
EP2740054A4 (en) * 2011-08-04 2015-10-28 Krasimir Popov Searching for and creating an adaptive content

Also Published As

Publication number Publication date
CA2265292C (en) 2009-09-29
GB2335761B (en) 2003-05-14
CA2265292A1 (en) 1999-09-25
GB9806392D0 (en) 1998-05-20
DE19913509A1 (en) 1999-09-30

Similar Documents

Publication Publication Date Title
CA2265292C (en) Agent-based web search engine
US11547853B2 (en) Personalized network searching
US6546388B1 (en) Metadata search results ranking system
Pierrakos et al. Web usage mining as a tool for personalization: A survey
He et al. Combining evidence for automatic web session identification
US6917972B1 (en) Parsing navigation information to identify occurrences corresponding to defined categories
US6006217A (en) Technique for providing enhanced relevance information for documents retrieved in a multi database search
Hong et al. WebQuilt: A proxy-based approach to remote web usability testing
CN100428234C (en) Method and system for assessing quality of search engines
US7107338B1 (en) Parsing navigation information to identify interactions based on the times of their occurrences
US20010044795A1 (en) Method and system for summarizing topics of documents browsed by a user
Senkul et al. Improving pattern quality in web usage mining by using semantic information
US7596533B2 (en) Personalized multi-service computer environment
US6714933B2 (en) Content aggregation method and apparatus for on-line purchasing system
US8572100B2 (en) Method and system for recording search trails across one or more search engines in a communications network
Hijikata Implicit user profiling for on demand relevance feedback
US20070094268A1 (en) Broadband centralized transportation communication vehicle for extracting transportation topics of information and monitoring terrorist data
KR20020003915A (en) A method for providing search result including recommendation of search condition, and a server thereof
Menczer et al. Adaptive assistants for customized e-shopping
US20060074843A1 (en) World wide web directory for providing live links
US8166027B2 (en) System, method and program to test a web site
EP2608064A1 (en) Information provision device, information provision method, programme, and information recording medium
CA2805872C (en) Information provisioning device, information provisioning method, program, and information recording medium
Gates et al. Toward an adaptive WWW: a case study in customized hypermedia
Koutri et al. Adaptive interaction with web sites: an overview of methods and techniques

Legal Events

Date Code Title Description
732E Amendments to the register in respect of changes of name or changes affecting rights (sect. 32/1977)
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20030814