CA2265292C - Agent-based web search engine - Google Patents

Agent-based web search engine Download PDF

Info

Publication number
CA2265292C
CA2265292C CA002265292A CA2265292A CA2265292C CA 2265292 C CA2265292 C CA 2265292C CA 002265292 A CA002265292 A CA 002265292A CA 2265292 A CA2265292 A CA 2265292A CA 2265292 C CA2265292 C CA 2265292C
Authority
CA
Canada
Prior art keywords
user
profile
search
agent
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA002265292A
Other languages
French (fr)
Other versions
CA2265292A1 (en
Inventor
Michael Weiss
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitel Networks Corp
Original Assignee
Mitel Networks Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitel Networks Corp filed Critical Mitel Networks Corp
Publication of CA2265292A1 publication Critical patent/CA2265292A1/en
Application granted granted Critical
Publication of CA2265292C publication Critical patent/CA2265292C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Abstract

According to one aspect of the present invention, there is provided a method for searching that uses social filtering between collaborating agents to track user behavior and to guide the search for documents and information stored in electronic form. The present invention records background information about each user in a user profile. User profiles are learned from a training set of pages or by observing user behavior. The method of the present invention increases the number of search results which are likely to be perceived by the user as highly relevant because of a "peer effect". The present invention allows the use of best-first search of the Web which performs better than the depth-first or breadth-first search used in past approaches.

Description

. ., . . . . . i '' ',i ...

AGENT-BASED WEB SEARCH ENGINE
FIELD OF T'HE RWF=NTION

The present invention relates generally to the field of information search and retrieval, and more particularly, to an agent-based Web search engine that improves keyword-based searches by utilizing a user profile to further restrict the search to documents of interest to the user. The present invention utilizes a network of collaborating agents that track and use information from the browsing behavior of other similar users.

BACKGROUND OF THE INVENTION

Previous approaches to fimding information on the World Wide Web (WWW) have included automated searching programs (search engines) such as WAIS or Web crawler to locate web sites and information of interest to the user.

These automated search engines suffer from the problem of returning too many search results, frequently by including documents of marginal or low relevance, reducing the usefulness of the search to the user. These typical approaches fail to consider any measure of the user interest in conducting the search.
These prior art approaches use a measure of interest based only on the search keywords entered by the user. As a consequence, these search approaches return all documents which contain the search terms, including documents which are in subject areas unrelated to the user's area of interest. However, frequently background information is available which can be applied to the search. These prior art approaches do not use this valuable background information to eliminate documents which are not relevant to the user. User satisfaction with these prior art search engines is therefore low.
Lieberman in "Letizia: An Agent that Assists in Web Browsing", International Joint Conference on Artificial Intelligence, 1995, describes an approach that uses a single agent to assist the user browsing the World Wide Web. In the Lieberman approach, the agent tracks user behavior and attempts to anticipate documents of interest by autonomously exploring links from the user's current position.
This approach infers search goals from the user's browsing behavior and makes unsolicited recommendations of "interesting" documents. One of the drawbacks of this prior art approach is that it focuses on the behaviour of the individual user without considering other information which can be gleaned from community interest in the documents and information.
It is a reasonable assumption that documents and information of interest to one person in a community would likely be of interest to another. Valuable information obtained from the browsing behaviour of other users can be used to focus the search.
Therefore, an approach which tracks and uses information from the browsing behavior of other users "similar" to the user can be used to facilitate the search process to ascertain relevant documents.
Furthermore, a search mechanism which is augmented by utilizing a network of collaborating agents to track browsing behavior and guide searches would improve the effectiveness of the search.

SUIVIlVIARY OF THE INVENTION

According to one aspect of the present invention, there is provided a method for searching that uses social filtering between collaborating agents to track user behavior and to guide the search for documents and information stored in electronic form.
The present invention records background information about each user in a user profile. User profiles are learned from a training set of pages or by observing user behavior.
The method of the present invention increases the number of search results which are likely to be perceived by the user as highly relevant because of a "peer effect". The present invention allows the use of best-first search of the Web which performs better than the depth-first or breadth-first search used in past approaches. The approach according to the present invention is also highly scalable, since each agent manages only a small subset of users and documents, as compared to other known social filtering approaches (e.g. Maes, P. "Agents that Reduce Work and Information Overload", Communications of the ACM, July, 1994).
BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram depicting an overview of a networked system implementing the search mechanism of the present invention.
Figure 2 is a block diagram of a search tree for searching web pages illustrating the use of the present invention.
Figure 3 is a block diagram depicting the present invention implemented in Java.
Figure 4 is a flowchart diagram illustrating the executing of the search commands of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

According to the present invention, there is provided an agent based web search engine where agents are used to assist users and communicate information to facilitate efficient searches. The data that is communicated among agents pertains to user profiles about individual users on the web as well as web page profiles regarding the particulars of various web pages.
The present invention tracks and uses information gathered on the browsing behaviour of users that are "similar" to the user searching for information, utilizing a process known as social filtering. The concept of social filtering has been described by Maes (referred to herein above), and by Lashkari, Y. et al. "Collaborative Interface Agents", Conference of the American Association of Artificial Intelligence, 1994.
Social filtering, in its basic unmodified form, does not base its correlations on the content of information but rather on correlations solely among the users or viewers of such information. Social filtering uses information about a user's social environment as a guide to locating relevant documents and information.
In the implementation of social filtering, information about the interests of individual users is gathered. Then, during the search phase information is filtered for relevance by exchanging data about the users who have expressed an interest in the information. Social filtering has, for example, been successfully applied to make recommendations about music records.
Turning to Figure 1, a networked system implementing the present invention is shown. Web server 110 is connected to local area network 104. Local area network 104 is in turn connected to the World Wide Web (WWW) 102. Likewise, web server 112 is connected to local area network 106, which is connected to World Wide Web 102. Web servers 110 and 112 are standard Internet or Intranet computing machines, as are well known in the art, that are capable of displaying web pages of hypertext markup language (HTML) format. HTML is a well known markup system used to create hypertext documents that are portable from platform to platform. To accomplish the communication tasks required of the present invention, web servers 110 and 112 use standard network communication schemes and the Internet Hypertext Transfer Protocol (HTTP) which allows the transfer of information from client to server. While the invention hereinafter will be described with respect to HTML web pages using HTTP, it is within the scope of this invention that other document formats and protocols could be used.
HTML web pages are stored in computer memory 114 of web server 110 and are made accessible to local users 118 and 120 as well as remote users 122 and through the World Wide Web 102. Likewise, other web pages are stored in computer memory 116 of web server 112 which are available to local users 118 and 120 as well as remote users 122 and 124 through World Wide Web 102. Local users 118 and and remote users 122 and 124 use a standard web browser, such as NetScape'' from = 5 NetScape Communication Corporation or Microsoft Internet ExplorerTM from Microsoft Corporation, which can read the HTML coded web pages. Each of users 118, 120, 122 and 124, as well as each web page in memory 114 and 116 have an accessible address or universal resource locator (URL) which can be used by users, agents or other network devices for locating and accessing user or web page information.
In the preferred embodiment, agents are used to perform tasks on the user's behalf, train or teach the user, help different users collaborate, and monitor events and procedures. In particular, various user and web page profiles are managed by agents. In addition, searches are facilitated by the communication and collaboration of agents. In the preferred embodiment, agents employ social filtering relying upon correlations drawn between different users in identifying relevant documents and information.
As shown in Figure 1, agent 126 manages a portfolio of user profiles for local users 118 and 120 on web server I 10. Agent 126 also manages a portfolio of web page profiles for each web page stored in memory 114 of web server 110. Agent 128 manages a portfolio of web page profiles for each web page stored in memory 116 of web server 112. In addition, agent 130 manages the user profile for user 122, while agent 132 manages the user profile for user 124. Particulars of the profiles and the information managed and communicated by agents 126, 128, 130 and 132 is described in further detail below.
A user profile, such as managed by agents 126, 128, 130 and 132 is a description of background information about a particular user. It is created by identifying and listing specific features or areas of interest to the user. One method of representing this profile is as a vector where each element represents the user's interest in a particular feature. Therefore, in the general case, the Boolean feature vector {ul,... un}, each Boolean value Uk indicates whether a particular feature k is of interest to a user. For example, features for comparison could be defined as: (a) cars; (b) sports; (c) cooking. If a user is interested in cars and sports, but not interested in cooking, their profile would be represented as the feature vector {1, 1, 01, where 1 represents true and 0 represents false. This profile could then be used to correlate with profiles of users with similar interests.
A web page profile is merely a list of the profiles of users who have visited that web page document. It may be implemented as a list of the actual user profiles, or merely a pointer to the agent managing each user's profile. For example, in one embodiment, a web profile can be set up by an agent as a table of {feature, user id, interest} tuples.
An example of such is set out in Table 1 below, such as:
User id Feature a, a2 a3 cars 1 0 0 sports 1 1 1 cooking 0 1 1 Java 0 0 0 Internet 1 0 1 The web page profile is updated as a new user visits that web page, in a similar manner to which a web page hit counter is updated.
Before a search can be conducted, it is necessary to create user profiles.
While user profiles could be manually generated, the preferred approach is to generate user profiles during a learning operation. During the learning operation, users will be presented with a set of web pages or questions and asked to indicate their interest by responding with either yes or no. Training pages are prepared such that they are associated with a set of features that will be included in the user profile.
Therefore, given the set of positive examples of concepts (or web pages) that the user is interested in, and the set of negative examples (or web pages) the user is Mt interested in, the set of features that distinguish pages rated as interesting from those that are not can be learned throutth an information theoretic approach. The use of an information theoretic approach is k n o wn in t h e a r t.

Traini,ng pages that are highly matched with a user's interest may subsequently be suggested to the user as starting points for exploration of the web. This provides the additional advantage of directing the user to a web server that supports the agent based web server engine.
The present invention is better illustrated by way of example. Turning to Figure 2, the search tree depicting the typical search actions of users of Figure 1 is shown in greater detail.
The search is represented by numerous nodes 202, 204, 206, 208, 210, 212, 214, 216 and 218. Each of the nodes in the search tree corresponds to a Web page which may be on a different web server. The initial node 202 is a web page currently received by a user such as described in Figure 1. Branches of the tree represent hypertext links between the web pages.
Each web page is associated with a web page profile that is used to track which users have visited that particular web page. When a user visits a web page 204 this is a good indication that it is of interest to that user.
A further, better measure of interest could be employed wherein the actual amount of time spent by a user reading a web page is recorded. This may be implemented by a profiler downloaded when a user first accesses the instrumented pages, as an invisible applet written in a suitable language, such as Java, that collects information on a user's web page traversals and captures usage access patterns including time spent on a particular page.
In this manner, the time spent viewing a web page can be reported back to the agent at the web server by the applet. If the viewing time is in a certain range, the page is considered interesting to the user.
The range is bounded by a lower threshold, set and adjustable by the user agent, below which the page is considered not interesting and optionally a corresponding upper threshold, beyond which the user is considered to have abandoned the web page, rather than a page of high interest. The particular search illustrated is started at the users' initial node 202. The user formulates a search by providing a keyword 222. The user profile 224 describing the user's background is implicitly added to the search specification. The user also has the option to leave the keyword unspecified. If this is done, the Web is searched for every document that matches the user's profile 224. Each node 202 - 208 has a page profile, however, only page profile 220 is illustrated in Figure 2.

As the search evolves, each of nodes 204 - 208 is tested on whether it includes the specified keyword. If it does, the correlation between the user profile 224 and the page profile for the node is computed and compared against a threshold. A high-level formal description of the process by which a node is tested for match is described below.

With the following definitions:
u user profile, u = {u,,...,uN}
u; indicates whether feature i is of interest to the user in matching vector, m = {m,,...,mM}
A page profile, A={a,,..., aM}
theta threshold N number of features M number of agents or user profiles in the page profile The correlation between u and A computes to:
uA=m A match occurs if:
Iln l >= theta Each column of A in this description corresponds to the profile of a user who has visited that page. Rows are added to A as a user's visit is tracked by the agent in charge of that page. Alternatively, a link to the user's web page is provided.
Any 2 0 obvious optimizations apply as to how these profiles are actually represented which are obvious to one skilled in the art. There are several standard ways to compute this ~
correlation. One standard way to measure the "similarity" between two features vectors to compute their normalized vector product, that is, use a cosine similarity measure.
For Boolean feature vector, this measure obtains values between 0 and 1, where values close to 0 indicate low, and values close to 1 high degrees of similarity, respectively.
The similarity function is:
sim(u,v) = cosine(u, v) where u and v are feature vectors, and cosine (u, v) is their normalized vector product:
cosine (u,v) = (u*v) /( Ju l * Jv J) Using this similarity function to calculate the similarity between bectors {0, 1, 1, 0, 1} and {1, 1, 0, 0, 1), the function is:

sim ({0, 1, 1, 0, 1), {1, 1, 0, 0, 1}) = 2/(sqrt(3) sqrt(3)) = .67 Therefore, using the defmitions above where u is the user profile of the new visitor, and ar is the k-th column in the web page profile A, the average similarity of the user profile u and the page profile gives a measure of the correlation between the user profile u and the page profile A.
The correlation can be expressed by the formula:

k=I
correlation (u, A) _ E sim (u, M
M
This is further illustrated using the web page profile in Table 2 below.
In Table 2 below, a new user with user profile (0, 1, 1, 0, 1) visits the page:
User Id Feature a1 a2 T a3 u cars 1 0 0 0 sports 1 1 1 1 cooking 0 1 1 1 Java 0 0 0 0 Internet 1 0 1 1 The correlation is calculated by comparing the new user profile with each column of the web page profile using the similarity function across the web page profile.
sim(u,a,)=.67 sim (u, a2) = .81 sim (u, a3) = 1.00 correlation (u, A) _ (.67 + . 81 + 1.00)/3 = .83 A match between a user and a page is determined by comparing the correlation against a threshold which may be pre-defined or optionally set by the user agent. For example, if the threshold theta = .80, the user profile would be considered a match and included in the page profile, because .83 > .80.

If the web page profile is particularly large, it may be necessary to employ some optimization techniques to speed the calculation. Many optimizations are possible, for 5 example, summing over a random sample of columns of the page profile, rather than all.
The results of the match can be presented in several known ways, for example, through colour-coding the links or annotating the link with a numerical rating a confidence level, number or percentage in the recommendation (such as described, for 10 example, Hill et al., "Recommending and Evaluating in a Virtual Community of Users, 1994, Bell Communications Research).

The search continues for each of the pages that meet the search criteria and has not been eliminated by testing for the threshold. The list of relevant pages is refined until they do not match the criteria, they contain no further links to other pages, or reach some pre-defined limit of number of results. Circular links can be handled by testing against a list of already successfully matched pages before a match is attempted.
The web pages which meet or exceed the threshold are presented to the user. In this manner, web pages, documents and information which meets both the search criteria of the user, as well as correlate with interests of similar users are delivered.
Turning to Figure 3, a preferred embodiment of the present invention, implemented in Java, is shown. The User Agent 302 operates as an applet stored on WWW Client 304. Profile Agent 1 (306) is connected to the same local area network 300 as the user Agent 302. Profile Agent 1 (302) is implemented as a Java application on WWW Server 308 and manages Page Profile database 310. Profile Agent 2 (312) and Profile Agent 3 (318) reside on remote W W W Servers 314 and 320 (www.soccer.com and www.cars.com), respectively. Both WWW Servers 314 and 320 are implemented supporting Java applications. WWW Client 304, Profile Agent 1 (302), Profile Agent 2 (306) and Profile Agent 3 (318) are provided with connections to the World Wide Web 324. WWW Client 304 is a standard Web browser such as Netscape Communicator or Microsoft Explorer and WWW Servers 306, 314 and 320 are standard Web servers such as Netscape FastTrack or Apache server.

Profile Agents 306, 312, and 318 nnplement the HTTP protocol and behave to WWW Client 304 just as a typical WWW server would. Each Profile Agent 306, 312, and 318 implements the following three commands: search, load, and ask. The search command is initiated by a user to conduct a search. The load conunand is used to further refine a search. The ask command is used by profile agent 306, 312 and 318 to inquire about interest levels for linked pages. These commands have the following format:

search =uid:profile:keywords l oad = uid : profile : keywords ask=uid:profile Each user is assigned a unique user id (uid) when the user is set-up with user agent 302. Each command also includes the profile of the user initiating the command.
The keywords which are part of the search and load commands are the words, separated by commas, that the user wishes to search. It is also possible to pass other information with each command, such as a threshold for assessing the relevance of a page as set by the user, as a further embodiment of the invention.
Turning to Figure 4, the processing of the commands is described in further detail.
The User Agent 402 provides the mechanism for the user to defme his user ~
profile, as previously described with respect to Figure 1, or through the use of a separate combination box for selecting features.
User Agent 402 issues search command 404 on the initial screen 403 off which the user starts his search. The user Agent 402 is embedded into the Web page for the initial screen 403 as a Java applet. This applet displays a typical search form in HTML
format with a field for entering keywords 406 and button 408 to set off the search. The applet is downloaded when the initial screen 403 web page is retrieved from the WWW
server from which the user starts his search. If the applet code is stored as "SearchForm.class" on the WWW Server, the initial screen 403 HTML web page would have the following statement:

< applet code = SearchForm width=200 height = 50 >
</applet>

The User Agent 402 on the initial screen 403 retrieves the user id and profile 410 from the W W W Client. The preferred embodiment utlizes the presence of a "cookie" mechanism by which the web server connections (such as applets or CGI
scripts) can both store and retrieve information on the client side of the connection.
Such a mechanism is implemented by the major browsers, for example, Netscape Communicator , Cookies are stored as name/value pairs in a designated file on the WWW Client.
Each cookie is associated with a path and an optional expiry date. =When the Java applet requests a cookie from the WWW Client, the path component of the applet's document base is compared to the path attribute, and if there is a match, the cookie is visible to the applet. Commercial browsers such as Netscape provide a Java class library for accessing cookies from within an applet.
The cookie mechanism is applied as follows. When a first tim user (which can be indicated by passing a special uid with the search command) submits a search through his User agent 402, the Profile Agent 412 at the server side generates a unique user id and passes it back to the User Agent 402. The User Agent 402 then creates a cookie on the client side of the connection that contains this user id and profile 410.
For example, the following cookie represents a user with user id "1234" and user profile "01101" (which corresponds to the feature vector {0, 1, 1, 0, 11 using a straightforward encoding):

uid=1234; profile=01101; path=/

When the User Agent 402 subsequently issues a search command 404 to the Profile Agent 412 that created the cookie, it retrieves the user id and profile 410 from the cookie and sends it to the Profile Agent 412. As discussed above, a complete search command contains the user id, the user profile and the list of keywords. For example, the following is a search command to search the Web on behalf of a user with id " 1234" and profile "01101 " for the keyword "worldcup" :
search = 1234: 0110 1: worldcup Each WWW Server is set up having an index page which contains initiallinks from which to begin a search. This could be a database of keywords and associated web pages containing those key words. This database could be derived from a typical search engine or robot. For example, if the user connects to a WWW Server to start his search, the index page might contain the following HTML initial links:

< a href = http: //www. soccer. com/index. html > Soccer </a >
<a href=http://www.cars.com/marketplace.html>Cars for sale </a>
< a href = http: //intranet/home.html > ACM < /a >

On receiving a search command, the Profile Agent 412 retrieves the index page from its WWW server, and issues an ask command 414 to one or more other Profile Agents 416 for recommendations on each document linked from the index page.
With each ask command the user id and profile as provided. For example:
ask=1234:01101 The Profile Agent 416 for the linked page replies with the level of interest calculated in association with other profile agents 416 using a correlation function such as previously described for the given profile. Pages whose level of interest is above a certain pre-defined threshold, or a threshold optionally set up the user agent 402, are then downloaded and filtered against the optional list of keywords. In a simple filtering scheme, pages that do not include the keywords would be removed from the list of recommended links. The Profile Agent 412 then modifies the links in the original page by encoding and including how interesting each of them would be to the user and sends the modified link page back to the User Agent 402. Each link is annotated, for example, through color coding or a numerical indication of the degree of confidence that the page is relevant to the user.

For the above example, the Profile Agent 412 might have annotated the links as follows:

< a href=http://www.soccer.com/index.html?load=1234:01101:worldcup >
< font color = "#FF000" > Soccer < /a >
< a href=http: //www. cars.com/marketplace.html >
Cars for sale < /a >
< a href =http://intranet/home.html?load =1234:01101: worldcup >
< font color = "#FF000" > Intranet homepage < /a >

Here a color coding scheme with one threshold is used. Any link that was recommended with a correlation at or above threshold is encoded in red (corresponding to the color code FF000 in RGB format). Each color coded link also contains an embedded load command which contains the user id and profile 410 to be passed to the Profile Agent 412 for that page. The load command is separated from the actual link using a "?", which is a CGI (Common Gateway Interface) convention.
The search command is only invoked once. Subsequent refmements of the search are performed via load commands. For example:
~

load =1234:01101:worldcup The processing of the load and search commands are generally the same, except for two aspects, which warrant the separation into two commands.
First, in the case of the load command, the host of the Profile Agent to receive indicated by the command is the host part of the URL which contains the load command. For example, if the user selects link http: //intranet/home.html?load =1234:01101:worldcup in the page retuned by Profile Agent 412, the following information will be sent (via the WWW client) in the load conunand 419 to the Profile Agent 420 on the host "intranet":

home.html?load =1234:01101: worldcup The Profile Agent 420 then extracts from this a local path (home.html) and the 5 actual load command (load = 1234:01101:worldcup).

Second, the page returned by the Profile Agent 420 in reply to a load command 419 contains a Java applet 422 that monitors the time that the user spends reading the page, which the Profile Agent 420 then uses as an indication of interestingness.

Further information on how recommendations are solicited from other Profile Agents and how the level of interest displayed by a user in a particular page is measured is described below.
On receiving a search command 404 or load command 419, a Profile Agent 412 or 420, as the case may be, first retrieves the appropriate page from the WWW
Server on the same site. This is either the index page or the page at the path which was passed together with the search or load command. The Profile Agent 412 or 420, as the case may be, then extracts links to other pages from the document. For each link the Profile Agent 412 or 420 establishes a socket connection to the remote Profile Agent 416 or 426 using the URL for that link. This is not necessary if the page is already on the local WWW Server and monitored by the same Profile Agent. The Profile Agent or 420 then sends an ask command 414 or 424 to the Profile Agent or Agents 416 or 426 for the linked page and waits for it to reply with the interest level (correlation) for that page. The Profile Agent 416 or 426 for the linked page computes the correlation between the specified user profile 410 and the profiles stored for that page in the Page Profile database and returns it to the Profile Agent 412 or 420 (as the case may be) as the interest level. The socket is then deestablished, if necessary.
In the example of Figure 3, the User Agent 302 would first send a search command with uid 1234, profile 01101, and keywords "worldcup" to Profile Agent 306, which would then load the page intranet/index.html for the WWW Server 308.
Profile Agent 1 (306) then sends ask commands with the same uid and profile to Profile Agent 2 (312) on host www.soccer.com and Profile Agent 3 (318) on host www.cars.com. From the replies Profile Agent 1 (306) would then assemble a modified index page with embedded load commands. If the user now selects the link "Soccer", WWW Client 304 would connect to Profile Agent 2 (312) which parses the path part of the URL into the name of a local page (index.html) to be retrieved from WWW Server www.soccer.com and a load command from Profile Agents linked from within the index.html page and sent a modified page back to the WWW Client 304.
To capture the time spent viewing a page, a User Agent applet that measures time is started on loading the page to WWW Client 304. When the user changes to a different page (by following a link, or using one of the browser buttons like "Back", "Forward" etc.), the User Agent 302 reports the time spent while the page was visible back to the Profile Agent 312 from which the page was loaded. The User Agent applet is implemented as a invisible Java applet embedded into the Web page downloaded from the Profile Agent 312. To illustrate, assuming that code for the User Agent 302 is contained in the file "TimeTracker.class", the page assembled by the Profile Agent 312 must, for example, include the following statements:
< applet code-TimeTracker width=l height=l >
< param name =uid value = " 1234" >
< param name =profile value = 01101 " >
< /applet >

The Profile Agent 312 at the server side records the information sent by the User Agent 302 which includes the user id, the user profile, the time spent reading the page and the URL of the page loaded. This information is then used to update the profile for that page in the Page Profile database 316. As described above, a page is considered interesting to the user if the time spent reading it is in a certain range. For such pages, the user profile is added to the page profile in the page profile database 316 at the position indicated by the user id. If an entry for the user previously existed in the profile it is overwritten. This procedural interaction between the user agent 302 and profile agent 312 is followed for all other profile agents interacting with the user agent during a search.

Although the invention has been described in terms of the preferred and several alternate embodiments, those skilled in the art will appreciate other modifications and alternation that can be made without departing from spirit and scope of the teachings of the invention. All such modifications are intended to be included within the scope of the claims appended hereto.

Claims (6)

1. A web search engine, comprising:

a user agent to interface with a user for defining a user profile, receiving at least one keyword input by a user for a search request, and initiating said search request having search parameters including said at least one keyword and said user profile;

a first profiling agent to receive said search request for initiating a search to identify candidate documents matching said search parameters;

a plurality of second profiling agents associated with said candidate documents, each of said candidate documents having an associated document profile for correlating said user profile to document profiles associated with said candidate documents to determine an interest level to each of said candidate documents;

wherein said user agent presents a listing of said candidate documents with interest levels to said user in response to said search; and wherein each of said document profiles comprises user profiles of users who have retrieved and displayed on their viewing devices a candidate document associated with respective said document profile, for a time in a range indicating interest.
2. The web search engine of claim 1, wherein each of said second profiling agents further comprises updating said document profile associated with said candidate document with said user profile in response to receiving a communication from a program that was sent with said candidate document to said user which indicates that said user has retrieved and displayed said candidate document for said time in a range indicating interest.
3. The web search engine of claim 2, wherein said user profiles and said document profiles are managed by agents.
4. The web search engine of claim 2, wherein said program is an applet.
5. The web search engine of claim 1, wherein said listing of each of said candidate documents is presented in response to said interest level being above a threshold.
6. The web search engine of claim 1, wherein said interest level is calculated by correlation formula, as follows:

sim (u, a k)/M, where:
(a) u is said user profile;

(b) a k is k th column of said document profile.

(c) M is number of said user profiles in said document profile; and (d) sim (u, a k) is the similarity between said user profile and the k th column of said document profile.
CA002265292A 1998-03-25 1999-03-15 Agent-based web search engine Expired - Fee Related CA2265292C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9806392.8 1998-03-25
GB9806392A GB2335761B (en) 1998-03-25 1998-03-25 Agent-based web search engine

Publications (2)

Publication Number Publication Date
CA2265292A1 CA2265292A1 (en) 1999-09-25
CA2265292C true CA2265292C (en) 2009-09-29

Family

ID=10829235

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002265292A Expired - Fee Related CA2265292C (en) 1998-03-25 1999-03-15 Agent-based web search engine

Country Status (3)

Country Link
CA (1) CA2265292C (en)
DE (1) DE19913509A1 (en)
GB (1) GB2335761B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6169989B1 (en) * 1998-05-21 2001-01-02 International Business Machines Corporation Method and apparatus for parallel profile matching in a large scale webcasting system
US7039639B2 (en) * 1999-03-31 2006-05-02 International Business Machines Corporation Optimization of system performance based on communication relationship
DE19935792A1 (en) * 1999-07-29 2001-02-01 Intershop Software Entwicklung Networked computer system for information exchange by address data field or uniform resource locator (URL), has server computer provided with an URL having address information for clear identification of server computer
NL1013193C2 (en) * 1999-10-01 2001-04-03 Resense V O F Interactive search machine for finding solutions to defined problems, selects key words associated with problem and searches database for problems associated with these words
ATE504880T1 (en) 1999-11-03 2011-04-15 Sublinks Aps METHOD, SYSTEM AND COMPUTER READABLE MEDIUM FOR MANAGING CONNECTIONS BETWEEN RESOURCES
GB2372859B (en) * 1999-12-01 2004-07-21 Amicus Software Pty Ltd Method and apparatus for network access
DE19963123B4 (en) * 1999-12-24 2004-09-16 Deutsche Telekom Ag Analytical information system
US6615209B1 (en) * 2000-02-22 2003-09-02 Google, Inc. Detecting query-specific duplicate documents
US6701362B1 (en) * 2000-02-23 2004-03-02 Purpleyogi.Com Inc. Method for creating user profiles
GB2366033B (en) * 2000-02-29 2004-08-04 Ibm Method and apparatus for processing acquired data and contextual information and associating the same with available multimedia resources
FI111879B (en) * 2000-05-08 2003-09-30 Sonera Oyj Management of user profile information in a telecommunications network
DE10024368A1 (en) * 2000-05-17 2001-11-22 Michael Fahrmair Locating selection of information products involves accessing information product database containing data about information products with at least location, category information per product
US7177904B1 (en) 2000-05-18 2007-02-13 Stratify, Inc. Techniques for sharing content information with members of a virtual user group in a network environment without compromising user privacy
CA2924940A1 (en) 2000-07-05 2002-01-10 Paid Search Engine Tools, L.L.C. Paid search engine bid management
JP2003085081A (en) * 2000-07-25 2003-03-20 Nosu:Kk Information delivery service system
AU2002259125A1 (en) * 2001-05-03 2002-11-18 Quinn, Gaellen And Michael Offline-to-online traffic generation and demographic identification process and method
DE10143940B4 (en) * 2001-09-07 2012-07-26 Peter Krug Method and device for determining relevant objects
AUPR914601A0 (en) * 2001-11-27 2001-12-20 Webtrack Media Pty Ltd Method and apparatus for information retrieval
FI116808B (en) * 2003-10-06 2006-02-28 Leiki Oy An arrangement and method for providing information to a user
DE10357562A1 (en) * 2003-12-10 2005-07-28 Deutsche Telekom Ag Method for guiding user to required element by use of communication system with internet, requires input of individual user profile into first data bank
US7444328B2 (en) 2005-06-06 2008-10-28 Microsoft Corporation Keyword-driven assistance
US7974880B2 (en) * 2007-01-31 2011-07-05 Yahoo! Inc. System for updating advertisement bids
US20100099446A1 (en) * 2008-10-22 2010-04-22 Telefonaktiebolaget L M Ericsson (Publ) Method and node for selecting content for use in a mobile user device
US8244766B2 (en) 2010-04-13 2012-08-14 Microsoft Corporation Applying a model of a persona to search results
US9785987B2 (en) 2010-04-22 2017-10-10 Microsoft Technology Licensing, Llc User interface for information presentation system
US9043296B2 (en) 2010-07-30 2015-05-26 Microsoft Technology Licensing, Llc System of providing suggestions based on accessible and contextual information
US9727893B2 (en) * 2011-08-04 2017-08-08 Krasimir Popov Searching for and creating an adaptive content

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2175187A1 (en) * 1993-10-28 1995-05-04 William K. Thomson Database search summary with user determined characteristics

Also Published As

Publication number Publication date
GB2335761B (en) 2003-05-14
CA2265292A1 (en) 1999-09-25
GB2335761A (en) 1999-09-29
GB9806392D0 (en) 1998-05-20
DE19913509A1 (en) 1999-09-30

Similar Documents

Publication Publication Date Title
CA2265292C (en) Agent-based web search engine
US11547853B2 (en) Personalized network searching
Tanasa et al. Advanced data preprocessing for intersites web usage mining
US6546388B1 (en) Metadata search results ranking system
KR100645608B1 (en) Server of providing information search service using visited uniform resource locator log, and method thereof
Senkul et al. Improving pattern quality in web usage mining by using semantic information
Pierrakos et al. Web usage mining as a tool for personalization: A survey
Spiliopoulou Web usage mining for web site evaluation
Cothey A longitudinal study of World Wide Web users' information‐searching behavior
US6006217A (en) Technique for providing enhanced relevance information for documents retrieved in a multi database search
Hong et al. WebQuilt: A proxy-based approach to remote web usability testing
Katz et al. Effects of scent and breadth on use of site-specific search on e-commerce Web sites
US6718365B1 (en) Method, system, and program for ordering search results using an importance weighting
US8185545B2 (en) Task/domain segmentation in applying feedback to command control
US20010044795A1 (en) Method and system for summarizing topics of documents browsed by a user
US8572100B2 (en) Method and system for recording search trails across one or more search engines in a communications network
KR100408965B1 (en) A method for providing search result including recommendation of search condition, and a server thereof
Hijikata Implicit user profiling for on demand relevance feedback
US20100125564A1 (en) Mobile SiteMaps
US20030051031A1 (en) Method and apparatus for collecting page load abandons in click stream data
US8166027B2 (en) System, method and program to test a web site
EP2608064A1 (en) Information provision device, information provision method, programme, and information recording medium
Herder Forward, back and home again: analyzing user behavior on the web
CA2805872C (en) Information provisioning device, information provisioning method, program, and information recording medium
EP0837403A1 (en) Database retrieval system

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20150316