WO2001093068A2

WO2001093068A2 - System to facilite navigation of internet from a wireless device

Info

Publication number: WO2001093068A2
Application number: PCT/CA2001/000799
Authority: WO
Inventors: William K. Wong; Francois LÉVESQUE; Eldon J. Mellaney; Jiangxin Hu; Nhu Khac Quang Le
Original assignee: Myskyweb.Com Inc.
Priority date: 2000-06-02
Filing date: 2001-05-31
Publication date: 2001-12-06
Also published as: CA2344732A1; AU2001268859A1; WO2001093068A3

Abstract

A system to facilitate navigation of Internet from a wireless device is disclosed. A user requests a resource of interest from an origin server (OS) in the Internet by inputting in a URL request through a user agent which runs on the user's wireless device. A request is sent via an optional proxy server. Preferably, the proxy server interprets the request and formulates a request to the OS that contains the requested resource. An arbitrator agent located at the OS receives the request and sends the requested resource, which exists within a file at the OS in its original form to an intermediate conversion server. The intermediate conversion server then performs any necessary conversion of the file and sends the converted file back to the OS. The OS receives the resultant converted file and sends it to the proxy server. The proxy server returns the converted file to the user's wireless device for suitable browsing.

Description

System to Facilitate Navigation of Internet from a Wireless Device

Field of Invention

This invention is concerned with the presentation of data stored on Internet servers to wireless devices.

Background of the Invention

The world is experiencing an exponential growth in the number of users of the Internet. Consequently, high performance and the availability of appropriate user interfaces are crucial for achieving user satisfaction in such an environment. Most current users of the Internet use a wireline network to access content from various servers in the Internet. However, there is a rapid increase in the popularity of using mobile wireless devices to access the content contained on various Origin Servers (OS) in the Internet. Given the wide availability of wireless networks and the low cost of wireless appliances, which include smart phones, personal digital assistants, and hand. held computers, providing Internet content to wireless users presents one of the largest business opportunities of today.

Using a "user agent," an Internet user enters, edits, and receives data through the interpretation of content that is authored in some mark-up language, e.g., Hypertext Mark-up Language (HTML). A user agent is any computer program that is capable of processing the requested content that resides on a given OS in the internet. The most common user agent for the World Wide Web (WWW) is known as a "web browser", or simply a "browser". Providing a wireless user with access to the content that resides within the Internet presents new challenges to the traditional Internet Service Provider (ISP), the content provider and wireless network operator both in terms of performance as well as data presentation. This invention is concerned with the presentation of such Internet data to mobile wireless users.

Conventional Internet users often use high-speed modems and very high capacity backbone networks, with a Gigabit per second capacity for example, to access the data stored at the OSs. Accessing such Internet data from a mobile wireless device is quite different. The wireless network used to interconnect the user with the OS is much lower in bandwidth and is less robust in comparison to a wireline network. Lack of large secondary storage devices as well as the necessity to conserve power are two other major differences between wireless and conventional wireline systems. Appropriate communication protocols and effective resource management strategies are important for the successful operation of these systems. Both ongoing research and a large body of existing knowledge address these issues, hi addition to these challenges, the actual visual presentation of Internet data to mobile users requires special attention from Internet content providers, wireless network operators and ISPs. A relatively lesser amount of work has been directed towards this vital issue of data presentation to the wireless user that this invention focuses on.

At present, the HTML is used widely for developing the content of web pages available from the WWW, which is a part of the Internet. HTML is an evolving specification. It is difficult to know whether HTML will remain the de facto standard for authoring WWW content or whether it will be replaced and/or complemented by a variant of HTML, e.g., xHTML, or an entirely new mark-up language not yet devised. However, the most important characteristic of all these mark-up languages for WWW content is that they are designed for displaying information on a standard desktop computer equipped with a reasonably large computer display monitor, with a 17" diagonal screen dimension for example, that is connected through a wireline network and an optional proxy server to an OS that resides in the Internet. In comparison to a wireless device these computers are higher in performance, use a higher resolution colour display, and a set of Input/Output (I/O) devices such as a mouse, a sound card, and a disk system. A typical wireless device is much lower in performance, typically uses a smaller black and white display, and has access to very rudimentary forms of I/O: touch screens, soft keys, and voice commands. The interconnection networks that carry the WWW content to conventional desktop computers are often 100 Mb/s Local Area Networks (LANs), 6 Mb/s Asymmetrical Digital Subscriber Lines (ADSL) connections, or 64 kb/s Integrated Services Digital Network (ISDN) connections. In comparison, a typical wireless connection is characterized by a much lower bandwidth: 9600 kb/s, for example, for Global System for Mobile cornmunication (GSM). As a consequence, the latency perceived by a user on a wireless system (seconds) is orders of magnitude higher than that observed on a wireline system (tens of milliseconds). Additionally, the screens of cellular telephones in current use are often limited to five lines of text each with fourteen characters whereas a number of PDAs provide a viewing area that is 160 pixels by 160 pixels in size.

Since the standard mark-up languages, e.g., HTML, were not designed with wireless devices in mind, it gives rise to a number of problems that include the following: • Lack of image scaling, by which an image in true colour can be toned down to an image with fewer colours and a lower resolution; • Lack of content transformation, by which a PostScript® or Portable Document Format (PDF) document can be translated into a plain text format that can be easily displayed on a mobile device; • Lack of provision for semantic compression, by which a document may be distilled into a smaller form that can be effectively transmitted over a low bandwidth medium and displayed on the small display area of a wireless device; and, • The use of a heavy weight protocol called the Hypertext Transfer Protocol (HTTP) for the transfer of content in the Internet.

To alleviate some of these problems, a new mark-up language called the

Wireless Mark-up Language (WML) and a new protocol suite called the Wireless

Application Protocol (WAP) have been proposed and adopted by the Internet community. In addition to being particularly suitable for mobile wireless systems,

WAP is characterized by a number of desirable attributes that include performance scalability, interoperability among software from different vendors, reliability, and provisioning of data integrity and security. The present invention is concerned particularly with the presentation of data on wireless devices that at present use the WML, although it is envisaged that such devices will support the HTML and/or similar, complimentary as well as other mark-up languages not yet defined, and are therefore not precluded. To this extent, the present invention provides specific aspects related to the presentation of Internet data to a wireless user whose wireless device uses the WML as well as other presentation techniques that are independent of the mark-up languages used in the Internet and by the wireless devices.

Prior art in the area of wireless Internet services can be divided into two groups: patented inventions and products that are not currently protected by issued patents. Examples of the former include the system described in United States Patent

No. 5905719 and Canadian Patent Application No. 2272467. The U.S. patent proposes a technique for economically providing Internet access to wireless devices at data rates comparable to that of ISDN lines. This system considers only communication level issues and does not concern mark-up language conversion and context management that is the focus of present invention . The Canadian application, on the other hand, discloses a bar code reader that reads a bar code and sends it over the Internet to retrieve information corresponding to the bar code stored on information servers. The system, however, is not concerned with the data presentation issues that form the principal attention of the invention described herein.

Example from the latter group includes an HTML to WML converter that only handles direct explicit translation requests made by a user and is, therefore, crude compared to the present invention. The present invention has a number of sophisticated attributes that are not present in any single system described in the prior art. These attributes include adequate support for URLs embedded within JavaScript, the inclusion of URLs associated with images, and the elimination of redundant URLs for context management. The HTML to WML conversion provided by other prior art is quite primitive in comparison to the present invention. For example, the prior art products cannot perform proper conversion of a number of different types of web pages and consequently the user is unable to navigate to the end of said web pages in many cases. The present invention, on the other hand, is free of these problems and effectively and economically performs an HTML to WML conversions. Another example of an HTML to WML translation service is software that must be loaded onto hand held wireless devices such as a Palm VII™. Moreover, such prior art acts like a proxy server that lies between the mobile user and the OS. The user is required to communicate directly with the web site corresponding to the product and none of these products provide the capability for a mobile user to directly obtain content from an OS. Furthermore, the advanced context management features, such as the removal of redundant information, e.g., URLs, and the merging of logically related information, provided by the inventors' server are not incorporated in these other products.

In view of the foregoing, there is a need for a system that provides the context management as well as any necessary mark-up language translation for content stored on an OS in the Internet that is necessary for an effective presentation of said content on a wireless device. The present invention is designed to address these concerns.

Summary of the Invention The objective of the present invention is to efficiently convert the original content requested from an OS into a more compact form that is more suitable to the well-known characteristics of wireless systems that include low bandwidth of the communication media, low power of energy sources, and smaller visual display areas of the wireless devices. The present invention is characterized by the following highly useful attributes:

• The system may interact with any WAP-compliant wireless device and has two modes of operation. In the first mode of operation, the wireless user specifies a URL to the system that obtains the requested resource identified by said URL from the OS, performs context management, e.g., elimination of any redundant information, does any necessary mark-up language translation on it, and sends the results back to the requesting user. In the second mode of operation, the entire context management and, as required, mark-up language translation, processes are completely transparent to the requesting user who communicates directly with the OS. Based on the URL specified by the user to the OS, an "arbitrator agent" at the OS retrieves the resource identified by said URL from the OS, and forwards said resource, e.g., HTML document, to a network document converter server that does the necessary processing, including mark-up language conversion, as required, as well as context management, e.g., elimination of redundant information, and forwards the resultant document, e.g., WML document, to the arbitration agent.

The arbitration agent in turn returns the resultant document to the requesting user. Mapping of HTML to other mark-up languages: The system may map HTML documents to one or more documents specified in another mark-up language, e.g.,

WML, that can be displayed conveniently on the screen of a wireless device. This is consistent with the prior art in the domain. The inventors' server, however, may handle a set of complex mappings that are not handled by all the other systems.

Moreover, the invention may utilize mark-up language "metatags" to perform more efficient mappings that include the following:

- Mapping of xHTML to other mark-up languages, including WML and cHTML

- Mapping of HTML to other mark-up languages, including WML and cHTML

- Mapping of HTML to other mark-up languages

- Mapping of xHTML to other mark-up languages

- Mapping of a generic mark-up language to some other mark-up language. "Metatags" are completely ignored by standard user agents, e.g., HTML web browser, while those interested parties can process them to achieve a desired result. Therefore, the "metatag" provides a transparent solution while fully maintaining backward compatibility with existing systems. In this context, metatags are used to specify specific sections of the network document to process, including "mainText", "Section", and by-lines, e.g., "Author", as well as directives regarding which content to disregard, e.g., "ignoreContent" that will assist in the conversion of the original document to another which is more suitable for viewing on a wireless device.

• Flexibility of Mapping Rules: The mapping and context management performed by the inventors' server is based on a set of rules. A unique feature of this invention is that this set of rules may be stored in a file and thus be tuned to suit the requirements of a specific environment.

• "Merging and Look-Ahead": The system parses the network document authored in some mark-up language, e.g., HTML, that was requested by the user. Under appropriate conditions, the system examines the content of any embedded URLs with the goal of determining whether any portions of the current document reappear in a second document referenced by a URL embedded in the current document. If so, the common portions are removed from the current document and are replaced by a single selectable hypertext link that references the appropriate URL for the second document. For example, the text of a news headline and the introductory paragraph of said news article that appear in the current document as well as on another document referenced by one or more URLs embedded in the current document are reduced to a single selectable hypertext link with a meaningful name, e.g., headline text, in the document presented to the user. Hereon in this process is called "context merging". One possible method of implementing context merging is to use this type of "look-ahead" algorithm. In a given document, content may be determined to be similar in nature and, consequently, reduced to a much more compact representation consisting of a single selectable hypertext link that references the appropriate document. Should there exist multiple instances of similar content in a given document, each such instance is identified and the set of instances is represented by a single hyperlink, thereby creating an index or table of contents for presentation to the wireless user that facilitates access to the requested content. This feature of the present invention leads to a reduced amount of text to be displayed on the small screen of a wireless device and also gives rise to bandwidth savings. Consequently, this leads to economic savings for the user of systems in which the user is charged on the basis of airtime as well as in packet switched wireless systems (such as the 2.5G and 3G Cellular Systems) where users will be charged on a per packet basis. Moreover, it greatly facilitates the usability of the content. • URL rewriting

Given the limited bandwidth available to wireless devices and the desire to reduce the amount of information sent to and from such devices, the system replaces URLs embedded in the requested network document with much more compact representations of the same. These more compact representations appear in the converted document, e.g., cHTML network document, sent to the wireless user.

The immediate benefits of this approach include a very significant reduction in the amount of information that is sent over the wireless media thereby improving perceived system response time, reducing communication costs, both monetary and otherwise, and increasing processing efficiency by the system as it handles far fewer bytes to process any subsequent request from the wireless user.

• URL Pruning and Delta-Display: The pruning algorithm according to the present invention is concerned with the conditional removal of redundant information, e.g., URLs, within a given network document, which is authored in some mark-up language, e.g., HTML, as well as the conditional removal of any information that is considered to be redundant from any subsequent network document requested by a user. Hereon in, these two processes are called "Intra-page pruning" and "Inter-page pruning", respectively. If the resource selected by a user from an embedded URL in the current network document contains information that was already returned as part of a previous network document(s), this redundant information is removed. In this manner, only the difference, or delta, between the network documents requested by the user is displayed on the user's screen. This Delta-Display feature of the present invention utilizes the display area in the wireless device more effectively, saves in terms of bandwidth and the computations required by the user agent, reduces the latency for subsequent URL requests and allows the user to more quickly identify the object of interest. This also gives rise to an economic benefit to the users. Presentation of categorised hyperlinks, i.e., URLs, from requested network document to the requesting user: Having selected a hyperlink, a resource, e.g.,

HTML document, is processed to extract its embedded URLs. Having extracted all the URLs from the requested resource, the URLs are categorised according to their inherent hierarchical directory structure. Suitably, file system directory structure. Then, based on some metric, which is selectable by the requesting user, the system sorts the URLs at each level in the URL hierarchy. The metric used to sort may be compound. The metric may be any number of different algorithms, e.g., Most Recently Used (MRU). Once complete, the sorted list of hyperlinks is presented to the requesting user. If presented as a numbered list, the user may select any item with a single key press. Alternative means exist to present the same list to the user including a list of hyperlinks, which the user may select by an appropriate input device. By categorizing and sorting, and, where appropriate, numbering, the hyperlinks that appear on the requested network page, the user is presented with a much more efficient interface to the requested resource, thereby facilitating access, and, therefore, navigation of the site at which the resource exists. • Universal Applicability: The invention is able to interact with any wireless device that is WAP-compliant.

The product runs on the UNLX™ and Windows™ operating systems. It is fully Java™-compliant and is compatible with HTML, compact HTML (cHTML) and WAP. An additional browsing layer on top of the converted content of the network document for enhancing user friendliness characterize the product. In accordance with one aspect of the invention, there is provided a system to facilitate navigation of internet from a wireless device comprising: (1) a user inputting a URL from a wireless device to create a request for a specified resource; (2) sending the request to a proxy server; (3) interpreting the request by the proxy server and formulating an appropriate request, e.g., HTTP request, to the OS that contains the requested resource; (4) an arbitrator agent at the OS receiving the request and sending the requested resource, which exists within a file at the OS, e.g., HTML document, in its original form to an intermediate conversion server; (5) the intermediate conversion server performing any necessary mark-up language conversion, e.g., from the HTML format to the WML format, performing context management, e.g., URL pruning, and sending the converted file back to the OS; (6) the arbitrator agent at the OS receiving the resultant formatted file and sending the file to the proxy server; and (7) the proxy server returning the file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user's wireless device for suitable browsing.

In accordance with another aspect of the invention, there is provided a system to facilitate navigation of the Internet from a wireless device comprising: (1) a user inputting a URL from a wireless device to create a request for a specified resource; (2) sending the request to a proxy server; (3) interpreting the request by the proxy server and formulating an appropriate request, e.g., HTTP, to an intermediate conversion server; (4) the intermediate conversion server receiving the request and sending it to the OS that contains the requested resource; (5) the OS receiving the request, e.g., HTTP request, and retrieving the requested resource, which exists within a file, e.g., HTML file, for the user and returning the file in its original format, e.g., HTML, to the intermediate conversion server; (6) the intermediate conversion server performing any necessary mark-up language conversion, e.g., HTML format to WML format, performing context management, and sending the converted file back to the proxy server; and (7) the proxy server returning said file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user's wireless device for suitable browsing.

In accordance with yet another aspect of the invention, there is provided a system to facilitate navigation of the Internet from a wireless device comprising: (1) a user inputting a URL from a wireless device to create a request for a specified resource; (2) sending the request to the OS that contains the requested resource; (3) an arbitrator agent at the OS receiving the request and sending the requested resource, which exists within a file at the OS, e.g., HTML document, in its original form to an intermediate conversion server; (4) the intermediate conversion server performing any necessary conversion of the file and sending the converted file back to the OS; (5) the arbitrator agent at the OS receiving the resultant converted file and returning the converted file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user's wireless device for suitable browsing. Brief Description of the Drawings

In the accompanying drawings which illustrate embodiments of the invention,

Figure 1 shows a typical scenario of operations of a preferred embodiment.

Figure 2 shows a high level architecture of the present invention. Figure 3 shows a flowchart of the input and output of a preferred embodiment.

Figure 4 shows a flowchart of the high-level operation of a preferred embodiment.

Figure 5 shows a flowchart of the algorithm "GetFinalPage" used by the algorithm presented in Figure 4.

Figure 6 shows a flowchart of the algorithm "Parser" used by the algorithm presented in Figure 4.

Figure 7 shows a flowchart of the algorithm "GetMainStory" used by the algorithm presented in Figure 6.

Figure 8 shows a flowchart of the algorithm "Saveh termediateFormat" used by the algorithm presented in Figure 4. Figure 9 shows a flowchart of the algorithm "Handler" used by the algorithm presented in Figure 4.

Figure 10 shows a flowchart of the algorithm "WriteMainContent" used by the algorithm presented Figure 9.

Figure 11 shows a flowchart of the algorithm "WriteCategories" used by the algorithm presented in Figure 9.

Figure 12 shows a flowchart of the algorithm "Classify" used by the algorithm presented in Figure 11.

Figure 13 shows a flowchart of the algorithm "InsertLinklntoTheTree" used by the algorithm presented in Figure 12. Figure 14 shows a flowchart of the algorithm "CleanTheTree" used by the algorithm presented in Figure 12.

Figure 15 shows a flowchart of the algorithm "RemoveNotOverWrittenMetricLinks" used by the algorithm presented in Figure 14. Figure 16 shows a flowchart of the algorithm 'TSTormalizeTheTree" used by the algorithm presented in Figure 14.

Figure 17 shows a flowchart of the algorithm "WriteRelatedTopics" used by the algorithm presented in Figure 9.

Figure 18 shows a flowchart of the algorithm "RetrieveRelatedTopics" used by the algorithm presented in Figure 17.

Figure 19 shows an example network document.

Figure 20 shows the example network document presented in Figure 19 in which just the elements of the network document that are associated with URLs are retained.

Figure 21 shows the constituent URLs extracted from the example network document presented in Figure 19.

Figure 22a & 22b show the high level operation for the URL categorisation algorithm in which the constituent URLs extracted from the example network document presented in Figure 19 are categorised for presentation to the user.

Figure 23 shows a flowchart of the algorithm "URLTreePruning." Figure 24 shows a flowchart of the algorithm

"getUniqueNetworkDocumentElements" used by the algorithm presented in Figure

23.

Figure 25 shows an example of pruning redundant URLs.

Detailed Description of the Preferred Embodiment At present, the HTML is used widely for developing the content available from the WWW, which is a part of the Internet. HTML is an evolving specification. It is difficult to know whether HTML will remain the de facto standard for authoring WWW content or whether it will be replaced and/or complemented by a variant of HTML, e.g., xHTML, or an entirely new mark-up language not yet devised. However, the most important characteristic of all these mark-up languages for WWW content is that they are designed for displaying information on a standard desktop computer equipped with a reasonably large computer monitor, with a 17" diagonal screen dimension for example, that is connected through a wireline network and an optional proxy server to an OS that resides in the Internet. As wireless communication is becoming more and more prevalent, a large body of users is becoming more interested in using wireless devices for accessing the WWW. Using a "user agent," an Internet user enters, edits, and receives data through the interpretation of content that is authored in some mark-up language. A user agent is any computer program that is capable of processing the requested content that resides on a given OS in the Internet. The most common user agent for the WWW is known as a "web browser", or simply a "browser". In comparison to a desktop computer, a wireless device is characterized by a lower performance compute and communication system, a smaller and low- resolution, often black and white, display, and a limited set of I/O devices. The wireless interconnection networks, which carry the content of web pages to these wireless devices, have bandwidths that are orders of magnitude lower than those used with the wireline systems. Since the current standard mark-up languages, e.g., HTML, used to author WWW content were not designed with wireless devices in mind, a number of problems arise including the lack of image scaling, the lack of content transformation, the lack of provisions for semantic compression, and the use of the underlying heavyweight HTTP protocol. The present invention provides an effective solution to some of these problems that concern the access of network documents,

which are authored in some mark-up language, by wireless devices.

The present invention has two major components: The arbitrator agent and the intermediate converter server, referred to as "ICS" in the following text. The ICS performs the necessary conversion of the input network document. Suitably, this

conversion involves conversion between mark-up languages and context management. Referring to Figure 1, the arbitrator agent 100 resides at the OS. The user's request for a network document is intercepted by the arbitrator agent 100 that sends the network document to the arbitrator agent server 101 residing at the ICS (see Figure 2). Located within the ICS is a network page converter 102 that performs translation of network pages from one mark-up language to another, e.g., HTML to WML, as well as the necessary context management and sends the resulting document back to the arbitrator agent at the OS, which in turn forwards the resultant network document to the user agent on the user's wireless device. The operations performed by ICS are explained in the following paragraphs.

Figure 2 presents a high level architecture of the ICS. As indicated in Figure 2, the system consists of two components: an arbitrator agent server 101 and the network document converter 102. The ICS performs the necessary conversion of the input network document. Suitably, this conversion involves conversion between mark-up languages and context management. The operation of the preferred embodiment is

explained with the help of the following scenario:

1. Using an appropriate user agent, e.g., micro-browser, which runs on the user's wireless device, the user requests the resource of interest from an OS in the Internet by inputting its appropriate URL.

2. A request is sent to the proxy server that resides within an air carrier. 3. The proxy server interprets the request and formulates an appropriate request, e.g., HTTP request, to the OS that contains the requested resource.

4. The OS receiving the request and retrieving the requested resource, which exists within a file, e.g., HTML document, and uses the arbitrator agent 100 to send it to ICS.

5. The arbitrator agent server 101 processes the input arriving at II of ICS (Figure 2). The arbitrator agent server 101 sends the incoming network document to the network document converter 102.

6. The network document converter 102 performs the necessary conversion, including mark-up language translation and context management, and sends the results back to arbitrator agent server 101.

7. Arbitrator agent server 101 sends the resultant document to the OS through OI .

8: The OS uses the arbitrator agent to send the resultant file to the proxy server.

9. The proxy server returns said file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user agent on the user's wireless device.

The network document converter 102 performs the primary task of the ICS: context management. The major functionalities of the network document converter 102 are as follows:

Mark-up Language Translation:

Conversion from one mark-up language such as HTML to another such as WML is one of the important features of the present invention. A network page with HTML content consists of static components such as text, images, as well as dynamic content that include animations, video and audio samples demarcated with text known

as tags. An effective translation of HTML to WML is required when the content from an OS is presented to a wireless subscriber who is using a WAP -compliant device. An integral part of this proprietary conversion algorithm is the mapping of HTML tags to WML tags that are of a similar nature. A detailed description of the mappings between HTML and WML tags is presented in the Appendix. The translation algorithm is conscious of the limitations of the wireless device. The inclusion of "metatags" in the source network document aids in the conversion of the document into one or more different mark-up languages, e.g., WML. By definition, "metatags" are completely ignored by standard user agents, e.g., HTML web browser, as they appear within the comment sections of the document; however, those interested parties may process them to achieve a desired result. Therefore, the "metatag" provides a transparent solution while fully maintaining backward compatibility with existing systems and user agents. In this context, metatags may be used to specify or demarcate specific sections of the network document to process, including "mainText", "Section", and by-lines, e.g., "Author", as well as specifying directives regarding the content, e.g., disregard - "ignoreContent," that will assist in the conversion of the original document to another which is more suitable for viewing on a wireless device. These "metatags" may be structured and stored in a separate file similar to that of a Document Type Definition (DTD) file. Like a DTD, the "metatags" stored in the file would essentially define the rules of the document, including the structural relationship between the elements and define what "metatags" can appear within the network document and optionally the corresponding "metatag" attributes and their permissible values. The present invention can effectively utilize the existence of such metatags and handle pages with both static and dynamic content.

Context Management:

(I) Looking Ahead and Context Merging:

Two or more seemingly independent components of a network document may refer to the same resource. One example is a headline and the accompanying introductory paragraph of an article that appears on a web page of a news gathering organization. Different types of objects on the page that include the headline, the text or a "more" button are themselves hypertext links or "hyperlinks". Each of these objects is associated with an URL that points to the article itself. Although such a multiplicity of the same link is visually appealing, the limited display space available on a wireless device is not appropriate for viewing such an object with many redundant components. Moreover, if the user decides not to read the article then this additional information goes to a total waste. This costs the user additional airtime as well as bandwidth.

The present invention determines whether the information that appears around and/or as part of an anchor tag in a network document also appears in the network document that is referenced by the associated embedded URL. If the information does reappear, it is reduced to a single hyperlink with a meamngful label. In this manner, no unnecessary information is sent to the wireless user beyond an unwanted headline.

To achieve this goal, the present invention parses the requested network document to determine the presence of redundant information. The network document converter 102 in Figure 2 examines the resource, which is referenced by the embedded URL, hence "looking ahead", hi this context, "looking ahead," means examining the content of the resource referenced by the URL in question to determine the reappearance of the same content. If such a repetition is discovered, the headline and introductory text of the article are "merged" into a single URL, which is presented to the user with a meaningful label.

Additionally, to facilitate access to content by improving navigation within the same, a requested network document that contains one of more "frames", a formatting feature as nominally described in the HTML specification, the network documents referenced in each of the constituent frames, as specified by the URLs associated with each of the frames in the set of frames, i.e., "frameset", are "merged" into a single network document which is then processed by the ICS. This process is called "frame- merging." Should one or more referenced network documents in the frameset reference another in which frames are employed, the process is repeated. This "frame- merging" process is depicted in Figure 5. An immediate benefit of this process is granting access to wireless users who employ as a compact HTML (cHTML) user agent. By definition, cHTML does not support frames. Therefore, without this "frame-merging" operation, said wireless users of cHTML user agents would not have access to the requested network document should it contain one or more frames. Additionally, owing to the size of the display on typical wireless devices, it is impractical to display content in multiple frames. Therefore, for any user agent employed, there is a benefit of this process should the user agent operate on a device that has a smaller screen or only rudimentary input devices.

hi addition to the removal of redundant information, the context merging software groups logically related network pages together. For example, all articles on finance on a network page are merged under a single, selectable hyperlink, which is given a meaningful label, e.g., "Finance," in the content returned to the requesting user. The algorithm looks ahead in the network document to identify logically related information. This is achieved by extracting the URLs embedded in the requested resource, e.g., HTML document. Having extracted all the hyperlinks from the requested resource, the hyperlinks are categorised according to their inherent hierarchical directory structure. Suitably, this is a file system directory structure.

The process of categorizing the URLs is called the "URL categorization" algorithm herein. The description of the operation of the "URL categorization" algorithm is aided via a number of flow diagrams that describe portions of the operation of the preferred embodiment.

Figure 3 shows a flowchart of the input and output of a preferred embodiment. Figure 4 shows a flowchart of the high-level operation of a preferred embodiment in which the input is a network document authored in some mark-up language and the output is another that is more suitable for viewing on a wireless device which has only a small display and rudimentary input devices. The first step is to process the requested network document. The "GetFinalPage" algorithm depicted in Figure 5 does this. In processing the page, the algorithm determines whether the requested network document contains one or more frames. If so, the algorithm will retrieve the network documents that correspond to the various frames within the requested document and combine them into a single document. Should one or more of the retrieved network documents contain one or more frames, the process is repeated. At the completion of the "GetFinalPage" algorithm, the resultant network document is returned to the calling algorithm where the "Parser" in turn parses it. The "Parser" algorithm is depicted in Figure 6.

Depending on the elements found within the network document, the parser will behave differently. When it encounters a URL, or hyperlink, the parser will determine the full URL, i.e., via a base URL or similar, and add it the list of URLs for the requested network document. (This is the basis of the URL categorisation algorithm.)

If the document is a "Form", the form is processed accordingly and an entry is added to the form list. Should the element be an image map, or similar, the parser extracts all of the constituent URLs. It then resolves the URLs to their full absolute equivalent representations. If the parser can identify what is loosely coined the "main content" or

"main story" in the requested network document, the parser calls the "GetMainStory" algorithm, which is described in Figure 7, and adds it the list of "stories" found in the requested document. In all cases, the parser puts its results into an "intermediate format" that is neither the source nor target document mark-up language, but a demarcated document that permits the generation of the same content in any number of mark-up languages suitable for use with the requesting user agent that resides on the requesting party's wireless device.

The "GetMainStory" algorithm used by the parser is presented in Figure 7. The "GetMainStory" algorithm identifies possible headlines within the requested network document. Should a headline be identified, the corresponding story to which the headline belongs is stored and the algorithm "GetMainStory" terminates.

As described above, the parser stores its results in an intermediate format suitable for efficient processing by a mark-up language specific handler, of which they may exist an arbitrary number. Figure 8 depicts the "SavelntermediateFormat" algorithm that stores the intermediate representation to secondary storage, e.g., hard disk drive, for later processing. Next, the suitable mark-up language handler, or simply "handler," algorithm is called with the intermediate representation as input.

The handler algorithm, as depicted in Figure 9, processes its input, provided in a suitable intermediate format, to carry out a number of related tasks depending on the input:

a) The algorithm calls the "WriteMainContent" if the input contains content identified as "main content";

b) The algorithm calls the "WriteCategories" algorithm in order to deliver the suitable URLs to the wireless user that facilitates access to the content contained within the requested network document;

c) The algorithm calls the "WriteRelatedTopics" algorithm to identify information related to the requested network document;

d) The algorithm processes and writes the "Forms" contained within the input to the output destined for the wireless user; and

e) The algorithm processes the input and writes the required output destined for the wireless user when no structure can be imparted on the requested network document, e.g., document contains no URLs nor any significant text.

Suitably, that algorithm may carry out more than one of a) through e). For example, should the "Handler" algorithm detect the presence of "main content," the algorithm will immediately return the first unit of information to the wireless user. This process reduces the user's perceived system latency. After returning the first unit of information, the algorithm will continue to process the remaining input until the input is exhausted and the algorithm completes. Should the input not contain any "main content", the next most desirable choice is to return the first unit of categories into which the various constituent URLs have been categorized. Failing this, the "Handler" algorithm will return the first unit of the "full content," which is essentially a linear representation of the originally requested network document in a mark-up language compatible with the user agent employed on the wireless device. To determine how cases (a), (b) and (c) are handled, the "WriteMainContent", "WriteCategories" and "WriteRelatedTopics" algorithms are now described.

The "WriteMainContent" algorithm is given in Figure 10. The goal of this algorithm is to present the user with a viable and efficient interface to the content in the requested network document. One manner of achieving this involves presenting the user with a "table of contents" view of the requested network document, rather than the entire document in full. Therefore, the "WriteMainContent" algorithm identifies each "main story" or significant content item within the requested document. As each such item is identified it is added to the table of contents for the requested document. When no more significant content elements are distinguishable, the algorithm makes the "table of contents," e.g., within an output file, available to the arbitrator agent server 101 as depicted in Figure 2, and proceeds to write each of the "main stories" to the output that the user may in turn request. The arbitrator agent server 101 then returns this first amount of information to the requesting user. This two-cycle approach de-serializes the processing of the "table of contents" and the referenced information. This expedites delivery of a suitable interface to the content to the waiting user, thereby reducing perceived system latency and increasing system efficiency: by the time the user makes a request for an item referenced in the "table of contents", the system has completed processing said referenced information and is able to immediately return the requested information. The immediate benefit of this approach is that it makes the most efficient use of the limited bandwidth available in the wireless network and minimizes the latency experienced by the end-user. Combined, the approach yields a significantly more efficient interface to the requested network document.

Should the requested network document not be amenable to providing the requesting user with a "table of contents" view of the document, an alternative approach is to return an "index" into the requested network document to the user. Such an index provides for random access to content referenced within the requested network document, thereby creating a very efficient means of accessing the content contained therein. This process is now described with the aid of several flow diagrams of the general "WriteCategories" algorithm.

Figure 11 depicts the flow diagram for the "WriteCategories" algorithm. Given the appropriate input, which is specified in an appropriate intermediate format, the "WriteCategories" algorithm attempts to classify or organize the various URLs within the requested network document according to their inherit structure, i.e., file directory structure.

Once the user has been presented with either the content of the requested network document, or an efficient interface to the same, e.g., "table of contents" view, a logical next step is to provide the user with immediate access to information related to the network document just requested, hi doing so, the user is presented with an efficient means to identify, retrieve and read information related to the requested network document. Effectively, there are two (2) types of information related to the current network document with respect to their logical belonging to a group of information: those within the same top-level domain and those within one or more different top-level domains. The former is now discussed in the context of Figure 11 that depicts the flow diagram of the "WriteCategories" algorithm.

Input to the "WriteCategories" algorithm is provided in a suitable intermediate format, which includes well-demarcated content that, while present in the network document requested by the wireless user, was not so well structured in the original document. The "classifier object" includes the URL tree whose nodes include two main fields: "full URL" and "description." The full URL is the absolute URL of the referenced resource; the "description" is the description assigned the associated URL. In Figure 11, if the classifier object exists, then the URL tree already exists and only minimal processing is required before returning the results to the user, i.e., write the resultant network documents based on the URL tree or "classifier object." If the URL tree does not exist, it must be built. The first step in building the URL tree is to build an empty URL tree. This is completed when the classifier object is created. As depicted in Figure 11, the next step involves calling the "Classify" algorithm. The "Classify" algorithm is presented in Figure 12. The "Classify" algorithm takes the list of absolute, or full, URLs along with their associated descriptions, a unique user identifier and the current absolute URL as input. The first step of the "Classify" algorithm involves determining whether the requested network document, in its final form, e.g., WML, is available from a suitable cache of network documents. If so, less processing of the list of absolute URLs is required: the list of URLs is retrieved from a second cache that holds the list of URLs present in the requested network document; otherwise, the list of absolute URLs provided as input to the algorithm is stored in the URL list cache. The next step involves retrieving the "metric" links from a "metric table" in a database. This database is used to store those URLs that have been previously selected by the requesting user on a per-site basis based on domain name, but not fully qualified domain name. Records in said database are maintained on a per-user basis. Discussion of the metric is deferred to a subsequent section of the disclosure. Having retrieved this list of "metric links", or "metric URLs", from the database, they are used to build the URL tree. This is accomplished by calling the " sertLmklntoTheTable" algorithm presented in Figure 13.

The goal of calling the "InsertLinklntoTheTable" algorithm with the list of "metric URLs" as input is to permit the URL tree to be constructed while implicitly sorting it concurrently according to the user's usage behaviour captured by the metric URLs. These metric URLs are indicative of the user's past behaviours with respect to the top-level domain of the metric URLs. Once constructed, the URL tree will be populated with the URLs present in the requested network document by calling the "InsertLinklntoTheTable" algorithm with the list of the URLs present in the requested network document. This ensures that the URLs are inserted in the tree such that those items in which the user has shown the most interest in the past are retumed quickly via a straightforward traversal of the URL tree. There are a number of immediate benefits of this approach, including customization of the Internet content on a per-user basis, facilitating navigation of the Internet content available from the requested network document, providing a more efficient interface to the requested content, and making more efficient use of the limited low bandwidth and high latency wireless link upon which the wireless user relies.

The "InsertLinklntoTheTable" algorithm is explained as follows. Input to the algorithm consists of an absolute, or full, URL and its associated description, e.g., "www.mvskvweb.com/about/mRmt/mgmt.html" and "MySkyWeb Management". The first step in the algorithm determines whether a directory structure exists within the provided absolute URL, e.g., "/about". If such a directory is available, processing of the provided URL continues to determine if the URL is "acceptable"; otherwise, the absence of a directory structure readily identifies the remaining URL component as a leaf node in the URL tree.

If the URL is acceptable, a check is performed to determine whether such a

"category" already exists within the URL tree; otherwise, the URL or its description may be replaced via a rule specified in an external file, e.g., database, with an alternative representation, e.g., rewrite "mgmt" as "Management", excluded or ignored. By employing a rule-based system, each URL is processed independently to determine if the current URL component, i.e., directory name, is acceptable. If the URL is deemed acceptable, but the category does not exist within the URL tree, a "category node" is created and inserted into the URL tree at the current level in the tree, i.e., effectively at the end of the current list of categories. If the URL is deemed acceptable and the category exists within the URL tree, the tree is traversed until the corresponding category node is found and processing of the input URL continues. When the URL component being considered has no directory structure, then the URL component is considered for inclusion in the URL tree as a leaf node. First, a check is performed to determine whether the URL component is already present in the tree. If so, it is compared against the existing one. Provided that the description of the URL specified as input to the algorithm is more descriptive than that currently associated with the URL component in the leaf node, the description in the leaf node is updated; otherwise, no changes are made to the leaf node. In both these cases, the algorithm terminates. Should the URL component not exist within the URL tree, a new leaf node is added to the URL tree. If the new node just added is the result of a "metric link" input, it is added to the list of URLs that may be removed from the URL tree.

Once populated with the URLs present in the requested network document, the URL tree is "cleaned" to remove the metric URLs, which were used to construct the initial URL tree, should they not appear in the requested network document, e.g., metric URL is no longer valid. This is accomplished by calling the "CleanTheTree" algorithm presented in Figure 14. The "CleanTheTree" algorithm takes the URL tree just constructed and the list of metric URLs to be removed as input. In turn, the "RemoveNotOverWrittenMetricLinks" algorithm presented in Figure 15 is called.

The "RemoveNotOverWrittenMetricLinks" algorithm processes each metric URL that appears in the input list individually. The first step locates the referenced metric URL in the URL tree. If, when building the tree, a newer URL did not replace, i.e., overwrite, a metric link in the tree, i.e., URL is invalid, the metric URL is removed from the tree. The algorithm terminates when the input list of metric URLs is exhausted.

The second step involved in "cleaning" the URL tree involves normalizing the tree. The "CleanTheTree" algorithm calls the "NormalizeTheTree" algorithm, which is presented in Figure 16, to carry out this task. The 'TSformalizeTheTree" algorithm takes the current URL tree as input.

Beginning with the root node, the "NormalizeTheTree" algorithm retrieves the next node using a depth-first search. If the operation is successful, the algorithm examines the contents of the current node to determine its characteristics; otherwise, the algorithm terminates. The algorithm determines whether the current node has an associated URL and more than a single child node. If so, a new child node is created and the URL associated with the current node is set to null, thereby making it a category selection, e.g., "Management." The URL field in the new child node is set to the URL formerly associated with its parent, e.g.,

"www.myskyweb.com/about/management/" and is given a description of "More on this..." (The other child nodes will contain URLs under the category of the parent that reference specific network documents, e.g., www.mvskvweb.com/about/management/executives.html.) Should the current node have no associated URL, e.g., category, the algorithm determines whether the current node has only a single child node. If so, it would mean that the user would be presented with a category selection in which only a single resource could be obtained. Therefore, to improve upon the interface provided to the wireless user the category is replaced by the single available selection under that "category." This process is called "path compression." Another benefit of this approach is the reduction in the amount of information that must be sent over the low bandwidth and high latency wireless link to the user, thereby making more efficient use of the wireless link. Additionally, by providing such a "More on this..." topic selection in the choices returned to the wireless user under a specified "topic" or "heading", the user gains ready access to content directly pertinent to that "topic" or "heading." An immediate benefit of this approach is improved navigation of the content within the requested network document, including the quick identification and retrieval via a single selection of content related to the current category or "topic."

Having categorised the URLs present in the requested network document by suitably placing them into an appropriate data structure, i.e., the URL tree, the next step is to present the constituent embedded components of the network document to the requesting user in an intelligent fashion that facilitates navigation of the requested network document and, thus, the Internet site at which the network document exists. This process begins with the "Classify" algorithm presented in Figure 12.

The "Classify" algorithm traverses the URL tree using a breadth-first search (BFS) to obtain an ordered list of topic selections and their associated URLs. In this manner, the selections for the highest-level topics, and hyperlinks, i.e., URLs, as appropriate, are placed into the first unit of infoπnation to be returned to the requesting user. Each iteration of the BFS algorithm will identify more detailed topic selections from the ordered URL tree and include them as hyperlinks with the corresponding URLs in the resultant network documents. This process continues until the BSF algorithm terminates. Consequently, the resultant network documents returned to the user have a hierarchical structure that facilitates access to the content available from the requested network document.

The "Classify" algorithm writes the topic output to the appropriate number of resultant network documents, e.g., 1500 byte WML decks, which are stored in main memory and possibly secondary storage. A caching system, which aids in the quick retrieval of this output when the user requests it again, may be employed. Finally, the "Classify" algorithm returns the classify object to the calling algorithm, i.e. "WriteCategories". In turn, the "WriteCategories" algorithm returns the first file name of resultant output, i.e., network documents, to the arbitrator agent server 101 of the intermediate conversion server presented in Figure 2. Using the reference filename returned to it, the arbitrator agent server 101 returns the first resultant network document to the OS. The OS returns the resulting network document to the requesting user.

In this manner, the top-level categories or topics are very quickly returned to the user who requested the network document while the processing of the remaining content by the network document converter 102, as presented in Figure 2 as part of the intermediate conversion server, continues. When the user makes a request of the system based on the selections available in the initial content returned to him, e.g., list of categories, the network document converter 102 will have completed all the necessary processing of the requested network document. By de-serializing this process, the perceived system latency is reduced and the user is quickly presented with a means to navigate easily within the requested content.

Initially, the user is presented with the first set of selections to access the constituent components of the requested network document. The exact ordering of these selections depends on the selected URL sorting metric in use, e.g., Most Recently Used. The explanation of the sorting metric within the context of the "Classify" algorithm is deferred to the next paragraph. Thereafter, the user's selection will determine which one of the resultant network documents produced by the "Classify" algorithm will be returned to the user. The resultant network document will contain an ordered, and, as appropriate or specified, numbered, selections for the requesting user. Given the use of the breadth-first search by the "Classify" algorithm, an efficient hierarchical interface to the content available from the requested network document is provided to the requesting user. According to the "Classify" algorithm described in Figure 12, a leaf node in the URL tree data structure will correspond uniquely to a single network document.

The ordering of the various selections presented to the user is determined by "Classify" algorithm. As described above, the "Classify" algorithm retrieves a set of "metric links" from a database. To achieve the ordered and, as appropriate or specified, numbered, list of URL components, a database is used to store those URLs that have been previously selected by the requesting user on a per-site basis based on domain name, but not fully qualified domain name. Records in said database are maintained on a per-user basis. When a user requests a resource by selecting a URL, that URL is added to the database for that user along with the time and date of the request, provided that a coπesponding record does not already exist for said user and said URL; otherwise, the corresponding record in the database is updated to reflect the current time and date of the request. Once the arbitrator agent server 101 as shown in Figure 2 retrieves the resource requested by the user, the database is queried to locate all the records that contain the same top-level domain of the requested resource that the requesting user has previously selected. These records correspond uniquely to the requesting user.

Based on the records returned from the database query and the specified sort metric, e.g., Most Frequently Used, a "metric list" is created, i.e., list of absolute URLs ordered according to some specified sort metric. With the "metric list" and list of absolute URLs extracted from the requested network document, as specified by the original URL selected, the "Classify" algorithm calls the "InsertLinklntoTheTree" algorithm. As described previously, the " sertLinklntoTheTree" algorithm builds an implicitly sorted tree using the sorted "metric list." Once complete, control is returned to the "Classify algorithm" which in turn calls the "InsertLinklntoTheTree" algorithm with the list of absolute URLs and their corresponding descriptions extracted from the requested network document. Having already structured and ordered the URL tree according to the "metric links", the insertion operation now places the absolute URLs extracted from the requested network document in the correct position based on the sorting metric in use. As required, the final steps of the "Classify" algorithm result in removal of invalid "metric links" from the URL tree and the normalization the tree, e.g., to prevent single selections under a specified topic, i.e., node with a single child node.

Therefore, as described the sorting metric employed by the "Classify" algorithm may be any suitable sorting metric, e.g., Most Recently Used. By definition, the sorting metric may be compound. Depending on the sorting metric employed, the use of a heuristic for the presentation of the categorised URLs to the requesting user may be useful. For example, when the number of the categorised URLs is large, presenting the entire URL selection list according to a "frequency of use" sorting metric algorithm may reduce the usefulness of the system. Therefore, a heuristic may be employed to limit the number of ordered URLs in the list to some maximum, e.g., five, according to the "frequency of request" field and to present the remaining categorised URLs to the user in one of two fashions:

1. Alphabetically, or;

2. In the order in which the embedded URLs appear in the requested resource.

Consider the example network document given in Figure 19. The network document consists of a number of different elements, including news headlines that have an associated URL, or hypertext link, introductory paragraphs to full articles, and hypertext links, i.e., URLs, to full stories as well as more articles on "national", "world", "city" and "business" news, e.g., "More world news". Figure 20 depicts just those elements, e.g., headline, of the network document given in Figure 19 that are retained because they have an associated URL. Figure 21 depicts the entire set of relative URLs associated with one or more of the various elements of the network document given in Figure 19. Note the multiplicity of URLs that reference the same resource, e.g., "/city." Having extracted all the hyperlinks from the requested resource, the hyperlinks, i.e., URLs, are categorised according to their inherent hierarchical structure, i.e., file system directory structure. Figure 22a depicts the entire set of relative URLs associated with one or more of the various elements of the network document given in Figure 19 that have been categorised for presentation to the user as selections entitled "National", "City", "World news", "Business", "Columns", and "Election 2000". Note that when one or more elements of the web page are associated with the same URL, the most descriptive element or attribute of the element, e.g., ALT attribute for an image element, is presented to the user as the selection. One scenario, in which the selections are presented as a numbered list, is shown in Figure 22b. Should the user select "1. National",she/heis presented with the selection of national stories based on the manner in which the hyperlinks, i.e., URLs, have been categorized. Therefore, the headline text, "Green party goes into the Red" is given as a selection as it has been categorised, or grouped, under "National." Items number two, three and so on would include selections for all the elements on the network document originally requested that have been categorised under "National". In the specific example given in Figure 22b, selection number four is entitled "More on this ..." to present the user the opportunity to see all the items that appear under the "National" category, including those not on the network page originally selected. This "category" selection is always presented to the user as the last selectable item in response to a specified item, e.g., "1. National." For simplicity, these six selections in ' Figure 22a are presented in no specific order; however, the system sorts all the selections in a given category or subcategory according to some metric, which is selectable by the requesting user, and presents the sorted selections to the user. This process was described when the "WriteCategories" and "Classify" algorithms were described.

An extension of the "Classify" algorithm for the presentation of a hierarchical view of the content available via the requested network document is to permit the requesting user to quickly identify information or content related to the requested network document. One approach to identifying such related information is based on the inherent structure of the current network document, i.e., the one most recently requested by the user. Such related content or information is easily identified using a depth-first search (DFS) of the URL tree data structure. For practicality, the results of the search may be limited to a specified maximum number. The results of such a search shall be presented to the user as an ordered, and, as appropriate or specified, numbered item selection. Therefore, with a single key press, the user may obtain the content most related to the current network document as determined by the author(s) of the content. This approach is described with the aid of the flowchart for the "WriteRelatedTopics" algorithm presented in Figure 17.

The "WriteRelatedTopics" algorithm first determines whether the classifier object exists: if it does not, no URL tree exists; otherwise, the "WriteRelatedTopics" algorithm obtains the related topics by calling the "RetrieveRelatedTopics" algorithm presented in Figure 18. As the "WriteRelatedTopics" algorithm has the network document available in a suitable intermediate format, it may construct the necessary URL tree using the exact same process as described for the "WriteCategories" algorithm, i.e., create the classifier object and call the "Classify" algorithm to build the necessary ordered URL tree.

The "RetrieveRelatedTopics" algorithm takes 'max_hnks' and a reference absolute URL as input parameters. 'max_links' is used to limit the maximum number of URLs returned by the algorithm, i.e., the maximum number of related topics. The algorithm identifies the location of the reference URL in the URL tree. Once found, it sets 'count' to 0. The algorithm identifies information related to the reference URL by performing a DFS beginning with the node corresponding to the reference URL. Each node returned by the DFS of the URL tree is examined to determine whether it has already been used. If it is not used, the node is marked as "used", the URL and its associated description is added to the list of related topics and 'count' is incremented by 1 ; otherwise, the DFS is restarted with the direct parent of the current node and count is not incremented. This process continues until either 'count' exceeds 'max_link' or the DFS has traversed the entire URL tree. On completion, the "RetrieveRelatedTopics" returns to the calling algorithm a list of at most 'max_link' URLs, along with their corresponding descriptions, in the order of their retrieval from the sorted URL tree.

When control returns to the "WriteRelatedTopics" algorithm, the algorithm writes the list of related topics in the form of URLs and their associated descriptions into the appropriate number of resultant network documents, e.g., 1500 byte WML decks. As this identical process is used at the final stage of the "WriteCategories" algorithm, it is not described again .

With a simple extension, the "RetrieveRelatedTopics" algorithm may also make use of data contained in an open directory, e.g., Open Directory Project, that could be queried to determine additional information related to the current network document, but which may exist at a different origin server with a different top-level domain. For example, if the URL of the current resource relates to a university in the province of Ontario, e.g., "Carleton University", the user may be presented with hyperlinks, i.e., URLs, to resources that include "Ottawa (the city)", "Ontario", and "universities in Ontario."

By such a context merging operation the present invention produces a simpler interface for the network document to the wireless user. Furthermore, the amount of information that is presented to the wireless subscriber in which she/he holds no interest is significantly reduced. This in turn leads to user satisfaction and a reduced airtime cost.

(II) URL rewriting

Given the limited bandwidth available to wireless devices and the desire to reduce the amount of information sent to and from the devices, the system replaces URLs embedded in the requested network document with much more compact representations of the same. These more compact representations appear in the converted document, e.g., cHTML network document, sent to the wireless user. A software component within the system is responsible for the mapping between the rewritten compact representation of the URL and the original URL. During the conversion process, the ICS converts the original URLs in the requested network document to a much more compact representation. When a subsequent request is made, the ICS maps the compact representation to the original URL in order to interact with the OS where the requested resource exists.

The immediate benefits of this approach include a very significant reduction in the amount of information that is sent over the wireless media thereby improving perceived system response time, reducing communication costs, both monetary and otherwise, and increasing processing efficiency by the computational system as it handles far fewer bytes to process any subsequent request from the wireless user. Additionally, when the ICS is implemented as a distributed system, the associated communication costs between software and hardware servers is reduced accordingly yielding improvements in system performance. (Ill) Pruning of Redundant URLs:

This component of the system eliminates duplication of URLs described in the previous section. The URL tree-pruning algorithm described in Figure 23 receives the requested URL as input. The algorithm uses a function called "getUniqueNetworkDocumentElements" that is explained with the help of a flowchart presented in Figure 24. If the current network document is empty (NULL) the algorithm terminates; otherwise, it calls the function

"getUniqueNetworkDocumentElements".

The "getUniqueNetworkDocumentElements" algorithm presented in Figure 24 takes a network document as a parameter. It iteratively identifies each embedded URL inside the network document. It discards embedded URLs that have already been displayed and returns a set of URLs that the user has not viewed on the previous screens. Consider the example presented in Figure 25 in which both network document #1 and network document #2 contain URLl, URL2, URL3, URL4 and URL5. The tree-pruning algorithm will remove these URLs from the network document #2.

Another embodiment of the present invention is concerned with the scenario in which the wireless user presents the URL to the ICS instead of the OS. The ICS performs the necessary conversion of the input network document. Suitably, this conversion involves conversion between mark-up languages and context

management. A scenario for this embodiment is as follows:

1. Using an appropriate user agent as described earlier, e.g., a micro-browser, which runs on the user's wireless device, the user goes to the ICS site and requests the resource of interest from an OS in the Internet by inputting its appropriate URL.

2. A request is sent to the proxy server.

3. The proxy server interprets the request and formulates an appropriate request, e.g., HTTP request to ICS. This request arriving at 12 is a call to the arbitrator agent server 101 (Figure 2) with the specified URL as an input parameter. 4. Through OI (Figure 2), the arbitrator agent server 101 sends an HTTP request for the requested resource, which exists within a file, e.g., HTML document, to the OS that contains the requested resource as identified by the URL.

5. The arbitrator agent server 101 in the ICS receives the network document file at II, passes it through the network document converter 102 (Figure 2) and returns the new network document, e.g., a WML document, to the proxy server through

O2.

6. The proxy server returns said file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user agent on the user's wireless device. Another further embodiment is concerned with the scenario in which the wireless user presents the URL to the OS without the aid of a proxy server. The ICS performs the necessary conversion of the input network document. Suitably, this conversion involves conversion between mark-up languages and context management. A scenario for this embodiment is as follows:

1. Using an appropriate user agent as described earlier, e.g., micro-browser, which runs on the user's wireless device, the user requests the resource of interest from an OS in the Internet by inputting its appropriate URL.

2. A request is sent to the OS that contains the requested resource; 3. The OS receiving the request and retrieving the requested resource, which exists within a file, e.g., HTML document, and uses the arbitrator agent 100 to send it to ICS.

4. The arbitrator agent server 101 processes the input arriving at II of ICS (Figure 2). The arbitrator agent server 101 sends the incoming network document to the network document converter 102. 5. The network document converter 102 performs the necessary conversion, including mark-up language translation and context management, and sends the results back to arbitrator agent server 101.

6. Arbitrator agent server 101 sends the resultant document to the OS through OI.

7. The OS uses the arbitrator agent 100 to send the resultant file, or an equivalent compiled byte-code representation, e.g., compiled WML, to the user agent on the user's wireless device.

Platform: The Intermediate Conversion Server consists of both hardware and software components. The software components run on top of a commercial-off-the-shelf process, e.g., a CISC or RISC processor such as the AMD Athlon 800 MHz CPU and communicates with the other hardware components through a 133 MHz system bus that is housed on the A7V motherboard. The system employs 256 MB of PC 133 random access memory and an ATA - IDE Ultra Direct Memory Access (UDMA) or Advanced Technology Attachment (ATA) disk drive. An IEEE 802.11 (CSMA-CD) network card is used for interfacing with the communication medium. The application software explained in detail in the following section employs a JDBC compliant database. The software components use a Java Virtual Machine, JDK version 1.3, for example (as provided by Sun Microsystems 's JVM, JDK 1.3 example). Any operating system that supports such a Java platform, such as a Red Hat Linux, version 6.2 for example may be used in the construction of ICS.

It is to be understood that the embodiments and variations shown and described herein are merely illustrative of the principles of this invention and that various modifications may be implemented by those skilled in the art without departing from the scope and spirit of the invention. APPENDIX

(to Patent Apyln. entitled "System to Facilitate Navisation of Internet from a Wireless Device")

HTML to WML mapping

Mapping in a nutshell

HTML elements for the HEAD section

HTML elements for the BODY section

HTML elements for lists

HTML elements for forms

HTML elements for frames

HTML elements for tables

HTML elements for maps

HTML elements for applets

Mapping in detail (HTML vs WML)

General elements

<A HREF = "url"> text for the link </A> becomes

<a href = "url"> text for the link </a> or

Because the only way to define source anchor in WML is through card, when the NAME attribute is encountered a new card must be defined to be the source of this anchor.

<A NAME="AnchorName">Text</A> must becomes

<p> Text

</ρ> </card>

This tag is only used to indicate the browser to format the text in a special way for elements contained inside it. Most of the web browser display information inside this tag in italic format. Thus, the following should be the way to do the translation.

<ADDRESS>Text</ADDRESS> should becomes <i>Text</i>

<B> text </B> becomes <b> text < b>

<BIG> text < BIG> becomes <big> text </big>

<BLINK> text </BLINK> might becomes text

BLOCKQUOTE

<BLOCKQUOTE> text </BLOCKQUOTE> might becomes >text

All the attributes for the BODY tag must be ignored.

<BODY> content </BODY> becomes

<card> content </card>

The body of an HTML document should be split in multiple cards.

The CLEAR attribute for this tag is ignored. <BR> becomes <br/>

This tag is not supported in WML. The text inside this tag should be display in italic as most of the HTML browser does.

<CITE>text</CITE> should become <i>text</i>

<EM> text </EM> becomes <em> text </em>

In WML, there is on support for heading level. A way to support heading in WML is to count the number of each heading and put these number in the corresponding WML heading translation. A heading might be display in bold or another font emphasis. If an ALIGN attribute is present in the heading it must be ignored.

<H1> Heading Text 1 </Hl>

<HR>

<H2> Heading Text 2 </H2>

<H3> Heading Text 3 </H3>

<H4> Heading Text 4 </H4>

<HR>

<H2> Heading Text 5 </H2>

<H3> Heading Text 6 </H3> <H3> Heading Text 7 </H3> might become

<b> 1 Heading Text 1 < bxbr/>

<br/>

<b> 1.1 Heading Text 2 </b><br/>

<b> 1.1.1 Heading Text 3 < b><br/>

<b> 1.1.1.1 Heading Text 4 </b><br/>

<br/>

<b> 1.2 Heading Text 5 </b><br/>

<b> 1.2.1 Heading Text 6 </b><br/>

<b> 1.3 Heading Text 7 </b><br/>

<HEAD> content </HEAD> becomes <head> content </head>

There is no equivalence in WML. The blank line tag should replace this tag. <ΗR> should becomes <br/>

<I> text </I> becomes <i> text </i>

<IMG ALT="text" SRC="url" WIDTH="100" HEIGHT="100" ALIGN=ABSMIDDLE VSPACE=4 HSPACE=7> becomes

<LINK HREF="page.html" REL="next"/> becomes

<LINK HREF=" bar" REL="NEXT"/> becomes

<META HTTP-EQUrV="name" CONTENT="value"> becomes

This tag must be ignored.

If the STYLE attribute is encountered it must be ignored. <P>text or <P>text</P> becomes <p>text</p>

This tag is not supported in WML. A simple way to support it is to replace each space by the non-breaking space character.

<PRE>

Text 1 Text 2

Text 3 4,56

Text 4 3,54

< PRE> might become

 Text 1                     &nbsρ;  Text 2<br/> <br/>

Text 3                      

  4,56<br/>

Text 4                      

  3,54<br/>

<SMALL> text </SMALL> becomes <small> text </small>

<STRONG> text </STRONG> becomes <big> text </big>

This tag is not supported by WML browser.

<U>text</U> becomes <u>text</u>

The following is a possible mapping. If the OL tag contains an attribute for the start of the numbering it must be considered.

<OL> <LH>List header <LlxList item 1 <OL> <LIxA HREF="urll">List item 1 sub item 1</A> <LI>List item 1 sub item 2 </OL> <LI>List item 2 <LI>List Item 3 <OL> <LI>List item 3 sub item 1 <LI>List item 3 sub item 2 <LI>List item 3 sub item 3 </OL> </OL> might become

List header<br/>

1. List item Kbr/>

    1.1 <a title="Go"><go href="urll"/>List item 1 sub item K/a><br/>    1.2 List item 1 sub item 2<br/>

2. List item 2<br/> 3. List Item 3<br/>

   3.1 List item 3 sub item l<br/>    3.2 List item 3 sub item 2<br/> &ribsp;  3.3 List item 3 sub item 3<br/>

Another way the mapping might be performed is like the following. <card id="main"> List header<br/> <do type="oρtions" label="View">

<option value="cardl">List item K/option>

1 <a title="Go"xgo href="urll"/>List item 1 sub item K/a><br/>

2 List item 1 sub item 2<br/> </p>

</card>

1 List item 3 sub item Kbr/>

2 List item 3 sub item 2<br/>

3 List item 3 sub item 3<br/> </p>

</card>

<UL> <LH>List header <LI><List item 1 <UL> <LI><A HREF="urll">List item 1 sub item 1</A> <LI>List item 1 sub item 2 < UL> <LI>List item 2 <LI>List Item 3 <UL> <LI>List item 3 sub item 1 <LI>List item 3 sub item 2 <LI>List item 3 sub item 3 < UL> < UL> should become List header<br/>

    <img localsrc="bigcirclel"/Xa title="Go"><go href="urll"/>List item 1 sub item

K/a><br/>

Another way is to use multiple cards like in the second example from OL.

The translation is like that one of UL.

The definition lists provide a format like a dictionary entry, with and identifiable term and indented definition paragraph. A possible mapping might be to put the term in a special emphasis, like bold font, and the definition might remains in normal font below the term. The COMPACT attribute of DL must be ignored.

<DL>

<DT>Term 1

<DD>Defmition of term 1

<DT>Term 2

<DD> Definition of term 2 </DL> might becomes

<b>Term K/bxbr/> Definition of term 1 <b>Term 2</bxbr/> Definition of term 2

<FORM METHOD-' OST" ACTION="url"> <INPUT TYPE="text" NAME="variable"> <INPUT TYPE="submit" VALUE-"value"> <FORM> becomes

</go> </do> <input name="variable"/>

In HTML, the post method sends the value of all input field from the form separately from the page to the URL specified with ACTION. In WML, the field value to send are specified with postfield. The URL to which the information must be sent is specified with go href. The submit button is replaced by the accept button.

TEXT type text <rNPUT NAME="variable" TYPE="TEXT" VALUE="value" SIZE="15" MAXLENGTH="30"> becomes text <input name- 'variable" type- 'text" value- Value" size-" 15" maxlength-"30">

PASSWORD type

Like the TEXT type but "password" instead "text"

CHECKBOX type

<INPUT TYPE-"CHECKB0X" NAME="variablel" VALUE="valuel">textl

<INPUT TYPE-"CHECKB0X" NAME="variable2" VALUE="value2" CHECKED >text2

<INPUT TYPE="CHECKBOX" NAME-"variable3" VALUE="value3">text3 could become

If no checkbox is checked in HTML then there is no value in the select statement in WML.

RADIO type

<INPUT TYPE="RADIO" NAME="variable" VALUE="valuel">textl <INPUT TYPE-"RADIO" NAME="variable" VALUE="value2" CHECKED >text2 <INPUT TYPE="RADIO" NAME="variable" VALUE="value3">text3 could become <select multiple="false" name- 'variable" value="value2">

<option value— 'valuel ">textl</option>

If no radio button is checked in HTML then there is no value in the select statement in WML.

RESET type

<INPUT TYPE="reset"> becomes

<refresh> use <setvar> to set all variables to its original value

</refresh> </do>

<INPUT TYPE="reset" VALUE="value"> becomes

<refresh> use <setvar> to set all variables to its original value

</refresh> </do>

SUBMIT type

<INPUT TYPE="SUBMIT"> becomes <do type="ACCEPT" label-" Submit"> or

<INPUT TYPE="SUBMIT" VALUE="Text on the button"> becomes

Single choice

<SELECT NAME="variable"> OPTION VALUE=valuel>Text option 1 <OPTION VALUE- alue2>Text option 2 <OPTION VALUE=value3>Text option 3

</SELECT> becomes

<select multiple- 'false" name="variable"> <option value="valuel">Text option l</option> <option value="value2">Text option 2</option> <option value="value3">Text option 3</option>

</select>

Multiple choices

<SELECT MULTIPLE NAME="variable" SIZE="n"> OPTION SELECTED VALUE=valuel>Text option 1 OPTION VALUE=value2>Text option 2 <OPTION VALUE=value3>Text option 3

</SELECT> becomes

<select multiple="true" name="variable" value- 'value 1"> <option value="valuel">Text option K/option> <option value="value2">Text option 2</option> <option value="value3">Text option 3</option>

</select> or

<select multiple="true" name=" variable" ivalue="l"> <option value-" value l">Text option K/option> <option value- 'value2">Text option 2</option> <option value="value3">Text option 3</option>

</select>

OAPTION ALIGN=BOTTOM>Title of the table</CAPTION>

<TR>

<TH ALIGN=MIDDLE>Header K/THxTH ALIGN=MIDDLE>Header 2</TH>

</TR>

<TR>

</TR> </TABLE> becomes

<td><b>Header K/b></td> <td><b>Header 2</b></td> </tr> <tr> <td>Data K/td> <td>Data 2</td> </tr> </table>

Not supported by WML browser.

Applets

Not supported by WML browser.

HTML WML

<HEAD> content </HEAD> <head> content </head>

<I> text </I> <i> text </i> text <INPUT NAME-"variable" TYPE="TEXT" text <input name- 'va^'riable" type- 'text" VALUE="value" SIZE="15" MAXLENGTH="30"> value="value" size="15" maxlengtlι="30">

<INPUT TYPE="reset" VALUE="value"> ou

<refresh> use <setvar> to set all variables to its original value

</refresh> < do>

<IMG ALT="text" SRC="url" WIDTH="100" <img alt="text" src="uri" width="100" height-" 100" HEIGHT="100" ALIGN=ABSMIDDLE align=middle vspace=4 hspace=7> VSPACE=4 HSPACE=7>

<META HTTP-EQUIV="name" <meta http-equiv="name" content-"value"> CONTENT="value">

<SELECT NAME="variable"> <select multiple="false" name="variable"> <OPTION VALUE=valuel>Text option 1 <option value="valuel">Text option K/option> OPTION VALUE=value2>Text option 2 <option value="value2">Text option 2</option> OPTION VALUE=value3>Text option 3 <option value="value3">Text option 3</option>

</SELECT> </select>

<select multiple="true" name="variable" value-"valuel"> or

<SELECT MULTIPLE NAME="variable" <select multiple="true" name="variable"

SIZE="n"> ivalue-"l"> OPTION SELECTED VALUE=valuel>Text <option value="valuel">Text option K/option> option 1 <option value- ' value2">Text option 2</option> OPTION VALUE=value2>Text option 2 <option value="value3">Text option 3</option> OPTION VALUE=value3>Text option 3 </select>

</SELECT> <SMALL> text </SMALL> <small> text </small> <STRONG> text </STRONG> <big> text </big>

OAPTION ALIGN=BOTTOM>Title of the <tr> table</CAPTION> <tdxb>Header K/bx/td>

<TR> <tdxb>Header 2</bx/td>

<TH ALIGN=MIDDLE>Header K/THXTH </tr> ALIGN=MIDDLE>Header 2</TH> <tr>

</TR> <td>Data K/td>

</TR> </TABLE>

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A system to facilitate navigation of the Internet from a wireless device comprising:

a user requesting a resource of interest from an origin server (OS) in the Internet by inputting a URL request through a user agent which runs on the user's wireless device;

sending the request to a proxy server;

interpreting the request by said proxy server and formulating a request to the OS that contains the requested resource;

an arbitrator agent at said OS receiving the request and sending the requested resource, which exists within a file at the OS, in its original form to an intermediate conversion server;

said intermediate conversion server performing any necessary conversion of said file and sending said converted file back to said OS;

said OS receiving the resultant converted file and sending said file to the proxy server; and, said proxy server returning said converted file to the user agent on the user's wireless device for suitable browsing.

A system to facilitate navigation of the Internet from a wireless device comprising:

sending the request to a proxy server;

interpreting the request by said proxy server and formulating a request to an intermediate conversion server;

said intermediate conversion server receiving the request and sending it to the OS that contains the requested resource;

said OS receiving the request and retrieving the requested resource, which exists within a file at the OS, as specified by the user's URL and returning said file in its original format to said intermediate conversion server; said intermediate conversion server performing any necessary conversion of said file and sending said converted file back to the proxy server; and,

said proxy server returning said converted file to the user agent on the user's wireless device for suitable browsing.

sending the request to the OS that contains the requested resource;

said intermediate conversion server performing any necessary conversion of said file and sending said converted file back to said OS; and,

said OS receiving the resultant converted file and returning said converted file to the user agent on the user's wireless device for suitable browsing. A system as in claim 1, 2 or 3, wherein said user agent is a browser.

A system as in claim 4, wherein said browser is selected from a group of user agents, including HTML browser, cHTML browser, xHTML browser and XML browser.

A system as in claim 1, 2 or 3, wherein said request is a HTTP request.

A system as in claim 1, 2 or 3, wherein said conversion is from a HTML format to a WML format.

A system as in claim 1 or 2, wherein said proxy server returning an equivalent compiled byte-code representation.

A system as in claim 8, wherein said compiled byte-code representation is a compiled WML.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server performing context management function.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server performing mark-up language conversion function.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server performing URL rewriting. A system as in claim 1, 2 or 3, wherein said intermediate conversion server performing frame merging.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server performing context management, mark-up language conversion functions, URL rewriting and frame merging.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server is implemented as a distributed system.

A system as in claim 1, 2 or 3, wherein a third party system is used to derive related information.

A system as in claim 10, wherein said context management function is achieved by looking ahead in the requested document for information that may be useful to said user in the future.

A system as in claim 10, wherein said context management function is achieved by identifying primary components of a requested document.

A system as in claim 17, wherein said looking ahead comprises extracting logically related information by categorization of embedded hyperlinks in said requested resource based on their directory structure. A system as in claim 19, wherein said directory structure is a file system directory structure.

A system as in claim 19, wherein said directory structure is a tree structure.

A system as in claim 19, wherein said categorization may exclude a given URL component on the basis of a set of rules.

A system as in claim 19, wherein said categorization may result in the modification of the description associated with the URL component on the basis of a set of rules.

A system as in claim 22 or 23, wherein said set of rules are customizable and are stored in a file.

A system as in claim 19, wherein said hyperlinks in a category are presented as a list to the user.

A system as in claim 25, wherein the order in which the items in said list appear is determined by a sorting metric.

A system as in claim 26, wherein said sorting metric is selectable by said user.

A system as in claim 26, wherein said sorting metric in itself can be a combination of simpler sorting metrics. A system as in claim 26, wherein said sorting metric is "Most Recently Used" or "Most Frequently Used".

A system as in claim 26, wherein said ordering is achieved with the help of a database of previously selected URL's is maintained by said system.

A system as in claim 25, wherein the number of said hyperlinks in a category may be limited by the use of a heuristic.

A system as in claim 31, wherein said heuristic determines the first N hyperlinks in accordance with the user provided sorting metric and presents said hyperlinks.

A system as in claim 31, wherein said remaining hyperlinks are presented either alphabetically or in the order in which the embedded URL's appear in said requested resource.

A system as in claim 10, wherein said context management function is achieved by determining information related to the current network document and presents to the user.

A system as in claim 34, wherein said related information set is obtained by the system performing a "depth-first" search in the URL tree data structures. A system as in claim 10, wherein said context management function is achieved by determining the presence of redundant information wherein said system identifies information that appears around and/or as part of an anchor tag and also re-appears in the network document referenced by the associated embedded URL.

A system as in claim 10, wherein said context management function is achieved by context merging.

A system as in claim 37, wherein said context merging comprises combining information that appears around and/or as part of an anchor tag as well as in the network document referenced by the associated embedded URL into a single hyperlink.

A system as in claim 37, wherein said context merging comprises grouping logically related network documents together.

A system as in claim 39, wherein said merging function produces a single selectable hyperlink for the group.

A system as in claim 10, wherein said context management function is achieved by removal of redundant information through combining information that appears around and/or as part of an anchor tag as well as in the network document referenced by the associated embedded URL into a single hyperlink. A system as in claim 41, wherein said system discards embedded URL's that have already been displayed .

A system as in claim 1, 2 or 3, wherein said intermediate conversion server runs on a commercial-off-the-shelf processor.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server runs on a CISC or RISC processor.

A system as in claim 43, wherein said processor is an AMD Athlon 800 MHz CPU.

A system as in claim 43, wherein said processor communicates with the other system components through a bus.

A system as in claim 43, wherein said processor uses a 133 MHz bus on a A7V motherboard.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server runs on top of any operating system that supports a Java platform or provides run time support for native code produced by a Java compiler.

A system as in claim 1, 2 or 3, wherein said intermediate conversion server uses a Java Virtual Machine or the native code produced by a Java compiler. A system as in claim 1, 2 or 3, wherein said intermediate conversion server uses a JDBC compliant database.