WO2001050298A2 - Coding and transmission of multiple web pages - Google Patents

Coding and transmission of multiple web pages Download PDF

Info

Publication number
WO2001050298A2
WO2001050298A2 PCT/IL2000/000721 IL0000721W WO0150298A2 WO 2001050298 A2 WO2001050298 A2 WO 2001050298A2 IL 0000721 W IL0000721 W IL 0000721W WO 0150298 A2 WO0150298 A2 WO 0150298A2
Authority
WO
WIPO (PCT)
Prior art keywords
pages
file
page
web
combined
Prior art date
Application number
PCT/IL2000/000721
Other languages
French (fr)
Other versions
WO2001050298A3 (en
Inventor
Alex Keselman
Israel Nov
Original Assignee
Weblink, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weblink, Ltd. filed Critical Weblink, Ltd.
Priority to AU11739/01A priority Critical patent/AU1173901A/en
Publication of WO2001050298A2 publication Critical patent/WO2001050298A2/en
Publication of WO2001050298A3 publication Critical patent/WO2001050298A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/567Integrating service provisioning from a plurality of service providers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/2895Intermediate processing functionally located close to the data provider application, e.g. reverse proxies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching

Definitions

  • the present invention relates to communication networks and in particular to transmission of Web pages.
  • Web servers are commonly used to provide users with information. Generally, the information is provided in the form of Web pages. Some Web pages are transmitted in the form of a single HTTP (Hypertext transfer protocol) file. Other Web pages, for example Web pages that include images or other embedded elements, are transmitted as an HTTP file which includes an HTML page and a plurality of additional files (referred to also as objects) referenced by the HTML page.
  • HTTP Hypertext transfer protocol
  • Other Web pages for example Web pages that include images or other embedded elements, are transmitted as an HTTP file which includes an HTML page and a plurality of additional files (referred to also as objects) referenced by the HTML page.
  • objects referred to also as objects
  • a client requests such Web pages from a server, in a first stage the HTTP file is transmitted to the client.
  • the client opens the HTML page it finds the references to the additional objects and sends additional HTTP requests to the server to receive these objects.
  • Many Web sites include a plurality of Web pages that are interconnected using hyper-text links. That is, Web pages usually include areas which when
  • Web pages that include images typically requires large amounts of bandwidth. When clients are connected through low bandwidth links, as most users are, such Web pages require relatively long transmission times, which annoy users.
  • each HTTP request message is transmitted on a separate TCP connection to the server.
  • the server sends the HTTP response message on the TCP connection on which the request message was received and then closes the TCP connection.
  • a newer HTTP version i.e., HTTPvl.l
  • HTTPvl.l optionally uses the same TCP connection for all the HTTP messages transmitted between the client and the server.
  • Such connections are referred to as persistent HTTP connections.
  • a single TCP connection may thus carry a stream of HTTP request messages from the client to the server. The time required for establishing the TCP connections is reduced using this scheme.
  • Many clients and/or Web servers for example due to load balancing limitations, do not support persistent HTTP connections.
  • the establishment of TCP connections involves transmission of three packets in a hand shake procedure. The establishment of a connection for each retrieved Web page and each embedded element is therefore time consuming.
  • An aspect of some embodiments of the present invention relates to a method of storing and/or providing information, in which a plurality of Web pages and/or web page elements are transmitted as a single combined file, for example as an HTTP file.
  • the HTTP file comprises the HTML description of a master page, in a regular HTML format recognized by existing clients, and one or more additional pages in a compressed HTML format.
  • at least some of the hypertext links in the master HTML page, and optionally in the rest of the pages are replaced, in some embodiments of the invention, by scripts, e.g., Java scripts, which when clicked upon display the respective page from the data stored in the HTTP file.
  • all the transmitted Web pages are stored in the combined file in a compressed format.
  • a dummy master page is stored together with the compressed pages in the file.
  • the dummy master page includes a Java script that initiates the display of one of the pages upon reception of the file by the client.
  • files that are repeated between pages are provided only once in a compressed file that includes the pages.
  • the Web pages to be included in a single combined file are selected responsive to an inter-link map of a compressed Web site (or any other group of Web pages being compressed).
  • the Web pages included in a single combined file are determined responsive to statistics of the usage of the compressed pages and/or of the hypertext links connecting the pages.
  • the Web pages included in a single combined file are determined responsive to a user profile of the user receiving the Web pages and/or of the bandwidth of the connection between the client and the server.
  • the combined files of a Web site are prepared during and/or after the preparation of the Web site and the Web site is posted on a server as the combined file.
  • the compression is performed on the fly responsive to the download requests from a client.
  • a method of providing information comprising: providing a plurality of files including descriptions of a plurality of respective Web pages; selecting at least a sub-group of the Web pages; creating a combined file which includes descriptions of the Web pages in the subgroup; and transmitting the combined file to a client responsive to a request for one of the selected Web pages received from the client.
  • providing the plurality of files comprises providing files describing Web pages included in a single Web site.
  • providing the plurality of files comprises providing at least links to files that are the results of a search.
  • selecting the sub-group comprises selecting responsive to a map of the interconnections of the plurality of Web pages.
  • selecting the sub-group comprises selecting responsive to statistics of the usage of the plurality of Web pages. Alternatively or additionally, selecting the sub-group comprises selecting responsive to a user profile. Alternatively or additionally, selecting the sub-group comprises selecting responsive to a bandwidth of a link on which the file is transmitted. Alternatively or additionally, creating the combined file comprises creating a combined file in which at least some of the descriptions of the Web pages are compressed. Alternatively or additionally, creating the combined file comprises replacing at least one of the hypertext links in one or more of the selected pages with a script which actuates the display of the page referenced by the link.
  • replacing at least one of the hypertext links comprises replacing links that lead to one of the selected pages with a script which actuates the display of the page referenced by the link from within the combined file.
  • creating the combined file comprises replacing links to pages included in one or more other combined files with links which indicate the location of the referenced page within the other combined file.
  • creating the combined file is performed responsive to receiving the request from the client.
  • creating the combined file is performed independently of said request.
  • creating the combined file comprises detecting repeated embedded objects between the plurality of pages.
  • creating the combined file comprises detecting repeated embedded objects between the selected pages.
  • the method comprises providing only one copy of said repeated object in said file.
  • the method comprises providing only one copy of said repeated object as a separate file.
  • said request is an HTTP request.
  • transmission of the combined instead of a regular file is transparent to a user that generate said request.
  • selecting at least a sub-group comprises selecting fewer than all the plurality of files.
  • the method comprises maintaining a copy of said files on a file server associated with a storage of said combined file.
  • a method of providing information comprising: receiving a request for a Web page, including one or more links to data elements, from a client; and transmitting to the client, in response to said request, a combined file including descriptions of the requested page and one or more of the data elements referenced by the one or more links of the Web page.
  • the method comprises generating the combined file responsive to receiving the request from the client.
  • the combined file is generated before receiving the request from the client.
  • the one or more of the data elements referenced by the links of the Web page comprise at least one additional Web page.
  • the one or more of the data elements referenced by the links of the Web page comprise embedded objects.
  • apparatus for web page serving comprising: a compression unit that provides at least one combined file including the description of a plurality of WWW pages; and a web server that receives requests for WWW pages and responds with at least one of said combined files.
  • said compression unit generates said file responsive to said request.
  • said compression unit generates said file to be personalized for a particular user.
  • said compression unit generates said file responsive to a request by a WWW site manager.
  • said compression unit maintains a copy of uncompressed versions of said WWW pages.
  • said compression unit comprises a grouper that selectively groups pages together based, at least, on their link structure.
  • said compression unit comprises a redundancy detector that detects embedded elements repeated between said pages.
  • said compression unit is integrated with a WWW site construction program.
  • said compression unit is integrated with a WWW site maintaining program.
  • FIG. 1 is a schematic block diagram of a Web site preparation system, in accordance with an exemplary embodiment of the present invention
  • Fig. 2 is a flowchart of the acts of a compression unit in compressing a Web site, in accordance with an exemplary embodiment of the present invention
  • Fig. 3 is a schematic block diagram of a structure of a file including a plurality of web pages stored as a single page, in accordance with an embodiment of the present invention
  • Fig. 4 is a flowchart of the acts performed in downloading pages of a Web site, in accordance with an exemplary embodiment of the present invention.
  • Fig. 5 is a schematic illustration of an exemplary simplified Web site and optional file organizations therefor, in accordance with an embodiment of the present invention.
  • Fig. 1 is a schematic block diagram of a Web site preparation system 20, in accordance with an embodiment of the present invention.
  • a Web site preparation computer 22 optionally a general purpose computer with suitable software, is used to generate a Web site.
  • the Web site is then posted on a Web server 24 that provides the Web pages to clients.
  • Web pages can be prepared directly on Web server 24.
  • compression unit 26 is situated between preparation computer 22 and Web server 24.
  • compression unit 26 compresses the pages as described below and passes the Web site in its compressed form to Web server 24.
  • compression unit 26 decompresses the Web site files and passes the decompressed files to computer 22.
  • compression unit 26 keeps an uncompressed copy of the Web site, which is provided to the Web master when changes are to be made.
  • the Web master can perform changes in the compressed Web site without having knowledge of the compression tools and/or structures of the Web site.
  • the Web master is not aware of the compression performed by compression unit 26.
  • the Web master uses the compressed Web site for performing further changes to the Web site.
  • compression unit 26 stores a template of the compression and/or some of the parameters used in the compression, such that a subsequent compression after changes are performed, uses at least part of the results of previously performed compressions.
  • the template includes a map of the compressed Web site and/or other determined parameters of the Web site, as described below.
  • compression unit 26 Alternatively or additionally to having compression unit 26 compress the Web site when it is passed from preparation computer 22 to Web server 24, compression unit 26 periodically and/or upon commands from the Web master compresses the Web site on Web server 24 and/or on preparation computer 22.
  • compression unit 26 uses statistics gathered by Web server 24 regarding the access to the different pages of the Web site, in performing the compression. In some embodiments of the invention, the compression is customized periodically according to the current statistics.
  • compression unit 26 is located between Web server 24 and a client 28. When a user enters the Web site, compression unit 26 compresses the Web site 26 and/or portions thereof on the fly, optionally according to a user profile of the client.
  • compression unit 26 and Web server 24 are located on separate processors.
  • compression unit 26 is a software located on Web server 24.
  • compression unit 26 is a plug-in unit, such as an MS Internet Information Server Filter, which cooperates with Web server 24.
  • it may be provided in other manners, for example, as a proxy or as a stand-alone service.
  • client 28 comprises a standard Web browser and does not require any special software in order to carry out the invention.
  • client 28 may be optimized for use with client 28 so as to enhance the advantages of the present invention.
  • client 28 may comprise a browser including software to decompress page files (as described below).
  • Fig. 2 is a flowchart of the acts of compression unit 26 in compressing a Web site, in accordance with an embodiment of the present invention.
  • compression unit 26 creates (50) a map of the Web site.
  • the map includes indication of the Web pages of the Web site and the hypertext links which lead between the Web pages.
  • the map also indicates for each page the embedded elements of the page (e.g., images) which are included in separate files, and the pages which refer to each of the embedded elements.
  • the map is possibly created using any method, such as applying the BFS or DFS graph traversing methods to the "graph" defined by the hyperlinked files.
  • the map and/or portions thereof are imported by compression unit 26 from an external hardware or software unit.
  • the maps are created by the preparation computer 22, for example, by the site preparation software.
  • compression unit 26 finds (52) duplicate embedded elements and/or duplicated pages.
  • compression unit 26 reviews the embedded elements of the Web site, while preparing the map or separately, before or after, and prepares a short catalog which lists a few parameters (e.g., length, type and/or leading bits) of each of the embedded elements. Thereafter, elements which have identical parameter values are compared (e.g., bit by bit) to determine whether they are identical.
  • elements which are determined to be similar but not identical are each split into two separate portion elements, one portion element which is identical in all the similar elements and one portion element which contains the non- identical portions, for each of the similar elements.
  • a user alert may be generated in offline embodiments, to allow a user (e.g., site manager) to consolidate two files.
  • a user e.g., site manager
  • an after the fact alert may be provided to a user, for example, in on-the-fly compression systems.
  • compression unit 26 receives (54) statistical information on the visiting patterns of the Web site.
  • the statistical information includes, for example, the number of visits in each page of the Web site, the frequencies of usage of the hypertext links of each of the pages and/or the frequencies of entrance to each of the pages from external and/or internal links.
  • compression unit 26 groups (56) the Web pages into one or more page groups which are included in a single combined file.
  • the grouping may depend, for example, on one or more of statistical considerations of intra and inter- group links and/or link following rates, on relative file sizes of each group, on a desire for certain pages to come up faster (e.g., smaller page files or different grouping) and/or on a sharing of embedded elements between pages.
  • Each page group is converted (58) into a single combined HTML page.
  • the HTML descriptions of the pages in each group are stored together in a single respective combined file.
  • Hypertext links leading from one page of the group to another page of the group are converted into Java scripts which actuate the display of the other page.
  • the display of the page does not change due to the replacement of the hypertext links by Java scripts.
  • hypertext links which lead to other combined pages of the Web site are converted (60) into suitable links in accordance with the combining of the pages.
  • the hypertext link of a page is converted into a link that states the URL of the combined page with a parameter which states the position of the page in the combined page.
  • compression unit 26 determines whether the embedded elements are in a compressed form and, if necessary, compresses (62) or re-compresses the elements.
  • different compression ratios and/or compression methods are used for different embedded elements and/or in different compression instances.
  • the compression ratio may be adjusted according to user preferences (i.e., whether higher quality or faster service is desired), the bandwidth of the clients connection and/or the size of a specific combined file.
  • Fig. 3 is a schematic block diagram of the structure of a combined file 70, in accordance with an embodiment of the present invention.
  • combined file 70 comprises a master page record 72 that is automatically displayed by the client when the packet is downloaded.
  • combined file 70 comprises one or more slave page records 74 that describe additional pages, which are not generally displayed immediately when combined file 70 is downloaded, but rather responsive to actuation of a Java script in one of the other pages included in combined file 70.
  • master page record 72 comprises an open HTML description of one of the original pages of the group.
  • master page record 72 describes a pseudo page which comprises an automatically opening script which initiates the display of one of the pages as described in a respective slave page record 74.
  • a different page file may be sent, in which the slave page is a master.
  • the opening script may connect to web server 24 or compression unit 26 to receive an indication of which page to show first.
  • a separate file including such an indication is sent to the client.
  • slave page records 74 comprise compressed HTML page descriptions. Alternatively, at least some of the pages the pages are uncompressed.
  • the slave page records are compressed using any suitable compression method, for example the LZ (Lempel Ziv) method and/or the WLZ (Walsh, Lempel, Ziv) method.
  • slave page records 74 comprise standard non-compressed HTML page descriptions, for example Java script for displaying the pages.
  • combined file 70 comprises embedded element records 76 which contain descriptions of the embedded elements of the pages represented by combined file 70.
  • Hypertext links to the embedded elements are optionally converted to Java scripts that, conditionally or unconditionally, initiate the display of the contents of respective embedded element records 76, upon displaying the page.
  • embedded elements that are included in a plurality of the pages of the group are stored in only a single embedded element record 76, which may be actuated from a plurality of different pages.
  • some of the embedded elements are stored in separate files and are downloaded responsive to a hypertext link, as is known in the art.
  • the decompressed, shared, embedded objects are stored in a local cache of the browser and/or operating system
  • embedded elements which are included in a plurality of pages which are stored in different combined files 70 are repeated in each of the combined files 70.
  • at least some of the embedded elements are stored only in one of the combined files 70 and when they are required for Web pages in other combined files 70, the combined file containing the embedded element is downloaded.
  • embedded elements which appear only in pages of a single file 70 are stored within the file, while embedded elements which are included in pages of a plurality of files 70 are stored in separate files.
  • the contents of combined file 70 are arranged such that the master page 72 may be displayed, partially or in its entirety, by the client, before all of combined file 70 is received.
  • master page 72 with the page record 74 of the page automatically displayed and the embedded element records referenced by the automatically displayed page are located at the top (i.e., the first transmitted area) of combined file 70.
  • Fig. 4 is a flowchart of the acts performed in downloading pages of a Web site, in accordance with an exemplary embodiment of the present invention.
  • Client 28 transmits (80) to Web server 24 a request to view a page of the Web site.
  • Web server 24 responds by transmitting (82) a combined file 70 that includes the requested file as the automatically displayed page.
  • each of the pages of the Web site is generally included in a single combined file 70.
  • Web server 24 optionally adjusts the combined file (if necessary) before it is transmitted, so that the automatically displayed page of the file is the requested page.
  • Web server 24 carries a few versions of at least some of the combined files 70, which versions differ in the page automatically displayed when the file is downloaded. Possibly, web server 24 determines which version to transmit (82) according to the requested page, for example, by the request address mapping to a suitable stored file version.
  • some pages of the Web site are not allowed direct access, by clients, without passing through previous pages of the Web site.
  • such pages are included in a combined file 70 in which they are not the automatically displayed page.
  • the respective combined file 70 is optionally downloaded and a different page from the file is automatically displayed.
  • Web server 24 responds with an error message to such requests.
  • Web server 24 can simply prevent direct access to a desired page without passing through an introductory page, for example, a log-in page, which the Web master wants all clients to display before reaching the desired page.
  • a log-in page or other pre-personalization page may be sent as a separate file 70, with group pages only being generated once the personalization of the pages is determined.
  • the personalization may be applied to HTML files, which are then compressed. Alternatively, they may be applied directly to the compressed files, for example by record replacement.
  • client 28 When client 28 receives combined file 70, it automatically accesses master page record 72 and accordingly displays (84) one of the pages included in the received combined file 70.
  • the operation of client 28 in opening combined file 70 is exactly as if a regular HTML file is received.
  • the user does not know that combined file 70 is not a regular HTML file.
  • the displayed page typically includes one or more controls which actuate Java scripts. These controls operate from the point of view of a user of client 28 in substantially the same way as hypertext links.
  • the page typically includes one or more hypertext links that relate to Web pages not included in the downloaded file.
  • the user of client 28 may actuate one of the controls on the displayed page. Responsive thereto, the respective Java script of the control is actuated.
  • the Java control decompresses (86) the contents of the respective slave page record 74, and displays the page, only when needed. Alternatively, some or all of records 74 in a received file 70 are decompressed upon receipt and are stored in a temporary memory in a decompressed form. In some embodiments of the invention, the decompression (86) is performed seamlessly such that the user of client 28 does not notice the decompression.
  • the Java control is a stand alone script which does not require additional commands for operation.
  • the Java control actuates, with one or more specific parameter values, a separate Java script, which is used by substantially all the Java, scripts of pages compressed in accordance with the present invention, for example being a decompression program.
  • a separate Java script is stored within a browser of the client.
  • the separate Java script is provided as an embedded element.
  • client 28 sends a request accordingly to Web server 24 (or a different, unrelated web server, for links outside the site) that responds by transmitting the desired page (in a regular HTML page or in another combined file 70).
  • Web server 24 or a different, unrelated web server, for links outside the site
  • client 28 sends a request accordingly to Web server 24 (or a different, unrelated web server, for links outside the site) that responds by transmitting the desired page (in a regular HTML page or in another combined file 70).
  • the browser finds the referenced combined file 70 in its cache and a parameter in the link leads to the specific desired page within the combined file (e.g., the link in the additional page may be adapted to match the previous combined file 70).
  • client 28 sends a request with the URL of the requested page to Web server 24.
  • Web server 24 responds with a short HTML file which includes a Java script that accesses the slave page record 74 of the requested page in the combined file 70. Further alternatively, Web server 24 responds by re-transmitting the requested page, either by itself or with a combined file in which the requested page is the automatically displayed page. Further alternatively, each time a combined file 70 is received by client 28, a Java script which extracts the pages in slave page records 74 into a cache of client 28, as if the pages were received on their own as regular HTML files. Each page is stored in the cache with its URL, such that when a request for the page is generated when the page is still in the cache, the page is found in the cache and displayed therefrom.
  • Web server 24 hosts, for at least some of the Web pages of the site, a plurality of combined files 70 that include the Web page.
  • the plurality of Web pages include the Web page with different other pages and/or with different compression styles and/or ratios.
  • client 28 transmits (80) to Web server 24 a request to view the Web page
  • Web server 24 chooses one of the plurality of files containing the page to be transmitted (82) to client 28, responsive to which combined files the user previously downloaded and/or responsive to a user profile.
  • the user profile may include, for example, a standard user behavior (e.g., whether the user usually actuates hypertext links at the top or the bottom of the page), topics which interest the user and/or user preferences (e.g., voice files, long articles, images).
  • Web server 24 customizes, on the fly, the combined file 70 in which the requested page is the automatically displayed page, based on the user profile.
  • Fig. 5 is a schematic illustration of an exemplary simplified Web site 90 and two optional file organizations 92 and 94, in accordance with an embodiment of the present invention.
  • Web site 90 comprises Web pages indicated by digits 1-6 and links between the pages are indicated by arrows.
  • a first optional file organization 92 includes two combined files (A and B).
  • each page (1, 2, 3, 4 and 5) has a separate file (C, D, E, F and G), respectively, which is transmitted to the client if the user first enters the Web site from that particular page.
  • the respective pages of the file, listed first in Fig. 5, are the pages which are automatically displayed when the files are downloaded.
  • Each file (C, D, E, F and G) is customized for its respective page such that the pages included in its slave page records 74 are the pages to which the user is most likely to move from the displayed page. For example, in file E page 3 is accompanied by pages 1 and 5, the only pages in the Web site to which page 3 has hypertext links.
  • a user enters the Web site from page 1, and therefore receives file C.
  • page 1 the user moves to page 4 by actuating the respective Java script which displays page 4 from the contents of file C. No transmission is thus required from Web server 24 for displaying page 4.
  • page 4 the user moves to page 3, again using a Java script and the contents of file C.
  • page 3 the user moves to page 5, which is not included in file C (e.g., the link is not encoded as a Java script to display part of the same file, but as a regular HTTP link). Therefore, a request for page 5 is sent to Web server 24 which responds by transmitting file G to the client.
  • page 4 is re-transmitted in file G although it was already transmitted in file C.
  • server 24 determines whether one or more of the pages and/or embedded elements in the file were recently transmitted to the client.
  • Such pages and/or embedded elements are optionally replaced in the file by short Java scripts which refer to the pages and/or embedded elements in previously transmitted files.
  • the description of page 4 is replaced in file G by a Java script which if actuated displays page 4 based on the contents in file C, which is typically still in the cache of the client. If when the Java script is actuated, file C was already erased from the cache, a request for page 4 will be retransmitted and accordingly, file F will be forwarded to the client.
  • file organization 94 there is no file for page 6, as it is assumed that page 6 may not be accessed from outside the Web site. If this is not true, a separate file for page 6 may be included in file organization 94.
  • file organization 94 requires more space on server 24 than file organization
  • file organization 94 may provide a faster response time as each page is transmitted with the pages to which the user is most likely to move to.
  • file organization 94, or parts thereof, is not actually stored in its entirety on Web server 24. Rather, the files of file organization 94 are generated on the fly from stored building blocks.
  • the pages are grouped according to the site map and the usage statistics such that the pages transmitted with a requested page are those which are most likely to be accessed from the requested page. Alternatively or additionally, the pages transmitted with a requested page are those which are most likely to be accessed by the client in the near future.
  • the pages are grouped such that when possible all the pages which reference a specific embedded element are included in a single combined file.
  • the pages are grouped responsive to a desired size (or size range) of combined files 70 and/or the bandwidth with which the client connects to the server.
  • the desired size is approximately equal to the average size of the files of the original Web pages of the compressed Web site.
  • the desired size is a predetermined percent greater than the size of the original Web pages, possibly a percent which is substantially unnoticed by the client or the user at the client.
  • compression unit 26 splits one or more pages into a plurality of separate pages.
  • pages that contain both static information, that never or rarely changes, and dynamic information are split into dynamic and static parts, which may be compressed separately.
  • a first part of the page e.g., the static part
  • the client requests the second part as an embedded element is normally ordered.
  • the Java script may initiate for certain pages, the retrieval of another file which includes additional pages which are most likely to be accessed by the user from the current page.
  • the pages are transmitted to the client before the client requested the pages and the latency for waiting for the pages to arrive is shortened.
  • client 28 is customized for use with servers that operate in accordance with the present invention.
  • the client displays in a special color (or other indication) links that lead to pages already downloaded.
  • the client notifies, at the beginning of an HTTP session, which particular parameters compression unit 26 should use, for example, whether embedded elements should be transmitted within the same file as the HTML of the pages or in a separate file.
  • Such notification may be, for example, by the way of cookies.
  • the present invention may be applied to substantially any group of Web pages, referred to herein as a virtual Web site.
  • a page which provides search results may be compressed and transmitted to the client with one or more of the pages found in the search.
  • An exemplary implementation is described in Israel application 133,888, the disclosure of which is incorporated herein by reference.
  • the present invention has been described in relation to the TCP/IP protocol suite, some embodiments of the invention may be implemented with relation to other packet based transmission protocols, such as, for example IPX, DECNET and the ISO protocols.
  • the present invention is described with relation to the HTTP protocol, the principles of the present invention may be used with relation to other application protocols, such as WAP (wireless application protocol), WML, and e-mail transmission of pages.
  • WAP wireless application protocol
  • WML wireless application protocol
  • e-mail transmission of pages instead of transmitting a newsletter or other e-mail with links to one or more sites, the email may include one or more combined files that include some or all of the referenced pages.
  • the present invention may be used for tasks other than transmission, for example, for storage.
  • hypertext links were described as being replaced by Java scripts, any other scripts or controls may be used, including but not limited to, VB-Scripts, Java applets and activeX scripts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method of providing information, comprising: providing a plurality of files including descriptions of a plurality of respective Web pages; selecting at least a sub-group of the Web pages; creating a combined file which includes descriptions of the Web pages in the sub-group; and transmitting the combined file to a client responsive to a request for one of the selected Web pages received from the client.

Description

CODING AND TRANSMISSION OF MULTIPLE WEB PAGES FIELD OF THE INVENTION
The present invention relates to communication networks and in particular to transmission of Web pages. BACKGROUND OF THE INVENTION
Web servers are commonly used to provide users with information. Generally, the information is provided in the form of Web pages. Some Web pages are transmitted in the form of a single HTTP (Hypertext transfer protocol) file. Other Web pages, for example Web pages that include images or other embedded elements, are transmitted as an HTTP file which includes an HTML page and a plurality of additional files (referred to also as objects) referenced by the HTML page. When a client requests such Web pages from a server, in a first stage the HTTP file is transmitted to the client. When the client opens the HTML page it finds the references to the additional objects and sends additional HTTP requests to the server to receive these objects. Many Web sites include a plurality of Web pages that are interconnected using hyper-text links. That is, Web pages usually include areas which when clicked upon initiate the retrieval and display of other Web pages, often from the same site.
The transmission of Web pages that include images typically requires large amounts of bandwidth. When clients are connected through low bandwidth links, as most users are, such Web pages require relatively long transmission times, which annoy users. One of the features required from Web sites in order to attract clients, is fast response times.
In an early version of the HTTP protocol, each HTTP request message is transmitted on a separate TCP connection to the server. The server sends the HTTP response message on the TCP connection on which the request message was received and then closes the TCP connection. A newer HTTP version (i.e., HTTPvl.l) optionally uses the same TCP connection for all the HTTP messages transmitted between the client and the server. Such connections are referred to as persistent HTTP connections. A single TCP connection may thus carry a stream of HTTP request messages from the client to the server. The time required for establishing the TCP connections is reduced using this scheme. Still, many clients and/or Web servers, for example due to load balancing limitations, do not support persistent HTTP connections. The establishment of TCP connections involves transmission of three packets in a hand shake procedure. The establishment of a connection for each retrieved Web page and each embedded element is therefore time consuming. SUMMARY OF THE INVENTION
An aspect of some embodiments of the present invention relates to a method of storing and/or providing information, in which a plurality of Web pages and/or web page elements are transmitted as a single combined file, for example as an HTTP file. In some embodiments of the invention, the HTTP file comprises the HTML description of a master page, in a regular HTML format recognized by existing clients, and one or more additional pages in a compressed HTML format. Optionally, at least some of the hypertext links in the master HTML page, and optionally in the rest of the pages, are replaced, in some embodiments of the invention, by scripts, e.g., Java scripts, which when clicked upon display the respective page from the data stored in the HTTP file.
Alternatively or additionally, all the transmitted Web pages are stored in the combined file in a compressed format. A dummy master page is stored together with the compressed pages in the file. The dummy master page includes a Java script that initiates the display of one of the pages upon reception of the file by the client. In an exemplary embodiment of the invention, files that are repeated between pages are provided only once in a compressed file that includes the pages.
In some embodiments of the invention, the Web pages to be included in a single combined file are selected responsive to an inter-link map of a compressed Web site (or any other group of Web pages being compressed). Alternatively or additionally, the Web pages included in a single combined file are determined responsive to statistics of the usage of the compressed pages and/or of the hypertext links connecting the pages. Alternatively or additionally, the Web pages included in a single combined file are determined responsive to a user profile of the user receiving the Web pages and/or of the bandwidth of the connection between the client and the server. In some embodiments of the invention, the combined files of a Web site are prepared during and/or after the preparation of the Web site and the Web site is posted on a server as the combined file. Alternatively or additionally, the compression is performed on the fly responsive to the download requests from a client.
There is thus provided in accordance with an exemplary embodiment of the invention, a method of providing information, comprising: providing a plurality of files including descriptions of a plurality of respective Web pages; selecting at least a sub-group of the Web pages; creating a combined file which includes descriptions of the Web pages in the subgroup; and transmitting the combined file to a client responsive to a request for one of the selected Web pages received from the client. Optionally, providing the plurality of files comprises providing files describing Web pages included in a single Web site. Alternatively or additionally, providing the plurality of files comprises providing at least links to files that are the results of a search. Alternatively or additionally, selecting the sub-group comprises selecting responsive to a map of the interconnections of the plurality of Web pages. Alternatively or additionally, selecting the sub-group comprises selecting responsive to statistics of the usage of the plurality of Web pages. Alternatively or additionally, selecting the sub-group comprises selecting responsive to a user profile. Alternatively or additionally, selecting the sub-group comprises selecting responsive to a bandwidth of a link on which the file is transmitted. Alternatively or additionally, creating the combined file comprises creating a combined file in which at least some of the descriptions of the Web pages are compressed. Alternatively or additionally, creating the combined file comprises replacing at least one of the hypertext links in one or more of the selected pages with a script which actuates the display of the page referenced by the link. Optionally, replacing at least one of the hypertext links comprises replacing links that lead to one of the selected pages with a script which actuates the display of the page referenced by the link from within the combined file. In an exemplary embodiment of the invention, creating the combined file comprises replacing links to pages included in one or more other combined files with links which indicate the location of the referenced page within the other combined file. Alternatively or additionally, creating the combined file is performed responsive to receiving the request from the client. In an exemplary embodiment of the invention, creating the combined file is performed independently of said request. Alternatively or additionally, creating the combined file comprises detecting repeated embedded objects between the plurality of pages. Optionally, creating the combined file comprises detecting repeated embedded objects between the selected pages. Optionally, the method comprises providing only one copy of said repeated object in said file.
In an exemplary embodiment of the invention, the method comprises providing only one copy of said repeated object as a separate file. In an exemplary embodiment of the invention, said request is an HTTP request. Alternatively or additionally, transmission of the combined instead of a regular file, is transparent to a user that generate said request.
In an exemplary embodiment of the invention, selecting at least a sub-group comprises selecting fewer than all the plurality of files. Alternatively or additionally, the method comprises maintaining a copy of said files on a file server associated with a storage of said combined file.
There is also provided in accordance with an exemplary embodiment of the invention, a method of providing information, comprising: receiving a request for a Web page, including one or more links to data elements, from a client; and transmitting to the client, in response to said request, a combined file including descriptions of the requested page and one or more of the data elements referenced by the one or more links of the Web page. Optionally, the method comprises generating the combined file responsive to receiving the request from the client. Alternatively or additionally, the combined file is generated before receiving the request from the client. Alternatively or additionally, the one or more of the data elements referenced by the links of the Web page comprise at least one additional Web page. Alternatively, the one or more of the data elements referenced by the links of the Web page comprise embedded objects. There is also provided in accordance with an exemplary embodiment of the invention, apparatus for web page serving, comprising: a compression unit that provides at least one combined file including the description of a plurality of WWW pages; and a web server that receives requests for WWW pages and responds with at least one of said combined files. Optionally, said compression unit generates said file responsive to said request. Alternatively or additionally, said compression unit generates said file to be personalized for a particular user.
In an exemplary embodiment of the invention, said compression unit generates said file responsive to a request by a WWW site manager. In an exemplary embodiment of the invention, said compression unit maintains a copy of uncompressed versions of said WWW pages. Alternatively or additionally, said compression unit comprises a grouper that selectively groups pages together based, at least, on their link structure. Alternatively or additionally, said compression unit comprises a redundancy detector that detects embedded elements repeated between said pages. Alternatively or additionally, said compression unit is integrated with a WWW site construction program. Alternatively or additionally, said compression unit is integrated with a WWW site maintaining program. BRIEF DESCRIPTION OF FIGURES
Particular non-limiting embodiments of the invention will be described with reference to the following description of embodiments in conjunction with the figures. Identical structures, elements or parts which appear in more than one figure are preferably labeled with a same or similar number in all the figures in which they appear, in which: Fig. 1 is a schematic block diagram of a Web site preparation system, in accordance with an exemplary embodiment of the present invention;
Fig. 2 is a flowchart of the acts of a compression unit in compressing a Web site, in accordance with an exemplary embodiment of the present invention;
Fig. 3 is a schematic block diagram of a structure of a file including a plurality of web pages stored as a single page, in accordance with an embodiment of the present invention;
Fig. 4 is a flowchart of the acts performed in downloading pages of a Web site, in accordance with an exemplary embodiment of the present invention; and
Fig. 5 is a schematic illustration of an exemplary simplified Web site and optional file organizations therefor, in accordance with an embodiment of the present invention. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
Fig. 1 is a schematic block diagram of a Web site preparation system 20, in accordance with an embodiment of the present invention. A Web site preparation computer 22, optionally a general purpose computer with suitable software, is used to generate a Web site. The Web site is then posted on a Web server 24 that provides the Web pages to clients. Alternatively or additionally, Web pages can be prepared directly on Web server 24.
In some embodiments of the invention, compression unit 26 is situated between preparation computer 22 and Web server 24. When a generated Web site is transferred from preparation computer 22 to Web server 24, compression unit 26 compresses the pages as described below and passes the Web site in its compressed form to Web server 24. Optionally, when a Web master retrieves the Web site from Web server 24 to preparation computer 22 in order to perform changes and/or add pages to the Web site, compression unit 26 decompresses the Web site files and passes the decompressed files to computer 22. Alternatively or additionally, compression unit 26 keeps an uncompressed copy of the Web site, which is provided to the Web master when changes are to be made. Thus, the Web master can perform changes in the compressed Web site without having knowledge of the compression tools and/or structures of the Web site. Optionally, the Web master is not aware of the compression performed by compression unit 26. Alternatively, the Web master uses the compressed Web site for performing further changes to the Web site.
In some embodiments of the invention, compression unit 26 stores a template of the compression and/or some of the parameters used in the compression, such that a subsequent compression after changes are performed, uses at least part of the results of previously performed compressions. Optionally, the template includes a map of the compressed Web site and/or other determined parameters of the Web site, as described below.
Alternatively or additionally to having compression unit 26 compress the Web site when it is passed from preparation computer 22 to Web server 24, compression unit 26 periodically and/or upon commands from the Web master compresses the Web site on Web server 24 and/or on preparation computer 22. Optionally, compression unit 26 uses statistics gathered by Web server 24 regarding the access to the different pages of the Web site, in performing the compression. In some embodiments of the invention, the compression is customized periodically according to the current statistics.
Alternatively or additionally, compression unit 26 is located between Web server 24 and a client 28. When a user enters the Web site, compression unit 26 compresses the Web site 26 and/or portions thereof on the fly, optionally according to a user profile of the client.
In some embodiments of the invention, compression unit 26 and Web server 24 are located on separate processors. Alternatively, compression unit 26 is a software located on Web server 24. Optionally, compression unit 26 is a plug-in unit, such as an MS Internet Information Server Filter, which cooperates with Web server 24. Alternatively or additionally to providing compression unit 26 in association with web server 24, it may be provided in other manners, for example, as a proxy or as a stand-alone service.
In some embodiments of the invention, client 28 comprises a standard Web browser and does not require any special software in order to carry out the invention. Alternatively, some clients may be optimized for use with client 28 so as to enhance the advantages of the present invention. In a particular example, client 28 may comprise a browser including software to decompress page files (as described below).
Fig. 2 is a flowchart of the acts of compression unit 26 in compressing a Web site, in accordance with an embodiment of the present invention. Optionally, compression unit 26 creates (50) a map of the Web site. In some embodiments of the invention, the map includes indication of the Web pages of the Web site and the hypertext links which lead between the Web pages. Optionally, the map also indicates for each page the embedded elements of the page (e.g., images) which are included in separate files, and the pages which refer to each of the embedded elements. The map is possibly created using any method, such as applying the BFS or DFS graph traversing methods to the "graph" defined by the hyperlinked files. Alternatively or additionally, the map and/or portions thereof are imported by compression unit 26 from an external hardware or software unit. Alternatively or additionally, the maps are created by the preparation computer 22, for example, by the site preparation software. In some embodiments of the invention, compression unit 26 finds (52) duplicate embedded elements and/or duplicated pages. In an exemplary embodiment of the invention, compression unit 26 reviews the embedded elements of the Web site, while preparing the map or separately, before or after, and prepares a short catalog which lists a few parameters (e.g., length, type and/or leading bits) of each of the embedded elements. Thereafter, elements which have identical parameter values are compared (e.g., bit by bit) to determine whether they are identical. In some embodiments of the invention, elements which are determined to be similar but not identical, are each split into two separate portion elements, one portion element which is identical in all the similar elements and one portion element which contains the non- identical portions, for each of the similar elements. A user alert may be generated in offline embodiments, to allow a user (e.g., site manager) to consolidate two files. Alternatively or additionally, an after the fact alert may be provided to a user, for example, in on-the-fly compression systems.
In some embodiments of the invention, compression unit 26 receives (54) statistical information on the visiting patterns of the Web site. The statistical information includes, for example, the number of visits in each page of the Web site, the frequencies of usage of the hypertext links of each of the pages and/or the frequencies of entrance to each of the pages from external and/or internal links.
In some embodiments of the invention, compression unit 26 groups (56) the Web pages into one or more page groups which are included in a single combined file. The grouping may depend, for example, on one or more of statistical considerations of intra and inter- group links and/or link following rates, on relative file sizes of each group, on a desire for certain pages to come up faster (e.g., smaller page files or different grouping) and/or on a sharing of embedded elements between pages. Each page group is converted (58) into a single combined HTML page. The HTML descriptions of the pages in each group are stored together in a single respective combined file. Hypertext links leading from one page of the group to another page of the group, are converted into Java scripts which actuate the display of the other page. In some embodiments of the invention, the display of the page does not change due to the replacement of the hypertext links by Java scripts.
In some embodiments of the invention, hypertext links which lead to other combined pages of the Web site are converted (60) into suitable links in accordance with the combining of the pages. Optionally, the hypertext link of a page is converted into a link that states the URL of the combined page with a parameter which states the position of the page in the combined page.
Optionally, compression unit 26 determines whether the embedded elements are in a compressed form and, if necessary, compresses (62) or re-compresses the elements. In some embodiments of the invention, different compression ratios and/or compression methods are used for different embedded elements and/or in different compression instances. For example, the compression ratio may be adjusted according to user preferences (i.e., whether higher quality or faster service is desired), the bandwidth of the clients connection and/or the size of a specific combined file.
Fig. 3 is a schematic block diagram of the structure of a combined file 70, in accordance with an embodiment of the present invention. In some embodiments of the invention, combined file 70 comprises a master page record 72 that is automatically displayed by the client when the packet is downloaded. In addition, combined file 70 comprises one or more slave page records 74 that describe additional pages, which are not generally displayed immediately when combined file 70 is downloaded, but rather responsive to actuation of a Java script in one of the other pages included in combined file 70.
Optionally, master page record 72 comprises an open HTML description of one of the original pages of the group. Alternatively, master page record 72 describes a pseudo page which comprises an automatically opening script which initiates the display of one of the pages as described in a respective slave page record 74. When one of the slave pages is requested directly, a different page file may be sent, in which the slave page is a master. Alternatively, only an indication of which page to show first, is changed. Alternatively, the opening script may connect to web server 24 or compression unit 26 to receive an indication of which page to show first. Alternatively, a separate file including such an indication is sent to the client.
Optionally, some or all of slave page records 74 comprise compressed HTML page descriptions. Alternatively, at least some of the pages the pages are uncompressed. The slave page records are compressed using any suitable compression method, for example the LZ (Lempel Ziv) method and/or the WLZ (Walsh, Lempel, Ziv) method. Alternatively or additionally, slave page records 74 comprise standard non-compressed HTML page descriptions, for example Java script for displaying the pages.
In some embodiments of the invention, combined file 70 comprises embedded element records 76 which contain descriptions of the embedded elements of the pages represented by combined file 70. Hypertext links to the embedded elements are optionally converted to Java scripts that, conditionally or unconditionally, initiate the display of the contents of respective embedded element records 76, upon displaying the page. In some embodiments of the invention, embedded elements that are included in a plurality of the pages of the group are stored in only a single embedded element record 76, which may be actuated from a plurality of different pages. Alternatively or additionally, some of the embedded elements are stored in separate files and are downloaded responsive to a hypertext link, as is known in the art. Possibly, the decompressed, shared, embedded objects are stored in a local cache of the browser and/or operating system In some embodiments of the invention, embedded elements which are included in a plurality of pages which are stored in different combined files 70 are repeated in each of the combined files 70. Alternatively or additionally, at least some of the embedded elements are stored only in one of the combined files 70 and when they are required for Web pages in other combined files 70, the combined file containing the embedded element is downloaded. Further alternatively or additionally, embedded elements which appear only in pages of a single file 70 are stored within the file, while embedded elements which are included in pages of a plurality of files 70 are stored in separate files.
In some embodiments of the invention, the contents of combined file 70 are arranged such that the master page 72 may be displayed, partially or in its entirety, by the client, before all of combined file 70 is received. Optionally, master page 72 with the page record 74 of the page automatically displayed and the embedded element records referenced by the automatically displayed page are located at the top (i.e., the first transmitted area) of combined file 70. Fig. 4 is a flowchart of the acts performed in downloading pages of a Web site, in accordance with an exemplary embodiment of the present invention. Client 28 transmits (80) to Web server 24 a request to view a page of the Web site. Web server 24 responds by transmitting (82) a combined file 70 that includes the requested file as the automatically displayed page. In some embodiments of the invention, each of the pages of the Web site is generally included in a single combined file 70. When more than one of the pages included in a single combined file 70 may be accessed directly, Web server 24 optionally adjusts the combined file (if necessary) before it is transmitted, so that the automatically displayed page of the file is the requested page. Alternatively or additionally, Web server 24 carries a few versions of at least some of the combined files 70, which versions differ in the page automatically displayed when the file is downloaded. Possibly, web server 24 determines which version to transmit (82) according to the requested page, for example, by the request address mapping to a suitable stored file version.
Optionally, some pages of the Web site, are not allowed direct access, by clients, without passing through previous pages of the Web site. In some embodiments of the invention, such pages are included in a combined file 70 in which they are not the automatically displayed page. When a request for such a page is received, the respective combined file 70 is optionally downloaded and a different page from the file is automatically displayed. Alternatively, Web server 24 responds with an error message to such requests. Thus, Web server 24 can simply prevent direct access to a desired page without passing through an introductory page, for example, a log-in page, which the Web master wants all clients to display before reaching the desired page. Alternatively or additionally, a log-in page or other pre-personalization page may be sent as a separate file 70, with group pages only being generated once the personalization of the pages is determined. The personalization may be applied to HTML files, which are then compressed. Alternatively, they may be applied directly to the compressed files, for example by record replacement.
When client 28 receives combined file 70, it automatically accesses master page record 72 and accordingly displays (84) one of the pages included in the received combined file 70. In some embodiments of the invention, the operation of client 28 in opening combined file 70 is exactly as if a regular HTML file is received. Optionally, the user does not know that combined file 70 is not a regular HTML file. As described above, the displayed page typically includes one or more controls which actuate Java scripts. These controls operate from the point of view of a user of client 28 in substantially the same way as hypertext links. In addition, in an exemplary embodiment of the invention, the page typically includes one or more hypertext links that relate to Web pages not included in the downloaded file.
The user of client 28 may actuate one of the controls on the displayed page. Responsive thereto, the respective Java script of the control is actuated. Optionally, the Java control decompresses (86) the contents of the respective slave page record 74, and displays the page, only when needed. Alternatively, some or all of records 74 in a received file 70 are decompressed upon receipt and are stored in a temporary memory in a decompressed form. In some embodiments of the invention, the decompression (86) is performed seamlessly such that the user of client 28 does not notice the decompression. In some embodiments of the invention, the Java control is a stand alone script which does not require additional commands for operation. Alternatively, the Java control actuates, with one or more specific parameter values, a separate Java script, which is used by substantially all the Java, scripts of pages compressed in accordance with the present invention, for example being a decompression program. Optionally, the separate Java script is stored within a browser of the client. Alternatively or additionally, the separate Java script is provided as an embedded element.
If the user actuates a regular hypertext link, client 28 sends a request accordingly to Web server 24 (or a different, unrelated web server, for links outside the site) that responds by transmitting the desired page (in a regular HTML page or in another combined file 70). In some embodiments of the invention, if the user actuates in this additional page, a hypertext link which leads to one of the slave pages of the previous combined file 70, the browser finds the referenced combined file 70 in its cache and a parameter in the link leads to the specific desired page within the combined file (e.g., the link in the additional page may be adapted to match the previous combined file 70). Alternatively, client 28 sends a request with the URL of the requested page to Web server 24. In some embodiments of the invention, Web server 24 responds with a short HTML file which includes a Java script that accesses the slave page record 74 of the requested page in the combined file 70. Further alternatively, Web server 24 responds by re-transmitting the requested page, either by itself or with a combined file in which the requested page is the automatically displayed page. Further alternatively, each time a combined file 70 is received by client 28, a Java script which extracts the pages in slave page records 74 into a cache of client 28, as if the pages were received on their own as regular HTML files. Each page is stored in the cache with its URL, such that when a request for the page is generated when the page is still in the cache, the page is found in the cache and displayed therefrom.
In some embodiments of the invention, Web server 24 hosts, for at least some of the Web pages of the site, a plurality of combined files 70 that include the Web page. The plurality of Web pages include the Web page with different other pages and/or with different compression styles and/or ratios. When client 28 transmits (80) to Web server 24 a request to view the Web page, Web server 24 chooses one of the plurality of files containing the page to be transmitted (82) to client 28, responsive to which combined files the user previously downloaded and/or responsive to a user profile. The user profile may include, for example, a standard user behavior (e.g., whether the user usually actuates hypertext links at the top or the bottom of the page), topics which interest the user and/or user preferences (e.g., voice files, long articles, images). Alternatively or additionally, Web server 24 customizes, on the fly, the combined file 70 in which the requested page is the automatically displayed page, based on the user profile. Fig. 5 is a schematic illustration of an exemplary simplified Web site 90 and two optional file organizations 92 and 94, in accordance with an embodiment of the present invention. Web site 90 comprises Web pages indicated by digits 1-6 and links between the pages are indicated by arrows. A first optional file organization 92, includes two combined files (A and B). When a page of Web site 90 is requested from Web server 24, the file containing the requested page is transferred to the client, with the requested page being set as the automatically displayed page of the file. This organization, not only provides faster transmission of the Web pages to the user, but also reduces the space required to store the Web site, on Web server 24.
In optional file organization 94, substantially each page (1, 2, 3, 4 and 5) has a separate file (C, D, E, F and G), respectively, which is transmitted to the client if the user first enters the Web site from that particular page. The respective pages of the file, listed first in Fig. 5, are the pages which are automatically displayed when the files are downloaded. Each file (C, D, E, F and G) is customized for its respective page such that the pages included in its slave page records 74 are the pages to which the user is most likely to move from the displayed page. For example, in file E page 3 is accompanied by pages 1 and 5, the only pages in the Web site to which page 3 has hypertext links.
In an exemplary scenario, a user enters the Web site from page 1, and therefore receives file C. From page 1 the user moves to page 4 by actuating the respective Java script which displays page 4 from the contents of file C. No transmission is thus required from Web server 24 for displaying page 4. From page 4 the user moves to page 3, again using a Java script and the contents of file C. From page 3 the user moves to page 5, which is not included in file C (e.g., the link is not encoded as a Java script to display part of the same file, but as a regular HTTP link). Therefore, a request for page 5 is sent to Web server 24 which responds by transmitting file G to the client. It is noted that in some embodiments of the invention, page 4 is re-transmitted in file G although it was already transmitted in file C.
Alternatively to re-transmitting pages transmitted in other files (e.g., page 4), in some embodiments of the invention, before transmitting a file (e.g., file G), server 24 determines whether one or more of the pages and/or embedded elements in the file were recently transmitted to the client. Such pages and/or embedded elements are optionally replaced in the file by short Java scripts which refer to the pages and/or embedded elements in previously transmitted files. In the current example, the description of page 4 is replaced in file G by a Java script which if actuated displays page 4 based on the contents in file C, which is typically still in the cache of the client. If when the Java script is actuated, file C was already erased from the cache, a request for page 4 will be retransmitted and accordingly, file F will be forwarded to the client.
In file organization 94 there is no file for page 6, as it is assumed that page 6 may not be accessed from outside the Web site. If this is not true, a separate file for page 6 may be included in file organization 94.
Although file organization 94 requires more space on server 24 than file organization
92, it may provide a faster response time as each page is transmitted with the pages to which the user is most likely to move to. Alternatively, file organization 94, or parts thereof, is not actually stored in its entirety on Web server 24. Rather, the files of file organization 94 are generated on the fly from stored building blocks.
Referring in detail to grouping (56, Fig. 3) the Web pages, in some embodiments of the invention, the pages are grouped according to the site map and the usage statistics such that the pages transmitted with a requested page are those which are most likely to be accessed from the requested page. Alternatively or additionally, the pages transmitted with a requested page are those which are most likely to be accessed by the client in the near future.
Alternatively or additionally, the pages are grouped such that when possible all the pages which reference a specific embedded element are included in a single combined file. In some embodiments of the invention, the pages are grouped responsive to a desired size (or size range) of combined files 70 and/or the bandwidth with which the client connects to the server. Optionally, the desired size is approximately equal to the average size of the files of the original Web pages of the compressed Web site. Alternatively, the desired size is a predetermined percent greater than the size of the original Web pages, possibly a percent which is substantially unnoticed by the client or the user at the client.
In some embodiments of the invention, compression unit 26 splits one or more pages into a plurality of separate pages. For example, in some embodiments of the invention, pages that contain both static information, that never or rarely changes, and dynamic information (e.g. updated hourly), are split into dynamic and static parts, which may be compressed separately. Optionally, when the page is downloaded by a client, a first part of the page (e.g., the static part) is first downloaded to the client, and when the first part is opened by the client, the client requests the second part as an embedded element is normally ordered.
In some embodiments of the invention, when a page transmitted within a slave page record 74 is opened (before any hypertext links of the page are actuated) by client 28 using a Java script, the Java script may initiate for certain pages, the retrieval of another file which includes additional pages which are most likely to be accessed by the user from the current page. Thus, in some cases, the pages are transmitted to the client before the client requested the pages and the latency for waiting for the pages to arrive is shortened. In some embodiments of the invention, client 28 is customized for use with servers that operate in accordance with the present invention. Optionally, the client displays in a special color (or other indication) links that lead to pages already downloaded. Alternatively or additionally, the client notifies, at the beginning of an HTTP session, which particular parameters compression unit 26 should use, for example, whether embedded elements should be transmitted within the same file as the HTML of the pages or in a separate file. Such notification may be, for example, by the way of cookies.
It is noted that although the above description relates to reorganizing a Web site, the present invention may be applied to substantially any group of Web pages, referred to herein as a virtual Web site. For example, a page which provides search results may be compressed and transmitted to the client with one or more of the pages found in the search. An exemplary implementation is described in Israel application 133,888, the disclosure of which is incorporated herein by reference. It is further noted that although the present invention has been described in relation to the TCP/IP protocol suite, some embodiments of the invention may be implemented with relation to other packet based transmission protocols, such as, for example IPX, DECNET and the ISO protocols. Furthermore, although the present invention is described with relation to the HTTP protocol, the principles of the present invention may be used with relation to other application protocols, such as WAP (wireless application protocol), WML, and e-mail transmission of pages. For example, instead of transmitting a newsletter or other e-mail with links to one or more sites, the email may include one or more combined files that include some or all of the referenced pages. It is also noted that the present invention may be used for tasks other than transmission, for example, for storage.
In addition, although the hypertext links were described as being replaced by Java scripts, any other scripts or controls may be used, including but not limited to, VB-Scripts, Java applets and activeX scripts.
It will be appreciated that the above described methods may be varied in many ways, including, changing the order of steps, and the exact implementation used. It should also be appreciated that the above described description of methods and apparatus are to be interpreted as including apparatus for carrying out the methods and methods of using the apparatus.
The present invention has been described using non-limiting detailed descriptions of embodiments thereof that are provided by way of example and are not intended to limit the scope of the invention. It should be understood that features and/or steps described with respect to one embodiment may be used with other embodiments and that not all embodiments of the invention have all of the features and/or steps shown in a particular figure or described with respect to one of the embodiments. Variations of embodiments described will occur to persons of the art.
It is noted that some of the above described embodiments describe the best mode contemplated by the inventors and therefore include structure, acts or details of structures and acts that may not be essential to the invention and which are described as examples. Structure and acts described herein are replaceable by equivalents which perform the same function, even if the structure or acts are different, as known in the art. Therefore, the scope of the invention is limited only by the elements and limitations as used in the claims. When used in the following claims, the terms "comprise", "include", "have" and their conjugates mean "including but not limited to".

Claims

1. A method of providing information, comprising: providing a plurality of files including descriptions of a plurality of respective Web pages; selecting at least a sub-group of the Web pages; creating a combined file which includes descriptions of the Web pages in the subgroup; and transmitting the combined file to a client responsive to a request for one of the selected Web pages received from the client.
2. A method according to claim 1, wherein providing the plurality of files comprises providing files describing Web pages included in a single Web site.
3. A method according to claim 1 or claim 2, wherein providing the plurality of files comprises providing at least links to files that are the results of a search.
4. A method according to any of claims 1-3, wherein selecting the sub-group comprises selecting responsive to a map of the interconnections of the plurality of Web pages.
5. A method according to any of claims 1-4, wherein selecting the sub-group comprises selecting responsive to statistics of the usage of the plurality of Web pages.
6. A method according to any of claims 1-5, wherein selecting the sub-group comprises selecting responsive to a user profile.
7. A method according to any of claims 1-6, wherein selecting the sub-group comprises selecting responsive to a bandwidth of a link on which the file is transmitted.
8. A method according to any of claims 1-7, wherein creating the combined file comprises creating a combined file in which at least some of the descriptions of the Web pages are compressed.
9. A method according to any of claims 1-8, wherein creating the combined file comprises replacing at least one of the hypertext links in one or more of the selected pages with a script which actuates the display of the page referenced by the link.
10. A method according to claims 9, wherein replacing at least one of the hypertext links comprises replacing links which lead to one of the selected pages with a script which actuates the display of the page referenced by the link from within the combined file.
11. A method according to any of claims 1-10, wherein creating the combined file comprises replacing links to pages included in one or more other combined files with links which indicate the location of the referenced page within the other combined file.
12. A method according to any of claims 1-11, wherein creating the combined file is performed responsive to receiving the request from the client.
13. A method according to any of claims 1-11, wherein creating the combined file is performed independently of said request.
14. A method according to any of claims 1-13, wherein creating the combined file comprises detecting repeated embedded objects between the plurality of pages.
15. A method according to claim 14, wherein creating the combined file comprises detecting repeated embedded objects between the selected pages.
16. A method according to claim 15, comprising providing only one copy of said repeated object in said file.
17. A method according to any of claims 14-15, comprising providing only one copy of said repeated object as a separate file.
18. A method according to any of claims 1-17, wherein said request is an HTTP request.
19. A method according to any of claims 1-18, wherein transmission of the combined instead of a regular file, is transparent to a user that generate said request.
20. A method according to any of claims 1-19, wherein selecting at least a sub-group comprises selecting fewer than all the plurality of files.
21. A method according to any of claims 1-20, comprising maintaining a copy of said files on a file server associated with a storage of said combined file.
22. A method of providing information, comprising: receiving a request for a Web page, including one or more links to data elements, from a client; and transmitting to the client, in response to said request, a combined file including descriptions of the requested page and one or more of the data elements referenced by the one or more links of the Web page.
23. A method according to claim 22, comprising generating the combined file responsive to receiving the request from the client.
24. A method according to claim 22 or claim 23, wherein the combined file is generated before receiving the request from the client.
25. A method according to any of claims 22-24, wherein the one or more of the data elements referenced by the links of the Web page comprise at least one additional Web page.
26. A method according to any of claims 22-24, wherein the one or more of the data elements referenced by the links of the Web page comprise embedded objects.
27. Apparatus for web page serving, comprising: a compression unit that provides at least one combined file including the description of a plurality of WWW pages; and a web server that receives requests for WWW pages and responds with at least one of said combined files.
28. Apparatus according to claim 27, wherein said compression unit generates said file responsive to said request.
29. Apparatus according to claim 27 or claim 28, wherein said compression unit generates said file to be personalized for a particular user.
30. Apparatus according to claim 27, wherein said compression unit generates said file responsive to a request by a WWW site manager.
31. Apparatus according to claim 27 or claim 30, wherein said compression unit maintains a copy of uncompressed versions of said WWW pages.
32. Apparatus according to any of claims 27-31, wherein said compression unit comprises a grouper that selectively groups pages together based, at least, on their link structure.
33. Apparatus according to any of claims 27-32, wherein said compression unit comprises a redundancy detector that detects embedded elements repeated between said pages.
34. Apparatus according to any of claims 27-33, wherein said compression unit is integrated with a WWW site construction program.
35. Apparatus according to any of claims 27-33, wherein said compression unit is integrated with a WWW site maintaining program.
PCT/IL2000/000721 2000-01-05 2000-11-05 Coding and transmission of multiple web pages WO2001050298A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU11739/01A AU1173901A (en) 2000-01-05 2000-11-05 Coding and transmission of multiple web pages

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL13388800A IL133888A0 (en) 2000-01-05 2000-01-05 Method and algorithm for viewing search results in the internet and multi-page system using the same
IL133888 2000-01-05

Publications (2)

Publication Number Publication Date
WO2001050298A2 true WO2001050298A2 (en) 2001-07-12
WO2001050298A3 WO2001050298A3 (en) 2003-05-30

Family

ID=11073681

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2000/000721 WO2001050298A2 (en) 2000-01-05 2000-11-05 Coding and transmission of multiple web pages

Country Status (3)

Country Link
AU (1) AU1173901A (en)
IL (1) IL133888A0 (en)
WO (1) WO2001050298A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005022404A1 (en) * 2003-09-03 2005-03-10 International Business Machines Corporation Offline browsing with mobile device
CN100366072C (en) * 2003-03-27 2008-01-30 国际商业机器公司 Ultra light browser
CN102982046A (en) * 2011-09-07 2013-03-20 中国移动通信集团公司 Storage method and system for webpage data compression
CN107918638A (en) * 2016-10-11 2018-04-17 佳能株式会社 Information processor, document display method, file display system and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802520A (en) * 1996-09-16 1998-09-01 Software Builders International, L.L.C. System and method for manipulating compressed files
US5991713A (en) * 1997-11-26 1999-11-23 International Business Machines Corp. Efficient method for compressing, storing, searching and transmitting natural language text

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5802520A (en) * 1996-09-16 1998-09-01 Software Builders International, L.L.C. System and method for manipulating compressed files
US5991713A (en) * 1997-11-26 1999-11-23 International Business Machines Corp. Efficient method for compressing, storing, searching and transmitting natural language text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CARD S K ET AL: "THE WEBBOOK AND THE WEB FORAGER: AN INFORMATION WORKSPACE FOR THE WORLD-WIDE WEB" COMMON GROUND. CHI '96 CONFERENCE PROCEEDINGS. CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. VANCOUVER, APRIL 13 - 18, 1996, CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, NEW YORK, ACM, US, 13 April 1996 (1996-04-13), pages 111-117, XP000657809 ISBN: 0-201-94687-4 *
PIROLLI P ET AL: "Silk from a Sow's Ear: Extracting Usable Structures from the Web" XEROX RESEARCH CENTER, 11 July 1996 (1996-07-11), XP002128179 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100366072C (en) * 2003-03-27 2008-01-30 国际商业机器公司 Ultra light browser
WO2005022404A1 (en) * 2003-09-03 2005-03-10 International Business Machines Corporation Offline browsing with mobile device
KR100745431B1 (en) * 2003-09-03 2007-08-02 인터내셔널 비지네스 머신즈 코포레이션 Offline browsing with mobile device
CN100461166C (en) * 2003-09-03 2009-02-11 国际商业机器公司 Offline browsing with mobile device
US9811603B2 (en) 2003-09-03 2017-11-07 International Business Machines Corporation Transport and administration model for offline browsing
US10331755B2 (en) 2003-09-03 2019-06-25 International Business Machines Corporation Transport and administration model for offline browsing
CN102982046A (en) * 2011-09-07 2013-03-20 中国移动通信集团公司 Storage method and system for webpage data compression
CN107918638A (en) * 2016-10-11 2018-04-17 佳能株式会社 Information processor, document display method, file display system and medium
EP3309695A1 (en) * 2016-10-11 2018-04-18 Canon Kabushiki Kaisha Information processing apparatus, document display method, document display system, and program
US10572546B2 (en) 2016-10-11 2020-02-25 Canon Kabushiki Kaisha Information processing apparatus, document display method, document display system, and medium

Also Published As

Publication number Publication date
AU1173901A (en) 2001-07-16
WO2001050298A3 (en) 2003-05-30
IL133888A0 (en) 2001-04-30

Similar Documents

Publication Publication Date Title
US8271689B2 (en) System and method for partial data compression and data transfer
US8805957B2 (en) Method and apparatus for communications over low bandwidth communications networks
US8176428B2 (en) Portable internet access device back page cache
JP4716645B2 (en) Document viewing method
US7246177B2 (en) System and method for encoding and decoding data files
US6070184A (en) Server-side asynchronous form management
JP4233638B2 (en) A method for communicating between a web browser running on a first computer that is temporarily and intermittently connected to a second computer and a web server in the second computer, and system and computer readable program thereof Recorded computer-readable recording medium
EP1940126A2 (en) Relay server and client terminal
US20020046262A1 (en) Data access system and method with proxy and remote processing
US20050038874A1 (en) System and method for downloading data using a proxy
JPH11502047A (en) Time coherent cash system
US20040049598A1 (en) Content distribution system
US20030177172A1 (en) Method and system for generating a graphical display for a remote terminal session
EP1729211B1 (en) Transmission and reception of display information that configures a screen with multiple screen elements
US20020099785A1 (en) Enhanced multimedia mobile content delivery and message system using cache management
JPH11514117A (en) TCP communication system with low overhead
JPH11500895A (en) Client / server communication system
US20020133566A1 (en) Enhanced multimedia mobile content delivery and message system using load balancing
JP2004527028A (en) Digital TV application protocol for interactive TV
CN101662503A (en) Information transmission method, proxy server and service system in network
CN101335766A (en) Communication system, proxy server, method of controlling same and control program therefor
JP2002502521A (en) System and method for managing a connection between a server and a client node
US20020083130A1 (en) Method and system for referring to data over network
WO2001050298A2 (en) Coding and transmission of multiple web pages
JP2004192493A (en) Storage device controller, information processing apparatus, and program

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: COMMUNICATION PURSUANT TO R. 69 EPC (F. 1205A DATED 14.10.03)

122 Ep: pct application non-entry in european phase