CA2384687A1 - Method and apparatus for compressing scripting language content - Google Patents

Method and apparatus for compressing scripting language content Download PDF

Info

Publication number
CA2384687A1
CA2384687A1 CA002384687A CA2384687A CA2384687A1 CA 2384687 A1 CA2384687 A1 CA 2384687A1 CA 002384687 A CA002384687 A CA 002384687A CA 2384687 A CA2384687 A CA 2384687A CA 2384687 A1 CA2384687 A1 CA 2384687A1
Authority
CA
Canada
Prior art keywords
scripting language
tag
data
codewords
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002384687A
Other languages
French (fr)
Inventor
Robert Charles Booth
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arris Technology Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2384687A1 publication Critical patent/CA2384687A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2355Processing of additional data, e.g. scrambling of additional data or processing content descriptors involving reformatting operations of additional data, e.g. HTML pages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method and apparatus for compressing scripting language content, such as HyperText Markup Language (HTML). Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. The codewords are combined (230) with translated (or coded) text (215) to provide comprssed HTML data. The codewords may have reserved bits t o distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or the provide other info rmatio n about the tag to aid in processing. The amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal, is thereby reduced. Additionally, the invention allows the use of a graphics engine (156) or browser (159), e.g. in a subscriber terminal (150) that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. The technique is compatible with other compression techniques to provide even greater compression.</SDOA B>

Description

METHOD AND APPARATUS FOR COMPRESSING SCRIPTING LANGUAGE
CONTENT
BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HyperText Markup Language (HTML).
HTML is a system for marking documents to indicate how the document should be displayed, and how various documents should be linked together. HTML has been used extensively to provide documents (e. g., Web pages) on the Internet. The documents are organized into Web spaces, where a Web space includes a home page and links to other documents which may be in the local Web space or in an external Web space. Such links are known as hyperlinks. Documents may include moving images, text, graphical displays, and sound.
HTML is a form of Standard Generalized Markup Language (SGML), defined by the International Standards Organization (ISO), reference number ISO 8879:1986.
HTML specifies the grammar and syntax of markup tags which are inserted into a data file to define how the data will be presented (e.g., rendered) when read by a computer program known as a browser. The computer's browser and/or graphics engine processes the data to format a layout for the page so the page can be viewed by the user on a display terminal or device.
A SGML document includes three parts. The first part describes the character set, or codes, which are used in the language. The second part defines the document type, and which markup tags are recognized.
The third part is known as the document instance and contains the actual text and markup tags. The three parts may be stored in different files. Furthermore, HTML browsers assume that files of different pages contain a common character set and document type, so only the text and markup tags will change for different pages.
HTML elements include tags and character entities.
Character entities are predefined characters from the ISO Latin-1 alphabet that are not defined in ASCII, and characters used to mark the beginning and end of an HTML element. For example, the character entity "&lt"
designates the character "<" ("less than" sign).
HTML tags are enclosed in angled brackets to distinguish them from the page text. The tags may appear alone (as standalone or empty tags), or may appear at the start and end of a field of the page text (as non-empty or container tags). For example, <P> is an empty tag that indicates the start of a new paragraph, while <I> and </I> are container tags that modify the contained text (e. g., <I>Welcome to my home page</I> indicates the phrase "Welcome to my home page"
should be italicized). "<I>" is the starting tag, and "</I>" is the ending tag.
Generally, HTML tags provide text formatting, hypertext links to other pages, and links to sound and picture elements. HTML tags also define input fields for interactive Web pages.
Additionally, some tags have one or more associated attributes that can be specified with the tag. For example, the tags <A> and </A> are anchor codes that define a section of text as a hyperlink, or target of another hyperlink. The attributes of the tag include HREF=url, NAME=name, and TITLE=text. Thus, the HTML code "<A HREF="http://www.uspto.gov">U.S. Patent and Trademark Office</A>" will cause the text "U. S.
Patent and Trademark Office" to appear on a browser with special highlighting (such as a special color and/or underlining) that designates the text as a hyperlink. When the user clicks on the text, the Web address "www.uspto.gov" is accessed.
Moreover, tags can have secondary, or sub-attributes. For example, the tag is an empty tag that designates that an inline image is to be placed in a page. The attributes include SRC=url, which specifies the URL of the file containing the image to be embedded, ALT=text, which specifies a text string that can be displayed in the image is not available, and ALIGN=[TOP~MIDDLE~BOTTOM], which identifies how the image should be aligned with the adjacent text and other HTML elements. Thus, ALIGN is an attribute, and TOP, MIDDLE and BOTTOM are sub-attributes. An example HTML code is: "<IMG SRC="filename.GIF" ALT="filename"
ALIGN=middle>".
HTML tags and attributes are referred to herein generally as HTML "elements". Moreover, the term "attributes" generally encompasses the different levels of sub-attributes.
An HTML application is made available to users on the Web by storing the HTML file in a directory that is accessible to a server. Such a server is typically a Web server which conforms to a Web browser-supported protocol known as Hypertext Transfer Protocol (HTTP).
Alternatively, HTML content may be stored at the headend of a subscriber communication network, such as a cable/satellite television network. There is an increasing trend toward providing HTML content to subscribers via such networks due to the network's high speed data rates, the potential commercial benefits for tying in the HTML content with traditional television programming services, the expected convergence of telephone, television and computer networks, and the expected rise of in-home computer networks. The HTML
content may be selected and provided directly by the headend, or the headend may merely act as a conduit in a high speed link between the subscriber and remote Web servers.
Servers that conform to other protocols, such as the File Transfer Protocol (FTP) or GOPHER may also be accessed by an HTTP browser by using a proxy server. A
proxy server is a type of gateway that allows a browser using HTTP to communicate with a server that does not understand HTTP, but which uses, e.g., FTP, Gopher or other protocols. The proxy server accepts HTTP
requests from the browser and translates them into a format that is suitable for the origin server, such as an FTP request. Similarly, the proxy server translates FTP replies from the server into HTTP replies so that the browser can understand them.

Generally, the FTP file itself is not translated.
FTP is a high level protocol for transferring files (as is HTTP). The said translation would occur at the protocol level. For example, a client browser may send 5 the HTTP request 'GET
http://www.myserver.com/somefile.txt HTTP/1.1'. This would be translated at a proxy into an FTP 'GET' request to be forwarded to the FTP origin server. The FTP response from the origin server back to the proxy (which has the requested file attached) is then translated (at the proxy) into an HTTP response that includes the attached fide. The file being transferred is not translated or modified. However, in some cases, the browser may indicate that it can decode certain encoding or compression formats. Thus, the proxy may translate (encode or compress) the attached file before it is transmitted to the client.
The proxy server can be a program running on the same machine as the browser, or a free-standing machine somewhere in a network that serves many browsers.
For example, the headend of a subscriber communication network may provide a proxy server function.
HTTP defines a set of rules that servers and browsers follow when communicating with each other.
Typically, the process begins when a user clicks on an icon in an HTML page which is the anchor of a hyperlink, or the user types in a Uniform Resource Locator (URL). The URL contains a host name that is typically resolved into an IP address via a domain name system (DNS) lookup. A connection is then made to the host server using the IP address (and possibly a port number) returned by the DNS lookup. Next, the browser sends a request to retrieve an object from the server, or to post data to an object on the server. The server sends a response to the browser including a status code and the response data. The connection between the browser and server is then closed.
The URL is a unique address which identifies virtually all files and resources on the Internet.
However, due to the flexibility of HTML, and the variety of tags with their attributes and sub-attributes that are supported, the amount of data needed to represent any given Web page can be very large. Accordingly, the amount of processing power required by a user's terminal and browser may not be sufficient to keep up with the flow of data, thereby resulting in undesirable delays in rendering the data on the user's screen, or other problems.
Moreover, an increasing amount of bandwidth for transmitting the HTML data is consumed, thereby reducing the available bandwidth for other uses, or taxing the capacity of the channel.
The HTML data may be transmitted via a Public Switched Telephone Network (PSTN), via a cable or satellite television network, via a local wireless network, or via a combination of the above, for example.
In particular, the base character set for HTML is Latin-1 (ISO 8859/1), which is an eight-bit alphabet with characters for most American and European languages. The 128-character standard ASCII (ISO 646) is a seven-bit subset of Latin-1. For simplicity and compatibility with different browsers, many Web pages include only an ASCII character set.
With eight bits or one byte of data required for each character (including a letter, number, punctuation symbol or blank space), for example, the HTML code:
"<IMG SRC="filename.GIF" ALT="filename" ALIGN=middle>"
has 52 characters, or 52 bytes of data.
Accordingly, it would be desirable to provide a system for compressing scripting language content such as HTML or any similar language.
The system should reduce the amount of bandwidth required to communicate HTML data to a browser or other (graphics) rendering engine.
The system should be suitable for use with existing networks over which HTML data is communicated.
The system should allow a browser that is implemented in a terminal (e. g., set-top box/decoder), in a subscriber television network, to directly process and render the compressed data without decompressing it.
The system should reduce the required processing power of a browser in a user terminal in a subscriber television network.
The system should provide a consistent and deterministic processing time for all HTML elements and attributes within a given page.
The system should be usable on a client/browser side or server side of a network.
The system should be usable on a proxy server that interfaces between a client/browser and a server, or other proxy servers.
The system should be compatible with networks that communicate HTML data using a digital video communication protocol, such as MPEG-2.
The system should be compatible with networks that communicate HTML data using the Transmission Control Protocol/internet Protocol (TCP/IP).
The system should provide compression for current versions of HTML, as well as derivations thereof and other analogous markup languages.
The system should be compatible with other bit level compression techniques.
The present invention provides a system having the above and other advantages.
SUMMARY OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HTML.
Codewords are provided for HTML or other scripting language elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page.
Moreover, the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing. The technique is compatible with other compression techniques to provide even greater compression.
The invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal. Moreover, the invention enables the use of a graphics engine or browser at the subscriber terminal that processes/renders the compressed HTML data directly, without decompressing it, thereby resulting in significant savings in processing time and complexity.
A particular method for processing scripting language data includes the step of parsing the HTML
data to separate text thereof from scripting language elements thereof. The scripting language elements include tags and their attributes, if any. Respective codewords, such as two-byte codewords, are provided for each different tag. The text is coded, such as with ASCII codes. The codewords are then combined with the coded text in the appropriate sequence to provide compressed scripting language data.
The codewords may have reserved bits to designate 5 specific information, such as whether the associated tag is an empty tag or a container tag.
For container tags, the codeword may designate whether the container tag is a starting tag or an ending tag.
10 The codewords may designate whether a tag is a style markup tag or a structural markup tag.
For structural markup tags, the codeword may designate whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
A respective codeword may be provided for each different attribute of a tag, including sub-attributes.
The codewords may also indicate the number of attributes that are associated with a tag.
In a particularly advantageous implementation, the compressed scripting language is communicated from an scripting language content server or headend to a subscriber terminal in a communication network.
For decompression of the compressed scripting language, e.g., at a subscriber terminal, the compressed scripting language data is parsed to separate the coded thereof from the codewords thereof.
The respective scripting language elements are provided for each corresponding different codeword, and the coded text is decoded to provide decoded text. Lastly, the scripting language elements are combined with the decoded text to provide the uncompressed scripting language data.
Optionally, the compressed scripting language data is communicated to a subscriber terminal in a communication network, and processed without recovering the scripting language elements to provide data suitable for display. Thus, the codewords are processed directly.
In addition, an optimal solution would cache (e.g., temporarily store) the compressed data in a proxy server for content that is accessed frequently by subscriber terminals.
A corresponding apparatus is also disclosed.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
FIG. 2 illustrates HTML compression in accordance with the present invention.
FIG. 3 illustrates HTML decompression in accordance with the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates to a method and apparatus for compressing scripting language content, such as HTML.
FIG. 1 illustrates a subscriber television network that uses HTML compression in accordance with the present invention.
Although the invention may be implemented in a variety of networks, it is particularly suitable for use in subscriber television networks that allow users (subscribers) to access HTML data, such as on the Internet. Typically, the user can access HTML content, such as Web pages, that is delivered via a downstream channel on the network. For example, a variety of techniques can be used to deliver HTML data via cable and satellite television networks. The user is typically provided with an upstream link via a conventional telephone network to enter commands, such as a URL address to request to view a particular Web page. Some cable television networks have an upstream user data channel that can be used for this purpose.
The request is received at a headend or other central location, and forwarded to the content server that is designated by the URL. The content returned by the server to the headend is then prepared for transport to the user. For example, the HTML data may be encapsulated in digital MPEG-2 packets that are in-band or out-of-band with programming service data (e. g., television programs, audio, etc.).
Or, the HTML data may be carried in the vertical blanking interval (vBI) of a digital or analog television signal.
The invention is compatible with essentially any communication technique for providing the HTML data to the end user.
The HTML content is subsequently recovered at the user's terminal and rendered by a browser application or graphics processing engine for viewing on a video monitor, such as a television or computer monitor.
The headend may act as a proxy server when interacting with the content server, e.g., when the URL
request from the user is in a format that is not compatible with the content server. In this case, the proxy server converts the URL request into the necessary format, and converts the content returned by the server into a format that the user's terminal can understand.
FIG. 1 shows an example embodiment wherein a network 100 includes a content server 110, a headend 130, and a user terminal 150. The content server 110 is representative of any number of available origin or proxy servers that store HTML data in a computer network such as the Internet.
Similarly, the user terminal 150 is representative of a population of terminals that can receive broadcast signals from a common service provider, such as the headend 130 in a cable/optical fiber or satellite television network.
An optional upstream channel 160, such as a conventional telephone link and modems, allows the terminal 150 to communicate directly with content servers.
A channel 162 is used by the headend 130, e.g., to broadcast programming services from function 136 (such as television programs, weather and stock data, shop at 5 home data and the like) to a subscriber terminal population, including the example terminal 150. HTML
content is also communicated to the terminal 150 via the one-way or bi-directional channel 162. The channel 162 may physically be implemented as coaxial cable, a 10 satellite link, optical fiber, local wireless channel (such as multi-point microwave distribution - MMDS), or a telephone link for example, or a combination thereof.
A channel 164 allows the headend 130 and the example content server 110 to communicate with each 15 other. This channel typically is implemented as a telephone link or Ethernet network. The server 110 is generally remote from the headend 130, although it is possible for the headend to store HTML content on a local storage media, such as digital video disc or magnetic tape, or on a hard drive of a file server.
Known networking architectures are used to provide the channel 164.
When the headend 130 provides the content locally, a limited amount of content is provided. The content may be selected to correspond to the programming services. In this case, a graphic may be overlaid with a television program to inform the user that related HTML content is available. For example, during a televised baseball game, the user can be directed to a Web site for baseball scores.
In some cases, the entire local content may be continually or periodically broadcast, e.g., on the same channel (or multiplex) as the programming service, or on a separate channel (or multiplex). This may occur on one-way only networks where the user has no upstream link to the headend. The selection of the desired HTML content then occurs at the user terminal 150.
Known conditional access techniques may be used to provide access to the HTML content on a fee basis.
The present invention is suitable with any of the above scenarios.
In the example FIG. 1, it is assumed that the user has some upstream channel (either 160 or 162) to cause selected HTML content to be recovered from the content server 110 and provided to the terminal 150 via the headend 130.
The content server 110, headend 132, and terminal 150 are shown with HTML compression functions 112, 132 and 152, respectively, and HTML decompression functions 114, 134 and 154, respectively. Not all of these functions are required, however.
Generally, compression of the HTML data provided to the terminal is most important. The HTML data output from the terminal, if any, is generally small.
This can vary, however, for example, if the user is sending HTML content to another user, or is authorized to send HTML content to modify the remote server 110.
The compression function 152 is used to compress HTML data transmitted from the terminal 150 to the headend 130 or the content server 110. The decompression function 154 is used to decompress compressed HTML data received from the headend 130 or content server 110.
The compression function 132 is used to compress HTML data transmitted from the headend 130 to the content server 110 or the terminal 150. The decompression function 134 is used to decompress compressed HTML data received from the content server 110 or the terminal 150.
The compression function 112 is used to compress HTML data transmitted from the content server 110 to the headend 130 or the terminal 160. The decompression function 114 is used to decompress compressed HTML data received from the headend 130 or the terminal 150.
The terminal 150 includes a user interface 158 for receiving user commands, e.g., via a keyboard or infra-red remote control. For example, the user may click on a graphic on the display 170 that is associated with a URL, to initiate the downloading of the corresponding HTML content to the terminal 150.
A browser 159 may be a full-featured browser application such as used on a personal computer, or a minimal browser that has only some basic functionality, such as text rendering or limited graphics rendering capabilities. The browser 159 is used in conjunction with the graphics engine 156 for rendering text and images for the display 170 from the HTML content received at the terminal 150.
A video decoder 157 may be used for rendering video, associated with the compressed (or uncompressed) scripting language content, for the display 170.
The display 170 may be a television screen or a video monitor for a PC, for example.
The processing power of the terminal 150 will dictate the level of features that can be supported by the browser 159 and the graphics engine 156.
The compression functions 112, 132 and 152 can implement an HTML compression scheme as shown in FIG.
2, while the decompression functions 114, 134 and 154 can implement an HTML decompression scheme as shown in FIG. 3.
FIG. 2 illustrates HTML compression in accordance with the present invention. The compression function 200 corresponds to the compression functions 112, 132, 152 of FIG. 1. A buffer/parser 210 receives uncompressed HTML data. Note that the HTML data may reference locations where audio, video or graphics data can be found.
The text is parsed and provide to a conventional text coding function 215 to provide coded text, e.g., as ASCII data.
The HTML elements, such as tags, including their attributes, sub-attributes, sub-sub-attributes, if any, and so forth, are parsed and provided to a compression function 220, which optionally has a look-up table 225 that can be implemented using known techniques. The look-up table 225 associates a codeword with each HTML
element (tag and attribute). The length of the codeword should be selected based on the number of different tags and attributes that are to be coded. A
sixteen-bit codeword (two bytes) is believed to be appropriate to handle the existing tags while also allowing for future growth.
Moreover, it is possible to reserve one or more of the sixteen bits to designate whether the tag is an empty tag or a container tag. For example, the most significant bit can be selected. For container tags, one or more other reserved bits can also designate whether the tag is a starting tag or ending tag.
Other information that can be designated is whether the tag is a style markup tag or a structural markup tag. Generally, style markup tags (designating bold style, font, quoted text, and so forth) can be used within structural markup tags (designating lists, tables, anchors, and so forth), while the opposite is not recommended.
For structural markup tags, the codeword can designate whether the tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup, for example.
A codeword can also indicate a number of attributes that are associated with each tag. The number of bits reserved for this purpose should correspond to the maximum expected number of attributes. For example, three bits can indicate that here are up to eight attributes associated with a tag.
Generally, bits should be reserved in the codeword to designate characteristics of the tag to the extent that this aids in rendering of the HTML data. For example, the designation of starting and ending container tags is useful because it signals a processor of the bounds of the text to modified.
For example, with eight bits of data required for each character (including a letter, number, punctuation symbol or blank space), the HTML code "<IMG
SRC="filename.GIF" ALT="filename" ALIGN=middle>" has 52 characters, or 52 bytes of data. By substituting a two-byte codeword for each of the HTML elements: "IMG", 5 "SRC", "ALT" and "ALIGN", the fourteen bytes need to code these elements is reduced to eight bytes, for a savings of six bytes. In a given HTML page, the amount of savings with the present invention increases for longer elements (e.g., compare "<BLOCKQUOTE>, which is 10 reduced from twelve to two bytes, to "<A>", which is reduced from three to two bytes), and the number of elements in a page.
For each element, a codeword is output from the compression function 220 and provided to a combiner 230 15 to be combined with the coded text in the appropriate sequence to provide compressed HTML data in accordance with the present invention. This data comprises text codes for the text, and codewords from the compression function 220 for the HTML elements.
20 Note that additional, known compression techniques, such as the Lempel-Ziv algorithm and Huffman coding, can be used with the compressed HTML
data output from the combiner 230, or for the coded text alone or the codewords alone. Moreover, associated video/audio data may be compressed using known techniques.
FIG. 3 illustrates HTML decompression in accordance with the present invention. The decompression function 300 corresponds to the decompression functions 114, 134, 154 of FIG. 1.
Here, the compressed HTML is received at a buffer/parser 310. The coded text comprising text data is provided to a text decoding function 315 to recover the text, which is then provided to a combiner 330, while the HTML codewords are provided to a decompression function 320. A look-up table 325 at the decompression function 320 associates an HTML element with each received codeword. The corresponding elements are output to the combiner 330 to form the uncompressed HTML data.
Accordingly, it can be seen that the present invention provides a method and apparatus for compressing scripting language content, such as HTML.
Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. Moreover, the codeword may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tag, or to provide other information about the tag to aid in processing. The technique is compatible with other compression techniques to provide even greater compression.
The invention provides a significant reduction in the amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal. Additionally, the invention allows the use of a graphics engine or browser, e.g. in a subscriber terminal that processes/renders the compressed HTML data (e. g., codewords) directly without decompressing them. This can provide significant savings in processing time and complexity.
Additionally, each codeword has the same length and therefore generally takes the same amount of time to process, so the processing time becomes more deterministic.
The techniques of the present invention may be implemented using any known hardware, software and/or firmware.
Although the invention has been described in connection with various specific embodiments, those skilled in the art will appreciate that numerous adaptations and modifications may be made thereto without departing from the spirit and scope of the invention as set forth in the claims.
For example, while the invention was discussed in connection with a cable or satellite television broadband communication networks, it will be appreciated that other networks such as local area networks (LANs), metropolitan area networks (MANs), wide area networks (WANs), internets, intranets, and the Internet, or combinations thereof, may be used.
Moreover, the invention is suitable for use in compressing any scripting language content, including HTML or any similar language(e.g. - Extensible Markup Language (XML) or Synchronized Multimedia Integration Language (SMIL).

Claims (28)

What is claimed is:
1. A method for processing scripting language data, comprising the steps of:
(a) parsing the scripting language data to separate text thereof from scripting language elements thereof, said scripting language elements including tags;
(b) providing a respective codeword for each different tag;
(c) coding the text to provide coded text; and (d) combining the codewords with the coded text to provide compressed scripting language data.
2. The method of claim 1, wherein:
at least one of the codewords designates whether the associated tag is an empty tag or a container tag.
3. The method of claim 1, wherein, for container tags, at least one of the corresponding codewords designates whether the container tag is a starting tag or an ending tag.
4. The method of claim 1, wherein:
at least one of the codewords designates whether the corresponding tag is a style markup tag or a structural markup tag.
5. The method of claim 1, wherein, for structural markup tags, at least one of the corresponding codewords designates whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
6. The method of claim 1, wherein said scripting language elements include attributes of the tags, comprising the further step of:
providing a respective codeword for each different attribute.
7. The method of claim 1, wherein:
said scripting language elements include attributes of the tags; and the respective codewords indicate a number of attributes that are associated with each tag.
8. The method of claim 1, comprising the further step of:
communicating the compressed scripting language data from a scripting language content server to a subscriber terminal in a communication network.
9. The method of claim 1, comprising the further step of:
communicating the compressed scripting language data from a headend to a subscriber terminal in a communication network.
10. The method of claim 1, comprising the further step of:

communicating the compressed scripting language data to a subscriber terminal in a communication network; and processing the compressed scripting language data without recovering the scripting language elements to provide data suitable for display.
11. The method of claim 1, comprising the further steps of:
(d) parsing the compressed scripting language data to separate the coded text from the codewords thereof;
(e) decoding the coded text to provide decoded text;
(f) providing the respective scripting language elements for each corresponding different codeword obtained in said step (d); and (g) combining the scripting language elements provided in said step (f) with the decoded text to provide uncompressed scripting language data.
12. The method of claim 11, wherein:
the uncompressed scripting language data is processed by said steps (d)-(g) at a subscriber terminal in a communication network.
13. The method of claim 1, wherein:
the scripting language data comprises Hyper Text Markup Language (HTML) data.
14. The method of claim 1, comprising the further step of:

temporarily storing the compressed scripting language data in a proxy server for scripting language content that is accessed frequently by subscriber terminals.
15. An apparatus for processing scripting language data, comprising:
a first parser for parsing the scripting language data to separate text thereof from scripting language elements thereof, said scripting language elements including tags;
first means for providing a respective codeword for each different tag;
means for coding the text to provide coded text;
and a first combiner for combining the codewords with the coded text to provide compressed scripting language data.
16. The apparatus of claim 15, wherein:
at least one of the codewords designates whether the associated tag is an empty tag or a container tag.
17. The apparatus of claim 15, wherein, for container tags, at least one of the corresponding codewords designates whether the container tag is a starting tag or an ending tag.
18. The apparatus of claim 15, wherein:

at least one of the codewords designates whether the corresponding tag is a style markup tag or a structural markup tag.
19. The apparatus of claim 15, wherein, for structural markup tags, at least one of the corresponding codewords designates whether the structural markup tag is a block element, list element, table element, form element, hypertext link, inline image, or page markup tag.
20. The apparatus of claim 15, wherein said scripting language elements include attributes of the tags, further comprising:
means for providing a respective codeword for each different attribute.
21. The apparatus of claim 15, wherein:
said scripting language elements include attributes of the tags; and the respective codewords indicate a number of attributes that are associated with each tag.
22. The apparatus of claim 15, wherein:
the compressed scripting language data is communicated from an scripting language content server to a subscriber terminal in a communication network.
23. The apparatus of claim 15, wherein:

the compressed scripting language data is communicated from a headend to a subscriber terminal in a communication network.
24. The apparatus of claim 15, wherein the compressed scripting language data is communicated to a subscriber terminal in a communication network, further comprising:
at least one processor for processing the compressed scripting language data without recovering the scripting language elements to provide data suitable for display.
25. The apparatus of claim 15, further comprising:
a second parser for parsing the compressed scripting language data to separate the coded text thereof from the codewords thereof;
second means for providing the respective scripting language elements for each corresponding different codeword obtained from said second parser;
means for decoding the coded text to provide decoded text; and a second combiner for combining the scripting language elements provided by said second means with the decoded text to provide uncompressed scripting language data.
26. The apparatus of claim 25, wherein:
the uncompressed scripting language data is processed by said second parser, second means, means for decoding, and second combiner at a subscriber terminal in a communication network.
27. The apparatus of claim 15, wherein:
the scripting language data comprises Hyper Text Markup Language (HTML) data.
28. The apparatus of claim 15, further comprising:
means for temporarily storing the compressed scripting language data in a proxy server for scripting language content that is accessed frequently by subscriber terminals.
CA002384687A 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content Abandoned CA2384687A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US39383599A 1999-09-10 1999-09-10
US09/393,835 1999-09-10
PCT/US2000/040754 WO2001019052A2 (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content

Publications (1)

Publication Number Publication Date
CA2384687A1 true CA2384687A1 (en) 2001-03-15

Family

ID=23556435

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002384687A Abandoned CA2384687A1 (en) 1999-09-10 2000-08-25 Method and apparatus for compressing scripting language content

Country Status (5)

Country Link
EP (1) EP1279267A2 (en)
AU (1) AU8035100A (en)
CA (1) CA2384687A1 (en)
TW (1) TW473673B (en)
WO (1) WO2001019052A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1358751A2 (en) * 2001-01-26 2003-11-05 Pogo Mobile Solutions Limited Improvements in or relating to wireless communication systems
AU2002237460A1 (en) * 2002-02-28 2003-09-09 Nokia Corporation Http message compression
CN1751492B (en) 2003-02-14 2011-10-26 捷讯研究有限公司 System and method of compact messaging in network communications
JPWO2004079586A1 (en) * 2003-03-07 2006-06-08 シャープ株式会社 Data conversion method capable of optimal processing of markup language
NZ566291A (en) 2008-02-27 2008-12-24 Actionthis Ltd Methods and devices for post processing rendered web pages and handling requests of post processed web pages
US20130297728A1 (en) * 2012-05-01 2013-11-07 Qualcomm Iskoot, Inc. Selectively exchanging metadata in a wireless communications system
FR2988497A1 (en) * 2012-05-04 2013-09-27 Sagemcom Energy & Telecom Sas XML type message server for use in communication system, has compressing unit compressing XML type message, and decompressing unit that is utilized for decompressing message compressed in format of XML type

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5668548A (en) * 1995-12-28 1997-09-16 Philips Electronics North America Corp. High performance variable length decoder with enhanced throughput due to tagging of the input bit stream and parallel processing of contiguous code words
WO1997034240A1 (en) * 1996-03-15 1997-09-18 University Of Massachusetts Compact tree for storage and retrieval of structured hypermedia documents
US6018764A (en) * 1996-12-10 2000-01-25 General Instrument Corporation Mapping uniform resource locators to broadcast addresses in a television signal
JP3859313B2 (en) * 1997-08-05 2006-12-20 富士通株式会社 Tag document compression apparatus and restoration apparatus, compression method and restoration method, compression / decompression apparatus and compression / decompression method, and computer-readable recording medium recording a compression, decompression or compression / decompression program
EP0928070A3 (en) * 1997-12-29 2000-11-08 Phone.Com Inc. Compression of documents with markup language that preserves syntactical structure
JP4003854B2 (en) * 1998-09-28 2007-11-07 富士通株式会社 Data compression apparatus, decompression apparatus and method thereof
GB9911099D0 (en) * 1999-05-13 1999-07-14 Euronet Uk Ltd Compression/decompression method

Also Published As

Publication number Publication date
EP1279267A2 (en) 2003-01-29
AU8035100A (en) 2001-04-10
WO2001019052A3 (en) 2002-11-14
WO2001019052A2 (en) 2001-03-15
TW473673B (en) 2002-01-21

Similar Documents

Publication Publication Date Title
US6345307B1 (en) Method and apparatus for compressing hypertext transfer protocol (HTTP) messages
US6938270B2 (en) Communicating scripts in a data service channel of a video signal
US6018764A (en) Mapping uniform resource locators to broadcast addresses in a television signal
US7103904B1 (en) Methods and apparatus for broadcasting interactive advertising using remote advertising templates
RU2475832C1 (en) Methods and systems for processing document object models (dom) to process video content
US7849226B2 (en) Television with set top internet terminal with user interface wherein auxiliary content is received that is associated with current television programming
US20100281042A1 (en) Method and System for Transforming and Delivering Video File Content for Mobile Devices
US5818935A (en) Internet enhanced video system
US20020138849A1 (en) Broadcast enhancement trigger addressed to multiple uniquely addressed information resources
US20050162551A1 (en) Multi-lingual closed-captioning
GB2347329A (en) Converting electronic documents into a format suitable for a wireless device
CN101627607A (en) Script-based system to perform dynamic updates to rich media content and services
CA2384687A1 (en) Method and apparatus for compressing scripting language content
US20020091737A1 (en) System and method for rules based media enhancement
JPH1032801A (en) Information display method and device therefor
JP3277130B2 (en) Information display device and method
WO2010062761A1 (en) Method and system for transforming and delivering video file content for mobile devices
KR101012206B1 (en) System and Method for Managing Image Transmission Volume of Web Viewer
JP3763371B2 (en) Information display method and apparatus
WO2005069617A1 (en) Subtitling of an audio or video flow in a multimedia document
STANDARD Declarative Data Essence—Content Level
WO2001077894A1 (en) Paged web protocol
JPH1032798A (en) Information display method/device
KR100417601B1 (en) Apparatus for interfaceing between webbrowser and dsm-cc
JPH1032800A (en) Information transmitting method and information reception device

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued