TW473673B - Method and apparatus for compressing scripting language content - Google Patents

Method and apparatus for compressing scripting language content Download PDF

Info

Publication number
TW473673B
TW473673B TW089117782A TW89117782A TW473673B TW 473673 B TW473673 B TW 473673B TW 089117782 A TW089117782 A TW 089117782A TW 89117782 A TW89117782 A TW 89117782A TW 473673 B TW473673 B TW 473673B
Authority
TW
Taiwan
Prior art keywords
label
patent application
scope
item
written language
Prior art date
Application number
TW089117782A
Other languages
Chinese (zh)
Inventor
Robert Charles Booth
Original Assignee
Gen Instrument Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gen Instrument Corp filed Critical Gen Instrument Corp
Application granted granted Critical
Publication of TW473673B publication Critical patent/TW473673B/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • H04N21/2355Processing of additional data, e.g. scrambling of additional data or processing content descriptors involving reformatting operations of additional data, e.g. HTML pages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Abstract

A method and apparatus for compressing scripting language content, such as hypertext markup language (HTML). Codewords are provided for HTML elements, such as tags and their attributes, to reduce the amount of data, e.g., in a Web page. The codewords are combined with translated (or coded) text to provide compressed HTML data. The codewords may have reserved bits to distinguish empty tags from container tags, and to indicate whether a container tag is a starting or closing tags, or to provide other information about the tag to aid in processing. The amount of data that must be communicated, e.g., to represent a Web page that is transmitted to a subscriber terminal, is thereby reduced. Additionally, the invention allows the use of a graphics engine or browser, e.g. in a subscriber terminal that processes/renders the compressed HTML data (e.g., codewords) directly without decompressing them. The technique is compatible with other compression techniques to provide even greater compression.

Description

473673 五、發明說明(1) 發明背景 本發明係關於一種用於壓縮描述語文内容的方法及裝 置,例如超文字標記語文(HyperText Markup Language, HTML)。 Η T M L為一系統,用以標示文件來指示文件該如何顯示, 以及不同的文件該如何鏈結在一起。HTML已被廣泛地應用 來提供在I n t e r n e t的文件(例如網頁)。此文件被組織進入 網頁空間,其中一網頁空間包含一網頁,並鏈結到在本地 網頁空間或在一外部的網頁空間的其它文件。這樣的鏈結 即為熟知的超鏈結。文件可以包含移動的影,.像,文字,圖 形顯示,及聲音。 HTML為一標準泛用標記語文(standard Generalized Markup Language, SGML)的形式,其由國際標準組織 (i SQ)所定義,參考編號ISO 887 9 : 1 986。HTML指定文法及 標記標籤的語法,其係插入到一資料檔案,並能夠定義當 由一電腦程式,即已知的瀏覽器,來定義出資料該如何表 現(例如繪晝表現)。電腦的瀏覽器及/或繪圖引擎處理資 料來格式化網頁的布置,所以此頁能夠由使用者在一顯示 器或裝置上來觀視。 一SGML文件包含三個部份。第一部份說明了字元組,或 編碼,其用於語文當中。第二部份定義文件種類,並辨識 出“兄標籤。第三部份即為所知的文件實例,並包含實際 的文予及標記標籤。三個部份可以儲存在不同的樓案,另 外’ HTML瀏覽器假設不同網頁的檔案包含一通用字元組及473673 V. Description of the invention (1) Background of the invention The present invention relates to a method and a device for compressing the content of a descriptive language, such as HyperText Markup Language (HTML). M T M L is a system used to mark files to indicate how the files should be displayed and how different files should be linked together. HTML has been widely used to provide documents (such as web pages) in Int e r n e t. This file is organized into web space, where a web space contains a web page and links to other files in the local web space or in an external web space. Such links are known as superlinks. Files can contain moving images, images, text, graphics, and sound. HTML is a standard generalized markup language (SGML). It is defined by the International Standards Organization (iSQ) and reference number ISO 887 9: 1 986. HTML specifies the grammar and the syntax of markup tags. It is inserted into a data file and can define how the data should be represented by a computer program, known as a browser (such as plotting a daytime representation). The computer's browser and / or graphics engine processes the data to format the layout of the web page, so this page can be viewed by a user on a display or device. An SGML file contains three parts. The first section describes the characters, or codes, which are used in the language. The second part defines the type of document and identifies the "brother tag. The third part is a known example of the document and contains the actual text and tag labels. The three parts can be stored in different building plans. '' HTML browsers assume that files on different web pages contain a common character and

第6頁 47367Page 6 47367

所以只有文字 及標記標籤將可因不同的網頁而 樓案形式 改變。 HTML元件包含標籤及念-金So only the text and markup tags can be changed for different web pages. HTML components include tags and chant-gold

紙汉子疋實體。字元實體為由ISOPaper man 疋 疋 entity. Character entity is from ISO

Latin-1字母所預先定義的令- &amp;我的予tl,而並未定義在ASCII中, 而用於標示一HTML元件的„仏, , 一 i ^開始與結束的字元。舉例而言, 字元實體”&amp;11:”表示字元 疋〈(即”小於,,符號)。 HTML標籤係包含在角括抓+ + , ? ^ ^ #呢中來使其與網頁文字區別出 來:此;戴:以單獨出現(做為單獨或空白標籤),或可出 現在網頁文予的攔位的開始與結束(為非空白或容器標 籤)。舉例而言,&lt;P&gt;為—六&amp;掷从 ^ ^ ^ 工'白_示戴,用以指出一新的段洛 的開始,而&lt; I &gt;及&lt;/ I &gt;為容器標籤來修正所包含的文字(例 如 &lt;I&gt;WelC〇me to my h〇me page.&lt;/1:&gt; 表示一成語,ieic〇me to my home page”應該被意大利字型化)。^〗〉,,為啟始標 籤,而π &lt; / I &gt; π為結束標籤。 一般而言’ HTML標籤提供文字格式化,超文字鏈結到其 它的網頁,鏈結到聲音及圖像元件。HTML標籤也定義用於 互動式網頁的輸入搁位。 此外’ 一些標籤具有一個或多個能夠被此標籤設定的相 關屬性。舉例而言,標籤&lt;A&gt;及&lt;/A&gt;為錨記碼,其定義一 段的文字為一超鏈結,或其它超鏈結的目標。此標籤的屬 性包含HREF = uri,NAME = name,及TITLE = text。因此, HTML 石馬”&lt;A HREF = ” http: / /www· uspto· g〇v” &gt;U· S· Patent and Trademark 0ffice&lt;/A&gt;’1f 可使這段文字&quot;u.S. Patent and Trademark Office”以特殊的強調方式出現在The Latin-1 alphabet's pre-defined command-& MY tl, is not defined in ASCII, but is used to mark the HTML character "仏", "i", the beginning and end characters. For example The character entity "& 11:" means the character 疋 <(that is, "less than ,, symbol). HTML tags are included in the angle bracket + +,? ^ ^ #? To distinguish them from web page text: this; wear: appear alone (as a separate or blank tag), or can appear on the web page text The start and end of the stop (non-blank or container label). For example, &lt; P &gt; is -six & throw from ^ ^ ^ 工 '白 _ 表 戴 to indicate the beginning of a new paragraph Luo, and &lt; I &gt; and &lt; / I &gt; are Container tag to modify the contained text (eg &lt; I &gt; WelC〇me to my h〇me page. &Lt; / 1: &gt; represents an idiom, ieic〇me to my home page "should be typed in Italian) . ^〗>, Is the start tag, and π &lt; / I &gt; π is the end tag. Generally speaking, 'HTML tags provide text formatting, hypertext links to other web pages, links to sounds and pictures Like components. HTML tags also define input shelves for interactive web pages. In addition, some tags have one or more relevant attributes that can be set by this tag. For example, the tags &lt; A &gt; and &lt; / A &gt; An anchor code, which defines a paragraph of text as a hyperlink, or the target of another hyperlink. The attributes of this tag include HREF = uri, NAME = name, and TITLE = text. Therefore, "HTML stone horse" &lt; A HREF = ”http: /// www · uspto · g〇v” &gt; U · S · Patent and Trademark 0ffice &lt; / A &gt; '1f The text &quot; u.S. Patent and Trademark Office "appears in a special emphasis on

第7頁 473673 五、發明說明(3) ' 瀏覽器中(例如一特別的顏色及/或增加底線),其可標示 此文字為一超鏈結。當使用者點選在此文字上,網頁的位 址即進行到&quot;WWW · u sp t 0· gov ”。 再者,標籤具有副屬性或次屬性。舉例而言,標藏 &lt;IMG&gt;為一空白標籤,其指出一同軸影像是位在同一頁 中。此屬性包含SRC = url ,其指定要内含該影像的槽案的 URL,ALT = text,其指定一可顯示在影像中的文字字串並 無法使用,及ALIGN=[TOP|MIDDLE|BOTTOM],其辨識此影 像是如何與相鄰文字與其它HTML元件對齊。由此,alig= 為一屬性,TOP,MIDDLE及BOTTOM為次屬性。一HTML碼的 範例為:&quot;&lt;IMG SRC:nfilename.GIF&quot; ALI'Jfilenajue:、 ALIGN = middle&gt;’f 〇 再者 HTML標籤與屬性在此處一般稱之為HTML”元件π 術語”屬性”一般則包含不同次屬性的階層。Page 7 473673 V. Description of the invention (3) 'In the browser (for example, a special color and / or adding an underline), it may indicate that the text is a hyperlink. When the user clicks on this text, the address of the web page goes to &quot; WWW · usp t 0 · gov. &Quot; Furthermore, the label has a secondary attribute or a secondary attribute. For example, the tag &lt; IMG &gt; Is a blank label indicating that a coaxial image is located on the same page. This attribute contains SRC = url, which specifies the URL of the slot where the image is to be contained, ALT = text, which specifies a URL that can be displayed in the image Text strings cannot be used, and ALIGN = [TOP | MIDDLE | BOTTOM], which recognizes how this image is aligned with adjacent text and other HTML components. Therefore, alig = is an attribute, TOP, MIDDLE and BOTTOM are secondary An example of HTML code is: &quot; &lt; IMG SRC: nfilename.GIF &quot; ALI'Jfilenajue :, ALIGN = middle &gt; 'f 〇 Moreover, HTML tags and attributes are generally referred to herein as "HTML" element π terms "Attributes" generally include hierarchies of different sub-attributes.

一HTML應用在網頁上可為使用者:此朴丄 栲安锉左户^ ^ I丄 養利用者,係赭由將HTML 格木儲存在一目錄中,而由一伺服 *丄 版器取用。這樣的伺服哭 基本上即為網頁伺服器,其符合一彳】服时 w &gt; 網頁劇覽i§支援的通訊 協疋’即稱之為超文字轉換通訊協 T m ^(HypertextAn HTML application can be used for users on the webpage: This user can save the user ^ ^ I support the user, which is stored by a HTML grid in a directory, and accessed by a servo * version . This kind of server cry is basically a web server, which meets the requirements of [Waiting] w &gt; Communication Supported by Webpage Play i§ Supported Communication Protocol ’is called Hypertext Conversion Communication Protocol T m ^ (Hypertext

Transfer Protocol, HTTP) 〇 另外,HTML内容可以儲存在一用戶通訊網路的頭, 如:=/衛星電視網路。因為網路的高速資料速率,透 =樣的網路,有一種增加的趨勢朝向提供htm 二戶=内容結合傳統電視節目服務的潛在商業利Γ 為,“式的電話、電視及電腦網路,對於家用電腦網路Transfer Protocol, HTTP) 〇 In addition, HTML content can be stored in the header of a user communication network, such as: = / Satellite TV network. Because of the high-speed data rate of the Internet and the transparent network, there is an increasing trend toward the potential commercial benefits of providing two households = content combined with traditional TV program services. The "type of telephone, television and computer networks, For home computer networks

第8頁 473673 五、發明說明(4) 的預期可以增加。HTML内容可以選取之後,it直接由頭端 提供,或是頭端僅做為在用戶與遠端網頁伺服器之間的高 速鍵結的管道。 符合其它通訊協定的伺服器,例如檔案傳輸通訊協定 (FTP),或GOPHER,也可由藉由一代理伺服器(proxy server)來存取一HTTP瀏覽器。一代理伺服器為一種閘道 器的形式,其允許一瀏覽器使用HTTP來與一不瞭解HTTP的 伺服器通訊,但使用像是FTP,Gopher或其它通訊協定。 代理伺服器接受來自瀏覽器的HTTP請求,並將其轉譯為適 合於原有伺服器的格式,例如一 FTP請求。同樣地,代理 伺服器轉譯來自伺服器的FTP回應到HTTP回應,所以瀏覽 器可以瞭解它們。 ·Page 8 473673 V. Explanatory Notes (4) The expectations can be increased. After the HTML content can be selected, it is directly provided by the headend, or the headend is only used as a high-speed link between the user and the remote web server. Servers conforming to other communication protocols, such as File Transfer Protocol (FTP), or GOPHER, can also access an HTTP browser through a proxy server. A proxy server is a form of gateway that allows a browser to use HTTP to communicate with a server that does not understand HTTP, but uses protocols such as FTP, Gopher, or other protocols. The proxy server accepts HTTP requests from the browser and translates them into a format suitable for the original server, such as an FTP request. Similarly, the proxy server translates FTP responses from the server into HTTP responses, so the browser can understand them. ·

一般而言,FTP檔案本身並不被轉譯。FTP為一高階語 吕’用於傳送檔案(如Η T T P)。該轉譯係發生在通訊協定的 層次。舉例而言’ 一客戶瀏覽器可傳送HTTP請求,GET http://www/.mvserver· com/somefile· txt HTTP/ 1 1,。 於代理伺服器,此會被轉譯為一FTP,GET,請求而被傳送至 F T P原先的祠服器。來自原伺服器的j? τ p回應會傳回到代理 伺服器(其已附加所請求的檔案),然後即轉譯(在代理飼 服器中)到一 HTTP回應,並包含此附加檔案。欲傳送的梓 案並不轉譯或修改。但是,在一些情況下,瀏覽器可以虽 疋其可以解碼某些編石馬或壓縮格式。因此,代理祠服琴 以在其傳送給客戶端之前,可轉譯(編碼或壓縮)所附=可 槽案。 口的Generally speaking, FTP files themselves are not translated. FTP is a high-level term. Lu ’is used to transfer files (such as Η T T P). This translation takes place at the level of the communication protocol. For example, a client browser can send an HTTP request, GET http: //www/.mvserver· com / somefile · txt HTTP / 1 1. On the proxy server, this will be translated into an FTP, GET, request and sent to the original server of FTP. The j? Τ p response from the original server is passed back to the proxy server (with the requested file attached), which is then translated (in the proxy server) to an HTTP response and contains this additional file. The case to be transmitted is not translated or modified. However, in some cases, the browser can, although it can decode certain graffiti or compressed formats. Therefore, the proxy shrine can translate (encode or compress) the attached file before it can be transmitted to the client. Oral

473673 五、發明說明(5) 代理伺服器可以是一在相同機器上執行的程式,像是瀏 覽器,或一在網路中某處獨立的機器,並服務許多的瀏覽 例如,用戶通訊網路的頭端可以提供一代理伺服器功 能。 HTTP定義了一組規則,而伺服器及瀏覽器在彼此通訊時 所必須遵守。基本上,此過程在當使用者點選在一HTML頁 面中的一圖像時,該圖像為一超鏈結的錨記,或是使用者 輸入一致資源定位器(Uni form Resource Locator, URL)。此URL包含一主控站名稱,其基本上會透過一領域 名稱糸統(D N S)查表而轉換到一 I P位址。然後,會形成一 連接到主伺服器,其使用由DNS查表所傳回的ip位址(及可 能為一連接槔編號)。接著,瀏覽器傳送一請求來接收來 自伺服器的物件,或是傳送資料到伺服器上的物件。此伺 服器傳送一回應給瀏覽器,其包含一狀態碼及回應資料。 然後即關閉瀏覽器與伺服器之間的連接。 &quot;' URL為獨一的位址,其實際上地識別出在網際網路上的 所有檔案及資源。 但是,因為HTML的彈性,其所支援的不同的標籤及其屬 ,及次屬性,需要用來代表任何給定網頁的資料量會^非 常大。因此,一使用者終端機及瀏覽器所雲I^ 〇而要的處理能 力’可能不足以跟得上資料流’因此造成在顯示資料於 用者螢幕上的延遲,或是其它的問題。 再者’因為.增加了傳送HTML資料所消耗的頻寬,因此降473673 V. Description of the invention (5) The proxy server can be a program running on the same machine, such as a browser, or an independent machine somewhere in the network, and serves a lot of browsing. For example, the user communication network The headend can provide a proxy server function. HTTP defines a set of rules that servers and browsers must follow when communicating with each other. Basically, when the user clicks on an image in an HTML page, the image is a hyperlink anchor, or the user enters a Uniform Resource Locator (URL) ). This URL contains a master station name, which is basically converted to an IP address through a domain name system (DNS) lookup table. Then, a connection is made to the master server, which uses the IP address (and possibly a connection number) returned by the DNS lookup table. The browser then sends a request to receive an object from the server, or to send data to the object on the server. The server sends a response to the browser, which contains a status code and response data. The connection between the browser and the server is then closed. &quot; 'URL is a unique address that actually identifies all files and resources on the Internet. However, due to the flexibility of HTML, the different tags, their attributes, and sub-attributes that they support need to be used to represent the amount of data on any given webpage. Therefore, the processing power required by a user terminal and browser cloud may not be enough to keep up with the data stream, thus causing delays in displaying data on the user's screen or other problems. Furthermore, because the bandwidth consumed by sending HTML data is increased, so

第10頁 473673Page 10 473673

低了對其匕使用者的可用頻寬,或是增加通道容量的 擔。 HTML資料可以透過公共交換電話網路(psTN)傳送,其可 藉由例如纜線話衛星電視網路,透過一地區性無線網路: 或透過上述的组合。 特定而§ ’HTML的基本字母表為Latin — 1 (iso 8859/1),其為一 8位元字母,其字元可用於大部份的美洲 及歐洲語文。128字元標準ASCII (ISO 646 )為La tin-1的7 位元次組合。為了簡單,及與其它瀏覽器更方便的結合, 許多網頁僅包含一ASCI I字元組。 對於每個字元而言,其需要8位元或是一位元組的資料 (包含一文字,數字,標點符號,或空白),舉例而言, HTML 碼:’’&lt;IMG SRC = nfilename.GIF” ALT = &quot;filename,, ALIGN = middle&gt;n有52個字元,或52個位元組資料。 因此’有需要提供一系統來壓縮輸寫語文内容,例如 HTML或任何類似的語文。 此系統必須降低需要用以傳送^丁社資料到一瀏覽器或其 它(繪圖)顯示引擎的頻寬量。 此系統必須適合用於既有的傳送HTML資料的網路。 此系統必須允許在一用戶電視網路中終端機(如機上盒/ 解碼器)上的瀏覽器,而可直接處理及顯示壓縮的資料, 並不需要經過解壓縮。 此系統必須降低在一用戶電視網路中的一使用者終端機 上瀏覽器所需要的處理功率。This reduces the bandwidth available to its users or increases the burden on channel capacity. The HTML data can be transmitted over the public switched telephone network (psTN), which can be via a local wireless network such as a cable telephone satellite television network: or through a combination of the foregoing. Specifically, § 'HTML's basic alphabet is Latin — 1 (iso 8859/1), which is an 8-bit letter whose characters can be used in most American and European languages. The 128-character standard ASCII (ISO 646) is a 7-bit combination of La tin-1. For simplicity and easier integration with other browsers, many web pages contain only one ASCI I character. For each character, it needs 8-bit or one-byte data (including a text, number, punctuation mark, or blank). For example, the HTML code: `` &lt; IMG SRC = nfilename. "GIF" ALT = &quot; filename ,, ALIGN = middle &gt; n has 52 characters, or 52 bytes of data. Therefore, 'There is a need to provide a system to compress and write language content, such as HTML or any similar language. This system must reduce the amount of bandwidth needed to transmit data to a browser or other (graphics) display engine. This system must be suitable for existing networks that transmit HTML data. This system must allow The browser on a terminal (such as a set-top box / decoder) in the user's TV network can directly process and display the compressed data without decompression. This system must reduce the The processing power required by the browser on a user terminal.

第11頁 473673 發明說明(7) j t j必須對於 '給定的網頁對於所有的HTML元件及屬 性k仏穩定的及固定的處理時間。 此ί ^ t ί ^於網路的客戶/劉覽器側及伺服器側。 及一 Μ服^代理伺服器,而形成一客戶/瀏覽器 及伺服器,或其它代理伺服器之間的界面。 傳^ Ϊ Μ使用一數位影像通訊協定,如MPEG_2,來 傳运HTML貧料的網路相容。 此糸統必須與使用僂給狄生]、s :^#Ρ/ΤϋΛ + Μ 輸控制通訊協定/網際網路通訊協 疋(TCP/IP)來傳送HTML資料的網路相容。 此系統必須提供現行ΗΤΜΐ版太 及其它類似的標記語rm版本的壓縮,連同其衍生版本 與ΐ它位元層次的屬縮技術相容。 於= 種具有上述及其它優點的系統。 發明總結: 本發明係關於一種用於壓综蚩皆 置,如HTML。 用於“曰寫語文内容的方法及裝 它係;™或其它書寫語文元件,例如標藏及 口的來降低資料量’例如在一網頁卜再者,編 有保留位元來區分空白標籤及容器標籤,並指 俨::盆:器標戴為一啟始或結束標•,或能提供有關此 貢訊來協助處理。此技術係與其它壓縮技術相 各,亚能提供甚至更高的壓縮率。 本發明能夠有效地降低要傳送的資料量,例如要代表一 網頁’而傳送到-用戶終端機。再者,本發明可在用戶終Page 11 473673 Description of the invention (7) j t j must be stable and fixed for a given web page for all HTML elements and attributes. This ί ^ t ί ^ is on the client / network side and server side of the network. And an M server ^ proxy server to form an interface between a client / browser and server, or other proxy server. ^ Ϊ Μ 一 uses a digital image communication protocol, such as MPEG_2, to deliver HTML-compliant web-compatible materials. This system must be compatible with a network that uses HTML to send Dixon], s: ^ # Ρ / ΤϋΛ + Μ transport control protocol / Internet protocol (TCP / IP) to transmit HTML data. This system must provide compression of the current version of the TM version and other similar markup versions of rm, along with its derivative versions, which are compatible with other bit-level scaling technologies. Yu = A system with these and other advantages. Summary of the Invention: The present invention relates to a device for compacting, such as HTML. Used for "methods and methods of writing language content; ™ or other written language elements, such as tagging and speaking to reduce the amount of data ', for example, on a web page, reserved bits are used to distinguish between blank tags and Container label, and means 俨 :: Basin: The device is worn as an opening or closing label, or can provide this tribute message to assist in processing. This technology is different from other compression technologies, and Asia can provide even higher Compression rate. The present invention can effectively reduce the amount of data to be transmitted, for example, to be transmitted to a user terminal on behalf of a web page. Furthermore, the present invention can

第12頁 473673 五、發明說明(8) 端機啟動一繪圖引擎或瀏覽器的使用,而直接地處理及顯 示壓縮的HTM L資料,而不進行解壓縮,因此可以大幅地降 低處理時間及複雜度。 一種用以處理書寫語文資料的特殊方法,其包含分析 HTML資料的步驟,而由其書寫語文元件中分離出其文字。 書寫語文元件包含標籤及其屬性。個別的編碼字元,例如 雙位元組編碼字元,係用於每一個不同的標籤。文字係經 過編碼,例如以A S C I I編碼。然後編碼字元在適當序列中 與編碼過的文字結合,來提供壓縮的書寫語文資料。 編碼字元可以包含保留位元來指出特定的.資訊,例如是 否相關的標籤為一空白標籤或一容器標籤。 對於容器標籤、編碼字元可以.指出是否容器標籤為一啟 始標籤或一結束標籤。 編碼字元可以指出是否一標籤為一樣式標記標籤或一結 構化標記標籤。 對於結構化標記標籤,編碼字元可以指出是否結構化標 記標籤為一區塊元件,表列元件,表格元件,格式元件, 超文字鏈結,文字間影像,或頁標記標籤。 一個別的編碼字元可以用於每一個不同的標籤屬性,包 含次標籤。編碼自元也可指出與一標籤相關的屬性號碼。 在一特別有益的實施中,壓縮書寫語文是由一書寫語文 内容伺服器或頭端而傳送到一通訊網路中的用戶終端機。 對於壓縮的書寫語文的解壓縮,例如在一用戶終端機, 此壓縮的書寫語文資料係分析來由其編碼字元分離其編碼Page 12 473673 V. Description of the invention (8) The terminal starts the use of a drawing engine or browser, and directly processes and displays the compressed HTM L data without decompression, so the processing time and complexity can be greatly reduced. degree. A special method for processing written language data, including the steps of analyzing HTML data, and separating its text from its written language elements. The written language element contains tags and their attributes. Individual coded characters, such as double-byte coded characters, are used for each different label. The text is encoded, for example, A S C I I. The coded characters are then combined with the coded text in the appropriate sequence to provide compressed written language information. Encoded characters may contain reserved bits to indicate specific information, such as whether the related label is a blank label or a container label. For the container label, the encoding character can indicate whether the container label is an opening label or an ending label. The coded character can indicate whether a tag is a style tag tag or a structured tag tag. For structured markup tags, the coded character can indicate whether the structured markup tag is a block element, a list element, a table element, a format element, a hypertext link, an image between text, or a page mark tag. A different encoding character can be used for each different label attribute, including sub-labels. The encoding element can also indicate the attribute number associated with a tag. In a particularly advantageous implementation, the compressed written language is transmitted by a written language content server or headend to a user terminal in a communication network. For decompressing a compressed written language, such as a user terminal, the compressed written language data is analyzed to separate its encoding by its encoding characters.

第13頁Page 13

,書寫文字語文元件係與解碼 書寫語文資料。 用於每一個對應的不同編碼 來提供經解碼的文字。最 文字結合來提供未壓縮的 壓縮的書寫文字資料可視需要 而傳送釗一捅訊網政Φ的, Writing text language components and decoding Writing language data. Used for each corresponding different encoding to provide decoded text. The most combination of text to provide uncompressed compressed written text data can be transmitted as needed.

亦同時揭示一相關的裝置。 圖式簡單說明 圖1所示為根據本發明之使用H_TML壓縮的一用戶電視 路。 、 圖2所示為根據本發明之HTML壓縮。 圖3所示為根據本發明之HTML解壓縮。 發明詳細說明 本發明係關於一種用於書寫語文内容的方法及裝置,例 如HTML 〇 圖1所示為根據本發明之使用HTML壓縮的一用戶電視網 路。 雖然本發明可以施行於不同的網路,其特別適於在用戶 電視網路中使用,而允許使用者(用戶)來取用HTML資料, 例如在網際網路上。基本上,使用者能夠取用HTML内容, 例如網頁,而透過網路上的一下游通道來傳送。舉例而A related device is also revealed. Brief Description of the Drawings Figure 1 shows a user television circuit using H_TML compression according to the present invention. Figure 2 shows the HTML compression according to the present invention. FIG. 3 shows an HTML decompression according to the present invention. Detailed Description of the Invention The present invention relates to a method and device for writing language content, such as HTML. Figure 1 shows a user television network using HTML compression according to the present invention. Although the present invention can be implemented on different networks, it is particularly suitable for use in consumer television networks, and allows users (users) to access HTML data, such as on the Internet. Basically, users can access HTML content, such as web pages, and send them through a downstream channel on the Internet. For example

第14頁Page 14

W367J 發明說明(ίο) ----- 許多技術可以用於透過纜線及衛星電梘網路來 上ML賁料。使用者基本上提供一上游鏈結而一羽迗 话網路來輸入指令,例如一URL位址來請求在一1一 '用電 上顯示。一些有線電視網路具有一上游 、疋^頁 用於達到此目的。 $者貝枓通道而 此要求會在頭端或其它中心位置中接收,並轉送到由 URL—所指定的内容伺服器。由伺服器回送到頭端的内容, 接著則預備傳送給使用者。舉例而言,HTML資料可以涵括 於數位MPEG-2封包中,其係與節目服務資料同波段或不同 波段(例如電視節目,音樂等)。 一 〇 或者’ HTML資料可以載入於數位或類比電視信號的垂直 空白區間(VBI)。 . 本發明係與基本上任何的通訊技術相容,用以提供HTML ΐ料給最終使用者。 HTML的内容接著在使用者終端機上回復,並由一瀏覽器 應用或緣圖處理引擎顯示來在一視訊監視器上觀視,例如 一電視或電腦監視器。 頭端可在與内容伺服器構成界面時,做為一代理伺服 器’例如:當URL接受來自使用者的要求,而其格式並不 與内容伺服器相容。在此例中,代理伺服器轉換URL要求 到需要的格式,並轉換由伺服器傳回的内容成為使用者終 端機能夠瞭解的袼式。 圖1所示為範例具體實施例,其中網路1 0 0包含一内容伺 服器1 1 0 ’ 一頭端1 3 〇及一使用者終端機1 5 0。内容伺服器W367J Invention Description (ίο) ----- Many technologies can be used to upload ML data through cable and satellite electric network. The user basically provides an upstream link and a telephone network to enter instructions, such as a URL address to request a display on a 11 'electricity. Some cable television networks have an upstream, 疋 ^ page for this purpose. The requester will receive the request in the headend or other central location and forward it to the content server specified by URL—. The content is sent back to the headend by the server, and then ready for delivery to the user. For example, HTML data can be included in digital MPEG-2 packets, which are in the same or different bands as the program service data (eg TV shows, music, etc.). 10 or 'HTML data can be loaded into the vertical blank space (VBI) of digital or analog TV signals. The invention is compatible with essentially any communication technology used to provide HTML feeds to end users. The content of the HTML is then returned on the user's terminal and displayed by a browser application or edge map processing engine for viewing on a video monitor, such as a television or computer monitor. The headend can act as a proxy server when forming an interface with the content server. For example, when the URL accepts a request from a user, the format is not compatible with the content server. In this example, the proxy server converts the URL request to the required format, and converts the content returned by the server into a format that the user's terminal can understand. FIG. 1 shows an exemplary embodiment. The network 100 includes a content server 1 1 0 ′, a head end 130, and a user terminal 150. Content server

第15頁 473673 五、發明說明(11) 1 1 0代表任何數 網路中的眺資料二;如理饲服器’其儲… 夠=自表-些終端機的組合,能 /光纖,或衛星提供者的廣播信號’例如在一境線 、 生屯視網路中的頭端1 3 0 〇 一視需要選用的卜 及數據機,允許级端:二運首 16二例如-習用的電話鍵結 一 、^機1 5 0直接與内容伺服器通訊。 以心::由(頭 家購物笑、,不丨包視即目,氧象及股票資料,居 内容也透過單向或端族群,包含範例終端機150。耵壯 162玎、;奋 σ 5又向通逼162傳送到終端機150。通道 多職波分細ds),或像是電話鏈結,或 :暹164允許頭端130及範例内容伺服器】】 此通道基本上是以-電話鏈結或乙太網路來實皮施此通 :二!1;0 —般皆離頭端130很遠,雖然有可能由頭端在-亡地=媒體上儲存瞧内容,例如數位視訊光碟或磁 :提;::檔1 饲服器的硬碟。&quot;已知的網路架構可用 於徒供此通這164。 當頭端130提供本地的内容,而提供一有限的内容量。 此内容也可選擇來對應於節目服務。在此例中,一圖形可 j覆蓋在-電視節目上,而通知使用者相關可用的m〔内 谷。舉例而言.,在一電視棒球轉播中,使用者可以指向一 473673 五、發明說明(12) 、罔站來取得棒球比數。 在一些實例中,整個本地内容可以連續地或定期地廣 播’例如與節目服務相同的通道(或多工方式),或在一獨 立的通道(或多工方式)。此可以僅發生在單向網路,即使 用者沒有上游鏈結到頭端。然後,所需要的HTML内容的選 擇即產生在使用者終端機上150。 已知的狀態存取技術可以用於提供以一計費方式來取用 HTML内容。 本發明適用於任何以上所述的方式。 在圖1的範例中,係假設使用者具有一些上游通道(丨6 〇 或1 62),來使所選的HTML内容能夠由内容伺服器11〇取 回,並透過頭端1 3 0提供給終端機1 5 0。 内容伺服器1 1 0,頭端1 30及終端機1 50係分別與HTML壓 縮功能112, 132及152,以及HTML壓縮功能114, 134及 1 54。然而並非都需要所有的功能。 一般而言’提供HTML資料的壓縮給終端機是最重要的課 題。HTML資料由終端機輸出,如果有的話,一般而言皆很 小。但是’這樣也會有所變化,例如使用者傳送HTML内容 給其它的使用者,或是被授權來傳送HTML内容而修改遙控 伺服器11 0。 壓縮功能1 52係用於壓縮傳送自終端機15〇的^仉資料給 頭端1 3 0或内容伺服器11 〇。解壓縮的功能丨5 4係用於解壓 縮接收自頭端1 3 Q或内容伺服器丨丨〇的壓縮的資料。 壓縮功能1 3 2係用於壓縮傳送自頭端13〇的打社資料到内Page 15 473673 V. Description of the invention (11) 1 1 0 represents the data in any number of networks; for example, the feeding device 'its storage ... enough = self-table-some terminal combination, can / fiber, or The broadcast signal of the satellite provider's, for example, the front-end in the horizon, the live-view network, 1300, as needed, and a modem, which allows the level end: the first operation of the second transport, such as the conventional telephone key In conclusion, ^ machine 150 communicates directly with the content server. Attentive :: From the head of the shop, laugh, do n’t look at it immediately, oxygen and stock information, and the content is also through a one-way or end group, including the example terminal 150. Strong 162 玎, Fen σ 5 It then transmits to Tongfang 162 to the terminal 150. The channel is multi-function WDM fine ds), or like a telephone link, or: Siam 164 allows headend 130 and sample content server]] This channel is basically-telephone Links or Ethernet to implement this pass: two! 1; 0-generally are far away from the head end 130, although it is possible that the head end can store content on -dead = media, such as digital video discs Or magnetic: mention; :: file 1 hard disk of feeder. &quot; Known network architectures can be used for this purpose. When the headend 130 provides local content, it provides a limited amount of content. This content can also be selected to correspond to the program service. In this example, a graphic may be overlaid on the TV program, and the user is notified of the available m [inner valley]. For example, in a television baseball broadcast, the user can point to a 473673 V, invention description (12), 罔 station to get the baseball score. In some examples, the entire local content may be broadcast continuously or periodically &apos; e.g., the same channel (or multiplexing mode) as the program service, or on a separate channel (or multiplexing mode). This can only happen on one-way networks, even if the user does not have an upstream link to the headend. A selection of the desired HTML content is then generated 150 on the user terminal. Known state access technologies can be used to provide access to HTML content at a billing rate. The invention is applicable in any of the ways described above. In the example of FIG. 1, it is assumed that the user has some upstream channels (丨 60 or 162) to enable the selected HTML content to be retrieved by the content server 110 and provided to the headend 130 Terminal 1 50. The content server 1 10, head end 1 30 and terminal 1 50 are respectively HTML compression functions 112, 132 and 152, and HTML compression functions 114, 134 and 154. However, not all functions are required. Generally speaking, providing compression of HTML data to a terminal is the most important subject. The HTML data is output by the terminal and, if any, is generally small. But this will also change, for example, the user sends HTML content to other users, or is authorized to send HTML content and modify the remote control server 110. The compression function 152 is used to compress the data transmitted from the terminal 15 to the headend 130 or the content server 110. Decompression function 5 4 is used to decompress the compressed data received from the headend 1 3 Q or the content server. Compression function 1 3 2 is used to compress and transmit the data from the headend 13 to the inside.

第17頁 473673 —五'Β · ~ ---— - =月艮器110或終端機150。解壓縮功能134係用於解壓縮 内谷伺服器11〇或終端機150的壓縮HTML資料。 縮函數丨丨2係用於壓縮傳送自内容伺服器1 1 〇的HTML資 端130或終端機150。解壓縮函數114係用於解壓縮 自頭端1 30或終端機150的壓縮HTML資料。 八、、端機150包含一使用者介面丨58,用以接收使用者指 二二例如了透過一鍵盤或紅外線遙控。舉例而言,使用者 :—〜員示:1 70上點選一關於url的圖形,並啟始相關 内谷的下載到終端機150上。 -瀏覽器1 59可以為全功能的瀏覽器應用,例如應用於 :人ί腦或一小型瀏覽器中,其僅有-些基本的功能,例 文字顯不或有限的繪圖顯示能·力。瀏覽器丨5 9係與繪圖 引擎156結合,用以在顯示器17〇上由在終端機15〇所接收 的HTML内容來顯示文字及影像。 &quot;T使用視‘解碼器1 5 7來顯示視訊,並配合壓縮過(或 未壓縮)的書寫語文内容,而用在顯示器17〇上。 顯示器170可以為一電視螢幕,或像是一pc的視訊監視 終端機1 5 〇的處理能力會影響到由瀏覽器1 5 9及繪圖引擎 1 5 6所支援功能的層次。 壓縮功此1 1 2, 1 32及1 52可以實現如圖2所示的一HTML壓 縮架構’而解壓縮功能丨14, 134及154可以實現如圖3的一 HTML解壓縮架構。 圖2所示為根據本發明的HTML壓縮方式。壓縮功能2〇〇對P.17 473673 —Five'B · ~ --- —-= monthly generator 110 or terminal 150. The decompression function 134 is used to decompress the compressed HTML data of the Uchiya server 11 or the terminal 150. The shrink function 2 is used to compress the HTML asset 130 or the terminal 150 transmitted from the content server 110. The decompression function 114 is used to decompress compressed HTML data from the headend 130 or the terminal 150. 8. The terminal 150 includes a user interface 58 for receiving user's finger. For example, the remote control is through a keyboard or infrared. For example, the user:-~ Staff instructions: Click on a graphic about url on 70, and start the download of the related inner valley to the terminal 150. -Browser 159 can be a full-featured browser application, such as a human brain or a small browser, which has only some basic functions, such as text display or limited drawing display capabilities. The browser 599 is combined with the drawing engine 156 to display text and images on the display 170 by the HTML content received at the terminal 150. &quot; T uses video ‘decoder 1 5 7 to display video, and uses compressed (or uncompressed) written language content for display 17 °. The display 170 may be a television screen or a video surveillance terminal like a pc. The processing power of the terminal 150 will affect the level of functions supported by the browser 159 and the graphics engine 156. Compression functions 1 12, 2, 1 32, and 1 52 can implement an HTML compression architecture shown in FIG. 2 and decompression functions 14, 134, and 154 can implement an HTML decompression architecture shown in FIG. 3. FIG. 2 shows an HTML compression method according to the present invention. Compression function: 200 pairs

473673 五、發明說明(14) 應於圖1的壓縮功能11 2, 132及152。一緩衝器/分析器210 接收未壓縮的HTML資料。請注意此HTML資料可以參考到聲 音,視訊或圖形資料可發現的位置。 文字經過分析,並提供給一習用的文字編碼功能2丨5來 提供編碼文字,例如像是AS C I I資料。 HTML元件’像是標籤,包含其屬性、次屬性、次次屬性 等等,皆被分析並提供給一壓縮功能2 2 0,其視需要可具 有一查閱表225,能夠使用已知的技術來進行。此查閱表 2 25將一編碼字元與每個HTML元件(標籤及屬性)關聯起 來。編碼字元的長度係根據要進行編碼的不同標籤及屬性 數目來選定其長度。一 1 6位元編碼字元(雨個位元組)已知 可適用於處理既有的標籤,同時允許未來的擴充性。 再者,其有可能保留一個或多個1 6位元來指定此標籤為 一空白標籤或一容器標籤。舉例而言,最有效位元可被選 取。對於容器標籤,一個或多個其它保留位元也可指出是 否該標蕺為一啟始標籤或結束標籤。 其它可以指出的資訊為,是否此標籤為〆樣式標記榡織 或一結構化標記標籤。一般而言,樣式標記標籤(用以指 定粗線樣式,字型,引號文字等等),其玎被用在結構化 標記標籤之中(用以指定表列,表袼,錨記等等),但相反 的方式則不建議使用。 對於結構化標記標蕺,編碼字元可以指定是否標籤為〜 區塊元件,表列元件’表格元件,格式元件,超文字元 473673 五、發明說明(15) 一編碼字元也可指定與每個標籤相關的屬性數目。為此 目的所保留的位元數目必須對應於最大預期的屬性數目。 舉例而言,三個位元可以用於指定此處最多有8個屬性與 標籤相關。 一般而言,位元必須保留在編碼字元來指出標籤的特性 來協助顯示HTML資料。舉例而言,啟始及結束容器標藏的 指定是很有用的,因為其告訴一處理器有關要修正文字的 範圍。473673 V. Description of the invention (14) The compression functions 11 2, 132 and 152 of FIG. 1 should be used. A buffer / parser 210 receives uncompressed HTML data. Please note that this HTML material can refer to the location where audio, video or graphic material can be found. The text is analyzed and provided to a conventional text encoding function 2 丨 5 to provide encoded text, such as AS C I I data. HTML elements' like tags, including their attributes, sub-attributes, sub-attributes, etc., are analyzed and provided to a compression function 2 2 0, which can have a lookup table 225 as needed, and can use known techniques get on. This lookup table 2 25 associates a coded character with each HTML element (tag and attribute). The length of the encoding character is selected according to the number of different tags and attributes to be encoded. A 16-bit coded character (rain bytes) is known to be suitable for processing existing tags while allowing future scalability. Furthermore, it is possible to reserve one or more 16 bits to designate the label as a blank label or a container label. For example, the most significant bit can be selected. For container labels, one or more other reserved bits can also indicate whether the label is a start or end label. Other information that can be pointed out is whether the label is a 〆 style mark woven or a structured mark label. In general, style tag tags (to specify thick line styles, fonts, quoted text, etc.) are used in structured tag tags (to specify table columns, table tags, anchors, etc.) , But the opposite is not recommended. For structured markup, the coded characters can specify whether the label is ~ block element, list element 'table element, format element, hypertext element 473673 V. Description of the invention (15) A coded character can also be specified with each The number of tag-related attributes. The number of bits reserved for this purpose must correspond to the maximum expected number of attributes. For example, three bits can be used to specify that up to eight attributes are associated with the tag here. In general, bits must remain in the coded characters to indicate the characteristics of the tags to help display HTML data. For example, the designation of the start and end container tags is useful because it tells a processor about the range of text to be modified.

舉例而言,對每個字元需要8個位元的資料(包含一字 母,數字,標點符號,或空白),HTML碼” &lt; IMG SRC = ’,filename.GIF” ALT = &quot;fiiename” ALIGN = middle&gt;n 共有5 2個字元,或5 2個位元組的.資料。藉由對於每個HTML 元件(如&quot;IMGn,nSRC”,”ALTn及” ALIGN”)取代一雙位元組 編碼字元,有1 4個位元組需要編碼這些元件,並降低到8 個位元組,因此節省了 6個位元組。在一給定的Η Τ μ L頁面 中,利用本發明所節省的位元組數量對於較長的元件可以 節省更多,(例如:比較”&lt;81^0(:1^110丁£&gt;,,,其可由12個位 元組降低到2個位元組,對於” &lt; A &gt; ”,其係由3個位元組降 低到2個位元組),以及在一網頁中的元件數目。 對於每個元件,一編碼字元係由壓縮功能2 2 0所輸出, 並用於一結合器2 3 0來結合適當序列中的編碼字元,並根 據本發明來提供壓縮的HTML資料。此資料包含用於文字的 文字碼,及來自壓縮功能22 0用於HTML元件的編碼字元。 請注意其它的已知壓縮技術,例如Lempe 1 -Z i v演算法及For example, each character requires 8 bits of data (including a letter, number, punctuation, or blank), HTML code "&lt; IMG SRC = ', filename.GIF" ALT = &quot; fiiename " ALIGN = middle &gt; n has a total of 52 characters, or 52 bytes of data. By replacing one bit for each HTML element (such as &quot; IMGn, nSRC "," ALTn, and "ALIGN") The tuple encodes characters. There are 14 bytes that need to encode these elements, and it is reduced to 8 bytes, thus saving 6 bytes. In a given Η μL page, use this The number of bytes saved by the invention can save more for longer components, (for example: comparison "&lt; 81 ^ 0 (: 1 ^ 110 丁 £ &gt;), which can be reduced from 12 bytes to 2 Bytes, for "&lt; A &gt;", it is reduced from 3 bytes to 2 bytes), and the number of components in a web page. For each component, a coded character system Output by the compression function 2 2 0 and used in a combiner 2 3 0 to combine the coded characters in the appropriate sequence and provide according to the present invention Condensing the HTML data. This data includes character codes for the text, and from the compression function 220 for codewords HTML elements. Note that other known compression techniques, such Lempe 1 -Z i v algorithms and

第20頁 473673 五、發明說明(16)Page 20 473673 V. Description of the invention (16)

Huffman編碼,皆可用於輸出自結合器23〇的壓縮HTML資 料,或單獨對於編碼文字,或僅對編碼字元。再者,相關 的視訊/音訊資料也可用上述已知的技術來壓縮。 圖3所示為根據本發明的HTML解壓縮。解壓縮功能3〇〇對 應於圖1的解壓縮功能丨丨4, 1 3 4, 1 5 4。 此處,壓縮的HTML係由一緩衝器/分析器31〇所接收。包 含文字資料的編碼文字係用於一文字解碼功能3 1 5來回復 文字,然後則用於一結合器330,其中HTML編媽字元係用 於一解壓縮功能320。在解壓縮功能320處的查閱表325, 係相關於每個接收到的HTML元件。相關的元件係輸出到結 合器33 0來形成未壓縮的HTML資料。 因此,可以瞭解到本發明提供.一種用於壓縮書寫語文内 容的方法及裝置,例如HTML。編碼字元係用於HTML元件, 例如標籤及其屬性,來降低資料量,例如在一網頁中。再 者’編碼字元可以具有保留位元來區別空白標籤及容器標 籤’並指出一容器標籤是否為一啟始或結束標籤,或是提 供其它有關標戚的資訊來協助處理。此技術可與其它的壓 技術相容5並提供甚至更南的壓、缩。 本發明可以有效壓縮必須傳送的資料量,例如用以表示 傳送到一用戶終端機的一網頁。此外,本發明允許使用一 繪圖引擎或瀏覽器,例如在一用戶終端機中直接地處理/ 顯示壓縮的Η T M L資料(例如編碼字元),而不需要解壓縮。 由此可使處理時間及複雜度大為降低。 此外,每個編瑪字元具有相同的長度,因此通常耗費同Huffman encoding can be used to output compressed HTML data from the combiner 23, or for encoded text alone, or only for encoded characters. Furthermore, related video / audio data can also be compressed using the known techniques described above. FIG. 3 shows an HTML decompression according to the present invention. The decompression function 300 corresponds to the decompression function in FIG. 1, 4, 1, 3, 4, and 5 4. Here, the compressed HTML is received by a buffer / parser 31. The encoded text containing text data is used for a text decoding function 3 1 5 to reply to the text, and then used for a combiner 330, in which the HTML editor character is used for a decompression function 320. The lookup table 325 at the decompression function 320 is associated with each received HTML element. The related components are output to the combiner 330 to form uncompressed HTML data. Therefore, it can be understood that the present invention provides a method and device for compressing written content, such as HTML. Coded characters are used in HTML components, such as tags and their attributes, to reduce the amount of data, such as in a web page. Furthermore, the 'encoded character can have reserved bits to distinguish between a blank label and a container label' and indicate whether a container label is an opening or closing label, or provide other information about the label to assist in processing. This technology is compatible with other compression technologies5 and provides even more compression and shrinkage. The present invention can effectively compress the amount of data that must be transmitted, for example, to indicate a web page transmitted to a user terminal. In addition, the present invention allows the use of a graphics engine or browser, such as processing / displaying compressed ΗTLM data (such as coded characters) directly in a user terminal, without the need for decompression. This can greatly reduce processing time and complexity. In addition, each braille character has the same length, so it usually costs the same

第21頁 473673 五、發明說明(π) 樣的處理時間,所以處理時間成為可以決定的量 本發明的技術可以利用任何已知的硬體,軟體 體來實施。 雖然本發明已利用不同的特定具體實施例來加 那些本技藝的專業人士可以瞭解到,在不背離本 專利範圍的精神及範圍之下進行不同的調整及修 舉例而言,當本發明與纜線或衛星電視寬頻通 同討論時,其可以瞭解其它的網路,例如區域網 (LANs),都市區域網路(MANs),廣域網路 ^ 路,企業内網路,或以上的組合,皆 再者,本發明適用於壓縮任 心· HTML或其它類似的語文(例如σ 、㈢寫語文内容 多媒體整合語文(SMIL))。 伸標記語文(XML) 及/或韌 以說明, 發明申請 改。 訊網路共 路 ,網際網 ,包含 ,或同步Page 21 473673 V. Description of the invention (π) The processing time is similar, so the processing time becomes a determinable amount. The technology of the present invention can be implemented using any known hardware or software. Although the present invention has used different specific embodiments to add those skilled in the art, it can be understood that different adjustments and modifications can be made without departing from the spirit and scope of the scope of this patent. When discussing broadband or satellite TV broadband, they can learn about other networks, such as LANs, MANs, WANs, intranets, or a combination of the above. In other words, the present invention is applicable to compressed zip code, HTML or other similar languages (for example, σ, transcribed content, multimedia integrated language (SMIL)). Extensible Markup Language (XML) and / or resilience to illustrate that the invention application is modified. Internet, Internet, Include, or Sync

Claims (1)

473673 六、申請專利範圍 1 · 一種用以處理書寫語文資料的方法,其包含以下的步 驟: (a) 分析此書寫語文資料,並自其書寫語文元件分離 出其文字,該書寫語文元件包含標籤; (b) 對於每個不同的標籤提供個別的編碼字元; (c )對該文字進行編碼而產生編碼文字;及 (d )將編碼字元與編碼文字結合來產生壓縮的書寫語 文資料。 2 ·如申請專利範圍第1項之方法,其中: 至少一個編碼字元指出相關的標籤是否為一空白標籤 或一容器標籤。 3 ·如申請專利範圍第1項之方法,其中對於容器標籤, 至少一個對應的編碼字元指出是否容器標籤為一啟始標籤 或一結束標籤。 4. 如申請專利範圍第1項之方法,其中·· 至少一個編碼字元指出是否對應的標籤為一樣式標記 標籤或一結構標記標籤。 5. 如申請專利範圍第1項之方法,其中對於結構標記標 籤,至少一個對應的編碼字元指出是否結構標記標籤為一 區塊元件’表列元件’表格元件5格式元件’超文字鍵 結,内建圖形,或網頁標記標籤。 6. 如申請專利範圍第1項之方法,其中該書寫文字元件 包含標籤的屬性,另包含以下的步驟·· 對於每個不同的屬性提供一個別的編碼字元。473673 VI. Scope of Patent Application 1 · A method for processing written language data, which includes the following steps: (a) Analyze the written language data and separate its text from its written language element, which contains a label (B) provide individual coded characters for each different label; (c) code the text to generate coded text; and (d) combine the coded characters with the coded text to produce compressed written language data. 2. The method according to item 1 of the scope of patent application, wherein: at least one coded character indicates whether the relevant label is a blank label or a container label. 3. The method according to item 1 of the scope of patent application, wherein for the container label, at least one corresponding coded character indicates whether the container label is an opening label or an ending label. 4. The method according to item 1 of the scope of patent application, wherein at least one coded character indicates whether the corresponding label is a style label or a structure label. 5. The method according to item 1 of the scope of patent application, wherein for the structure mark label, at least one corresponding code character indicates whether the structure mark label is a block element 'list element' table element 5 format element 'hypertext bond , Built-in graphics, or web tag tags. 6. The method according to item 1 of the scope of patent application, wherein the written text element includes the attributes of the label, and further includes the following steps: · Provide a different encoding character for each different attribute. 第23頁 473673 六、申請專利把圍 7 ·如申請專利範圍第1項之方法,其中: 該書寫語文元件包含標籤的屬性;及 個別的編碼字元代表相關於每個標籤的屬性數量。 8 ·如申請專利範圍第1項之方法,另包含以下步驟: 在一通訊網路中,將壓縮的書寫語文資料由一書寫語 文内容伺服器傳送到一用戶終端機。 9 ·如申請專利範圍第1項之方法,另包含以下步驟: 在一通訊網路中,將壓縮的書寫語文資料由一頭端傳 送到一用戶終端機。 I 0 ·如申請專利範圍第1項之方法,另包含以下步驟: 在一通訊網路中,將壓縮的書寫語文資料傳送到一用 戶終端機;及 · 處理壓縮的書寫語文資料,不須回復書寫語文元件來 提供適於顯示的資料。 II ·如申請專利範圍第1項之方法,另包含以下步驟: (d )分析壓縮的書寫語文資料,並將其書寫語文元件 分離出其文字; (e )對於編碼文字進行解碼,並提供解碼的文字; (f )對於内含於該步驟(d)中每個相對應的不同編碼字 元提供個別的書寫語文元件;及 (g )將在該步驟(f )中的書寫語文元件與解碼文字結合 來提供未壓縮書寫語文資料。 1 2.如申請專利範圍第1 1項之方法,其中: 未壓縮的書寫語文資料係在一通訊網路中的一用戶終Page 23 473673 VI. Application for patent protection 7 · The method of item 1 of the patent application scope, wherein: the written language element contains the attributes of the tag; and the individual coded characters represent the number of attributes associated with each tag. 8. The method according to item 1 of the scope of patent application, further comprising the following steps: In a communication network, transmitting the compressed written language data from a written language content server to a user terminal. 9 · The method according to item 1 of the scope of patent application, further comprising the following steps: In a communication network, the compressed written language data is transmitted from one end to a user terminal. I 0 · If the method of the scope of patent application No. 1 further includes the following steps: In a communication network, transmit the compressed written language data to a user terminal; and · Process the compressed written language data without writing back Language elements to provide information suitable for display. II · The method according to item 1 of the scope of patent application, further comprising the following steps: (d) Analyze the compressed written language data and separate the written language elements from its text; (e) Decode the encoded text and provide decoding (F) providing separate written language elements for each corresponding different coded character contained in step (d); and (g) writing the language elements and decoding in step (f) Text combination to provide uncompressed written language information. 12. The method according to item 11 of the scope of patent application, wherein: the uncompressed written language data is a user terminal in a communication network 第24頁 473673 六、申請專利範圍 端機上以該步驟(d ) - (g)進行處理。 1 3.如申請專利範圍第1項之方法,其中: 書寫語文資料包含超文字標記語文(HTML)資料。 1 4 ·如申請專利範圍第1項之方法,另包含以下步驟: 在一代理伺服器中暫時地儲存壓縮的書寫語文資料, 用於經常被用戶終端機存取的書寫語文内容。 1 5 · —種用以處理書寫語文資料的裝置,其包含: 一第一分析器,用於分析該書寫語文資料以將其書寫 語文元件分離出其文字,該書寫語文元件包含標籤; 第一裝置,用以對每個不同的標籤提供一個別的編碼 字元 , 用於編碼該文字的裝置,以·提供編碼的文字;及 一第一結合器,用以將編碼字元與編碼文字結合來提 供壓縮的書寫文字資料。 1 6 ·如申請專利範圍第1 5項之裝置,其中: 至少一個編碼字元指出相關的標籤是否為一空白標籤 或一容器標籤。 1 7.如申請專利範圍第1 5項之裝置,其中對於容器標 籤,至少一個對應的編碼字元指出是否容器標籤為一啟始 標籤或一結束標藏。 1 8 ·如申請專利範圍第1 5項之裝置,其中: 至少一個編碼字元指出是否對應的標籤為一樣式標記 標籤或一結構標記標籤。 1 9.如申請專利範圍第1 5項之裝置,其中對於結構標記Page 24 473673 VI. Scope of patent application Terminals use the steps (d)-(g) for processing. 1 3. The method of claim 1 in the scope of patent application, wherein: the written language data includes hypertext markup language (HTML) data. 14 · The method according to item 1 of the scope of patent application, further comprising the following steps: Temporarily storing compressed written language data in a proxy server for written language content frequently accessed by user terminals. 1 5. A device for processing written language data, comprising: a first analyzer for analyzing the written language data to separate its written language element from its text, the written language element including a label; first A device for providing a different coded character for each different label, a device for coding the text to provide the coded text; and a first combiner for combining the coded character with the coded text To provide compressed written text. 16 · The device according to item 15 of the scope of patent application, wherein: at least one coded character indicates whether the relevant label is a blank label or a container label. 17. The device according to item 15 of the scope of patent application, wherein for the container label, at least one corresponding coded character indicates whether the container label is an opening label or an ending label. 18 · The device according to item 15 of the scope of patent application, wherein: at least one coded character indicates whether the corresponding label is a style label or a structure label. 19. The device according to item 15 of the scope of patent application, wherein 第25頁 473673 六、申請專利範圍 標籤,至少一個對應的編碼字元指出是否結構標記標籤為 一區塊元件,表列元件,表格元件,袼式元件,超文字鏈 結,内建圖形,或網頁標記標籤。 2 〇 ·如申請專利範圍第1 5項之裝置,其中該書寫語文元 件包含標籤的屬性,另包含: 對於每個不同的屬性提供一個別編碼字元的裝置。 2 1 ·如申請專利範圍第1 5項之裝置,其中: 該書寫語文元件包含標籤的屬性;及個別的編碼字元 代表相關於每個標籤的屬性數目。 2 2.如申請專利範圍第15項之裝置,其中,: 在一通訊網路中,壓縮語文資料係由一書寫語文内容 伺服器傳送到一用戶終端機。 - 23.如申請專利範圍第15項之裝置,其中: 在一通訊網路中,將壓縮的書寫語文資料由一頭端傳 送到一用戶終端機。 2 4.如申請專利範圍第1 5項之裝置,其中壓縮語文資料 在一通訊網路中係傳送到一用戶終端機,另包含: 至少一處理器,用以處理壓縮的書寫語文資料,而不 須回復書寫語文元件來提供適於顯示的資料。 2 5.如申請專利範圍第15項之裝置,另包含: 一第二分析器,用以分析壓縮的書寫語文資料來將其 編碼字元分離出其編碼的文字; 該第二裝置,用以對來自該第二分析器的每個對應的 不同標籤提供一個別的書寫語文元件;Page 25, 473673 6. At least one corresponding coded character indicates whether the structure tag label is a block component, a list component, a table component, a style component, a hypertext link, a built-in graphic, or Web tag labels. 2 〇 The device according to item 15 of the scope of patent application, wherein the written language element includes the attributes of the label, and further includes: a device for providing a different coded character for each different attribute. 2 1 · The device according to item 15 of the scope of patent application, wherein: the written language element includes attributes of tags; and individual coded characters represent the number of attributes associated with each tag. 2 2. The device according to item 15 of the patent application scope, wherein: In a communication network, the compressed language data is transmitted from a written language content server to a user terminal. -23. The device according to item 15 of the scope of patent application, wherein: In a communication network, the compressed written language data is transmitted from a head end to a user terminal. 2 4. The device according to item 15 of the scope of patent application, wherein the compressed language data is transmitted to a user terminal in a communication network, and further includes: at least one processor for processing the compressed written language data without The written language elements must be replied to provide information suitable for display. 2 5. The device according to item 15 of the scope of patent application, further comprising: a second analyzer for analyzing the compressed written language data to separate its coded characters from its coded text; the second device for Providing a different written language element for each corresponding different tag from the second analyzer; 第26頁 473673 六、申請專利範圍 對編碼文字進行解碼的裝置,用以提供解碼的文字; 及 一第二結合器,用以將由該第二裝置所提供的書寫語 文元件與解碼文字結合來提供未壓縮的書寫文字資料。 2 6 ·如申請專利範圍第2 5項之裝置,其中: 未壓縮書寫語文資料在一通訊網路上的一用戶終端 機,由該第二分析器,第二裝置,用以解碼的裝置,以及 第二結合器來進行處理。 2 7·如申請專利範圍第1 5項之裝置,其中·· 壓縮書寫語文資料包含超文字標記語文(HTML)資料。 2 8.如申請專利範圍第15項之裝置,另包含: 在一代理伺服器中暫時儲存·壓縮的書寫語文資料的裝 置,用於經常被用戶終端機所存取的書寫語文内容。Page 26 473673 VI. Patent application device for decoding encoded text to provide decoded text; and a second combiner for combining the written language elements provided by the second device with the decoded text to provide Uncompressed written text. 26. The device according to item 25 of the scope of patent application, wherein: a user terminal on which a language data is written uncompressed on a communication network, the second analyzer, the second device, a device for decoding, and the first Two combiners for processing. 27. The device according to item 15 of the scope of patent application, wherein the compressed written language data includes hypertext markup language (HTML) data. 2 8. The device according to item 15 of the scope of patent application, further comprising: a device for temporarily storing and compressing written language data in a proxy server for the written language content frequently accessed by the user terminal. 第27頁Page 27
TW089117782A 1999-09-10 2000-08-31 Method and apparatus for compressing scripting language content TW473673B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US39383599A 1999-09-10 1999-09-10

Publications (1)

Publication Number Publication Date
TW473673B true TW473673B (en) 2002-01-21

Family

ID=23556435

Family Applications (1)

Application Number Title Priority Date Filing Date
TW089117782A TW473673B (en) 1999-09-10 2000-08-31 Method and apparatus for compressing scripting language content

Country Status (5)

Country Link
EP (1) EP1279267A2 (en)
AU (1) AU8035100A (en)
CA (1) CA2384687A1 (en)
TW (1) TW473673B (en)
WO (1) WO2001019052A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1358751A2 (en) * 2001-01-26 2003-11-05 Pogo Mobile Solutions Limited Improvements in or relating to wireless communication systems
AU2002237460A1 (en) * 2002-02-28 2003-09-09 Nokia Corporation Http message compression
CN1751492B (en) 2003-02-14 2011-10-26 捷讯研究有限公司 System and method of compact messaging in network communications
JPWO2004079586A1 (en) * 2003-03-07 2006-06-08 シャープ株式会社 Data conversion method capable of optimal processing of markup language
NZ566291A (en) 2008-02-27 2008-12-24 Actionthis Ltd Methods and devices for post processing rendered web pages and handling requests of post processed web pages
US20130297728A1 (en) * 2012-05-01 2013-11-07 Qualcomm Iskoot, Inc. Selectively exchanging metadata in a wireless communications system
FR2988497A1 (en) * 2012-05-04 2013-09-27 Sagemcom Energy & Telecom Sas XML type message server for use in communication system, has compressing unit compressing XML type message, and decompressing unit that is utilized for decompressing message compressed in format of XML type

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5668548A (en) * 1995-12-28 1997-09-16 Philips Electronics North America Corp. High performance variable length decoder with enhanced throughput due to tagging of the input bit stream and parallel processing of contiguous code words
WO1997034240A1 (en) * 1996-03-15 1997-09-18 University Of Massachusetts Compact tree for storage and retrieval of structured hypermedia documents
US6018764A (en) * 1996-12-10 2000-01-25 General Instrument Corporation Mapping uniform resource locators to broadcast addresses in a television signal
JP3859313B2 (en) * 1997-08-05 2006-12-20 富士通株式会社 Tag document compression apparatus and restoration apparatus, compression method and restoration method, compression / decompression apparatus and compression / decompression method, and computer-readable recording medium recording a compression, decompression or compression / decompression program
EP0928070A3 (en) * 1997-12-29 2000-11-08 Phone.Com Inc. Compression of documents with markup language that preserves syntactical structure
JP4003854B2 (en) * 1998-09-28 2007-11-07 富士通株式会社 Data compression apparatus, decompression apparatus and method thereof
GB9911099D0 (en) * 1999-05-13 1999-07-14 Euronet Uk Ltd Compression/decompression method

Also Published As

Publication number Publication date
EP1279267A2 (en) 2003-01-29
AU8035100A (en) 2001-04-10
CA2384687A1 (en) 2001-03-15
WO2001019052A3 (en) 2002-11-14
WO2001019052A2 (en) 2001-03-15

Similar Documents

Publication Publication Date Title
RU2475832C1 (en) Methods and systems for processing document object models (dom) to process video content
US8032651B2 (en) News architecture for iTV
JP3880517B2 (en) Document processing method
US20100281042A1 (en) Method and System for Transforming and Delivering Video File Content for Mobile Devices
US6715126B1 (en) Efficient streaming of synchronized web content from multiple sources
TWI235924B (en) Methods, apparatus, and systems for storing, retrieving and playing multimedia data
US20090063530A1 (en) System and method for mobile web service
US6079566A (en) System and method for processing object-based audiovisual information
US6018764A (en) Mapping uniform resource locators to broadcast addresses in a television signal
KR100540495B1 (en) A method and apparatus for compressing a continuous, indistinct data stream
CN102282825B (en) Method and device for streaming media to request address mapping and cache nodes in content delivery network
CN101627607A (en) Script-based system to perform dynamic updates to rich media content and services
US20060245727A1 (en) Subtitle generating apparatus and method
US20060200761A1 (en) Content management and transformation system for digital content
US20020089470A1 (en) Real time internet transcript presentation system
EP1605615A2 (en) Method and apparatus for providing a slide show having interactive information in DAB
CN101383848A (en) System and method for mobile web service
TW473673B (en) Method and apparatus for compressing scripting language content
KR100698324B1 (en) Device and method for forwarding WAP contents using MMS message
KR20050117827A (en) Method and apparatus of processing slide show data in dab
CN101513070A (en) Method and apparatus for displaying the laser contents
JP4392190B2 (en) Data content transmitting apparatus and data content transmitting program
US20040201591A1 (en) Data transfer method, data transfer program, storage medium and information terminal
WO2010062761A1 (en) Method and system for transforming and delivering video file content for mobile devices
WO2001073560A1 (en) Contents providing system

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees