WO2000077677A2

WO2000077677A2 - Invisible encoding of attribute data in character based documents and files

Info

Publication number: WO2000077677A2
Application number: PCT/EP2000/005239
Authority: WO
Inventors: Keith T. Ahern
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 1999-06-15
Filing date: 2000-06-07
Publication date: 2000-12-21
Also published as: EP1145140A2; CN1335966A; EP1145140A3; JP2003502735A; WO2000077677A3

Abstract

Messages that contain text elements and attributes that affect the display of the text elements are encoded as a plain-text message followed by a list of the changes to the plain-text message to effect the enhanced display of the plain-text message. By segregating the plain-text from the attributes associated with the text elements, all text applications are able to display an undisturbed copy of the text. The control and formatting attributes are appended to the plain-text, so that the direct display of the initial portion of the message is an immediately readable version of the text. Additionally, the control and formatting information may be encoded using 'invisible' sequences of characters, such as space, backspace, tab, etc., or as a sequence of visible characters and corresponding invisible characters that have the effect of erasing the visible characters from view. By invisibly encoding the tag elements, the direct display of the message will appear as a plain-text message, because the tag elements will either be self-erasing, or appended to the plain-text message as 'invisible' white space.

Description

Invisible encoding of attribute data in character based documents and files

BACKGROUND OF THE INVENTION 1. Field of the Invention

This invention relates to the field of information processing, and in particular to the encoding of information in electronic versions of documents and files. 2. Description of Related Art

As methods of encoding information change to allow for greater capabilities and efficiencies, the likelihood of incompatibility with prior art systems increases. Techniques and standards have been adopted to minimize such incompatibilities, but there remains a class of legacy products that were created before such techniques and standards were adopted.

One standard that has had a fairly high degree of compatibility success is the MIME (Multipart Internet Mail Extension) format. Using the MIME format, compatibility is provided by encoding the message twice: the first encoding is in "plain-text", and the second encoding is in "rich-text". As their names imply, the plain-text encoding is an encoding of all of the printable text characters in the message, without any control codes, or tags, that affect the display of these characters, whereas the rich-text format includes control codes that indicate the attributes associated with the printable text characters, such as bold, italics, underscore, color, font-size, font-type, and other attributes. A MIME-format file includes both encodings of the message. When a MIME-format file is opened for viewing by an application, the application determines which encoding to use, depending upon its capabilities or the capabilities of the system upon which it is operating. If the application supports, for example, bold or italicized fonts, the rich-text format will be used to accurately reflect the presence of bold or italicized characters in the original message. Conversely, if the application or system is incapable of displaying bold or italicized letters, the plain-text encoding will be displayed.

To provide for compatibility among rich-text-enabled devices as well as plaintext-enabled devices, the MIME-format file consists exclusively of printable character codes. The tags, or control codes, in the original message are ignored in the plain-text encoding of the message, and are encoded in the rich-text format as sets of unique character strings. FIG. 1 illustrates the encodings of a message 100 into plain-text format 110 and rich-text format 120. FIG. 2 illustrates the composite MIME-format file 200 that includes both the plain-text format 110' and rich-text format 120', as well as MIME-specific control information that describes the contents of the file, the document type, and so on. The rich text format 120' includes control information 121, 122 that determine how the text elements are to appear when displayed, in this example, when to turn a "bold" rendering on 121, and off 122. For ease of reference the information that is in addition to the plain-text information is collectively termed "attributes". When an application that supports bold, italics, and underlining processes the MIME-format file 200, it will process the rich-text format 120', and the displayed or printed message will appear similar to the original message 100 of FIG. 1. If the application or system does not support bold, italics, and underling, the application will process the plain-text format and the displayed or printed message will appear similar to the plain-text format 110 of FIG. 1.

The above described proper display or printing of a MIME-format file 200, however, presupposes that the application is MIME-compatible. That is, it presupposes that the application recognizes the MIME-specific information 201, 202, 203 and selects the appropriate encoding 110', 120' for processing and display. An application that is not MIME- compatible, however, will not recognize that the initial portion 201 of the file is a MIME- header, nor that the center portion 202 is a MIME-separator between the plain-text encoding 110' and the rich-text encoding 120', nor that the ending portion 203 is a MIME-footer. To an application that is not MIME-compatible, the MIME-format file 200 merely appears as a conventional text file. The displayed or printed MIME-format file 200 via such an application will appear similar to the image of the MIME-format file 200 of FIG. 2. That is, all of the MIME-specific information 201, 202, 203 will appear as part of the displayed document, as well as the plain-text format 110' and rich-text format 120' information. Such a direct display of the MIME-format file 200 is visually unappealing, and is often undecipherable to a user who is unfamiliar with the raw form of formatted computer files.

BRIEF SUMMARY OF THE INVENTION It is an object of this invention to provide a method of encoding a message with text information and attributes that allows for an easy-to-read display of the message regardless of the capabilities of the application used to display the message. It is a further object of this invention to obviate the need to encode the same text information into two different formats. It is a further object of this invention to provide a segregation of the text information from the attributes that affect how the text information is to be displayed or printed.

These objects and others are achieved in two ways. In the first method, messages that contain text elements and tag elements that may affect the display of the text elements are encoded as a plain-text message followed by a list of the changes to the plaintext message to effect the enhanced display of the plain-text message. By segregating the plain-text from the attributes associated with the text elements, all text applications are able to display an undisturbed copy of the text. In a preferred embodiment, the control and formatting attributes are appended to the plain-text, so that the direct display of the initial portion of the message is an immediately readable version of the text. In the second method, which can be independent or combined with the first method, the control and formatting information is encoded using "invisible" sequences of characters. In one example embodiment, unique sequences of invisible characters, such as space, backspace, tab, etc., are used to encode each unique tag. In another example embodiment, the tag elements are encoded as a sequence of visible characters and corresponding invisible characters that have the effect of erasing the visible characters from view, such as backspace characters. By invisibly encoding the tag elements, the direct display of the message will appear as a plaintext message, because the tag elements will either be self-erasing, or appended to the plaintext message as "invisible" white space.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:

FIG. 1 illustrates an example of a prior art plain-text and rich-text encoding of a document containing text elements and tag elements.

FIG. 2 illustrates an example of a prior art MIME-encoding of a document containing text elements and tag elements.

FIG. 3 illustrates an example of an encoding of a document containing text elements and tag elements that clusters the text elements in accordance with one aspect of this invention.

FIGs. 4A-4C illustrate an example of an invisible encoding of the tag elements in accordance with another aspect of this invention.

FIG. 5 illustrates an example of an encoding of a document containing text elements and tag elements that includes an example clustering of the text elements and an alternative example of an invisible encoding of the tag elements in accordance with this invention.

FIG. 6 illustrates an example of an in-line invisible encoding of the tag elements of a document in accordance with this invention. FIG. 7 illustrates an example block diagram of an encoder for encoding a document in accordance with this invention.

FIG. 8 illustrates an example flow diagram for encoding a document in accordance with this invention.

FIG. 9 illustrates an example block diagram of a decoder for decoding a document that is encoded in accordance with this invention.

Throughout the drawings, same reference numerals indicate similar or corresponding features or functions.

DETAILED DESCRIPTION OF THE INVENTION FIG. 3 illustrates an example of an encoding of the document 100 in accordance with one aspect of this invention. As illustrated, the encoded document 300 includes a plain-text section 310, and a tag section 320. The plain-text section 310 is an extraction, or clustering, of the textual content of the document 100, without the attributes associated with the text that affect the appearance of the text. That is, all the letters, numbers, symbols, punctuation, and the like are encoded directly from the input document 100; in most cases, the document 100 will be in an electronic form, and the encoding of the text items will be a mere transfer of the text from the document 100 to the encoded document 300, using the same character codes that are contained in the electronic form of the document 100, such as ASCII. The tag section 320 is an extraction of each of the tag elements in the document 100, and an offset, or location, associated with the tag element. The offset is used to recreate the document 100 by applying each tag to the text 310 at the associated offset location in the encoded document 300. For example, the word "bold" 101, which occurs at the 33^rd through 36^th character locations in the input document 100, appears in bold type. To effect this bold appearance, a "bold-start" attribute would be applied just prior to the 33^rd character location, and a "bold-end" attribute would be applied just after the 36^th character location. The tag section 320 is illustrated as containing the number "32" 340 and the letter "B" 345, representing that a bold-start ("B") is to be applied immediately after the 32^nd ("32") character location in the plain-text 310. In like manner, the tag section 320 contains the number "36" 350 and the letter sequence "/B", representing that a bold-end ("/B") is to be applied after the 36^th ("36") character location in the plain-text 310. Thus, the "32 B 36 /B" entry in the tag section 320 provides sufficient information for effecting a bold rendering of the word "bold" in the plain-text 310. Each subsequent attribute tag ("I" Italics, "UL" UnderLine) is similarly encoded with a reference to their intended location in the plain-text 310. Other types of tag elements, such as HTML references to hypertext links, are similarly encoded. If the contents of the tag element are conventionally displayed, such as an explicitly referenced file name or Internet address, these contents remain in the plain-text 310. If the contents of the tag element are not conventionally displayed, such as references to internally generated references, similar to segment 202 in the example document of FIG. 2, they are encoded in the tag section 320, and do not appear in the plain-text 310.

Note that by referencing each tag to a location in the plain-text 310, there is no need to repeat the text. On the assumption that a typical document contains significantly more text than tags, the elimination of the need to repeat the text provides for a substantial reduction in the size of a formatted file using this invention, as compared to the size of an equivalent MIME-format file. Also note that, in a preferred embodiment, the plain-text 310 appears first in the formatted file 300. In this manner, a legacy application that displays the contents of the formatted file 300 directly will provide for a meaningful and easy to read rendition 310 of the text of the message 100. All of the tag information appears at the end of the text, and can be ignored by the user.

The decoding of the formatted file 300 is facilitated by the use of a tag section delimiter "{changes:" 321 that identifies the end of the plain-text 310 and the beginning of the tag section 320. An application that is compatible with the format of the formatted file 300 recognizes this predefined delimiter, and thereafter interprets the subsequent information as tag-location— tag-information pairs. The particular choice of characters for the tag segment delimiter 321, "{changes:" is presented here for illustrative purposes only. In a preferred embodiment, a sequence of characters is selected that is highly likely to be unique; that is, a sequence that has a high likelihood of not coincidently appearing within the plain-text 310, for example "qx74gh#$6^Λ2". Alternatively, the tag section delimiter can be deduced from the content of the formatted file 300. For example, the application may process the formatted file 300 from the end toward the beginning, noting the occurrences of identifiable-tag- elements—numeric-location pairs. The beginning of the tag segment 320 is identified by the first absence of a identifiable-tag-elements— numeric-location pair. These and other techniques for delineating distinguishable segments of information, or clusters, are common to one of ordinary skill in the art.

As noted above, the tag elements will appear at the end of the plain-text 310.

In accordance with another aspect of this invention, the tag elements in a preferred embodiment are encoded using "invisible" character codes. That is, the codes used to encode the tag elements and their locations are encoded such that a direct display of the encoded file will not produce a visible effect. For the purposes of this invention, a blank space is considered an "invisible" character, even though it produces a "white" space upon display. In like manner, blank lines are included in the definition of "invisible". FIGs. 4A-4C illustrate examples of a creation of invisible sequences corresponding to tag elements. Illustrated in FIG. 4 A, each type of tag element 410 is uniquely defined by a tag-type identifier 420. The definition of each tag-type identifier 420 can be predefined, or the mapping of unique identifiers to tag-types can be defined for each encoded document. For ease of understanding, the mapping of tag-types to tag-type identifiers is assumed herein as being predefined, alternative data mapping techniques being common in the art. As illustrated in FIG. 4A, a "begin italics" tag-type has an identifier of " 100" 421 , an "end italics" tag-type has an identifier of " 101 " 422, and so on. Common in the art, some tag-types have associated parameters. For example, a "begin color" tag-type has an identifier of "106" 425, and this identifier is followed by parameters that define the magnitude of the red 426, green 427, and blue 428 components of the defined color.

The binary representation 420B of the value of each tag-type identifier 420 is illustrated in FIG. 4 A. In accordance with one example embodiment of this invention, an invisible sequence is created for each tag element by encoding the sequence of binary (0-1) values in the binary representation 420B as a sequence of invisible characters. Illustrated in FIG. 4B, for example, a "space" (Sp) is used to represent a logic "0", while a "carriage return" (CR) is used to represent a logic "1 ". Using this representation, the example binary encoding 421B of a "begin italics" tag element, 01100100, is encoded as the sequence: Sp- CR-CR-Sp-Sp-CR-Sp-Sp 431. In like manner, the binary representation of the offset associated with each tag element, and the binary representation of any parameters associated with each tag element are similarly encoded. By using "invisible" characters to encode the tag elements, their offsets, and any other parameters associated with the tag elements, a direct display of the encoded tab elements will merely produce blank spaces and blank lines at the end of the plain-text 310. Alternative encoding means for producing "invisible" sequences corresponding to the tag information will be evident to one of ordinary skill in the art. Illustrated in FIG. 4C, for example, is an encoding that uses four "invisible" characters to represent pairs of binary digits: a "space" (SP) represents a 00 pair, a "line feed" (LF) represents a 01 pair, a "tab" (Tb) represents 10, and a "carriage return" (CR) represents 11. Using this representation, the 01100100 42 IB representation of a "begin italics" tag element is encoded as the sequence: LF-Tb-LF-Sp 441.

FIG. 5 illustrates an alternative encoding method that provides an "invisible sequence" using potentially visible characters in combination with character codes that "erase" the potentially visible characters. As in the prior examples, the encoded file 500 of FIG. 5 comprises plain-text 510 followed by a tag section 520. The end 519 of the plain-text 510 and the start of the tag section 520 are delineated by a delineation sequence 521. In this example, the delineation sequence 521 comprises a thrice repeated sequence of a "blank" character followed by a "backspace" character. Note that the direct display of a blank followed by a backspace will not be "visible", and will not produce "white space" on the display. That is, the conventional "cursor placement" pointer will be advanced after producing the blank space, then decremented after producing the backspace, resulting in an effective stationary cursor placement pointer. In a printing device, the print head will advance to produce the blank space, then regress to effect a backspace. Following the tag section delimiter sequence 521 is an encoding of the first tag element offset and tag-type. As discussed above, the first tag in the message 100, the "begin- bold" tag, has an offset location of 32. In accordance with this example encoding method, the invisible sequence encoding 560 of this tag offset value comprises the number "32" 561 followed by two backspace characters 562. The tag element encoding 570 comprises the text string "<B>" 571 followed by three backspace characters 572. That is, the encoding of each tag-offset and tag-identifier is the text presented in FIG. 3 that identifies the changes to the plain-text 310 accompanied with a suitable number of backspace characters to erase each item. When the encoded file 500 is directly displayed on a conventional display device, the characters "32" 561 will appear briefly at the end of the plain-text 510, and the cursor placement pointer will be returned to the end of the plain-text 510 by the two backspace characters 562. The "<B>" characters 571 will then appear briefly at the end of the plain-text 510, then the cursor placement pointer will be returned to the end of the plain-text 510 by the three backspace characters 572.. In like manner, each item in the tag section 520 will be displayed briefly at the end of the plain-text 510, then immediately overwritten by the next item. At the end of the tag section 520, a final sequence 590 of five spaces and five backspaces are appended to erase any residual visual text. The number of spaces and backspaces in the final sequence 590 should equal the longest visible tag sequence length. On a display device, the final appearance of the overstruck characters will be the last characters to be overstruck, in this case, a sequence of blank characters. On a printing device, depending upon the degree of buffering and processing in the printing device, the visible characters may be typed, then struck over and over again as the print head is returned to the end of the plaintext for each item in the tag section 520 by the backspace characters. In some applications, the printing and overstriking of a few characters at the end of a plain-text message may be preferable to the printing of blank spaces and lines at the end of the plain-text message. Otherwise, the encoding presented with respect to FIG. 4 that uses all invisible characters would be preferred. In like manner, some legacy devices do not "process" backspace characters, displaying instead a symbol representing the backspace character. Therefore, if maximum compatibility with legacy devices is desired, the encoding presented with respect to FIG. 4 is also preferred.

A display application that is compatible with this format will process the data in the formatted file as plain-text until it encounters the tag section delimiter 521. Thereafter it will process each tag-offset— tag-type pair, ignoring the backspace characters, and appropriately enhancing the display of the plain-text in response to each tag element. FIG. 6 illustrates an alternative encoding scheme that encodes the tag element information "in-line", obviating the need to encode an offset parameter for each tag. The same backspace erasure method presented with regard to FIG. 5 is used for "invisibly" encoding each tag. That is, the encoded file 600 appears similar to a conventional rich-text format, except that, in accordance with this aspect of the invention, each tag element 650, 660 is immediately followed by an appropriate number of backspace characters 651, 661 that erase the tag element from view when the encoded file 600 is directly displayed. As in FIG. 5, a compatible application will process the encoded file 600 by applying the attributes indicated by each tag element, while ignoring the backspace characters associated with each tag element. As noted above, this alternative may not be suitable for a device that displays a symbol for the backspace character, or a printer that does not buffer and preprocess backspaces, because the overstriking of characters will produce visually disturbing artifacts. In these cases, for maximum compatibility with legacy devices, the alternatives of FIG. 4 is preferred. FIG. 7 illustrates an example block diagram of an encoder 700 that processes an input document 100 to produce an encoded file 780. The encoder 700 includes a parser

710, a tag encoder 720, and a file organizer and writer 730. The parser 710 distinguishes text elements in the input document 100 from tag elements. Text elements 712 are communicated to the file organizer and writer 730, and tag elements 714 are communicated to the tag encoder 720. The tag encoder 720 encodes the tag into a tag-type identifier, if it is not already thusly encoded. If the invisible-sequence feature of this invention is being employed, the tag encoder 720 also encodes the tag-type identifier into an invisible sequence, using, for example, one of the encodings presented above with regard to FIGs. 4-6. The encoded tag sequence 721 is communicated to the file organizer and writer 730. If in-line encoding is not being employed, the location of each tag element 711 in relation to the plain-text elements 712 is also communicated as an encoded offset, also using the techniques discussed above.

The file organizer and writer 730 prepares the text 712 and tag 721 information for storage in an encoded file 780. The term file is used in a general sense herein, meaning a composite sequence of data. It includes, for example, a file on a computer system, a sequence of bytes in memory, a sequence of packets that are communicated over a network, and so on. If an in-line encoding of invisible sequences is being employed, the file organizer and writer 730 merely writes the text elements 712 and encoded tag sequences 721 to the encoded file 780 in the order in which they appear in the input document 100, as discussed with regard to FIG. 6. If in-line encoding is not being employed, the text elements 712 are each written directly to the encoded file 780, followed by each of the encoded tag sequences and its offset, as discussed with regard to FIGs. 3-5.

FIG. 8 illustrates an example flow diagram for encoding an input document in accordance with the various aspects of this invention. At 810 the input message is opened for processing. As with the output file of FIG. 7, the input message may be of various forms: a computer file, an image from a display screen, a web page, an input from a keyboard, and so on. The block 820 parses the input message for text elements and tag elements. The block 820 may also include means for creating tag elements based upon the form of the input message. For example, if the input message is a scanned image, the block 820 may be a text recognition system that identifies the text content as well as its attributes such as bold, italic, and so on.

If, at 830, the next element in the input message is a tag element, the corresponding tag sequence is determined, at 836. If in-line encoding is not being employed, the block 836 includes the offset location for this tag element in the corresponding tag sequence. If the invisible encoding aspect of this invention is being utilized, the block 836 converts the tag element into an invisible sequence. If, at 840, in-line encoding is not being employed, the encoded tag sequence is temporarily stored for subsequent appending to the end of the plain-text section of the output file. If, at 840, in-line encoding is being employed, the invisible sequence corresponding to the tag element is communicated to the block 850 for writing to the output file in the order in which it appears in the input message.

If, at 830, the next element in the input message is not a tag element, the corresponding text sequence is determined, at 832, and communicated to the block 850 for writing to the output file. Typically, block 832 merely communicates the text elements directly to block 850 for writing to the output file, but if any reformatting of the text of the input message is required, such as a conversion into ASCII character codes, it can be performed at this block 832.

After the sequence corresponding to the element in the input message is written to the output file, at 850, or stored for subsequent use, at 842, the system loops back, via 860 to 820, to parse the next element, and this process is continued until the end of the input message.

If, at 870, the in-line formatting has not been used, a delimiter marking the start of the tag section is written to the output file, at 875, and each of the stored tag sequences and its offset location is written to the output file, at 878. As noted above, because these sequences and offsets are placed in the output file after all of the text elements, the direct display of the output file will result in a rendering of the textual content of the input message in an easy to read format. That is, if the output file is rendered for display by an application that is not "compatible" with the encoded format discussed herein, the initial section of the output file will still be rendered as a plain-text document, with no intervening visually-disturbing tag elements.

FIG. 9 illustrates an example block diagram of a compatible decoder 900 that operates in accordance with the various aspects of this invention. The decoder 900 process an encoded file 901 to produce a rendered output 980 that includes the attributes associated with each of the text elements corresponding to the input document that was used to produce the encoded file 901. The decoder 900 includes a parser 910, a tag decoder 920, and a display driver 930. As previously noted, the encoded file 901 may be a computer file, a sequence of bytes in a computer memory, a sequence of packets on a communications medium, and so on. In like manner, the terms display 980 and display driver 930 are used in a general sense to include conventional computer displays and printers, and will be recognized by one of ordinary skill in the art as including intermediate display means such as files, web pages, applets, wavelets, cookies, and so on that contain information for producing a rendering via rendering applications such as web browsers and other viewing means.

The parser 910 delineates the text elements from the encoded tags. If an in-line encoding of tags is employed, the parser 910 includes a tag recognition system that recognizes each encoded tag 911 as it occurs in the encoded file 901 ; otherwise, the parser 910 includes a tag section delimiter recognizer that serves to identify the end of text elements 912 and the start of tag elements 911. As noted above, techniques for distinguishing sections of files, or types of information data, are common in the art. The text elements 912 are provided directly to the display driver 930. The encoded tags sequences 911 are decoded by the tag decoder 920, and the decoded tag elements 921 are provided to the display driver 930.

If an in-line encoding of tag elements has been used, the display driver produces each text element in its appropriate rendered form immediately after receiving the tag element and text element. If an in-line encoding has not been used, the text elements 912 are displayed after any tag element 921 that may affect the particular text element 912 is processed. For example, the decoder 980 may use the encoded file 901 as the "buffered" location of the text elements, and extract the text elements 912 as it proceeds through the list of changes to be effected by the tag elements. In the example input message 100 of FIG. 3, for example, the parser 910 may be designed with multiple ports to the encoded file 901, one port accessing the beginning of the plain-text 310, and the other accessing the beginning of the tag section 320. When the first tag offset "32" 340 is received via the second port and decoded, the display driver 930 instructs the parser 910 to provide characters from the first port, up to the 32^nd character, and renders them unmodified to the output 980. The display driver 930 then effects a "bold" effect, based on the "B" tag 345 from the second port, and instructs the parser 910 to present subsequent characters from the first port, up to the 36^th character, as indicated by the "36" offset 350 on the first port. Each of these characters, from the 33^rd through the 36^th, are rendered using a bold effect. In response to the "/B" tag 355 from the second port, the display driver 930 disables the bold effect for subsequent characters from the first port. This dual access process, common in the art, continues until the end of the encoded file 901, producing an output 980 that includes the text and associated attributes representative of the input file that was used to create the encoded file 901.

The foregoing merely illustrates the principles of the invention. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are thus within its spirit and scope. For example, the encoded tag sequences have been presented as being placed at the end of the plain-text section of the encoded output file, similar to "end-notes" in a document. Alternatively, the encoded tag sequences may be placed at the end of each plain-text page or section, similar to "foot-notes", or "chapter-notes" in a document. The particular structure and sequences provided in this disclosure are intended for illustrative purposes. For example, the encoder 700 and decoder 900 are presented herein as stand-alone devices, for completeness. As would be evident to one of ordinary skill in the art, the principles of this invention can be embodied via pre- and post- processors that transform message that have been encoded using conventional encoders, such as used to create a MIME-format file. That is, the encoder 700 can be configured to accept a MIME-format file as an input, and transform only the rich-text segment of the MIME-format file using the principles of this invention. A corresponding decoder 800 will accept this encoding of the rich-text segment and recreate a full (plain-text plus rich-text) MIME-format file for rendering via a conventional MIME-compatible display device. In like manner, the display of the text via the decoder 800 has been presented as a rendering of the text with its attributes. Alternatively, for a rapid immediate display of information, the decoder 800 can be configured to render the plain-text section immediately to a display, then subsequently add the attributes corresponding to the tag elements. In this manner, for example, a download of an encoded document 901 from an Internet site will be presented immediately as plain-text, and then enhanced as time and bandwidth allow, similar to images that arrive without detail, and are then enhanced to reflect the detail. These and other system optimization techniques will be evident to one of ordinary skill in the art in light of this disclosure and are within the intended scope of the following claims.

Claims

CLAIMS:

1. A method of encoding a message (100), wherein the message (100) comprises multiple text elements (110) and at least one tag element (711) specifying an appearance of the message (100) upon display, the method comprises: encoding (710) the multiple text elements (110), encoding (720) the at least one tag element (711) and associating a tag-type (410) to a text element (101) of the multiple text elements (110) to form an encoded tag sequence (721); and clustering (730) the encoded multiple text elements (110) separate from the encoded tag sequence (721).

2. The method of claim 1, wherein the encoded tag sequence (721) is encoded as an invisible sequence of character codes.

3. The method of claim 2, wherein the invisible sequence of character codes corresponds to a sequence of invisible character codes (431) that correspond to a binary representation (42 IB) of the at least one tag element (711).

4. A method of encoding a message (100) that contains a plurality of text elements (110) and at least one tag element (711) for controlling a display of at least one text element (101) of the plurality of text elements (110), the method comprising: enabling an encoding (720) of the at least one tag element (711) of the message (100) into a corresponding invisible tag sequence of character codes (721), and enabling an encoding of each text element of the plurality of text elements (HO) into a visible sequence of character codes (712).

5. The method of claim 4, further comprising: enabling a determination of an offset (340) of the at least one tag element (711) corresponding to a location in the message (100) at which the at least one tag element (711) is located, enabling an encoding of the offset (561) of the at least one tag element (711) as an invisible offset sequence (560), enabling a formation of a cluster (510) of the encoding of each text element of the plurality of text elements (110), enabling an appending of the invisible offset sequence and the invisible tag sequence (520) to the cluster (510) of the encoding of each text element of the plurality of text elements (110).

6. An encoder (700) for encoding an input message (100) comprising a plurality of text elements (110) and at least one tag element (711) for controlling a display of at least one text element (101) of the plurality of text elements (110), the encoder comprising: a tag encoder (720) that encodes the at least one tag element (711) into an invisible tag sequence of character codes (721), a text encoder (710) that encodes each text element of the plurality of text elements (110) into a visible sequence of character codes (712).

7. The encoder (700) of claim 6, further including: a tag extractor (830) that determines an offset (842) of the at least one tag element (711) corresponding to a location of the at least one tag element (711) in the input message (100), an offset encoder that encodes the offset into an invisible offset sequence of character codes, and a file writer (730) that writes the visible sequence of character codes (712) corresponding to each text element of the plurality of text elements (110) to an output file (780) as a contiguous cluster of character codes, and writes the invisible tag sequence of character codes and the invisible offset sequence of character codes (721) to the output file (780).

8. An encoder (700) for encoding an input message (100) comprising a plurality of text elements (110) and at least one tag element (711) for controlling a display of at least one text element (101) of the plurality of text elements (110), the encoder comprising: a tag extractor (710) that determines an offset of the at least one tag element (711) corresponding to a location of the at least one tag element (711) in the input message (100), a tag encoder (720) that encodes the offset and the at least one tag element (711) into an encoded tag sequence (721) of character codes, and a file organizer (730) that clusters each text element of the plurality of text elements (110) as a contiguous cluster of plain-text character codes (510), and appends the encoded tag sequence (721) of character codes (520) to the contiguous cluster of plain-text character codes (510) to form an encoding (500) of the input message (100).

9. A decoder (900) for decoding an input message (901) comprising at least one invisible tag sequence of character codes (911) and at least one visible sequence of character codes (912), the decoder comprising: a tag decoder (920) that decodes the at least one invisible tag sequence of character codes (911) into a tag element (921 ), and a display driver (930) that renders the at least one visible sequence of character codes (912) in dependence upon the tag element (921).

10. The decoder (900) of claim 9, wherein the tag decoder (930) further decodes the invisible tag sequence of into an offset (340) corresponding to the tag element (921), and the display driver (930) renders the at least one visible sequence of character codes (912) in further dependence upon the offset (340).

11. A decoder (900) for decoding an input message (300) comprising a contiguous plain-text segment (310), and at least one tag sequence (320), the decoder comprising: a tag decoder (921) that decodes the at least one tag sequence (320) into a tag element (345) and a tag offset (340), and a display driver (930) that renders the contiguous plain-text segment (310) with an appearance that depends upon the tag element (345) and the tag offset (340).

12. The decoder (900) of claim 11, wherein the at least one tag sequence (320) is an invisible tag sequence of character codes (431), and the tag decoder (921) decodes the invisible tag sequence (431) into the tag element (345) and the tag offset (340).

13. An encoded message (500) corresponding to an original message (100) having multiple text elements (110) and at least one tag element (121) specifying an appearance of the original message (100) upon display, the encoded message (500) comprising: a contiguous plain-text segment (510) corresponding to the multiple text elements (110), and at least one encoded tag sequence (570) corresponding to the at least one tag element (121).

14. The encoded message (500) of claim 13, wherein the encoded tag sequence (570) is encoded as an invisible sequence of character codes.

15. An encoded message (500) corresponding to an original message (100) having multiple text elements (110) and at least one tag element (121) specifying an appearance of the original message upon display, the encoded message (500) comprising: a plurality of visible sequences of character codes (510) that correspond to the multiple text elements (110), and at least one invisible tag sequence of character codes (570) that corresponds to the at least one tag element (121).