WO1998021890A1 - System and method for receiving and rendering multi-lingual text on a set top box - Google Patents

System and method for receiving and rendering multi-lingual text on a set top box Download PDF

Info

Publication number
WO1998021890A1
WO1998021890A1 PCT/US1997/020858 US9720858W WO9821890A1 WO 1998021890 A1 WO1998021890 A1 WO 1998021890A1 US 9720858 W US9720858 W US 9720858W WO 9821890 A1 WO9821890 A1 WO 9821890A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
recited
glyph
hash
top box
Prior art date
Application number
PCT/US1997/020858
Other languages
French (fr)
Inventor
Rajesh Kanungo
Richard K. Motofuji
Original Assignee
Thomson Consumer Electronics, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US08/747,204 external-priority patent/US6141002A/en
Priority claimed from US08/747,207 external-priority patent/US5870084A/en
Priority claimed from US08/745,508 external-priority patent/US5966637A/en
Application filed by Thomson Consumer Electronics, Inc. filed Critical Thomson Consumer Electronics, Inc.
Priority to AU52587/98A priority Critical patent/AU5258798A/en
Publication of WO1998021890A1 publication Critical patent/WO1998021890A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • TITLE SYSTEM AND METHOD FOR RECEIVING AND RENDERING MULTI ⁇
  • This invention relates to digital television systems and more particularly to the receiving and rendering of multi-lingual text on set top boxes of digital television systems
  • Digital television systems are capable of displaying text and graphic images in addition to typical video program streams
  • An example of digital television services which make use of text and graphic image display is interactive television
  • Proposed features of interactive television accommodate a vanety of marketing, entertainment and educational capabilities such as allowing a user to order an advertised product or service, compete against contestants m a game show, or request specialized information regarding a televised program
  • the interactive functionality is controlled by a set top box connected to a television set
  • the set top box executes an interactive program wntten for a television broadcast
  • the interactive functionahty is displayed upon the television set screen and may include icons or menus to allow a user to make selections via the television ' s remote control
  • program guides are advertising tools used by program providers may desire to mclude descnptions of television programs in multiple languages, such as English mixed with Japanese
  • end users may receive data from non-native regions, such as a
  • a language in this context, may be be able to use the same set top box to receive textual information in more than one language That is, it is desirable that the user not have to buy a different set top box to receive textual information m each different language
  • a language in this context, may be be beed as a written system of representing thoughts, ideas, actions, etc
  • a language includes, inter aha, a grammar, characters, and words
  • the characters and symbols used m wnting a language are commonly referred to as a "writing system' " , or " scnpt"
  • Many languages, such as Western European languages, are written with alphabetic and numenc characters
  • Japanese for example, is wntten with phonetic Hiragana and Katakana characters as well as alphabetic and numenc characters from Western languages and the ideographic Kanji characters which are largely taken from the Chinese language
  • the sc ⁇ pts of many languages may share common characters, as m the Western European languages
  • the textual information received by a set top box includes strings of characters
  • a "character” is an atomic symbol in a writing system In alphabetic languages, this symbol consists of a single letter of the alphabet
  • ideographic languages such as Chinese and Japanese
  • a character could be alphabetic
  • a "character set” is a group of characters used to represent a particular language or group of languages
  • a "character encoding” is a system for
  • Latin character encoding which is used to represent many of the alphabetic languages in the world ISO Latin includes a Basic Latin portion range of values (0 - 127) and an Extended Latin portion (values 128- 255)
  • JIS Japanese Industrial Standard
  • a character encoding which enables the representation of characters from many different languages and character sets using a single encoding scheme is referred to as a multi-hngual" character encoding
  • An example of a multi-lingual character encoding is the EUC (Extended UNIX Code) character encoding standard EUC is typically used to represent ideographic Asian languages in the UNIX environment EUC combines single byte ASCII characters with multi-byte ideographic character encodings However, EUC allows only a few languages to be encoded at a time
  • scnpts combine characters to form composed characters whose shape is determined by the relative positions of the characters, l e , the context of the characters Examples of these “contextual scnpts” are scnpts for the Arabic, Hebrew, Thai, and all Indie languages In contrast, "non-contextual scnpts", such as the Roman alphabet used in Western languages, represent each character as a separate object of fixed shape, independent of the position in a word and of the neighbonng characters
  • Each character of a character set lias a unique shape which distinguishes it from other characters in the character set, that is, which allows a reader to distinguish the character from other characters and thus unambiguously convey information
  • the shape assigned to a particular character is referred to as the "glyph" of the character
  • the English letter 'A for example, has a unique glyph which makes it recognizable from other characters
  • Glyphs may have a particular style associated with them That is, an English 'A' may be wntten in many different styles, such as in a block style or a calligraphic style However, the style maintains the basic shape of the character such that the glyph is still recognizable as an 'A ' A collection of glyphs slianng a common style is referred to as a "font " Examples of common fonts are Couner, Times Roman, and Helvetica
  • a vanety of glyph representation schemes exist A common scheme is a bitmap glyph, or font
  • the glyph of a given character includes a sequence of bits corresponding to an array of pixels on a display screen Each bit indicates if the corresponding pixel is to be illuminated or not based on the value of the bit
  • the pixel array has a charactenstic width and height
  • a glyph may be 24 pixels wide and 24 pixels high
  • 576 bits, or 72 bytes, of storage are required to store the glyph
  • tire glyphs in a font are the same number of pixels in width, the font is said to be a non- proportional font
  • the font is said to be a proportional font
  • Another common glyph representation scheme is an outline font A property of outline fonts is that they typically facilitate scaling and rotating
  • a set top box receives text encoded according to a character encoding and displays the text on a television
  • the act of processing the image of a character, I e , the glyph associated with the cliaracter, and displaying the character is referred to as "rendenng"
  • a rendenng program must use font type information, size information, and potentially contextual information in order to properly render a given cliaracter in a given scnpt
  • Korean pose particular problems in the context of text processing and rendenng in digital television systems
  • One problem is the large time to search through such a large set of characters to find a glyph associated with a given code point
  • the combined Chinese, Japanese, and Korean character sets constitute over 120,000 characters
  • the amount of memory required to store fonts and/or transmission bandwidth required to transmit fonts may be costly
  • set top boxes are a commodity item
  • a multi-lingual capable set top box which costs significantly more than a uni-lingual set top box may not be accepted readily m the market place
  • the set top box must deliver performance which is acceptable at a given cost
  • the factor of cost versus performance figures m to the design of a set top box Two components of a typical set top box which have a large beanng on its cost are its memory and processor If multiple languages are supported, particularly if the languages have a large number of characters, such as Chinese, Japanese, or Korean, a large amount of memory may be required to store the fonts for the languages
  • More powerful processors provide higher performance of functions such as character lookup and rendenng, but at a greater cost SUMMARY OF THE INVENTION
  • the system compnses a set top box which is configured to receive text, the characters of which are encoded according to a multi-lingual character encoding standard, "Unicode"
  • the set top box is further configured to process the Unicode text, and render the text for display on a television coupled to the set top box
  • the set top box is configured with an operating environment which accepts language-specific glyph sets to be modularly "plugged in” to the set top
  • One or more glyph sets can be plugged into the set top box to support one or more languages as desired Glyphs or glyph sets may be downloaded into the set top box along with the application program in the event that a given glyph is not present in the set top box
  • the set top box may also employ an improved hashing method for efficiently stonng and quickly retrieving characters of a language with large number of characters, such as Japanese
  • An application developer develops an application program, such as an interactive TV program, using development tools and libranes such that the textual data in the application program are Unicode characters
  • the textual data is included in a resource file, which is separate from the instructions of the application program
  • a broadcast center mixes the application program, including the resource file, with a digital audio/video data stream
  • the audio/video stream includes the data for playing the television program or commercial to be shown on the user's television
  • the audio/video stream is compressed using a compression algorithm such as one of the Motion Picture Expert Group (MPEG) compression standards
  • MPEG Motion Picture Expert Group
  • the broadcast center transmits the data stream to the set top box
  • the data stream is transmitted by a suitable transmission mechanism, such as via satellite or coaxial cable
  • the set top box receives the stream of digital data from the broadcast centei
  • the set top box demultiplexes the audio/video stream portion from the application program and stores the application program m local memory of the set top box
  • the set top box decompresses the audio/video data stream for display on the television
  • a processor in the set top box executes the application program
  • the operating environment running on the set top box is configured to manage the different tasks, such as the application program, which are executed by the set top box
  • the operating environment includes an interpreter which interprets code instructions which are processor independent
  • the application program is interpreted by the interpreter
  • the interpreter mcludes a Unicode encoding engine which includes library functions for manipulating and printing Umcode character strings
  • the application program calls the Umcode character suing functions to perform string manipulations such as determining Unicode suing lengths, copying Unicode stnngs and connecting Unicode strings
  • the application program also calls suing display funcuons of the Unicode engine
  • the interpreter further compnses a language detector
  • the Unicode engine invokes the language detector to determine a language associated with a given character of the Unicode string
  • the Umcode engine uses the language and the font set by the application program to determine which of the one or more glyph sets of the set top box includes the glyph for the character
  • the interpreter further includes one or more rendenng engines for rendenng glyphs of a given language and font
  • Glyphs or glyph sets may also be downloaded to the set top box as needed If the application is configured to display a glyph which is not present in the set top box, 1 e , not plugged-in to the set top box, the glyph may be downloaded along with the application to the set top box
  • the Unicode engine detects a condition where a glyph referenced by the application is not burned in to the set top box, and searches a list of downloaded glyphs to detect the presence of the referenced glyph If the Umcode engine detects the presence of the downloaded glyph, the Umcode engine invokes the appropnate rendenng engine to render the downloaded glyph
  • Each rendenng engine is configured to render suings of characters according to the rendenng rules for its particular language and font For example, a rendenng engine for a contextual language knows how to render characters in a suing based on the context of each character Furthermore, a rendenng engine may have specific knowledge about the standards of a given region, such as regarding time, date, and currency symbols Furthermore,
  • the glyph sets are preferably arranged in a manner conducive to efficient storage and retrieval of the glyphs in the glyph set, according to the charactenstics of the language associated with the glyph set
  • Glyph sets for languages with a large number of characters may be stored and reuived using a hash table according to a hashing method
  • the hash method may yield a relatively small maximum number of collisions with a large percentage of the code points hashing to elements with approximately half the maximum number of collisions or less
  • Each glyph set has an associated rendenng engine
  • the Umcode engine invokes the appropnate rendenng engine to process and render each Umcode suing of the text
  • a rendenng engine renders a character by receivmg a glyph associated with a Umcode character and populating a pixel map according to the glyph information
  • a pixel map is a string of bits indicating the state of each pixel in an array of pixels
  • rendenng the glyph includes copying the glyph bit map to the appropnate location in memory
  • the pixel map may further include other property information, such as color
  • the rendenng engine processes and renders characters of the suing until the rendenng engine encounters a character which does not belong to its language If the rendenng engine did not process the entire string, the Umcode engine updates the suing pointer to point to the next character in the string which was not processed by the rendenng engine, invokes the language detector to determine the language associated with that character, and invoke
  • the rendenng engines pass the pixel maps to a graphics dnver which conttols the video hardware of the set top box
  • the graphics dnver provides the pixel maps to the video hardware of the set top box such that the text is displayed mthe appropnate coordinates on the television display screen
  • the set top box multiplexes the decompressed audio/video stream with the rendered text and displays the audio/video mfonnaUon and rendered text on the television
  • the television system and method may advantageously provide a means for receiving and rendenng text in multiple languages, and do so in a manner which maximizes code reusability thus minimizmg development and maintenance time and cost by providing the ability to process text including characters in a umversal character encoding
  • the system and method may further minimize the broadcast bandwidth required to receive and render multiple languages by providing pluggable language-specific modules
  • Fig 1 is a block diagram of a television system according to the present invention.
  • Fig 2 is a block diagram of the set top box of the system of Fig 1,
  • Fig 3 is a block diagram illustrating the flow of data in the system of Fig 1 ,
  • Fig 4 is a flowchart illustrating steps taken in developing and transmitting an application program in the system of Fig 1.
  • Fig 5 is a block diagram of the software modules of the set top box of Fig 2,
  • Fig 6 is a block diagram illustrating in more detail portions of the interpreter of Fig 4
  • Fig 7 is a flowchart illustrating steps taken in receiving and rendenng multi-lingual text m the system of Fig 1,
  • Fig 8 is a flowchart illustrating in more detail the step of processing text in Fig 7,
  • Fig 9 is a block diagram illustrating the inputs and outputs of a rendenng engine of Fig 5,
  • Fig 10 is a block diagram illustrating data structures used in the hashing method of the present invention according to the preferred embodiment
  • Fig 11 is a block diagram illustrating data structures used in the hashing method of the present invention according to an alternate embodiment
  • Fig 12 is a block diagram illustrating data structures used in the hashing method of the present invention according to an alternate embodiment
  • Fig 13 is a flowchart illustrating steps taken to efficiently store a character set in the set top box of Fig 2,
  • Fig 14 is a flowchart illustrating steps taken to quickly retneve a cliaracter from the set top box stored according to the method of Fig 13
  • a block diagram of a television system 10 compnses a broadcast center 12 which transmits a stream of digital data to a set top box 18, also refened to as a digital interactive decoder (DID)
  • the broadcast center 12 transmits the digital data stream to a satellite 14 which transmits the digital data stream to an antenna 16 coupled to the set top box 18
  • the broadcast center 12 transmits the digital data stream to the set top box 18 via a cable, such as a coaxial or fiber optic cable
  • the set top box 18 receives the digital data stream from the antenna 16 or cable and displays a portion of the information from the digital data stream on a television 20 coupled to the set top box 18
  • the set top box 18 receives user mput from a remote control 28 Preferably, the set top box 18 provides a portion of the user input to a transaction server 26 For example, the set top box 18 may display a menu for ordenng a product, such as a hammer The user may provide input indicating the desire to purchase the hammer The set top box 18 provides the purchase information to the transaction server 26 which forwards the purchase information to the hammer manufacturer or distnbutor so that the product may be distnaded and billed to the user
  • the digital data stream compnses an audio/video portion and an application program portion
  • the audio/video portion compnses the digital audio and video information for a television program or television commercial to be displayed on the television 20
  • the audio/video stream is compressed using a common compression algonthm such as MPEG 2
  • the application program portion of the digital data stream compnses instructions and data to be executed on the set top box 18
  • the application program is configured to display text on the television 20 which is coordinated with the television program or television commercial of the audio/video data stream displayed on the television 20
  • the application program may execute instructions to display a menu for ordenng the hammer
  • the audio/video data stream portion and the apphcation program portion are mixed together, preferably in the broadcast center 12, to produce the digital data stream transmitted to the set top box 18
  • the application program includes textual information such as a menu, including stnngs of characters in multiple languages
  • the set top box 18 is configured to receive the application program and process the st ngs of characters of the application program and render the characters for display on the television 20
  • a video cassette recorder (VCR) 24 may also be coupled to the set top box 18
  • the set top box 18 may control the VCR 24 to perform actions according to application programs downloaded to the set top box 18
  • An example is the set top box 18 controlling the VCR 24 to perform automated recording
  • a computer 22 may also be coupled for communication with the set top box 18
  • the computer 22 may also download application programs to the set top box 18 Further, the set top box 18 may use resources of the computer 22, such as a hard disk, as permanent storage
  • the computer 22 may be locally connected, such as through a senal connection, or remotely connected via a telephone line
  • Set top box 18 compnses CPU 40 coupled to a read-only memory (ROM) 30
  • the ROM 30 includes instructions and data for executing on the CPU 40
  • a random access memory (RAM) 32 is coupled to the CPU
  • the RAM 32 is used for stonng program vanables for the program instructions contained m the ROM 0
  • the RAM 32 is also configured to store the application program received from the broadcast center 12 (of Fig 1)
  • FLASH memory 34 is also coupled to the CPU 40 and contains program instructions for execution on the CPU 40 and/or cliaracter glyphs used in rendenng text characters on the television 20 (of Fig 1)
  • the CPU 40 compnses a microprocessor, micro-controller, digital signal processor (DSP), or other type of software instruction processing device
  • the CPU 40 fetches instructions from the ROM 30, RAM 32, and/or FLASH 34 and executes the instructions
  • the ROM 30 compnses read only memory storage elements as are well known in the art of solid state memory circuits
  • the ROM 30 compnses read only memory storage which is programmed and plugged in to the set top box 18
  • the RAM 32 compnses dynamic random access memory (DRAM) or static random access memory (SRAM) storage elements as are well known m the art of solid state memory circuits
  • the FLASH memory 34 compnses wntable permanent memory storage elements as are well known m the art of sohd state memory circuits
  • the FLASH 34 compnses memory storage winch may be programmed, l e , wntten, dunng operation of the set top box 18
  • a secunty device 36 is also coupled to the CPU 40 for providing authentication and signature functionality
  • the secunty device allows the enabling or disabling of application program downloading to the set top box 18
  • a communications port 42 is coupled to the CPU 40 and is configured to provide communication with other devices such as the computer 22 (of Fig 1), the VCR 24 (of Fig 1), or other devices such as a keyboard
  • a remote control port 44 is coupled to the CPU 40 and is configured to receive remote input such as from the remote control 28 (of Fig 1) or from a front panel on the set top box 18 Preferably, the remote port 44 compnses an infra-red receiver for receiving infra-red signals from the remote control 28
  • a modem 46 is coupled to the CPU 40 and is configured to provide communication between the set top box 18 and the transaction server 26 (of Fig 1)
  • a demultiplexer 38 is coupled to the RAM 32 and is configured to receive the digital data sitesam from a receiver 50 coupled to the demultiplexer 38 The receiver 50 receives the digital data stream from the broadcast center and communicates the digital data stream to the demultiplexer 38
  • the demultiplexer 38 is configured to demultiplex the application program from the audio/video data stream of the digital data stream received from
  • the broadcast center 12 receives an audio/video stream 66 and an application program 64, multiplexes the audio/video stream 66 and the application program 64, and transmits the multiplexed data stream to the set top box 18
  • the language specific textual information of the application program 64 is included in a resource file 62 which is transmitted as part of the application 64 to the set top box 18
  • the application 64 advantageously provides a means whereby changing the textual information from one language to another requires only modification to the resource file 62 rather than to the application 64
  • the set top box 18 receives the digital data stream and the demultiplexer 38 of the set top box 18 demultiplexes the application 64 from the audio/video stream 66a
  • the audio/video stream 66 may have been compressed according to a lossy compression algonthm, hence the decompressed audio/video stream 66a may be in some manner different from the initially transmitted audio/video stream 66
  • the application program 64 is executed by an operating environment 70 of the set top box 18
  • the apphcation program 64 executes within the operating environment 70 to display rendered textual information which is received by the video multiplexer 48 (of Fig 2) along with the audio/video stream 66a
  • Video multiplexer 48 multiplexes the rendered text information mthe audio/video stream 66a and provides the multiplexed information to the television 20 for display on the television 20
  • Fig 4 a flowchart illustrating steps taken in developing and transmitting an apphcation program m the system of Fig 1 is shown.
  • An application program developer develops an apphcation such as the apphcation 64 of Fig 3 in step 102
  • the apphcation program developer does not include the textual information to be displayed on the television 20 (of Fig 1) in the apphcation program, but rather places the textual information m a resource file such as the resource file 62 of Fig 3 , and mcludes in the application program 64 references to the textual lnfonnation
  • the application programmer creates a resource file 62 in step 104
  • the resource file 62 includes formatted chunks of data which may be attached to the application program 64 to avoid embedding the data directly into the apphcation program 64
  • the resource file 62 advantageously simplifies maintenance and modification of the apphcation program 64 smce the data may be changed m the resource file 62 without modification to the tested and debugged application program 64
  • the textual information may be contained in the application program 64 itself
  • the textual information compnses stnngs of characters wherein the characters are from the Umcode character set
  • the application developer creates the application 64 and/or resource file 62 using a Umcode-capable text editor, or some other suitable Unicode-capable tool
  • Umcode is a multi-lingual character encoding which attempts to include all current wntten scnpts for all current languages
  • Each character in the Umcode character set is represented by a 16-bit value or code point, thus allowing a character set of 65,536 characters
  • Umcode is part of the ISO 10646 standard ISO/TEC 10646-1 1993 (E) defines the Umcode standard and is hereby incorporated by reference
  • creating the resource file in step 1 4 compnses optionally including one or more glyphs for particular characters referenced by the application program to be displayed on the television 20 If the glyph of a particular cliaracter is not already part of a glyph set of the set top box 18, the glyph may be downloaded to the set top box 18 with the application program 64
  • the resource file 62 of the system advantageously provides a means for providing glyphs to the set top box 18
  • the means of downloading a glyph to the set top box 18 is advantageous for rendenng characters which are infrequently used or special characters, thus allowing the saving of memory storage within the set top box and potentially reducing the cost of the set top box 18
  • FIG. 5 a block diagram of the software modules of the set top 18 box of Fig 2 are shown.
  • the application program 64 (of Fig 3) commumcates with the operating environment 70 (of Fig 3) which m turn commumcates with the set top box hardware 18 to display textual information of the application program 64 on the television 20
  • the operating environment 70 compnses device dnvers 76 for communicating with and controlling the set top box hardware 18
  • a microkernel 72 provides system services to the vanous components of the operating environment 70 such as memory management, task management, and commumcation between tasks, such as the apphcation program 64, and the device dnvers 76
  • the application program 64 compnses instructions which may be interpreted by an interpreter 74
  • the interpreted instructions in the application program are refened to as o-code and the interpreter 74 is an o-code interpreter o-code compnses a stack based instruction set
  • the interpreter 74, the microkernel 72, and the device dnver 76 of the operating environment 70 reside in the ROM 30 (of Fig 2) of the set top box 18
  • interpreter 74 interpreting the o- code of the application 64 provides a means of developing applications 64 which are mdependent of the underlying CPU 40 (of Fig 2) hardware
  • the interpreter 74 includes function libranes which are accessible by the application program 64 for performing functions such as allocating memory, mampulating memory, and providing user interface management
  • FIG 6 a block diagram illustrating in more detail portions of the interpreter 74 of Fig 4 are shown
  • An application program 64 executing on the set top box 18 commumcates with the interpreter 74 which in turn commumcates with a graphics dnver 76a of the device dnvers 76 (of Fig 5) to process and render Umcode text for display on television 20 (of Fig 1)
  • the interpreter 74 compnses a Umcode encoding engme 84
  • the Umcode encoding engine 84 provides functions which the application program 64 invokes to perfonn numerous stnng mampulation functions such as detenmmng the length of a Umcode suing, copying a Umcode stnng from one location to another, concatenating two Umcode stnngs together, companng two Umcode stnngs to determine if the Umcode stnngs are identical, and searching a Umcode stnng for an occunence of a particular Umcode character wifhm the Umcode stnng
  • a Umcode stnng is defined as one or more Umcode characters terminated by a null Umcode character
  • the Umcode encoding engine 84 further compnses functions winch the application program 64 invokes to set the cunent font of the Umcode text to be rendered
  • the Umcode encoding engme 84 further compnses functions which the application 64 invokes for displaying Umcode text on the television 20
  • the Umcode encoding engine 84 invokes a language detector 82 m order to determine a language associated with characters of the Unicode text received from the application program 64
  • the language detector 82 informs the Umcode encoding engine 84 of the language associated with the Umcode character passed to the language detector 82 by the Umcode encoding engine 84
  • the interpreter 74 compnses one or more glyph sets 94a-94f refened to collectively as 94
  • the Umcode encoding engine 84 uses the language lnfonnation returned by the language detector 82 along with font information set by the application program 64 to determine which one of the glyph sets 94 mcludes a glyph for descnb g the particular Unicode character to be rendered
  • the interpreter 74 further compnses one or more rendenng engines 92a-92n, refened to collectively as 92
  • Each of the rendenng engines 92 is configured to render Umcode characters conespondmg to a particular language and/or font
  • the rendenng engines 92 receive glyph information from the glyph sets 94 in order to render Umcode characters
  • the Umcode encoding engme 84 mvokes the appropnate rendenng engine from the rendenng engines 92 configured to render the particular glyph from one of the glyph sets 94 corresponding to the given Umcode character to be rendered
  • a given rendenng engine 92 may be configured to render glyphs from a plurality of glyph sets 94
  • a rendenng engine 92 which renders bitmap glyphs of a fixed pixel height and pixel width for non-contextual languages which render characters from left to nght may render characters for most of the Western European languages
  • the Umcode encoding engine 84 is further configured to determine the absence of a glyph in the glyph sets 94 for a given character in a language and detect the presence of a downloaded glyph 97 corresponding to a given Umcode character to be rendered
  • the downloaded glyphs 97 are downloaded to the set top box 18, preferably in the resource file 62 (of Fig 3) along with the application program 64
  • the downloaded glyphs 97 are placed in the RAM 32 (of Fig 2) m a list for access by the Umcode encoding engine 84 and rendenng engines 92 m rendenng Umcode characters not present in the glyph sets 94
  • the interpreter 74 may contain as few as one glyph set 94 and one rendenng engine 92
  • glyph sets 94 may be modularly added to the interpreter 74 as required to accommodate vanous languages and fonts
  • the set top box 18 (of Fig 2) may be configured with three glyph sets a glyph set for a 16 point "Couner" English font, a 24 point Couner English font glyph set, and 24 point Japanese font glyph set
  • the set top box 18 is further configured with a rendenng engine for rendenng each of the three glyph sets
  • the set top box 18 may be tailored specifically to support the desired languages in a given geographic locale Tins localization advantageously enables the same set top box hardware 18 and large portions of the operating environment 70 to be reused without modification Thus, development time and resources are decreased and the cost of the set top box hardware is reclaimed
  • the rendenng engines 92 invoke the graphics dnver 76a to display the rendered text on the television 20
  • the graphics dnver 76a interacts with the video hardware of the set top box 18 to display the rendered text along with the audio/video data of the television program or commercial on the television 20
  • the interpreter 74 further compnses an 8-bit encoding engine 86 for handling 8-bit cliaracter encoding stnngs such as ASCII text
  • the application program 64 invokes stnng functions of the 8-bit encoding engine 86 in order to manipulate and display 8-bit character encoding characters on the television 20
  • the 8-bit encoding engine 86 performs functions similar to the Umcode coding engine 84 but for 8-bit encoded character stnngs rather than for Umcode encoded character stnngs
  • Each of the rendenng engines 92 is configured to render Umcode text according to rendenng rules for the particular language associated with each of the rendenng engines 92
  • a rendenng engine associated with a language which is a contextual language has knowledge about how to render characters of a suing based on the context of the given character
  • an Arabic rendenng engine contains knowledge about particular kerns or ligatures used in connecting Arabic characters based on neighboring characters
  • a rendenng engine has specific knowledge regarding the direction in which characters are rendered For example, a Hebrew rendenng engine renders characters from nght to left, whereas a
  • French rendenng engme renders characters from left to nght
  • rendenng engines have specific knowledge about standards of a given locale such as standards for displaying times, dates and currency symbols, for example
  • a glyph set from one of the glyph sets 94 compnses a plurality of glyphs organized in a manner optimized for the particular language or set of glyphs in the glyph set
  • the glyph sets are organized to optimize the time required for look-up of a given Umcode character, as well as to optimize the amount of storage required in order to store the glyph set
  • two different glyph sets may be organized in two different manners
  • a glyph set compnsing a relatively small number of glyphs may be arranged as a simple indexed array
  • a glyph set compnsing a relatively large number of characters such as a Japanese, Chinese, or Korean glyph set may be arranged in a more sophisticated manner such as by using a hash table
  • glyph sets for contextual languages may include multiple tables according to context
  • the glyph sets may also be arranged according to glyph representation such as bit-mapped glyphs, outhne glyphs, stroke glyphs, etc
  • the set top box 18 receives a digital data siteam including an apphcation program and audio/video information from the broadcast center 12 (of Fig 1 ) in step 110
  • the demultiplexer 38 demultiplexes the application program 64 (of Fig 3) from the audio video stream 66 (of Fig 3) and stores the application program 64 m the RAM memory 32 (of Fig 2) m step 112
  • the operating environment 70 determines if a resource file 62 (of Fig 3) is present with the apphcation 64, and if so, determines if the resource file 62 includes glyphs for rendenng Umcode characters in step 113 If so, the operating environment 70 places the downloaded glyphs into a list of downloaded glyphs 97 (of Fig 6) for future use by one of the rendenng engines 92n (of Fig 6) m step 115
  • the operating environment 70 executes the application program 64 on the CPU 40 (of Fig 2) in step 114
  • the application program 64 calls functions of the operating environment 70 to manipulate and display text on the television 20 (of Fig 3)
  • the text is encoded according to the Umcode character encoding
  • the Umcode text is contained within the resource file 62 (of Fig 3) of the application program 64
  • the application program 64 compnses references to the Umcode text contained in the resource file 62
  • the Umcode encoding engme 84 receives the Umcode text from the apphcation program 64 in step 116
  • the Umcode text received by the Umcode encoding engine 84 compnses one or more Umcode text stnngs
  • the Umcode encoding engine 84 receives the Umcode text from the resource file 62
  • the Umcode encoding engine 84 determines whether more Umcode stnngs exist in the Umcode text received m step 120 If no more Umcode text stnngs exist, the text has been processed If more stnngs exist the Umcode encoding engine 84 sets a current character vanable to reference the first character in the cunent stnng to be processed in step 122 The Umcode encoding engme 84 then determines whether or not more characters exist m the current Umcode suing in step 124 If no more Umcode characters exist in the current stnng, the Umcode encoding engine 84 returns to step 120 to determine if any more stnngs exist in the text If more characters exist the cunent suing as determined in step 124, the Umcode encoding engine 84 invokes the
  • the Umcode encoding engine 84 invokes the rendenng engine by passing a reference to the current stnng to the rendenng engine
  • the rendenng engine renders characters in the stnng as long as each character encountered is a character m the language associated with the rendenng engine m step 130
  • the rendenng engme stops rendenng characters of the Umcode stnng and returns to the Umcode encoding engine 84 lnfonnation regarding which portion of the Umcode stnng was rendered by the rendenng engine
  • the Umcode encoding engine 84 uses the information returned by the rendenng engine concerning which characters of the stnng were rendered by the rendenng engine to assign the cunent character vanable to reference the character after the last character rendered by the rendenng engme
  • the Umcode encoding engine then returns to step 124 to determine if more characters exist in the current stnng to be rendered
  • each character in the Umcode text received is rendered for display on the television 20 according to the steps of the flowchart of Fig 8
  • the steps advantageously enable the set top box 18 to process and render Umcode text compnsing characters of different languages
  • FIG 9 a block diagram illustrating the operation of a rendenng engine such as the rendenng engines 92 (of Fig 6) is shown
  • a rendenng engine 92a representative of the rendenng engines 92 receives a Umcode cliaracter 140 and glyph set mfonnation 94a representative of the glyph sets 94 (of Fig 6) and generates a pixel map of the rendered character 95
  • the rendenng engme 92a receives the code pomt of the Umcode character 140 and uses the code point of the Umcode character 140 to access a conesponding glyph m the glyph set 94a which descnbes the Umcode cliaracter 140
  • the pixel map 95 compnses a stnng of bits indicating the state of each pixel in an array of pixels such as the pixels of a television screen
  • the state of a pixel is either on or off
  • a reference to the pixel map 95 is passed to the graphics dnver 76a (of Fig 6) and the graphics dnver 76a uses the pixel map to display the rendered character on the television 20
  • the rendenng engine 92a takes the descnption of the Umcode character 140 from the glyph of the glyph set 94a representing the Umcode character 140 and generates pixels in the pixel map 95 for display of the Umcode character 140
  • rendenng the glyph typically compnses copying the stnng of bits compnsing the glyph to the pixel map
  • this is true in the case of a non- contextual language
  • the rendenng engine 92a modifies the bitmap contained in the glyph to modify or create ligatures or kerns of the glyph based on the context, I e , the neighbonng characters m the stnng, to produce a modified glyph in the form of the pixel map 95
  • the rendenng engine 92a uses the outhne information to generate the pixel map 95
  • the rendenng engine 92a uses the outline information from the glyph along with onentation and sizing information to render the character and produce the pixel map 95
  • the pixel map 95 may be further modified by other portions of the operating environment 70 to include other properties such as color information in the pixel map 95
  • Font object 150 is a representative font object
  • the font object 150 is an object according to the notion of objects in object-onented programming
  • the font object 150 compnses methods and data associated with the object
  • the font object 150 compnses a reference to a rendenng engine 92a of the rendenng engines 92 (f Fig 6)wh ⁇ ch is a method of the font object 150
  • the Umcode encoding engine 84 uses the language information from the language detector 82 (of Fig 6) to determine which font object is associated with the language and cunent font of the application 64 (of Fig 3) in order to invoke the rendenng engine 92a of the font object 150
  • the font object 150 further compnses a reference to an anangement of a glyph set, such as glyph set 94a ofF ⁇ g 6 whichis data of the font object 150
  • Fig 10 illustrates the anangement of a glyph set using a hash table 160 and the font object 150 includes a reference to the hash table 160
  • the glyph set arrangement illustrated in Fig 10 is particularly useful for efficiently stonng and quickly reuieving Umcode characters for languages with a large number of characters, such as Japanese, Chinese, or Korean
  • the rendenng engine 92a uses the glyphs 180a-180n of the glyph set 94a to render Umcode characters whose glyphs are present in the glyph set 94a
  • the hash table 160 includes an array of hash table elements 162a-162n, refened to collectively as 162 Hash table entry 162a will be referred to as a representative hash table entry
  • the hash table 160 is indexed according to indexes calculated by a hashing method upon Umcode code points descnbed infra
  • the rendenng engine 92a calculates a hash table mdex and uses the index to calculate the appropnate hash table entry
  • Each hash table entry 162a compnses a hash bin list reference 164a-164n, referred to collectively as 164, and a hash bin count 166a-166n, referred to collectively as 166 Hashbm list reference 164a will be referred to as a representative hashbm list reference Hash bin count 166a will be referred to as a representative hashbm count
  • Each hash bin hst reference 164a references a list of hash bins from the array of hash bins 172a- 172n, referred to collectively as 172 Hash bin 172a will be referred to as a representative hash bm.
  • the hash bin count 166a indicates the number of hash bins in the list of hash bins referenced by its associated hash bin list reference 164a
  • Fig 10 shows example hash bm counts, e g , hash bm count 166a is 4 and associated hash bin list reference 164a references a hash bm list compnsing hash bins 172a- 172d, the first of which is hash bin 172a
  • the hash bin count 166a enables the rendenng engine 92a to search a hash bm list and determine when the end of the hash bm list has been reached
  • Each hash bm 172a compnses an encodmg value 174a, which is representative of encoding values 174a-174n, and a glyph reference 176a, which is representative of glyph references 176a-176n
  • the encodmg value 174a is the code point for a Umcode character
  • Fig 11 a block diagram illustrating data structures used in the hashing method according to an alternate embodiment is shown
  • the embodiment of Fig 11 is similar to that of Fig 10 and conesponding elements are numbered identically for simplicity and clanty
  • the structure of the embodiment of Fig 11 is similar to that of Fig 10 except that the glyph 180a for each Umcode character is mcluded in the hash bm 172a associated with the Umcode character rather than the hash bin 172a having a reference to the glyph 180a That is, the glyph set 94a is distnubbed among the hash bins 172
  • Fig 11 has the advantage of using less memory storage space due to the absence of the glyph reference 176a of Fig 10, but the potential disadvantage of having the glyph set distnaded among the hash bins 172
  • FIG 12 a block diagram illustrating data structures used in the hashing method according to an alternate embodiment is shown
  • the embodiment of Fig 12 is similar to that of Fig 11 and conesponding elements are numbered identically for simplicity and clanty
  • the structure of the embodiment ofFig 12 is similar to that of Fig 11 except that the hash bm lists comp ⁇ se linked lists of hash bins, rather than sequentially arranged lists in an array of hash bins according to Fig 11
  • Each hash bm 172a further compnses a next bin 190a field used to create the linked list of hash bins
  • the next bin 190a field refers to the next hash bin in the hash bin list or contains a NULL value indicating the end of the hash bin list
  • the hash bin count 166a field (of Fig 11) is absent from the hash table elements of Fig 12 since the end of a hash bm list may be determined by the presence of a NULL value m the next bm 190a field of a hash bin 172a
  • Fig 12 has the advantage of being created using a simpler creation method, but the disadvantage of using more memory storage space due to the presence of the next bin reference
  • the hashing method solves the problem of mapping a relatively large set of potential input values, l e , the entire Umcode code set, to a relatively smaller subset of values, I e , the code set associated with the subset of Umcode characters used in the Japanese language, or other language with a relatively large number of characters, such as Chinese or Korean
  • One solution contemplated is to provide an array of glyphs indexed by the code point of the
  • Umcode character wherem the size of the array is the size of the Umcode code set, I e , 65,536 array elements
  • this solution is very costly in terms of memory storage space
  • Another solution contemplated is to provide an anay of encoding value/glyph pairs, the size of which is the size of the language-specific character subset, which is linearly searched for a matching encoding value
  • this solution is costly in terms of time
  • the code points for the characters of a given language are not allocated sequentially m the Umcode code set
  • the characters which are used in the Japanese language do not occupy a single range of code points in the Umcode code set
  • a simple array of glyphs wherein the size of the array is the size of the language-specific character subset, indexed by subtracting from the code pomt of the character sought the smallest code point in the subset would suffice
  • the code points are not sequential, this solution is not realizable
  • Another solution contemplated is to provide a binary tree, or other tree configuration for arranging the glyph set This solution is potentially supenor to the encoding value/glyph pair solution in terms of time, and potentially supenor to the array of glyphs indexed by Umcode code points solution in terms of memory storage space
  • the hash tables provides an improved method over the previously contemplated methods for efficiently stonng and quickly retnevmg Umcode character glyphs as will be discussed with reference to
  • the number of hash bins 172 and the number of glyphs 180a in the glyph set 94a is equal to the number of Umcode characters whose glyphs are present m the glyph set 94a
  • the Japanese characters of the Umcode character set are stored in the hash table 160
  • the number of hash table elements is 2048
  • Fig 13 a flowchart lllusUating steps taken to efficiently store a character subset of the Umcode character set in the hash tables of Figs 10-12 of the set top box 18 (of Fig 2) is shown
  • the steps of the method Fig 13 are performed by a computer program, refened to as the table generator, which generates a source code file, such as a C language source code file, winch includes the data structures descnbed in Figs 10-12
  • the source code files are used to compile and link the operating environment 70 for programming the ROM 30 (of Fig 2)
  • the steps may be performed by a human to generate the source code files, I e
  • the table generator may be a human and/or modification of the computer program-generated source code file
  • the characters in a given hash bin list may be reordered such that more frequently accessed characters are placed nearer the front of the list to reduce lookup time
  • the table generator receives a subset of Umcode characters in step 200
  • the character subset compnses an encoding value and glyph for each of the characters in the subset
  • the subset is the Umcode characters used in the Japanese language
  • the table generator allocates storage for the hash table 160 (of Figs 10-12) m step 202
  • allocating storage compnses generating source code data structures for programming into the ROM 30 (of Fig 2)
  • Allocating storage for the hash table compnses allocating storage for the array of hash table elements 162 (of Figs 10-12)
  • Allocating storage for a hash table element compnses allocating storage for a hashbm hst reference, such as hashbm list reference 164a (of Fig 10)
  • allocating storage for a hash table element further compnses allocating storage for a hash bm count, such as hash bin count 166a (of Fig 10)
  • the table generator determines if more characters of the subset need to be stored in step 204 If not, all the characters of the subset have been stored in the hash table If a new character is to be stored, the table generator allocates storage for a new hash bin associated with the new character m step 206 Allocating storage for a hash bin compnses allocating storage for an encoding value 174a, and a glyph 180a In the case of Fig 10, allocating storage for a new hash bm further compnses allocating storage for a glyph reference 176a Inthe case of Fig 12, allocating storage for a new hash bin further compnses allocating storage for a next bm reference 190a The table generator stores the encoding value and glyph of the new character m the newly allocated hash bimn step 208 In the case of Fig 10, the table generator also populates the glyph reference, suchas glyph reference 176a (of Fig 10) with a reference to the glyph
  • the constant MASKl has a value of Oxff
  • the constant MASK2 has a value of 0x7ff
  • the constant SHIFTVAL has a value of 8
  • the encoding value is the encoding value of the new character to be stored in the hash table
  • the "&" operation is a bitwise logical AND operation
  • the " ⁇ " operation is a bitwise logical EXCLUSIVE OR operation
  • the "»" operation is a bitwise logical SHIFT RIGHT operation by the number of bits specified by the SHIFTVAL constant
  • the number of hash table elements is 2048 Using the prefened hash table size, and hashing index equation with the prefened constants to store the Umcode characters used in the Japanese language, advantageously yields a hash table in which only 44 of the 2048 hash table entiles are empty, the average hash bin list length, I e , the average number of "collisions", is 3, the maximum hashbm hst length is 8, and 80% of the characters hash to a hash bin list of length of 5 or less
  • the hashing method provides an efficient method for stonng and a quick method for retnevmg Japanese Umcode characters
  • the present method yields a distnbution of hash bin list lengths as shown in Table I
  • step 212 the table generator calculates a reference to a current hash b list by indexing mto the hash table using the mdex which was calculated m step 210
  • the cunent hash bm list reference is the hash bm list reference of the indexed hash table element
  • the table generator adds the new hash bin which was allocated m step 206 to the current hash bm list m step 214
  • adding the new hash bin to the hash bin list includes incrementing the hash bin list count and detenmmng if the hash bm list is empty If the hash bm list is empty, the hash bin list reference is assigned a reference to the new hash bin
  • adding the new hash bm to the hash bm list includes assigning the "next" hash bin reference of the new hash bin a terminating value, preferably NULL, and determining if the hash bin list is empty If the hash bin list is empty, the hash bin list reference is assigned a reference to the new hash bm If the hash bm hst is not empty, the tail of the hash bm list is found and the "next" hash bin reference of the tail hash bin is assigned a reference to the new hash bin
  • Fig 14 a flowchart illustrating steps taken to quickly retneve a Umcode character glyphfromthe hash tables of Figs 10-12 ofthe set top box of Fig 2 ⁇ s shown
  • the steps to retneve a Umcode character glyph from the hash tables are performed by a rendenng engine, such as rendenng engine 92a (of Fig 6)
  • the rendenng engme 92a retneves the Umcode character glyph to use m rendenng the specified Umcode character in response to a request from the Umcode encoding engine 84 (of Fig 6) to render a Umcode suing
  • the rendenng engine 92a receives an encoding value, or code point, conesponding to a Umcode character to be rendered in step 220
  • the rendenng engine 92a calculates an index into the hash table 160 m step 222 according to the same equation used in step 210 (of Fig 13) to calculate the index
  • the descnption of Fig 13 provides a detailed descnption of the hashing equation
  • the descnption of Fig 13 also discusses the vanous list length results, shown in Table I, ofthe hashing method, which are pertinent to the retrieval time associated with the method
  • step 224 the rendenng engme 92a calculates a reference to a hashbm list by indexing into the hash table using the index which was calculated in step 222
  • the hash bin list reference is the hash bin list reference ofthe mdexed hash table element
  • the rendenng engme 92a assigns a current hash bm reference to the first hash bm m the hash bin list in step 226
  • the rendenng engine 92a determines if the end ofthe hash bm list has been reached in step 228 Determining if the end ofthe hashbm list has been reached includes determining if the number of hash bins visited m hash bin list is greater than the hash bin list count value in the case of Figs 10 and 11 "Visiting" a hash bm means assigning the current hash bm reference to a new value, as is performed in steps 226 and 236 Detemumng if the end ofthe hashbm list has been reached mcludes determining if the cunent liashbin reference has a NULL value in the case of Fig 12
  • the rendenng engine 92a searches the list of downloaded glyphs 97 (of Fig 6) to find the glyph for the Umcode character corresponding to the encoding value which was received in step 220 If the glyph is not present in the list of downloaded glyphs 97 then an enor has occurred
  • the rendenng engine 92a determines if the encoding value ofthe current hash bin is equal to the encoding value which was received in step 220 If not, the current hash bin reference is assigned a reference to the next hash bin in the liash bm list in step 236 and the rendenng engine 92a returns to step 228 However, if the encoding values are equal, the rendenng engine 92a returns a reference to the glyph ofthe cunent hash bin in step 234

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A system and method for receiving and rendering Unicode text in multiple languages on a set top box is disclosed. The system includes a set top box which receives an application program from a broadcast station. The set top box executes the application program. The application program includes Unicode character encoding text for display on a television coupled to the set top box. An operating environment running on the set top box includes a Unicode encoding engine which the application program invokes to display Unicode text. The encoding engine determines the language of characters in the text and invokes a rendering engine corresponding to the language of each character, thus enabling characters from different languages to be mixed in the same text string. The rendering engine has specific knowledge of the language, such as rendering direction and context. One or more glyph sets may be plugged in.

Description

TITLE: SYSTEM AND METHOD FOR RECEIVING AND RENDERING MULTI¬
LINGUAL TEXT ON A SET TOP BOX
BACKGROUND OF THE INVENTION
1 Field of the Invention
This invention relates to digital television systems and more particularly to the receiving and rendering of multi-lingual text on set top boxes of digital television systems
2 Descnption of the Related Art
The emerging technology of digital television systems holds a promise of allowing a television set to provide a vast arrav of new services Digital television systems are capable of displaying text and graphic images in addition to typical video program streams An example of digital television services which make use of text and graphic image display is interactive television Proposed features of interactive television accommodate a vanety of marketing, entertainment and educational capabilities such as allowing a user to order an advertised product or service, compete against contestants m a game show, or request specialized information regarding a televised program Typically, the interactive functionality is controlled by a set top box connected to a television set
The set top box executes an interactive program wntten for a television broadcast The interactive functionahty is displayed upon the television set screen and may include icons or menus to allow a user to make selections via the television's remote control
Interactive television, and other broadcast communication systems in general which deliver textual data, often require support for multiple languages For example, program guides are advertising tools used by program providers may desire to mclude descnptions of television programs in multiple languages, such as English mixed with Japanese In addition, end users may receive data from non-native regions, such as a
Chinese broadcast being received by a television viewer in India
It is highly desirable that a set top box owner be able to use the same set top box to receive textual information in more than one language That is, it is desirable that the user not have to buy a different set top box to receive textual information m each different language A language, in this context, may be denned as a written system of representing thoughts, ideas, actions, etc A language includes, inter aha, a grammar, characters, and words
The characters and symbols used m wnting a language are commonly referred to as a "writing system'", or " scnpt " Many languages, such as Western European languages, are written with alphabetic and numenc characters However, Japanese, for example, is wntten with phonetic Hiragana and Katakana characters as well as alphabetic and numenc characters from Western languages and the ideographic Kanji characters which are largely taken from the Chinese language The scπpts of many languages may share common characters, as m the Western European languages The textual information received by a set top box includes strings of characters A "character" is an atomic symbol in a writing system In alphabetic languages, this symbol consists of a single letter of the alphabet In ideographic languages such as Chinese and Japanese, a character could be alphabetic, phonetic or ideographic A "character set" is a group of characters used to represent a particular language or group of languages A "character encoding" is a system for numencally representing the characters of a character set A well-known example of a character encoding is the ASCII character encoding The numenc value associated with a given character in a character set is referred to as a "code point", or "encoding value " The set of numenc values associated with a code set is referred to as a "code set " The ASCII character encoding provides an encoding for a character set of the alphabet, numbers, and other characters used in the English language The ASCII code set includes the values 0 - 127 Thus, each ASCII character has a unique assigned value which may be contained m 7 bits of a byte of data For example, the character 'A has a value 0x41 associated with it in ASCII Many software library routines have been developed to manipulate, read and wnte strings of ASCII characters Other character encoding sets exist, which provide support for multiple languages, such as the ISO
Latin character encoding which is used to represent many of the alphabetic languages in the world ISO Latin includes a Basic Latin portion range of values (0 - 127) and an Extended Latin portion (values 128- 255)
Another example of a character encoding is the Japanese Industrial Standard (JIS) character encoding JIS uses a 7-bit multi-byte encoding mechanism to represent Japanese text
A character encoding which enables the representation of characters from many different languages and character sets using a single encoding scheme is referred to as a multi-hngual" character encoding An example of a multi-lingual character encoding is the EUC (Extended UNIX Code) character encoding standard EUC is typically used to represent ideographic Asian languages in the UNIX environment EUC combines single byte ASCII characters with multi-byte ideographic character encodings However, EUC allows only a few languages to be encoded at a time
Developmg new software library routines to deal with strings in multiple character encodings and/or multiple languages may be prohibitive m terms of cost and time Furthermore, it may be prohibitive in terms of storage space and/or code maintenance to support libraries to handle characters in multiple character encodings and languages
Some scnpts combine characters to form composed characters whose shape is determined by the relative positions of the characters, l e , the context of the characters Examples of these "contextual scnpts" are scnpts for the Arabic, Hebrew, Thai, and all Indie languages In contrast, "non-contextual scnpts", such as the Roman alphabet used in Western languages, represent each character as a separate object of fixed shape, independent of the position in a word and of the neighbonng characters
Each character of a character set lias a unique shape which distinguishes it from other characters in the character set, that is, which allows a reader to distinguish the character from other characters and thus unambiguously convey information The shape assigned to a particular character is referred to as the "glyph" of the character The English letter 'A, for example, has a unique glyph which makes it recognizable from other characters
Glyphs may have a particular style associated with them That is, an English 'A' may be wntten in many different styles, such as in a block style or a calligraphic style However, the style maintains the basic shape of the character such that the glyph is still recognizable as an 'A ' A collection of glyphs slianng a common style is referred to as a "font " Examples of common fonts are Couner, Times Roman, and Helvetica
A vanety of glyph representation schemes exist A common scheme is a bitmap glyph, or font In a bitmap font, the glyph of a given character includes a sequence of bits corresponding to an array of pixels on a display screen Each bit indicates if the corresponding pixel is to be illuminated or not based on the value of the bit The pixel array has a charactenstic width and height For example, a glyph may be 24 pixels wide and 24 pixels high In this example, 576 bits, or 72 bytes, of storage are required to store the glyph If tire glyphs in a font are the same number of pixels in width, the font is said to be a non- proportional font If the width is variable, the font is said to be a proportional font Another common glyph representation scheme is an outline font A property of outline fonts is that they typically facilitate scaling and rotating
A set top box receives text encoded according to a character encoding and displays the text on a television The act of processing the image of a character, I e , the glyph associated with the cliaracter, and displaying the character is referred to as "rendenng " A rendenng program must use font type information, size information, and potentially contextual information in order to properly render a given cliaracter in a given scnpt
Transmission bandwidth m digital broadcast systems is a precious commodity Hence, there is a motivation to minimize the number of bytes transiiutted to the set top box with regard to the displaying of text Languages which have a relatively large number of characters, such as Chinese, Japanese, and
Korean, pose particular problems in the context of text processing and rendenng in digital television systems One problem is the large time to search through such a large set of characters to find a glyph associated with a given code point The combined Chinese, Japanese, and Korean character sets constitute over 120,000 characters Secondly, the amount of memory required to store fonts and/or transmission bandwidth required to transmit fonts may be costly
In many circumstances, set top boxes are a commodity item Hence, a multi-lingual capable set top box which costs significantly more than a uni-lingual set top box may not be accepted readily m the market place On the other hand, the set top box must deliver performance which is acceptable at a given cost Thus, the factor of cost versus performance figures m to the design of a set top box Two components of a typical set top box which have a large beanng on its cost are its memory and processor If multiple languages are supported, particularly if the languages have a large number of characters, such as Chinese, Japanese, or Korean, a large amount of memory may be required to store the fonts for the languages More powerful processors provide higher performance of functions such as character lookup and rendenng, but at a greater cost SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a system and method of the present invention for receiving and rendenng multi-lingual text on a set top box of a digital television system In one embodiment, the system compnses a set top box which is configured to receive text, the characters of which are encoded according to a multi-lingual character encoding standard, "Unicode" The set top box is further configured to process the Unicode text, and render the text for display on a television coupled to the set top box The set top box is configured with an operating environment which accepts language-specific glyph sets to be modularly "plugged in" to the set top One or more glyph sets can be plugged into the set top box to support one or more languages as desired Glyphs or glyph sets may be downloaded into the set top box along with the application program in the event that a given glyph is not present in the set top box The set top box may also employ an improved hashing method for efficiently stonng and quickly retrieving characters of a language with large number of characters, such as Japanese
An application developer develops an application program, such as an interactive TV program, using development tools and libranes such that the textual data in the application program are Unicode characters Preferably, the textual data is included in a resource file, which is separate from the instructions of the application program A broadcast center mixes the application program, including the resource file, with a digital audio/video data stream The audio/video stream includes the data for playing the television program or commercial to be shown on the user's television Typically, the audio/video stream is compressed using a compression algorithm such as one of the Motion Picture Expert Group (MPEG) compression standards The broadcast center transmits the data stream to the set top box The data stream is transmitted by a suitable transmission mechanism, such as via satellite or coaxial cable
The set top box receives the stream of digital data from the broadcast centei The set top box demultiplexes the audio/video stream portion from the application program and stores the application program m local memory of the set top box The set top box decompresses the audio/video data stream for display on the television A processor in the set top box executes the application program
The operating environment running on the set top box is configured to manage the different tasks, such as the application program, which are executed by the set top box Preferably, the operating environment includes an interpreter which interprets code instructions which are processor independent Preferably, the application program is interpreted by the interpreter
The interpreter mcludes a Unicode encoding engine which includes library functions for manipulating and printing Umcode character strings The application program calls the Umcode character suing functions to perform string manipulations such as determining Unicode suing lengths, copying Unicode stnngs and connecting Unicode strings The application program also calls suing display funcuons of the Unicode engine
The interpreter further compnses a language detector The Unicode engine invokes the language detector to determine a language associated with a given character of the Unicode string The Umcode engine uses the language and the font set by the application program to determine which of the one or more glyph sets of the set top box includes the glyph for the character The interpreter further includes one or more rendenng engines for rendenng glyphs of a given language and font
Glyphs or glyph sets may also be downloaded to the set top box as needed If the application is configured to display a glyph which is not present in the set top box, 1 e , not plugged-in to the set top box, the glyph may be downloaded along with the application to the set top box The Unicode engine detects a condition where a glyph referenced by the application is not burned in to the set top box, and searches a list of downloaded glyphs to detect the presence of the referenced glyph If the Umcode engine detects the presence of the downloaded glyph, the Umcode engine invokes the appropnate rendenng engine to render the downloaded glyph Each rendenng engine is configured to render suings of characters according to the rendenng rules for its particular language and font For example, a rendenng engine for a contextual language knows how to render characters in a suing based on the context of each character Furthermore, a rendenng engine may have specific knowledge about the standards of a given region, such as regarding time, date, and currency symbols Furthermore, a rendenng engine must know the direction in which the characters are to be rendered For example, an Arabic rendenng engine would render the characters from nght to left, whereas a French rendenng engine would render the characters from left to nght
The glyph sets are preferably arranged in a manner conducive to efficient storage and retrieval of the glyphs in the glyph set, according to the charactenstics of the language associated with the glyph set Glyph sets for languages with a large number of characters may be stored and reuived using a hash table according to a hashing method The hash method may yield a relatively small maximum number of collisions with a large percentage of the code points hashing to elements with approximately half the maximum number of collisions or less
Each glyph set has an associated rendenng engine The Umcode engine invokes the appropnate rendenng engine to process and render each Umcode suing of the text A rendenng engine renders a character by receivmg a glyph associated with a Umcode character and populating a pixel map according to the glyph information A pixel map is a string of bits indicating the state of each pixel in an array of pixels For example, in the case of a bitmap glyph of a non-contextual language, rendenng the glyph includes copying the glyph bit map to the appropnate location in memory The pixel map may further include other property information, such as color The rendenng engine processes and renders characters of the suing until the rendenng engine encounters a character which does not belong to its language If the rendenng engine did not process the entire string, the Umcode engine updates the suing pointer to point to the next character in the string which was not processed by the rendenng engine, invokes the language detector to determine the language associated with that character, and invokes the appropnate rendenng engine The process continues until all the text has been rendered
The rendenng engines pass the pixel maps to a graphics dnver which conttols the video hardware of the set top box The graphics dnver provides the pixel maps to the video hardware of the set top box such that the text is displayed mthe appropnate coordinates on the television display screen The set top box multiplexes the decompressed audio/video stream with the rendered text and displays the audio/video mfonnaUon and rendered text on the television
Thus, the television system and method may advantageously provide a means for receiving and rendenng text in multiple languages, and do so in a manner which maximizes code reusability thus minimizmg development and maintenance time and cost by providing the ability to process text including characters in a umversal character encoding The system and method may further minimize the broadcast bandwidth required to receive and render multiple languages by providing pluggable language-specific modules
BRIEF DESCRIPTION OF THE DRAWINGS
Other objects and advantages of the invention will become apparent upon reading the following detailed descnption and upon reference to the accompanying drawings in which
Fig 1 is a block diagram of a television system according to the present invention,
Fig 2 is a block diagram of the set top box of the system of Fig 1,
Fig 3 is a block diagram illustrating the flow of data in the system of Fig 1 ,
Fig 4 is a flowchart illustrating steps taken in developing and transmitting an application program in the system of Fig 1.
Fig 5 is a block diagram of the software modules of the set top box of Fig 2,
Fig 6 is a block diagram illustrating in more detail portions of the interpreter of Fig 4
Fig 7 is a flowchart illustrating steps taken in receiving and rendenng multi-lingual text m the system of Fig 1,
Fig 8 is a flowchart illustrating in more detail the step of processing text in Fig 7,
Fig 9 is a block diagram illustrating the inputs and outputs of a rendenng engine of Fig 5,
Fig 10 is a block diagram illustrating data structures used in the hashing method of the present invention according to the preferred embodiment,
Fig 11 is a block diagram illustrating data structures used in the hashing method of the present invention according to an alternate embodiment, Fig 12 is a block diagram illustrating data structures used in the hashing method of the present invention according to an alternate embodiment,
Fig 13 is a flowchart illustrating steps taken to efficiently store a character set in the set top box of Fig 2,
Fig 14 is a flowchart illustrating steps taken to quickly retneve a cliaracter from the set top box stored according to the method of Fig 13
While the invention is susceptible to vanous modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be descnbed in detail It should be understood, however, that the drawings and detailed descnption thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spint and scope of the present invention as defined by the appended claims
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Refernng now to Fig 1, a block diagram of a television system 10 according to one embodiment is shown The system 10 compnses a broadcast center 12 which transmits a stream of digital data to a set top box 18, also refened to as a digital interactive decoder (DID) Preferably, the broadcast center 12 transmits the digital data stream to a satellite 14 which transmits the digital data stream to an antenna 16 coupled to the set top box 18 In one embodiment, the broadcast center 12 transmits the digital data stream to the set top box 18 via a cable, such as a coaxial or fiber optic cable The set top box 18 receives the digital data stream from the antenna 16 or cable and displays a portion of the information from the digital data stream on a television 20 coupled to the set top box 18
The set top box 18 receives user mput from a remote control 28 Preferably, the set top box 18 provides a portion of the user input to a transaction server 26 For example, the set top box 18 may display a menu for ordenng a product, such as a hammer The user may provide input indicating the desire to purchase the hammer The set top box 18 provides the purchase information to the transaction server 26 which forwards the purchase information to the hammer manufacturer or distnbutor so that the product may be distnbuted and billed to the user
The digital data stream compnses an audio/video portion and an application program portion The audio/video portion compnses the digital audio and video information for a television program or television commercial to be displayed on the television 20 Preferably, the audio/video stream is compressed using a common compression algonthm such as MPEG 2
The application program portion of the digital data stream compnses instructions and data to be executed on the set top box 18 Preferably, the application program is configured to display text on the television 20 which is coordinated with the television program or television commercial of the audio/video data stream displayed on the television 20 For example, the application program may execute instructions to display a menu for ordenng the hammer The audio/video data stream portion and the apphcation program portion are mixed together, preferably in the broadcast center 12, to produce the digital data stream transmitted to the set top box 18 Preferably, the application program includes textual information such as a menu, including stnngs of characters in multiple languages The set top box 18 is configured to receive the application program and process the st ngs of characters of the application program and render the characters for display on the television 20
A video cassette recorder (VCR) 24 may also be coupled to the set top box 18 The set top box 18 may control the VCR 24 to perform actions according to application programs downloaded to the set top box 18 An example is the set top box 18 controlling the VCR 24 to perform automated recording
A computer 22 may also be coupled for communication with the set top box 18 The computer 22 may also download application programs to the set top box 18 Further, the set top box 18 may use resources of the computer 22, such as a hard disk, as permanent storage The computer 22 may be locally connected, such as through a senal connection, or remotely connected via a telephone line
Refernng now to Fig 2, a block diagram of the set top box 18 of Fig 1 is shown Set top box 18 compnses CPU 40 coupled to a read-only memory (ROM) 30 The ROM 30 includes instructions and data for executing on the CPU 40 A random access memory (RAM) 32 is coupled to the CPU The RAM 32 is used for stonng program vanables for the program instructions contained m the ROM 0 The RAM 32 is also configured to store the application program received from the broadcast center 12 (of Fig 1) A
FLASH memory 34 is also coupled to the CPU 40 and contains program instructions for execution on the CPU 40 and/or cliaracter glyphs used in rendenng text characters on the television 20 (of Fig 1)
The CPU 40 compnses a microprocessor, micro-controller, digital signal processor (DSP), or other type of software instruction processing device The CPU 40 fetches instructions from the ROM 30, RAM 32, and/or FLASH 34 and executes the instructions
The ROM 30 compnses read only memory storage elements as are well known in the art of solid state memory circuits Preferably, the ROM 30 compnses read only memory storage which is programmed and plugged in to the set top box 18
The RAM 32 compnses dynamic random access memory (DRAM) or static random access memory (SRAM) storage elements as are well known m the art of solid state memory circuits
The FLASH memory 34 compnses wntable permanent memory storage elements as are well known m the art of sohd state memory circuits Preferably, the FLASH 34 compnses memory storage winch may be programmed, l e , wntten, dunng operation of the set top box 18
A secunty device 36 is also coupled to the CPU 40 for providing authentication and signature functionality For example, the secunty device allows the enabling or disabling of application program downloading to the set top box 18
A communications port 42 is coupled to the CPU 40 and is configured to provide communication with other devices such as the computer 22 (of Fig 1), the VCR 24 (of Fig 1), or other devices such as a keyboard A remote control port 44 is coupled to the CPU 40 and is configured to receive remote input such as from the remote control 28 (of Fig 1) or from a front panel on the set top box 18 Preferably, the remote port 44 compnses an infra-red receiver for receiving infra-red signals from the remote control 28 A modem 46 is coupled to the CPU 40 and is configured to provide communication between the set top box 18 and the transaction server 26 (of Fig 1) A demultiplexer 38 is coupled to the RAM 32 and is configured to receive the digital data stieam from a receiver 50 coupled to the demultiplexer 38 The receiver 50 receives the digital data stream from the broadcast center and communicates the digital data stream to the demultiplexer 38 The demultiplexer 38 is configured to demultiplex the application program from the audio/video data stream of the digital data stream received from the broadcast center 12 and store the application program in the RAM 32 The CPU 40 executes the application program stored m the RAM memory and provides rendered textual data from the application program to a video encoder/multiplexer 48 The video encoder/multiplexer 48 multiplexes the rendered text with the audio/video stream and provides the multiplexed rendered text and audio/video stream to the television 20 for display on the television
Refernng now to Fig 3, a block diagram illustrating the flow of data in the system 10 of Fig 1 is shown The broadcast center 12 (of Fig 1) receives an audio/video stream 66 and an application program 64, multiplexes the audio/video stream 66 and the application program 64, and transmits the multiplexed data stream to the set top box 18 Preferably, the language specific textual information of the application program 64 is included in a resource file 62 which is transmitted as part of the application 64 to the set top box 18 By separating the language specific portion of the application 64 in the resource file 62 the application 64 advantageously provides a means whereby changing the textual information from one language to another requires only modification to the resource file 62 rather than to the application 64
The set top box 18 receives the digital data stream and the demultiplexer 38 of the set top box 18 demultiplexes the application 64 from the audio/video stream 66a The audio/video stream 66 may have been compressed according to a lossy compression algonthm, hence the decompressed audio/video stream 66a may be in some manner different from the initially transmitted audio/video stream 66
The application program 64 is executed by an operating environment 70 of the set top box 18 The apphcation program 64 executes within the operating environment 70 to display rendered textual information which is received by the video multiplexer 48 (of Fig 2) along with the audio/video stream 66a Video multiplexer 48 multiplexes the rendered text information mthe audio/video stream 66a and provides the multiplexed information to the television 20 for display on the television 20
Refernng now to Fig 4, a flowchart illustrating steps taken in developing and transmitting an apphcation program m the system of Fig 1 is shown. An application program developer develops an apphcation such as the apphcation 64 of Fig 3 in step 102 Preferably, the apphcation program developer does not include the textual information to be displayed on the television 20 (of Fig 1) in the apphcation program, but rather places the textual information m a resource file such as the resource file 62 of Fig 3 , and mcludes in the application program 64 references to the textual lnfonnation
The application programmer creates a resource file 62 in step 104 The resource file 62 includes formatted chunks of data which may be attached to the application program 64 to avoid embedding the data directly into the apphcation program 64 The resource file 62 advantageously simplifies maintenance and modification of the apphcation program 64 smce the data may be changed m the resource file 62 without modification to the tested and debugged application program 64
Alternatively, the textual information may be contained in the application program 64 itself Preferably, the textual information compnses stnngs of characters wherein the characters are from the Umcode character set Preferably, the application developer creates the application 64 and/or resource file 62 using a Umcode-capable text editor, or some other suitable Unicode-capable tool
Umcode is a multi-lingual character encoding which attempts to include all current wntten scnpts for all current languages Each character in the Umcode character set is represented by a 16-bit value or code point, thus allowing a character set of 65,536 characters Umcode is part of the ISO 10646 standard ISO/TEC 10646-1 1993 (E) defines the Umcode standard and is hereby incorporated by reference
Preferably, creating the resource file in step 1 4 compnses optionally including one or more glyphs for particular characters referenced by the application program to be displayed on the television 20 If the glyph of a particular cliaracter is not already part of a glyph set of the set top box 18, the glyph may be downloaded to the set top box 18 with the application program 64 Thus, the resource file 62 of the system advantageously provides a means for providing glyphs to the set top box 18 In particular, the means of downloading a glyph to the set top box 18 is advantageous for rendenng characters which are infrequently used or special characters, thus allowing the saving of memory storage within the set top box and potentially reducing the cost of the set top box 18 Furthermore, it may be desired to render a character of a language which is not present in the set top box 18 In that situation, the glyph for the Umcode character from the language not present in the set top box 18 may be downloaded with the application program to the set top box 18
Once the application program and resource file are developed they are provided to the broadcast center 12 which transmits the application to the set top box in step 106
Refernng now to Fig 5 , a block diagram of the software modules of the set top 18 box of Fig 2 are shown. The application program 64 (of Fig 3) commumcates with the operating environment 70 (of Fig 3) which m turn commumcates with the set top box hardware 18 to display textual information of the application program 64 on the television 20 The operating environment 70 compnses device dnvers 76 for communicating with and controlling the set top box hardware 18 A microkernel 72 provides system services to the vanous components of the operating environment 70 such as memory management, task management, and commumcation between tasks, such as the apphcation program 64, and the device dnvers 76
Preferably, the application program 64 compnses instructions which may be interpreted by an interpreter 74 In one embodiment, the interpreted instructions in the application program are refened to as o-code and the interpreter 74 is an o-code interpreter o-code compnses a stack based instruction set Preferably, the interpreter 74, the microkernel 72, and the device dnver 76 of the operating environment 70 reside in the ROM 30 (of Fig 2) of the set top box 18 Advantageously, interpreter 74 interpreting the o- code of the application 64 provides a means of developing applications 64 which are mdependent of the underlying CPU 40 (of Fig 2) hardware The interpreter 74 includes function libranes which are accessible by the application program 64 for performing functions such as allocating memory, mampulating memory, and providing user interface management
Refernng now to Fig 6, a block diagram illustrating in more detail portions of the interpreter 74 of Fig 4 are shown An application program 64 executing on the set top box 18 commumcates with the interpreter 74 which in turn commumcates with a graphics dnver 76a of the device dnvers 76 (of Fig 5) to process and render Umcode text for display on television 20 (of Fig 1) The interpreter 74 compnses a Umcode encoding engme 84
The Umcode encoding engine 84 provides functions which the application program 64 invokes to perfonn numerous stnng mampulation functions such as detenmmng the length of a Umcode suing, copying a Umcode stnng from one location to another, concatenating two Umcode stnngs together, companng two Umcode stnngs to determine if the Umcode stnngs are identical, and searching a Umcode stnng for an occunence of a particular Umcode character wifhm the Umcode stnng Preferably, a Umcode stnng is defined as one or more Umcode characters terminated by a null Umcode character The Umcode encoding engine 84 further compnses functions winch the application program 64 invokes to set the cunent font of the Umcode text to be rendered The Umcode encoding engme 84 further compnses functions which the application 64 invokes for displaying Umcode text on the television 20
When the application program 64 invokes a function of the Umcode encoding engine 84 to display Umcode text, the Umcode encoding engine 84 invokes a language detector 82 m order to determine a language associated with characters of the Unicode text received from the application program 64 The language detector 82 informs the Umcode encoding engine 84 of the language associated with the Umcode character passed to the language detector 82 by the Umcode encoding engine 84
The interpreter 74 compnses one or more glyph sets 94a-94f refened to collectively as 94 The Umcode encoding engine 84 uses the language lnfonnation returned by the language detector 82 along with font information set by the application program 64 to determine which one of the glyph sets 94 mcludes a glyph for descnb g the particular Unicode character to be rendered
The interpreter 74 further compnses one or more rendenng engines 92a-92n, refened to collectively as 92 Each of the rendenng engines 92 is configured to render Umcode characters conespondmg to a particular language and/or font As shown, the rendenng engines 92 receive glyph information from the glyph sets 94 in order to render Umcode characters The Umcode encoding engme 84 mvokes the appropnate rendenng engine from the rendenng engines 92 configured to render the particular glyph from one of the glyph sets 94 corresponding to the given Umcode character to be rendered
As shown, a given rendenng engine 92 may be configured to render glyphs from a plurality of glyph sets 94 For example, a rendenng engine 92 which renders bitmap glyphs of a fixed pixel height and pixel width for non-contextual languages which render characters from left to nght may render characters for most of the Western European languages
The Umcode encoding engine 84 is further configured to determine the absence of a glyph in the glyph sets 94 for a given character in a language and detect the presence of a downloaded glyph 97 corresponding to a given Umcode character to be rendered The downloaded glyphs 97 are downloaded to the set top box 18, preferably in the resource file 62 (of Fig 3) along with the application program 64 The downloaded glyphs 97 are placed in the RAM 32 (of Fig 2) m a list for access by the Umcode encoding engine 84 and rendenng engines 92 m rendenng Umcode characters not present in the glyph sets 94 The interpreter 74 may contain as few as one glyph set 94 and one rendenng engine 92 Advantageously, glyph sets 94 may be modularly added to the interpreter 74 as required to accommodate vanous languages and fonts
For example, the set top box 18 (of Fig 2) may be configured with three glyph sets a glyph set for a 16 point "Couner" English font, a 24 point Couner English font glyph set, and 24 point Japanese font glyph set The set top box 18 is further configured with a rendenng engine for rendenng each of the three glyph sets
Preferably, configunng the set top box 18 in the manner descnbed with modular glyph set modules compnses compiling and linking together the vanous portions of the operating environment 70 along with the desired glyph sets and programming the operating environment into the ROM 30 (of Fig 2) Thus, the set top box 18 may be tailored specifically to support the desired languages in a given geographic locale Tins localization advantageously enables the same set top box hardware 18 and large portions of the operating environment 70 to be reused without modification Thus, development time and resources are decreased and the cost of the set top box hardware is reclaimed
Once the rendenng engines 92 have rendered the Umcode characters and produced stnngs of bits in the RAM memory 32 of the set top box 18 (of Fig 2), the rendenng engines 92 invoke the graphics dnver 76a to display the rendered text on the television 20 The graphics dnver 76a interacts with the video hardware of the set top box 18 to display the rendered text along with the audio/video data of the television program or commercial on the television 20
The interpreter 74 further compnses an 8-bit encoding engine 86 for handling 8-bit cliaracter encoding stnngs such as ASCII text The application program 64 invokes stnng functions of the 8-bit encoding engine 86 in order to manipulate and display 8-bit character encoding characters on the television 20 Thus, the 8-bit encoding engine 86 performs functions similar to the Umcode coding engine 84 but for 8-bit encoded character stnngs rather than for Umcode encoded character stnngs
Each of the rendenng engines 92 is configured to render Umcode text according to rendenng rules for the particular language associated with each of the rendenng engines 92 For example, a rendenng engine associated with a language which is a contextual language has knowledge about how to render characters of a suing based on the context of the given character For example, an Arabic rendenng engine contains knowledge about particular kerns or ligatures used in connecting Arabic characters based on neighboring characters
Furthermore, a rendenng engine has specific knowledge regarding the direction in which characters are rendered For example, a Hebrew rendenng engine renders characters from nght to left, whereas a
French rendenng engme renders characters from left to nght Furthermore, rendenng engines have specific knowledge about standards of a given locale such as standards for displaying times, dates and currency symbols, for example A glyph set from one of the glyph sets 94 compnses a plurality of glyphs organized in a manner optimized for the particular language or set of glyphs in the glyph set The glyph sets are organized to optimize the time required for look-up of a given Umcode character, as well as to optimize the amount of storage required in order to store the glyph set Thus, two different glyph sets may be organized in two different manners
For example, a glyph set compnsing a relatively small number of glyphs may be arranged as a simple indexed array A glyph set compnsing a relatively large number of characters such as a Japanese, Chinese, or Korean glyph set may be arranged in a more sophisticated manner such as by using a hash table Furthermore, glyph sets for contextual languages may include multiple tables according to context The glyph sets may also be arranged according to glyph representation such as bit-mapped glyphs, outhne glyphs, stroke glyphs, etc
Refernng now to Fig 7, a flowchart illustrating steps taken in receiving and rendenng multi-lingual text according to the system of Fig 1 is shown The set top box 18 (of Fig 1) receives a digital data stieam including an apphcation program and audio/video information from the broadcast center 12 (of Fig 1 ) in step 110 The demultiplexer 38 (of Fig 2) demultiplexes the application program 64 (of Fig 3) from the audio video stream 66 (of Fig 3) and stores the application program 64 m the RAM memory 32 (of Fig 2) m step 112
The operating environment 70 (of Fig 3) determines if a resource file 62 (of Fig 3) is present with the apphcation 64, and if so, determines if the resource file 62 includes glyphs for rendenng Umcode characters in step 113 If so, the operating environment 70 places the downloaded glyphs into a list of downloaded glyphs 97 (of Fig 6) for future use by one of the rendenng engines 92n (of Fig 6) m step 115 The operating environment 70 executes the application program 64 on the CPU 40 (of Fig 2) in step 114 The application program 64 calls functions of the operating environment 70 to manipulate and display text on the television 20 (of Fig 3) Preferably, the text is encoded according to the Umcode character encoding Preferably, the Umcode text is contained within the resource file 62 (of Fig 3) of the application program 64 Preferably, the application program 64 compnses references to the Umcode text contained in the resource file 62
The Umcode encoding engme 84 (of Fig 6) receives the Umcode text from the apphcation program 64 in step 116 The Umcode text received by the Umcode encoding engine 84 compnses one or more Umcode text stnngs Preferably, the Umcode encoding engine 84 receives the Umcode text from the resource file 62 The Umcode encoding engine 84 m conjunction with other portions of the interpreter 74, processes the Umcode text for displaying the Umcode text m step 118
Refernng now to Fig 8, a flowchart illustrating in more detail step 118 (of Fig 7) of processing the Umcode text is shown In processing the Umcode text received from the apphcation program 64, the Umcode encoding engine 84 determines whether more Umcode stnngs exist in the Umcode text received m step 120 If no more Umcode text stnngs exist, the text has been processed If more stnngs exist the Umcode encoding engine 84 sets a current character vanable to reference the first character in the cunent stnng to be processed in step 122 The Umcode encoding engme 84 then determines whether or not more characters exist m the current Umcode suing in step 124 If no more Umcode characters exist in the current stnng, the Umcode encoding engine 84 returns to step 120 to determine if any more stnngs exist in the text If more characters exist the cunent suing as determined in step 124, the Umcode encoding engine 84 invokes the language detector 82 (of Fig 6) to determine the language of the cunent character in step 126 The Umcode encoding engine 84 invokes the appropnate one of the rendenng engines 92 (of Fig 6) associated with the language of the cunent character in step 128
The Umcode encoding engine 84 invokes the rendenng engine by passing a reference to the current stnng to the rendenng engine The rendenng engine renders characters in the stnng as long as each character encountered is a character m the language associated with the rendenng engine m step 130
Once the rendenng engine detects a character not in the language associated with the rendenng engine, the rendenng engme stops rendenng characters of the Umcode stnng and returns to the Umcode encoding engine 84 lnfonnation regarding which portion of the Umcode stnng was rendered by the rendenng engine In step 132 the Umcode encoding engine 84 uses the information returned by the rendenng engine concerning which characters of the stnng were rendered by the rendenng engine to assign the cunent character vanable to reference the character after the last character rendered by the rendenng engme The Umcode encoding engine then returns to step 124 to determine if more characters exist in the current stnng to be rendered
Thus, each character in the Umcode text received is rendered for display on the television 20 according to the steps of the flowchart of Fig 8 The steps advantageously enable the set top box 18 to process and render Umcode text compnsing characters of different languages
Refernng now to Fig 9, a block diagram illustrating the operation of a rendenng engine such as the rendenng engines 92 (of Fig 6) is shown A rendenng engine 92a representative of the rendenng engines 92 receives a Umcode cliaracter 140 and glyph set mfonnation 94a representative of the glyph sets 94 (of Fig 6) and generates a pixel map of the rendered character 95 The rendenng engme 92a receives the code pomt of the Umcode character 140 and uses the code point of the Umcode character 140 to access a conesponding glyph m the glyph set 94a which descnbes the Umcode cliaracter 140
The pixel map 95 compnses a stnng of bits indicating the state of each pixel in an array of pixels such as the pixels of a television screen The state of a pixel is either on or off A reference to the pixel map 95 is passed to the graphics dnver 76a (of Fig 6) and the graphics dnver 76a uses the pixel map to display the rendered character on the television 20
The rendenng engine 92a takes the descnption of the Umcode character 140 from the glyph of the glyph set 94a representing the Umcode character 140 and generates pixels in the pixel map 95 for display of the Umcode character 140 In the case of a bitmap glyph, rendenng the glyph typically compnses copying the stnng of bits compnsing the glyph to the pixel map In particular, this is true in the case of a non- contextual language
In the case of a contextual language, the rendenng engine 92a modifies the bitmap contained in the glyph to modify or create ligatures or kerns of the glyph based on the context, I e , the neighbonng characters m the stnng, to produce a modified glyph in the form of the pixel map 95 In the case of outhne glyphs, the rendenng engine 92a uses the outhne information to generate the pixel map 95 The rendenng engine 92a uses the outline information from the glyph along with onentation and sizing information to render the character and produce the pixel map 95 The pixel map 95 may be further modified by other portions of the operating environment 70 to include other properties such as color information in the pixel map 95
Refernng now to Fig 10, a block diagram illustrating data structures used in the hashing method according to the prefened embodiment is shown The Umcode encoding engine 84 (of Fig 6) compnses one or more font objects Font object 150 is a representative font object The font object 150 is an object according to the notion of objects in object-onented programming The font object 150 compnses methods and data associated with the object
The font object 150 compnses a reference to a rendenng engine 92a of the rendenng engines 92 (f Fig 6)whιch is a method of the font object 150 The Umcode encoding engine 84 uses the language information from the language detector 82 (of Fig 6) to determine which font object is associated with the language and cunent font of the application 64 (of Fig 3) in order to invoke the rendenng engine 92a of the font object 150
The font object 150 further compnses a reference to an anangement of a glyph set, such as glyph set 94a ofFιg 6 whichis data of the font object 150 Fig 10 illustrates the anangement of a glyph set using a hash table 160 and the font object 150 includes a reference to the hash table 160 The glyph set arrangement illustrated in Fig 10 is particularly useful for efficiently stonng and quickly reuieving Umcode characters for languages with a large number of characters, such as Japanese, Chinese, or Korean
The rendenng engine 92a uses the glyphs 180a-180n of the glyph set 94a to render Umcode characters whose glyphs are present in the glyph set 94a
The hash table 160 includes an array of hash table elements 162a-162n, refened to collectively as 162 Hash table entry 162a will be referred to as a representative hash table entry The hash table 160 is indexed according to indexes calculated by a hashing method upon Umcode code points descnbed infra The rendenng engine 92a calculates a hash table mdex and uses the index to calculate the appropnate hash table entry
Each hash table entry 162a compnses a hash bin list reference 164a-164n, referred to collectively as 164, and a hash bin count 166a-166n, referred to collectively as 166 Hashbm list reference 164a will be referred to as a representative hashbm list reference Hash bin count 166a will be referred to as a representative hashbm count
Each hash bin hst reference 164a references a list of hash bins from the array of hash bins 172a- 172n, referred to collectively as 172 Hash bin 172a will be referred to as a representative hash bm. The hash bin count 166a indicates the number of hash bins in the list of hash bins referenced by its associated hash bin list reference 164a Fig 10 shows example hash bm counts, e g , hash bm count 166a is 4 and associated hash bin list reference 164a references a hash bm list compnsing hash bins 172a- 172d, the first of which is hash bin 172a The hash bin count 166a enables the rendenng engine 92a to search a hash bm list and determine when the end of the hash bm list has been reached Each hash bm 172a compnses an encodmg value 174a, which is representative of encoding values 174a-174n, and a glyph reference 176a, which is representative of glyph references 176a-176n The encodmg value 174a is the code point for a Umcode character The glyph reference 176a refers to a representative glyph 180a, in the glyph set 94a, descnbing the Umcode character whose code point is in the encoding value 174a field The rendenng engine 92a uses the glyph 180a referenced by the glyph reference 176a to render the Umcode character whose code point is in the encoding value 174a field
The operation of the font object 150, I e , the use of the hash table 160 by the rendenng engine 92a, will be descnbed in more detail in the discussion of Fig 14 and the creation of the hash table 160 will be descnbed in more detail in the discussion of Fig 13 Refernng now to Fig 11 , a block diagram illustrating data structures used in the hashing method according to an alternate embodiment is shown The embodiment of Fig 11 is similar to that of Fig 10 and conesponding elements are numbered identically for simplicity and clanty The structure of the embodiment of Fig 11 is similar to that of Fig 10 except that the glyph 180a for each Umcode character is mcluded in the hash bm 172a associated with the Umcode character rather than the hash bin 172a having a reference to the glyph 180a That is, the glyph set 94a is distnbuted among the hash bins 172
The embodiment of Fig 11 has the advantage of using less memory storage space due to the absence of the glyph reference 176a of Fig 10, but the potential disadvantage of having the glyph set distnbuted among the hash bins 172
Refernng now to Fig 12, a block diagram illustrating data structures used in the hashing method according to an alternate embodiment is shown The embodiment of Fig 12 is similar to that of Fig 11 and conesponding elements are numbered identically for simplicity and clanty The structure of the embodiment ofFig 12 is similar to that of Fig 11 except that the hash bm lists compπse linked lists of hash bins, rather than sequentially arranged lists in an array of hash bins according to Fig 11
Each hash bm 172a further compnses a next bin 190a field used to create the linked list of hash bins The next bin 190a field refers to the next hash bin in the hash bin list or contains a NULL value indicating the end of the hash bin list The hash bin count 166a field (of Fig 11) is absent from the hash table elements of Fig 12 since the end of a hash bm list may be determined by the presence of a NULL value m the next bm 190a field of a hash bin 172a
The embodiment of Fig 12 has the advantage of being created using a simpler creation method, but the disadvantage of using more memory storage space due to the presence of the next bin reference
The hashing method solves the problem of mapping a relatively large set of potential input values, l e , the entire Umcode code set, to a relatively smaller subset of values, I e , the code set associated with the subset of Umcode characters used in the Japanese language, or other language with a relatively large number of characters, such as Chinese or Korean One solution contemplated is to provide an array of glyphs indexed by the code point of the
Umcode character, wherem the size of the array is the size of the Umcode code set, I e , 65,536 array elements However, this solution is very costly in terms of memory storage space Another solution contemplated is to provide an anay of encoding value/glyph pairs, the size of which is the size of the language-specific character subset, which is linearly searched for a matching encoding value However, this solution is costly in terms of time
The code points for the characters of a given language are not allocated sequentially m the Umcode code set For example, the characters which are used in the Japanese language do not occupy a single range of code points in the Umcode code set If the code points were ananged m a sequential range, a simple array of glyphs, wherein the size of the array is the size of the language-specific character subset, indexed by subtracting from the code pomt of the character sought the smallest code point in the subset would suffice However, since the code points are not sequential, this solution is not realizable Another solution contemplated is to provide a binary tree, or other tree configuration for arranging the glyph set This solution is potentially supenor to the encoding value/glyph pair solution in terms of time, and potentially supenor to the array of glyphs indexed by Umcode code points solution in terms of memory storage space
The hash tables, however, provides an improved method over the previously contemplated methods for efficiently stonng and quickly retnevmg Umcode character glyphs as will be discussed with reference to
Figure imgf000019_0001
The number of hash bins 172 and the number of glyphs 180a in the glyph set 94a is equal to the number of Umcode characters whose glyphs are present m the glyph set 94a Preferably, the Japanese characters of the Umcode character set are stored in the hash table 160 Preferably, the number of hash table elements is 2048
Refernng now to Fig 13, a flowchart lllusUating steps taken to efficiently store a character subset of the Umcode character set in the hash tables of Figs 10-12 of the set top box 18 (of Fig 2) is shown Preferably, the steps of the method Fig 13 are performed by a computer program, refened to as the table generator, which generates a source code file, such as a C language source code file, winch includes the data structures descnbed in Figs 10-12 The source code files are used to compile and link the operating environment 70 for programming the ROM 30 (of Fig 2) Alternatively, the steps may be performed by a human to generate the source code files, I e , the table generator may be a human and/or modification of the computer program-generated source code file For example, the characters in a given hash bin list may be reordered such that more frequently accessed characters are placed nearer the front of the list to reduce lookup time
The table generator receives a subset of Umcode characters in step 200 The character subset compnses an encoding value and glyph for each of the characters in the subset Preferably, the subset is the Umcode characters used in the Japanese language
The table generator allocates storage for the hash table 160 (of Figs 10-12) m step 202 Preferably, allocating storage compnses generating source code data structures for programming into the ROM 30 (of Fig 2) Allocating storage for the hash table compnses allocating storage for the array of hash table elements 162 (of Figs 10-12) Allocating storage for a hash table element compnses allocating storage for a hashbm hst reference, such as hashbm list reference 164a (of Fig 10) Inthe case of Figs 10 and 11 , allocating storage for a hash table element further compnses allocating storage for a hash bm count, such as hash bin count 166a (of Fig 10)
The table generator determines if more characters of the subset need to be stored in step 204 If not, all the characters of the subset have been stored in the hash table If a new character is to be stored, the table generator allocates storage for a new hash bin associated with the new character m step 206 Allocating storage for a hash bin compnses allocating storage for an encoding value 174a, and a glyph 180a In the case of Fig 10, allocating storage for a new hash bm further compnses allocating storage for a glyph reference 176a Inthe case of Fig 12, allocating storage for a new hash bin further compnses allocating storage for a next bm reference 190a The table generator stores the encoding value and glyph of the new character m the newly allocated hash bimn step 208 In the case of Fig 10, the table generator also populates the glyph reference, suchas glyph reference 176a (of Fig 10) with a reference to the glyph
The table generator calculates an index into the hash table 160 in step 210 according to the following equation index = ((((encodιng_value & MASKl ) » SHIFTVAL) Λ encodιng_value) & MASK2)
Preferably the constant MASKl has a value of Oxff, the constant MASK2 has a value of 0x7ff, and the constant SHIFTVAL has a value of 8 The encoding value is the encoding value of the new character to be stored in the hash table The "&" operation is a bitwise logical AND operation, the "Λ" operation is a bitwise logical EXCLUSIVE OR operation, and the "»" operation is a bitwise logical SHIFT RIGHT operation by the number of bits specified by the SHIFTVAL constant
Preferably, the number of hash table elements is 2048 Using the prefened hash table size, and hashing index equation with the prefened constants to store the Umcode characters used in the Japanese language, advantageously yields a hash table in which only 44 of the 2048 hash table entiles are empty, the average hash bin list length, I e , the average number of "collisions", is 3, the maximum hashbm hst length is 8, and 80% of the characters hash to a hash bin list of length of 5 or less Thus, the hashing method provides an efficient method for stonng and a quick method for retnevmg Japanese Umcode characters The present method yields a distnbution of hash bin list lengths as shown in Table I
Table I Hash bin list length number of lists of this length
0 44
1 171
2 404
3 517
4 456
5 264
6 148
7 35
8 9 In step 212 the table generator calculates a reference to a current hash b list by indexing mto the hash table using the mdex which was calculated m step 210 The cunent hash bm list reference is the hash bm list reference of the indexed hash table element The table generator adds the new hash bin which was allocated m step 206 to the current hash bm list m step 214 With reference to Figs 10 and 11 , adding the new hash bin to the hash bin list includes incrementing the hash bin list count and detenmmng if the hash bm list is empty If the hash bm list is empty, the hash bin list reference is assigned a reference to the new hash bin
With reference to Fig 12, adding the new hash bm to the hash bm list includes assigning the "next" hash bin reference of the new hash bin a terminating value, preferably NULL, and determining if the hash bin list is empty If the hash bin list is empty, the hash bin list reference is assigned a reference to the new hash bm If the hash bm hst is not empty, the tail of the hash bm list is found and the "next" hash bin reference of the tail hash bin is assigned a reference to the new hash bin
After adding the new hash bin to the hash bin list, the table generator returns to step 204 to determine if more characters of the subset need to be stored Refernng now to Fig 14, a flowchart illustrating steps taken to quickly retneve a Umcode character glyphfromthe hash tables of Figs 10-12 ofthe set top box of Fig 2 ιs shown The steps to retneve a Umcode character glyph from the hash tables are performed by a rendenng engine, such as rendenng engine 92a (of Fig 6) The rendenng engme 92a retneves the Umcode character glyph to use m rendenng the specified Umcode character in response to a request from the Umcode encoding engine 84 (of Fig 6) to render a Umcode suing The rendenng engine 92a receives an encoding value, or code point, conesponding to a Umcode character to be rendered in step 220
The rendenng engine 92a calculates an index into the hash table 160 m step 222 according to the same equation used in step 210 (of Fig 13) to calculate the index The descnption of Fig 13 provides a detailed descnption of the hashing equation The descnption of Fig 13 also discusses the vanous list length results, shown in Table I, ofthe hashing method, which are pertinent to the retrieval time associated with the method
In step 224 the rendenng engme 92a calculates a reference to a hashbm list by indexing into the hash table using the index which was calculated in step 222 The hash bin list reference is the hash bin list reference ofthe mdexed hash table element The rendenng engme 92a assigns a current hash bm reference to the first hash bm m the hash bin list in step 226
The rendenng engine 92a determines if the end ofthe hash bm list has been reached in step 228 Determining if the end ofthe hashbm list has been reached includes determining if the number of hash bins visited m hash bin list is greater than the hash bin list count value in the case of Figs 10 and 11 "Visiting" a hash bm means assigning the current hash bm reference to a new value, as is performed in steps 226 and 236 Detemumng if the end ofthe hashbm list has been reached mcludes determining if the cunent liashbin reference has a NULL value in the case of Fig 12
If the end ofthe hash bm hst has been reached, then the glyph for the Umcode character corresponding to the encoding value which was received m step 220 is not present m the hash table 160 In this case, m step 230 the rendenng engine 92a searches the list of downloaded glyphs 97 (of Fig 6) to find the glyph for the Umcode character corresponding to the encoding value which was received in step 220 If the glyph is not present in the list of downloaded glyphs 97 then an enor has occurred
If the end ofthe hash bin list has not been reached, in step 232, the rendenng engine 92a determines if the encoding value ofthe current hash bin is equal to the encoding value which was received in step 220 If not, the current hash bin reference is assigned a reference to the next hash bin in the liash bm list in step 236 and the rendenng engine 92a returns to step 228 However, if the encoding values are equal, the rendenng engine 92a returns a reference to the glyph ofthe cunent hash bin in step 234
Although the system and method ofthe present invention has been descnbed in connection with the prefened embodiment, it is not intended to be limited to the specific form set forth herein, but on the contrary, it is intended to cover such alternatives, modifications, and equivalents, as can be reasonably included within the spint and scope ofthe invention as defined by the appended claims

Claims

WHAT IS CLAIMED IS:
1 A television system compnsing
a set top box compnsing
a receiver configured to receive an application program from a broadcast center, wherein said application program includes a first character for display on a television, wherein said first character has an associated encodmg value from a set of encoding values, wherein said set of encoding values compnses encoding values for characters from a plurality of languages,
a processor configured to operably receive said application program from said receiver and execute said application program,
a language detector configured to determine a first language associated with said first character from said plurality of languages, and
a first rendenng engine associated with said first language for rendenng said first cliaracter for display on the television
2 The system as recited m claim 1 , wherem said set top box further compnses a glyph descnbmg a graphic representation of said first character, wherein said glyph conesponds to said encoding value of said first character, wherem said first rendenng engine uses said glyph to render said first character
3 The system as recited in claim 1 , wherem said application program further compnses a glyph descnbmg a graphic representation of said first character, wherem said glyph conesponds to said encoding value of said first character, wherem said first rendenng engine uses said glyph to render said first character
4 The system as recited in claim 1 , wherem said application program includes a second character for display on the television, wherem said second character has an associated encoding value from said set of encodmg values
5 The system as recited in claim 4, wherein said language detector is configured to determine a second language associated with said second character from said plurality of languages
6 The system as recited in claim 5, wherem said set top box further compnses a second rendenng engme associated with said second language for rendenng said second character for display on the television 7 The system as recited in claim 6, wherem said set top box further compnses a glyph descnbmg a graphic representation of said second character, wherein said glyph conesponds to said encoding value of said second character, wherem said second rendenng engine uses said glyph to render said second character
8 The system as recited in claim 5, wherein said first rendenng engine is also associated with said second language for rendenng said second character for display on the television
9 The system as recited m claim 1 , wherein said set top box further compnses a de-multiplexer coupled to said receiver
10 The system as recited in claim 9, wherein said receiver is configured to receive an audio/video data stieam for display on the television concurrently with said first character and multiplexed with said application program, wherein said de-multiplexer is configured to de-multiplex said audio/video data stream from said application program
11 The system as recited in claim 1 , wherein said set top box further compnses a port for receiving user mput from a user input device
12 The system as recited in claim 11 , wherem said set top box is configured to communicate said user input to a remote computer system
13 The system as recited in claim 1 , wherein said set top box further compnses a memory coupled to said receiver configmed to store said application program
14 The system as recited in claim 1, wherem said set top box further compnses an encoding engine configured to receive said first character from said application program, and to invoke said language detector and said first rendenng engine
15 The system as recited in claim 14, wherein said encoding engine compnses functions callable by said apphcation program for mampulatmg stnngs of characters, wherein each of said characters has an encoding value from said set of encoding values
16 The system as recited in claim 1, wherein said set of encoding values compnses the Umcode set of character encoding values
17 The system as recited in claim 1 , wherem said set top box further compnses a memory coupled to said receiver configmed to store a plurality of glyphs 18 A method for displaying a stnng of characters in a television system, wherein the television system compnses a set top box compnsing a receiver, a processor, a language detector, and a first rendenng engine the method compnsing
receiving into said set top box an application program from a broadcast center, wherein said application program includes a first character for display on a television, wherein said first character has an associated encoding value from a set of encoding values, wherein said set of encoding values compnses encoding values for characters from a plurality of languages,
executing said application program on said processor,
said language detector determining a first language associated with said first character from said plurality of languages, and
said first rendenng engine rendenng said first character for display on the television
19 The method as recited in claim 18, wherein said set top box further compnses a glyph descnbmg a graphic representation of said first character, wherein said glyph conesponds to said encoding value of said first character, wherein said rendenng compnses using said glyph to render said first cliaracter
20 The method as recited in claim 18, wherein said application program further compnses a glyph descnbmg a graphic representation of said first character, wherein said glyph conesponds to said encoding value of said first character, wherein said rendenng compnses using said glyph to render said first character
21 The method as recited in claim 18, wherein said application program includes a second character for display on the television, wherem said second character has an associated encoding value from said set of encoding values
22 The method as recited m claim 21 , further compnsing said language detector determining a second language associated with said second character from said plurality of languages
23 The method as recited m claim 22, wherein said set top box further compnses a second rendenng engine associated with said second language, wherem said method further compnses said second rendenng engine rendenng said second character for display on the television
24 The method as recited in claim 23 , wherein said set top box further compnses a glyph descnbmg a graphic representation of said second character, wherem said glyph conesponds to said encodmg value of said second character, wherein said second rendenng engine rendenng said second character compnses using said glyph
25 The method as recited in claim 21 , further compnsing said first rendenng engine rendenng said second character for display on the television
26 The method as recited in claim 18, wherein said receiver is configured to receive an audio/video data stream multiplexed with said application program, wherein said method further compnses demultiplexing said audio/video data stream from said application program
27 The method as recited in claim 18, further compnsing receiving an audio/video data stream from the broadcast center, and displaymg the audio/video data stream on the television concurrently with said first character
28 The method as recited m claim 18, further compnsing receiving user input from a user input device
29 The method as recited in claim 28, further compnsing communicating said user input to a remote computer system
30 The method as recited in claim 18, further compnsing stonng said application program in a memory of said set top box
31 The method as recited in claim 18, wherem said set of encoding values compnses the Umcode set of character encoding values
32 The method as recited m claim 18, further compnsing stonng a plurality of glyphs m a memory of
33 A television system compnsing
a broadcast center configured to transmit an application program and a glyph, wherein said application program includes a character for display on a television, wherem said glyph descnbes a graphic representation of said character, and
a set top box, compnsing
a receiver configured to receive said application program and said glyph from said broadcast center, a processor configured to operably receive said application program from said receiver and execute said application program, and
a rendenng engine for rendenng said cliaracter for display on the television, wherein said rendenng engine uses said glyph to render said character for display on the television
34 The system as recited in claim 33 , wherem said glyph is included in a resource file received with said application program from said broadcast center
35 The system as recited in claim 33, wherem said set top box further compnses an operating environment configured to add said glyph to a list of downloaded glyphs
36 The system as recited in claim 33, wherein said set top box further compnses a glyph set, wherem said rendenng engme uses said glyph set to render characters
37 The system as recited in claim 36, wherem said rendenng engine is configured to determine if a glyph for said character is not present in said glyph set
38 The system as recited in claim 37, wherein said rendenng engine is configured to search said list of downloaded glyphs if said glyph for said character is not present m said glyph set
39 The system as recited m claim 33, wherein said set top box further compnses a memory operably coupled to said receiver configured to store said glyph
40 The system as recited in claim 39, wherein said memory is a random access memory
41 The system as recited m claim 39, wherein said memory is a flash memory
42 The system as recited m claim 33, wherem said set top box further compnses an encoding engine configured to receive said character from said apphcation program, and to invoke said rendenng engine
43 The system as recited m claim 33, wherein said character has an encodmg value according to the Umcode set of character encoding values
44 A method for displaymg a stnng of characters in a television system, wherein the television system compnses a set top box compnsing a processor and a memory, the method compnsing transmitting an application program and a glyph, wherem said application program mcludes a character for display on a television, wherein said glyph descnbes a graphic representation of said character,
receiving said application program and said glyph,
executing said application program, and
rendenng said character for display on the television using said glyph to render said character for display on the television
45 The method as recited m claim 44, wherein said transmitting said application program and said glyph compnses transmitting a resource file including said glyph
46 The method as recited in claim 44 wherein said receiving said application program and said glyph compnses receiving said resource file including said glyph
47 The method as recited in claim 44, further compnsing adding said glyph to a list of downloaded glyphs
48 The method as recited in claim 47, further compnsing detemumng if a glyph for said character is not present in a glyph set of said set top box
49 The method as recited in claim 48, further compnsing searching said list of downloaded glyphs if said glyph for said character is not present in said glyph set
50 The method as recited in claim 44, further compnsing stonng said glyph in the memory of said set top box
51 The method as recited m claim 44, wherein said character has an encodmg value according to the
Umcode set of character encoding values
52 A set top box for receiving from a broadcast center an application program and a glyph wherem the application program includes a character for display on a television, wherem the glyph descnbes a graphic representation of the character, the set top box compnsing
a receiver configured to receive said application program and said glyph from the broadcast center, a processor configured to operably receive said application program from said receiver and execute said application program, and
a rendenng engine for rendenng said character for display on the television, wherem said rendenng engine uses said glyph to render said character for display on the television
53 The set top box as recited in claim 52, wherem said glyph is included in a resource file received with said application program from the broadcast center
54 The set top box as recited in claim 52, wherein said set top box further compnses an operating environment configured to add said glyph to a list of downloaded glyphs
55 The set top box as recited in claim 52, further compnsing a glyph set, wherein said rendenng engine uses said glyph set to render cliaracters
56 The set top box as recited in claim 55, wherein said rendenng engine is configured to determine if a glyph for said character is not present in said glyph set
57 The set top box as recited in claim 56, wherein said rendenng engine is configured to search said list of downloaded glyphs if said glyph for said character is not present in said glyph set
58 The set top box as recited in claim 52, further compnsing a memory operably coupled to said receiver configured to store said glyph
59 The set top box as recited m claim 58, wherein said memory is a random access memory
60 The set top box as recited in claim 58, wherem said memory is a flash memory
61 The set top box as recited in claim 52, further compnsing an encoding engine configured to receive said character from said apphcation program, and to invoke said rendenng engme
62 The set top box as recited in claim 52, wherein said character has an encodmg value according to the Umcode set of character encoding values
63 A method for efficiently stonng a subset of cliaracters from a set of characters, wherem the set of characters are encoded accordmg to a set of encoding values, wherein each of the characters has a umque associated encoding value, the method compnsmg receiving a character of said subset of characters, wherein said character compnses an encoding value and a glyph descnbmg a graphic representation of said character,
allocating storage for a hash bin compnsing storage for an encoding value and a glyph,
stonng said encoding value and said glyph m said hash bm,
calculating an index according to an equation index = ((((encodιng_value & MASKl) » SHIFTVAL ) Λ encoding value) & MASK2),
wherein said MASKl, MASK2, and SHIFTVAL are predefined constants,
calculating a reference to a hash bm list by indexing into a hash table of references to hash bin lists using said index,
adding said hash bm to said hash bm list,
performing said receiving, said allocating, said stonng, said calculating an index, said calculating a reference, and said adding for each character m said subset of characters
64 The method as recited in claim 63, wherein said set of encoding values compnses the Umcode set of encoding values
65 The method as recited in claim 63, wherein said subset of cliaracters compnses characters from at least one language
66 The method as recited in claim 65, wherein said at least one language mcludes at least one from the group compnsing Japanese, Chinese, Korean, Thai, Arabic, Indie, and European languages
67 The method as recited in claim 63, wherem said MASKl constant is OxOff, said MASK2 constant is
0x07ff, and said SHIFTVAL constant is 8
68 The method as recited in claim 63, wherem said hash table of references compnses 2048 references to hash bin lists
69 The method as recited in claim 63 , wherein said hash bin further compnses a reference to said glyph wherein said storage for said bm compnses a first portion for said glyph and a second portion for said encodmg value and said reference 70 The method as recited m claim 63 , wherem said hash bm list compnses a linked list of hash bins
71 The method as recited in claim 63 , wherem said hash bm list compnses an aιτaγ of hash bins
72 The method as recited in claim 70, wherein said reference to said hash bin list compnses a count of hash bins in said hash bm list
73 The method as recited in claim 63 , further compnsing allocating storage for said liash table pnor to said allocating storage for a hash bin
74 A method for quickly retnevmg a character of a subset of characters from a set of characters, wherem the set of characters are encoded accordmg to a set of encoding values, wherem each of the characters lias a umque associated encoding value, the method compnsing
receiving an encoding value of a character in said subset of characters,
calculating an index according to an equation, ndex = ((((encodmg_value & MASKl) » SHIFTVAL ) Λ encodιng_value) & MASK2),
wherein said MASKl , MASK2, and SHIFTVAL are predefined constants,
calculating a reference to a hash bin list by indexing into a hash table of references to hash bm lists using said index, wherein a hash bin includes an encoding value and a glyph descnbmg a graphic representation of a character,
searching said hash bm hst for a hash bm having an encoding value equal to said encoding value of said character, and
returning a glyph of said hash bm having an encoding value equal to said encoding value of said character
75 The method as recited m claim 74, wherem said searching compnses
referencing a first hash bm of said hash bm list as a cunent hash bin,
detemumng if an encodmg value of said current hash bin is equal to said encoding value of said character, assigning said current hash bm to be a next hash bm in said list of hash bins if said encoding value of said cunent hash b is not equal to said encoding value of said character, and
performing said detemumng and said assigning until said encoding value of said cunent hash bm is equal to said encoding value of said character
76 The method as recited in claim 74, wherein said set of encoding values compnses the Umcode set of encoding values
77 The method as recited m claim 74, wherem said subset of characters compnses characters from at least one language
78 The method as recited in claim 77, wherein said at least one language includes at least one from the group compnsing Japanese, Chinese, Korean, Thai, Arabic, Indie, and European languages
79 The method as recited in claim 74, wherem said MASKl constant is OxOff said MASK2 constant is 0x07ff, and said SHIFTVAL constant is 8
80 The method as recited in claim 74, wherein said hash table of references compnses 2048 references to hash bin lists
81 The method as recited in claim 74, wherem said hash bin further compnses a reference to said glyph
82 The method as recited in claim 74, wherem said hash bin list compnses a linked list of hash bins
83 The method as recited in claim 74, wherem said hash bin list compnses an anay of hash bins
84 The method as recited in claim 83, wherem said reference to said hash bm list compnses a count of hash bins in said hash bin list
85 A set top box for displaying a character on a television, wherein the character is of a subset of characters from a set of characters, wherein the set of characters are encoded according to a set of encoding values, wherem each ofthe characters has a umque associated encoding value, compnsing
a processor configured to execute program instructions, and
a memory coupled to said processor configured to store said program instructions, wherein said program instructions compnse instructions for receivmg an encoding value of a character in said subset of characters, instructions for calculating an index according to an equation, index = ((((encodmg value & MASKl) » SHIFTVAL ) Λ encodιng_value) & MASK2),
wherein said MASKl , MASK2, and SHIFTVAL are predefined constants,
instructions for calculating a reference to a hash bin list by indexing mto a hash table of references to hash bm lists using said index, wherein a hash bm compnses storage for an encoding value and a glyph descnbmg a graphic representation of said character,
instructions for searching said hash bin list for a hash bin having an encodmg value equal to said encoding value of said character,
instructions for returning a glyph of said hash bin having an encoding value equal to said encoding value of said character, and
instructions for rendenng said glyph for display on the television.
86 The set top box as recited m claim 85, wherein said set of encoding values compnses the Umcode set of encoding values
87 The set top box as recited in claim 85, wherein said subset of characters compnses characters from at least one language
88 The set top box as recited in claim 87, wherein said at least one language includes at least one from the group compnsing Japanese, Chinese, Korean
89 The set top box as recited in claim 85, wherein said MASKl constant is OxOff, said MASK2 constant is 0x07ff, and said SHIFTVAL constant is 8
90 The set top box as recited in claim 85, wherem said hash table of references compnses 2048 references to hash bm lists
91 The set top box as recited in claim 85 , further compnsing instructions for allocating storage for said hash table pnor to said allocating storage for a hash bm
92 A set top box for displaymg a character on a television, wherem the character is of a subset of characters from a set of characters, wherein the set of characters are encoded according to a set of encoding values, wherein each ofthe characters has a umque associated encoding value, compnsing
a processor configured to execute program instructions, and
a memory coupled to said processor configured to store said program instructions, wherein said program instructions compnse
instructions for receiving an encoding value of a character m said subset of characters,
instructions for calculating an index according to an equation, index = ((((encodmg value & MASKl) » SHIFTVAL ) Λ encodmg value) & MASK2),
wherein said MASKl, MASK2, and SHIFTVAL are predefined constants,
instructions for calculating a reference to a hash bin list by indexing into a hash table of references to hash bin lists using said index, wherem a hash bm includes an encoding value and a glyph descnbmg a graphic representation of said character
instructions for searching said hash bin list for a hash bin having an encoding value equal to said encoding value of said character, and
instructions for returning a glyph of said hash bin having an encoding value equal to said encodmg value of said character
93 The set top box as recited m claim 92, wherein said instructions for searching compnse
instructions for referencing a first hash bm of said hash bm list as a current hash bin,
instructions for determining if an encoding value of said cunent hash bin is equal to said encoding value of said character,
instmctions for assigmng said current hash bm to be a next hash bm m said list of hash bins if said encoding value of said current hashbm is not equal to said encoding value of said character, and
instructions for performing said detemumng and said assigmng until said encoding value of said cunent hash bm is equal to said encoding value of said character 94 The set top box as recited in claim 92, wherem said set of encodmg values compnses the Umcode set of encoding values
95 The set top box as recited in claim 92, wherem said subset of characters compnses cliaracters from at least one language
96 The set top box as recited in claim 95, wherein said at least one language includes at least one from the group compnsing Japanese, Chinese, Korean
97 The set top box as recited in claim 92, wherein said MASKl constant is OxOff, said MASK2 constant is 0x07ff and said SHIFTVAL constant is 8
98 The set top box as recited in claim 92, wherein said liash table of references compnses 2048 references to hashbm lists
PCT/US1997/020858 1996-11-12 1997-11-12 System and method for receiving and rendering multi-lingual text on a set top box WO1998021890A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU52587/98A AU5258798A (en) 1996-11-12 1997-11-12 System and method for receiving and rendering multi-lingual text on a set top box

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US08/747,204 1996-11-12
US08/747,204 US6141002A (en) 1996-11-12 1996-11-12 System and method for downloading and rendering glyphs in a set top box
US08/747,207 1996-11-12
US08/745,508 1996-11-12
US08/747,207 US5870084A (en) 1996-11-12 1996-11-12 System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US08/745,508 US5966637A (en) 1996-11-12 1996-11-12 System and method for receiving and rendering multi-lingual text on a set top box

Publications (1)

Publication Number Publication Date
WO1998021890A1 true WO1998021890A1 (en) 1998-05-22

Family

ID=27419325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/020858 WO1998021890A1 (en) 1996-11-12 1997-11-12 System and method for receiving and rendering multi-lingual text on a set top box

Country Status (2)

Country Link
AU (1) AU5258798A (en)
WO (1) WO1998021890A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0989499A2 (en) * 1998-09-25 2000-03-29 Apple Computer, Inc. Unicode conversion into multiple encodings
WO2000038170A2 (en) * 1998-12-18 2000-06-29 Powertv, Inc. Font substitution system
EP1420580A1 (en) * 2002-11-18 2004-05-19 Deutsche Thomson-Brandt GmbH Method and apparatus for coding/decoding items of subtitling data
GB2473724A (en) * 2009-09-17 2011-03-23 Ad Fuse Technology Ltd Enhancing video data by integrated controlling program that operates upon video playback

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5229768A (en) * 1992-01-29 1993-07-20 Traveling Software, Inc. Adaptive data compression system
WO1994029840A1 (en) * 1993-06-07 1994-12-22 Scientific-Atlanta, Inc. Display system with programmable display parameters
EP0661670A2 (en) * 1994-01-04 1995-07-05 Digital Equipment Corporation System and method for generating glyphs for unknown characters

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5229768A (en) * 1992-01-29 1993-07-20 Traveling Software, Inc. Adaptive data compression system
WO1994029840A1 (en) * 1993-06-07 1994-12-22 Scientific-Atlanta, Inc. Display system with programmable display parameters
EP0661670A2 (en) * 1994-01-04 1995-07-05 Digital Equipment Corporation System and method for generating glyphs for unknown characters

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HASKIN R L ET AL: "A SYSTEM FOR THE DELIVERY OF INTERACTIVE TELEVISION PROGRAMMING", DIGEST OF PAPERS OF THE COMPUTER SOCIETY COMPUTER CONFERENCE (SPRING) COMPCON, TECHNOLOGIES FOR THE INFORMATION SUPERHIGHWAY SAN FRANCISCO, MAR. 5 - 9, 1995, no. CONF. 40, 5 March 1995 (1995-03-05), INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, pages 209 - 215, XP000545431 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0989499A2 (en) * 1998-09-25 2000-03-29 Apple Computer, Inc. Unicode conversion into multiple encodings
WO2000038170A2 (en) * 1998-12-18 2000-06-29 Powertv, Inc. Font substitution system
WO2000038170A3 (en) * 1998-12-18 2000-09-14 Powertv Inc Font substitution system
EP1420580A1 (en) * 2002-11-18 2004-05-19 Deutsche Thomson-Brandt GmbH Method and apparatus for coding/decoding items of subtitling data
WO2004047431A1 (en) * 2002-11-18 2004-06-03 Thomson Licensing S.A. Method and apparatus for coding/decoding items of subtitling data
GB2473724A (en) * 2009-09-17 2011-03-23 Ad Fuse Technology Ltd Enhancing video data by integrated controlling program that operates upon video playback
GB2473724B (en) * 2009-09-17 2012-01-04 Ad Fuse Technology Ltd System and method for enhancing video data

Also Published As

Publication number Publication date
AU5258798A (en) 1998-06-03

Similar Documents

Publication Publication Date Title
US6141002A (en) System and method for downloading and rendering glyphs in a set top box
US5870084A (en) System and method for efficiently storing and quickly retrieving glyphs for large character set languages in a set top box
US5966637A (en) System and method for receiving and rendering multi-lingual text on a set top box
US5682158A (en) Code converter with truncation processing
AU781596B2 (en) Data entry in a GUI
US7653752B2 (en) Distribution contents forming method, contents distributing method and apparatus, and code converting method
US5784069A (en) Bidirectional code converter
US5784071A (en) Context-based code convertor
US5717922A (en) Method and system for management of logical links between document elements during document interchange
USRE40361E1 (en) Data conversion apparatus for data communication system
EP0989499A2 (en) Unicode conversion into multiple encodings
US20010047373A1 (en) Publication file conversion and display
JP3884102B2 (en) Device for supplying formatted data to a central processing unit
US20020032699A1 (en) User interface for network browser including pre processor for links embedded in hypermedia documents
EP0898404A2 (en) Information providing system
US20050005302A1 (en) Document data structure and method for integrating broadcast television with Web pages
WO1997043723A1 (en) Structured document browser
JP2012018489A (en) Image processor, image processing method, and program
CA2559198C (en) Systems and methods for identifying complex text in a presentation data stream
US5465322A (en) Apparatus and method for parsing a stream of data including a bitmap and creating a table of break entries corresponding with the bitmap
US20020091737A1 (en) System and method for rules based media enhancement
WO1998021890A1 (en) System and method for receiving and rendering multi-lingual text on a set top box
WO1997010556A1 (en) Unicode converter
US20040143816A1 (en) Information processing apparatus, information processing method, storage medium, and program
WO1997010556A9 (en) Unicode converter

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH KE LS MW SD SZ UG ZW AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase