New! View global litigation for patent families

WO1999028831A1 - Preformatted allographic arabic text for html documents - Google Patents

Preformatted allographic arabic text for html documents

Info

Publication number
WO1999028831A1
WO1999028831A1 PCT/US1998/025201 US9825201W WO1999028831A1 WO 1999028831 A1 WO1999028831 A1 WO 1999028831A1 US 9825201 W US9825201 W US 9825201W WO 1999028831 A1 WO1999028831 A1 WO 1999028831A1
Authority
WO
Grant status
Application
Patent type
Prior art keywords
false
text
call
arabic
true
Prior art date
Application number
PCT/US1998/025201
Other languages
French (fr)
Other versions
WO1999028831A9 (en )
Inventor
Nizar Yahya Habash
Original Assignee
University Of Maryland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language
    • G06F17/2863Processing of non-latin text
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/211Formatting, i.e. changing of presentation of document
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • G06F17/2217Character encodings
    • G06F17/2223Handling non-latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/28Processing or translating of natural language

Abstract

A method and apparatus for displaying text from a computer readable medium onto a computer display stores text data in a graphemic format (21), then converts (23) the graphemic text data, being made up of individual characters, to allographic text data (24) which is also made up of individual characters, using at least 8 bit encoding. The allographic text data (24) is then stored. The allographic text data (24) is then rendered as text on a display using a font.

Description

TITLE OF THE INVENTION:

PREFORMATTED ALLOGRAPHIC ARABIC TEXT FOR HTML DOCUMENTS BACKGROUND OF THE INVENTION: Field of the Invention:

The present invention relates to a method and apparatus for rendering allographic Arabic text on internet or HTML (hyper-text markup language) documents such as home pages and web sites. Description of the Related Art: Typical home pages and web sites which are currently in use are designed for Roman or Latin script languages such as English. Arabic script, however, is very different from Roman script in a number of ways. In addition to the significant differences in the shape of the characters, Arabic is written from right to left instead of from left to right, which causes, of course, significant differences with respect to justification and line wrapping. Additionally, the form of Arabic letters depends upon the position of the letter within the word. The same Arabic letter may have a different form depending upon whether the letter or character is a first character, a middle character, or an end character, or if the character stands alone. The forms are typically referred to as initial, medial, final, and stand-alone. These multiple forms are referred to as allographs. The rules for mapping a letter or grapheme into its allographs are called graphotactics. Conventional solutions for representing Arabic characters include treating them either as graphics or as text. Commonly, the most current solution is to treat Arabic text documents as graphical images. Such a solution is platform independent, due to the fact that pixel-handling is a platform independent phenomenon. However, since such graphical representation of characters requires a significant amount of information and calculation, a significant amount of memory is necessary, and a significant amount of time is necessary in order to render the characters. Additionally, graphical representation of text makes it difficult or impossible for the text to be appropriately searched by a search engine, or linked to other sites. The treatment of Arabic characters as text would seem to be the most desirable solution. A text based system is efficient with respect to memory space and rendering time, and makes it possible to search and link as with other text documents. Existing encodings of Arabic include ISO (International Standards Organization), ASMO (Arab Standards and Metrology Organization), CP-1256 (Microsoft (tm) encoding for Arabic Windows (tm)), and Unicode. However, the current state of the art is such that a specific platform is needed in order to either create or view documents in Arabic. Each of the known encodings discussed above requires a special hardware or software configuration which must be prepared before any Arabic reading or writing can take place. Every letter is represented as one character regardless of its position, and a specialized operating system or program is capable of dealing with appropriate right-left directional representation and graphotactics. The platform-dependent nature of this solution, however, creates significant limitations on its viability. While home pages or documents can be created using, for example, Arabic Windows (tm), such documents can only viewed by computers having Arabic Windows (tm), or having a localized version of a browser capable of dealing with the directional representation and graphotactics. SUMMARY OF THE INVENTION:

The present invention, therefore, is a method of treating Arabic text as text, in a manner which is platform independent, efficient from a memory and CPU speed perspective, is compatible with any other Arabic representation system, and enables benefits to be realized from advances in Roman script web typography. These advantages are created by a method wherein Arabic script is considered and handled in an allographic manner; every form of a particular Arabic letter is considered to be a different character to solve the graphotactics problem, and the text is preformatted in order to solve the problems which are typically associated with the right-left direction problem. The present invention, therefore, allows Arabic to be treated by browsers as if it were English. The invention encodes Arabic text in such a way that viewing would require a font only. The invention has been implemented in a special text editor which essentially converts a conventional computer into an Arabic text editor. HTML pages which are created using the invention are viewable by any computer that has access to the necessary font. The font can either be downloaded to the machine or embedded in the HTML document as an object, otherwise known as a dynamic font. The present invention, therefore, includes a novel configuration for encoding Arabic text.

The invention is embodied, therefore, as a method of displaying text from a computer readable medium, and an apparatus for displaying text from a computer readable medium, with the text being displayed onto a computer display. The method comprises the steps of storing text data in a graphemic format, and converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding. The allographic text data is stored on a computer readable medium, and the allographic text data is then rendered on a display using a font. In a preferred embodiment, the text data is stored as Arabic script characters. Furthermore, the rendering step can comprise one-to-one mapping of the allographic text data to glyphs. An apparatus according to the present invention includes a storing device for storing text data in a graphemic format, converting means for converting the graphemic text data, using at least 8 bit encoding, and a second storing device is provided for storing the allographic text data. Rendering means are provided for rendering the allographic text data as text on a display using a font. BRIEF DESCRIPTION OF THE DRAWINGS: Figure 1 illustrates a text rendering process according to the prior art;

Figure 2 illustrates a text rendering process according to the present invention;

Figure 3 illustrates Arabic text as rendered according the present invention; Figure 4 illustrates the text of Figures 3, but without having the necessary font applied;

Figure 5 is a conventional ASCII table; and Figure 6 is a table showing 8 bit representations of Arabic characters according to the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS:

In order to more clearly understand the present invention, a brief discussion of textual representation and terminology is appropriate. Writing systems, in any language, are broken down into units; graphemes are known as the smallest units in any writing system which are capable of causing a contrast in meaning. In the English alphabet, therefore, switching from "hook" to "book" introduces a meaning change. Therefore, "h" and "b" are each graphemes. There is no prescribed form for a grapheme. The grapheme for a particular letter or sound may appear as a capital letter, a small letter, a parenthetical notation, or other form depending upon the particular handwriting style or type face which is chosen. Each of the possible forms for a grapheme is known as a graph. When graphs are considered to be variations of a particular grapheme, they are considered to be "allographs."

Arabic script, typically used for writing the Arabic language, can also be used for writing other languages such as Persian, Urdu, Pashto, Sindhi, and Kurdish. Arabic script appears as cursive writing, whether handwritten or printed. The resulting handwritten traditions are such that the same letter may be written in different forms, depending upon how the character joins with neighboring characters. Typically, the encoding is such that a total of 54-58 graphemes are encoded as approximately 109 allographs according to the present invention. These graphemes include 28 basic characters, 8 to 12 additional characters, 10 numerals, and 8 vowel characters. According to prior art such as Unicode, each letter receives only one Unicode character value, independent of the form. The Unicode standard encodes characters, and the characters are resident as memory strings in internal memory or on disk storage. When a character is rendered on a screen, it is called a glyph. A character can render as one glyph, or part of one glyph. The specific glyph corresponding to the appropriate letter form (initial, medial, final, etc.) is determined by the character context. A repertoire of glyphs in Unicode comprises a font. Referring to Figure 1, the Unicode standard, as an example, contains information wherein characters are encoded, and stored on a disk as characters in a graphemic format. When text is sought to be rendered on the screen, the characters are taken from memory and subjected to a text conversion/rendering process wherein a set of conversion rules and a font are applied to the characters on the disk, to render a document on the screen. The document on the screen is essentially displayed in real time as a series of glyphs.

Referring to Figure 2, however, it can be seen that the present invention treats documents in a significantly different manner than the prior art text handling system such as that which is discussed above. According to the invention, a document which is stored on disk or in memory as a plurality of characters in a graphemic format is subjected to a conversion process wherein the graphemes are converted by a series of conversion rules into allographs, according to predetermined graphotactics. The resulting document is then stored on the disk as a series of allographs. When a user accesses the document for the purposes of reading it, a text rendering process is performed, wherein graphotactics are not used. Only one-to-one mapping is performed on the document to the screen, using a font. According to this process, the conversion occurs only once per document, while the rendering occurs every time a user accesses the document. According to the prior art of Figure 1, however, the conversion/rendering process wherein a grapheme is converted to a glyph occurs every time the document is accessed. This results in slower system operation and rendering time.

Attempts have been made to utilize a process similar to the process of Figure 2, however using only 7 bits of a conventional 8 bit code page such as an extended ASCII table. A conventional extended ASCII table is shown in Figure 5. Specifically, the allographic characters in this previous attempt were assigned locations in the second half of the extended ASCII table, keeping the first half of the table identical to the ASCII character set. Since appropriate Arabic text requires a minimum of 109 allographs, however, this 7 bit attempt was too limited. It ignored specific characters such as vowel marks, and required combining of certain letter forms. This resulted in a system with limited Arabic text representation ability. For example, Koranic text requires more ability than this 7 bit system provided. The present invention solves this problem by using the first as well as the second halves of the extended ASCII table, which consists of 8 bits, thereby using space typically reserved for the ASCII character set. This provides the invention with the number of locations needed to represent at least the 109 allographs needed for good quality Arabic text publishing. This representation results in 8 bit encoding of the allographs. A character table according to the present invention, therefore, is shown in Figure 6.

The invention, therefore, is directed to a method of and apparatus for displaying text from a computer memory onto a computer display, with the method comprising the steps of storing the text data in a graphemic format, then converting the graphemic text data, which is made up of individual characters, to allographic text data made up of individual characters, using at least 8-bit encoding. The allographic text data is then stored in a suitable memory. The final step is rendering the text on a display using a font. It should be noted that the present invention utilizes a rendering step wherein the allographic text data is mapped to glyphs in a one-to-one mapping configuration.

Additionally, in the case of Arabic text, the step of converting from graphemes to allographs includes a step of appropriately Preformatting the text for right-to-left display. Documents which are prepared in this fashion can be made available for accessing by any computer on the internet through normal web server technology. A client or computer wishing to view the Arabic document would access the particular page by entering in the address or "URL" for the page. When the page begins loading, the specific Arabic character information or font information can be loaded as a dynamic font with the page information, so that the client computer receives the necessary encoding information to properly view the Arabic characters. In the alternative, the specific Arabic character information can be downloaded separately. Any computer capable of viewing documents with dynamic fonts, therefore, will be capable of viewing the Arabic documents, and will not require a specific hardware/software platform or any plug-ins. Figure 3 is an illustration of Arabic text rendered as glyphs on a computer screen through an internet browser, with the Arabic text having been encoded according to the present invention. The text is, as noted previously, treated in a textual format, and is capable of being searched, edited, and linked. Figure 4 illustrates the text as stored on disk as allographs. When subjected to the text rendering process and font information illustrated in Figure 2, the glyphs of Figure 3 are rendered. If the text rendering process were not performed using the appropriate font, the allographs would appear on the screen as illustrated in Figure 4. The downloading of the font or the supply of the dynamic font to the computer upon which the Arabic text is rendered ensures appropriate viewing. An example of the conversion rules for converting the graphemes to allographs are shown in the attached code page. The code page, written in Visual Basic (tm), includes data regarding features of the characters that are used by the conversion rules to determine the allographs. For example, the letter "gayn" has the features of being connectable to other letters, does not act as a space, is not a vowel, and has a different form and therefore a different code for when the letter appears initially, medially, finally, or alone. Because the present invention is capable of treating the different forms of each letter as a different character, complex determining software, such as Arabic Windows (tm) which attempts to determine an appropriate form based upon a character's position, is not necessary.

As noted previously, the invention is directed to a method and a computer system which displays text from a computer readable medium onto a computer display. In view of the recent popularity of internet publishing and HTML documents, the invention provides a new and unique way of preparing and viewing HTML documents which have non-Roman characters, such as Arabic. Although the above discussion of the embodiments of the invention are illustrative in nature, and it would be evident to a person of ordinary skill in the art that a number of modifications could be made to the invention while still remaining within the spirit and scope of the invention. For example, although Arabic text and Arabic characters are discussed, the invention could be used on other languages or other character sets which have similarities to Arabic text and Arabic characters.

For a more clear understanding of the metes and bounds of the present invention, reference should be made to the appended claims.

Global Const: lcaaf% = 223

Global Const laam% = 225

Global Const miim% = 227

Global Const mιun% = 228

Global Cαnst aa% = 229

Global Const aw% = 230

Global Const alifnaqsura% = 23ό

Global Const yaa% = 237

Global Const: fatiιa% = 243

Global Const kasra% = 246

Global Const darπma% = 245

Global Const fat .atan iin% = 240

Global Const kasrataπwiir.% = 242

Global Const daπmatan iin% = 241

Global Const shadάa% = 248

Global Const suk ur.% = 250

Global Const corama% = 151 Global Const s<=micoiαn% = 186 Global Const period% = 220 Global Const <τuestnark% = 191 Global Const exclaιr.ark% = 33

Global Const laaπι_al f% = 1

Global Const laaιπ_al ±nadda% = 2

Global Const iaaπ_al fharιzaup% = 3

Global Const laan_al ,f aιπr.adn% = 4

Global Const qaaf_ acca% = 5 Global Const giin% = 5 Global Const vii% = 7 Global Const pii% = 8 Global Const oo% = 9 Global Const ee% = 11 Global Const alif_taa% = 12 Global Const a if_saad% = 14

Global Const ctaarαark% = 15 Global Const revshadda% = 16

'Copyright (c) 1997 Nizar Habas ' UrJversiy of Maryland at College Park 'All Rights Reserved

Sub defineletter (x, xconnects, xisspace, xisvowel, xinit, xsiid, xfisal, xalone)

--tter(x) .connects = xconnects letter (x) .isspace = xisspace etter (x) .isvowei = xisvowel etter(x) .ir.it = xinit etter (x) .mid = xmid etter (x) .final = xfinal letter (x) .alone = xalone

Ξr-d Sub

Sub defineletters ( ) 'Copyright (c) 1997 Nirar Habash 'ϋniversiy of Maryland at College Park 'All igh s Reserved ' miCiace

For x = 1 To 255

Call cefia.elecrer(x. False, True, False, x, x, χ( x) Nexc

' definelec-ar (hamza, eaxsieccs , isspace. isvowel, isie,mid, final, alone)

Call defineistcer (13 , False, True, False, 1 , 13, 13, 13)

Call defiπeieczer (hamza, False, True, False, 136, 135, 1 45, 35)

Call definelecrer(ali±nadda. False, False, False, S3,' 69, 62, 68)

Call defineiecrer {aiif amzaup , False, False, False, 6ό, 67, 67, 66)

Call defir-e ecrer (wawπamza. False, False, False, 140, 14C,'l4θ' 140)

Call defir-elec-ar (alif-ι?τπracn. False, False, False, 70, 71, 7 ' 70)

Call definelac-er(yaa fiπιza. True, False, False, 137, 137, ~l3a7' 139)

Call defiaelec-e (aii , False, False, False, 54, 55, 65, 54)

Call de ir-elec ar (baa, True, False, False, 72, 72, 73, 73)

Call defir-eiecrer ( caamarbuca, False, False, False, 132, 133, 133, 132)

Call defir.ele.::.ar(caa. True, False, Falsa, 74, 74, 75, 75)

Call definele^-ar (thaa. True, False, False, 7S, 75, 77, 77)

Call defir.elecrer (jiis, True, False, False, 73, 73, 79, 80)

Call defi2.elet-.rer (cbaa, True, False, False, 31, 31, 32, 33)

Call defineleϊzer ( baa. True, False, False, 84, 84, 35, 36)

Call defirieieccar (daa . False, False, False, 37, 37, 37, 37)

Call defir-elaccar (ciaal , FaJ.se, False, Falsa, 38, 83, 33, 38)

Call defir-eiec ar (raa. False, False, False, 89, 89, 39, 89)

Call definele e (zaay, False, False, False, 90, 90, 90, 90 )

Call defir-elecze (siir.. True, False, False, 91, 91, 92, 92)

Call defizvβiec er (sbiiTi, True, False, False, 93, 93, 94, 94)

Call άefir-eleccar (saa , True, False, False, 95, 95, 95, 95) Call defzLnele zer (daac. True, False, False, 97, 97, 98, 98)

Call defi-π.elec~ar (c~a , True, False, False, 99, 99, 99, 99) Call defir-e az-ar (c a , True, False, False, 100, 100, 100, 100)

Call defir.elac er (ayr.. True, False, False, 101, 102, 103, 104) Call defi elerzer (gayn, True, False, False, 105, 106, 107, 108)

Call defire ec-er ( f a, True, False, False, 109, 110, 111, ill) Call defineleczer (ςaaf , True, False, False, 112, 113, 114, 114)

Call definelec ar (kaaf, True, False, False, 115, 115, 115, us)

Call define ec-er (laa . True, False, False, 117, 117, 118, 118)

Call defiteleczer (rniix, True, False, False, 119, 119, 120, 120)

Call da inelecrer (r-ur.. True, False, False, 121, 121, 122, 122)

Call defiiielaczarCnaa, True, False, False, 122, 124, 125, 126)

Call definelec ar (waw, False, False, False, 123, 123, 128, 123)

Call defineleczer (alifnaqs ra. False, False, False, 135, 134, 134, 1 )

Call defi-αelec ar (yaa, True, False, False, 129, 129, 130, 131)

Call deficeieczar (facia, False, False, True, 170, 170, 170, 170) Call defineleczer ( asra. False, False, True, 172, 172, 172, 172) Call definelecrer (Gamm , False, False, True, 171, 171, 17"! , ι - ι \

Call definelecrar (shadda, False, Falsa, False, 62, 63, 63, 63) Call defi elac er (sukur., False, False, False, 63, 63, 63, 63)

Call defir.eletrer (coπcia, False, False, False, 53, S3, 63, 535 Call defi-ele"a ( semicolon. False, False, False, 52, S3, 53, 53) Call defineleeeer(period. False, False, False, 63, 63, 63, 63) Call defineleceer(questSDark, False, F-alse, False, 63, 63, 63, 63)

Call defineleeeer(spaceb.ar, False, True, F-alse, 32, 32, 32, 32)

Call defineleeeer(laam_aiif, False, False, False, 141, 142, 142, 141) ■Call defineleeeer(laω_alif-nadda, F-alse, False, False, 145, 146, 146, 145) Call defineleeeer(laam_alifhamzaup. False, False, False, 143, 144, 144, 143) Call definelecc-3r(laam_εLlifhaιnza«dϊi, F.alse, False, False, 147, 143, 148, 147)

Call defineleceer(qaaf_h.amza. True, False, False, 149, 150, 151, 151) Call defineleeeer(gii . True, False, False, 152, 152, 153, 154) Call defineleeeer{vii, True, False, False, 155, 155, 156, 159) Call defineleeear.pii. True, False, False, 162, 152, 163, 163) Call defineleecer(oo, False, False, False, 164, 154, 154, 164) Call defineleeeer(ee. True, False, False, 165, 165, 165, 167) Call defineleceer(alif_eaa, False, True, False, 158, 168, 168, 168) Call defineleeeer(alif_saad. False, False, True, 169, 159, 159, 169)

Call defineieccertceaamark, False, False, True, 182, 182, 182, 182) Call defineieceertrevs adda, False, False, True, 131, 131, 181, 181)

End Sub

Fnceion flip (x As String)

For i = ύen(x) To 1 SCep -1 y = y T Hid(x, i, 1) Nexe flip = y id Funceion

-senior. flip2 (x As String) x = fiip(x) done = False

While Not done pos = InStrCl, x, Chr(13)) If oos = 0 Then

"y = x + Gar (13) + Chr(lO) + y done = True Else

If oos = Len(x) Then y"= Mid(x, 1, pos - 1) + Chr(13) -> Chr(lO) - y done = True Else y = Mid(x, 1, pos - 1) + Chr(13! * Chr(lO) + y x = Mid(x, pas - 1, en(x) - pos) End If End If

W-and

£ii?2 = y Funceion subseieuce (a, x, b)

If (a = -1) Then before_connece = False Else before_coπnece = leceer(a) .conneces End If

If (b = -1) Then afeer_space = True Ξlse afeer_soace = leceer(b) .isspace End If

If (befαre_cαnnece = False) Then If (afeer_space = False) Then subseieuce - leeeer(x) . inie

Exic Funceion Else subseieuce = leeeer(x) .alone

Exic Funceion End If Else

If (afcar_space = False) Then subseieuce = leccar(x) .mid

Exic Funceio Ξlse subseieuce = leccer(x) .final

Exic Funceion End If

jmc iccio

Funceion cacchpaceem (a, b) c = -1 If a = 13 And b = 10 Then c = 13

If a = laam Then Selecc Case b Case alif: c = laam_aiif

Case aiifhamzaup : c = laam_aiifhamzaup

Case aiifhamzacn: c = laam_aiifhamzadn

Case ali±nadda: c = laam_ali fmaάda End Selecc

End If

If a = σaaf And b = exclaπτπ-k Then c = qaaf_ham2a

If a = jiim And b = exciamark Then c = giim

If a = faa And b = exciamark Then c = vii

If a = baa And b = axciasiark Then c = pii

If a = aw And b = -axclaark Th-an c = oo

If a = yaa And b = excl?τπark Th-an c = ee ,

If a = saad And b = exciamark Then c = alif_saad

If a = caa And b = exciamark Then c = alif_Caa

If a = shiin And b = exciamark Then c = revshadda

If a = ccaa And b = exciamark Then c = ccaamark

cacchpaceem = c

End Funceion

Sub cacchpaccems () cexeone = " " a = Asc(Mid(arabic.Texc, 1, 1) )

For i = 2 To Le ( rable .Texc) found = False a = Asc (Mid ( arabic . exe, i - 1, 1) ) b - Asc (Mid ( .arabic. T-axc, i, 1) ) c = -1 c = cacchpaceem (a, b)

If c > -1 Then found = True

If found Then eexeαne = Cexeone + Chr(c) i » i + 1

If i = L.sn(.arabic.Texe) Then cexeone = cexeone *• Mid(arabic.Texe, i, 1) End If Else cexeone = cexeone + Chr(a) End If

Nexc

If found = False Then

Cexeone = eaxeone -r Chr(b)-

End If

End Sub

Sub Coιπmandl_Click ( ) converc cexel.Texc = eexrcwo cexc2. Texc = newcexc

End Sub

Sub converc ( )

' Cexeone = arabic . taxc cexccwo = " "

If i-4-an(arabic.Texe) > 0 Then Call cacchnaceems

a = -1

For i = 1 To Le (eaxeone) x = Asc (Mid(cexeone, i, 1))

If i < Le (Cexeone) Then b = Asc (Mid (cexeone, i - 1, 1))

If (leccer(b) .isvowel And i + 2 <= Len( Cexeone) ) Ther. b = Asc (Mid (cexeone, i + 2, 1))

End If Ξlse b ■ -1 End If z = subseieuce (a, x, b) cexecwo = cexecwo + Chr(∑) a = x

Nexc newe-axe = fiip2(e>3xeeo) clipboard.SecTexc neweexe End If

End Sub

Sub Form_Load () Call defineieccers TITLE.Visible = True TITLE.Enabled = True

End Sub

Funceion nums (x As Scring) z = "

For i = 1 To Len(x) z = 2 * Ser(Asc(Mid(x, i, 1) ) )

≤nα Funceion

Sub oldconverc () 'Cexeone = arabic. Cexc ' lacir..Texe = " " cexecwo = " "

Call cacchpaccems

a = -1

For i = 1 To Le (Cexeone) x = Asc (Mid(cexeone, i, 1))

If i < Le (cexeone) Then b = Asc (Mid(cexeone, i - I, 1))

If (leceer(b) .isvowel And i + 2 <= Le (Cexeone) ) Then b = Asc (Mid(cexeone, i + 2, 1) )

End If Else b » -1 End If

2 = subseieuce (a, x, b) cexecwo = eexeewo + Chr(z) Nexc

'lacin.Texc = fli (cexecwo)

'Copyrighe (c) 1397 Nizar Habash 'Universiy of Maryland ae College Park 'All Righcs Reserved

End Sub

Claims

CLAIMS:
1. A method of displaying text from a computer readable medium onto a computer display, said method comprising the steps of: storing text data in a graphemic format; converting the graphemic text data comprising individual characters to allographic text data comprising individual characters using at least 8 bit encoding; storing the allographic text data on a computer readable medium; rendering the allographic text data on a display using a font.
2. A method as recited in claim 1 , wherein said text data is stored as Arabic script characters.
3. A method as recited in claim 2, wherein the rendering step comprises one-to-one mapping of the allographic text data to glyphs.
4. An apparatus for displaying text from a computer readable medium onto a computer display, said apparatus comprising: a storing device for storing text data in a graphemic format; converting means for converting the graphemic text data comprising individual characters to allographic text data comprising individual characters, using at least 8 bit encoding; a second storing device for storing the allographic text data; rendering means for rendering the allographic text data as text on a display using a font.
5. An apparatus as recited in claim 4, wherein said first storing device stores the text data in a graphemic format as Arabic script characters.
6. An apparatus as recited in claim 5, wherein the rendering means comprises mapping means for one-to-one mapping of the allographic text data to glyphs.
PCT/US1998/025201 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents WO1999028831A9 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US6744297 true 1997-12-03 1997-12-03
US60/067,442 1997-12-03

Publications (2)

Publication Number Publication Date
WO1999028831A1 true true WO1999028831A1 (en) 1999-06-10
WO1999028831A9 true WO1999028831A9 (en) 1999-09-16

Family

ID=22076010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/025201 WO1999028831A9 (en) 1997-12-03 1998-12-03 Preformatted allographic arabic text for html documents

Country Status (1)

Country Link
WO (1) WO1999028831A9 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1298629A2 (en) * 2001-09-26 2003-04-02 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
DE10147541A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Display of Greek and Hebrew characters in a display of a motor vehicle
WO2006021973A2 (en) * 2004-08-23 2006-03-02 Geneva Software Technologies Limited A system and a method for a sim card based multi-lingual messaging application
WO2008042845A1 (en) * 2006-10-02 2008-04-10 Google Inc. Displaying original text in a user interface with translated text

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4176974A (en) * 1978-03-13 1979-12-04 Middle East Software Corporation Interactive video display and editing of text in the Arabic script
US5412771A (en) * 1992-02-07 1995-05-02 Signature Software, Inc. Generation of interdependent font characters based on ligature and glyph categorizations
US5556282A (en) * 1994-01-18 1996-09-17 Middlebrook; R. David Method for the geographical processsing of graphic language texts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4176974A (en) * 1978-03-13 1979-12-04 Middle East Software Corporation Interactive video display and editing of text in the Arabic script
US5412771A (en) * 1992-02-07 1995-05-02 Signature Software, Inc. Generation of interdependent font characters based on ligature and glyph categorizations
US5556282A (en) * 1994-01-18 1996-09-17 Middlebrook; R. David Method for the geographical processsing of graphic language texts

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1298629A2 (en) * 2001-09-26 2003-04-02 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
DE10147541A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Display of Greek and Hebrew characters in a display of a motor vehicle
DE10147540A1 (en) * 2001-09-26 2003-04-10 Man Nutzfahrzeuge Ag Representation of Arabic script in a display of a motor vehicle
EP1298629A3 (en) * 2001-09-26 2005-06-29 MAN Nutzfahrzeuge Aktiengesellschaft Method of displaying arabic characters in a display of a vehicle
WO2006021973A2 (en) * 2004-08-23 2006-03-02 Geneva Software Technologies Limited A system and a method for a sim card based multi-lingual messaging application
WO2006021973A3 (en) * 2004-08-23 2006-10-05 Geneva Software Technologies L A system and a method for a sim card based multi-lingual messaging application
WO2008042845A1 (en) * 2006-10-02 2008-04-10 Google Inc. Displaying original text in a user interface with translated text
US7801721B2 (en) 2006-10-02 2010-09-21 Google Inc. Displaying original text in a user interface with translated text
US8095355B2 (en) 2006-10-02 2012-01-10 Google Inc. Displaying original text in a user interface with translated text
US8577668B2 (en) 2006-10-02 2013-11-05 Google Inc. Displaying original text in a user interface with translated text
US9547643B2 (en) 2006-10-02 2017-01-17 Google Inc. Displaying original text in a user interface with translated text

Also Published As

Publication number Publication date Type
WO1999028831A9 (en) 1999-09-16 application

Similar Documents

Publication Publication Date Title
Ferraiolo et al. Scalable vector graphics (SVG) 1.0 specification
US6347323B1 (en) Robust modification of persistent objects while preserving formatting and other attributes
US5682158A (en) Code converter with truncation processing
US5963205A (en) Automatic index creation for a word processor
US6976059B1 (en) System and method to provide applets using a server based virtual machine
US6321243B1 (en) Laying out a paragraph by defining all the characters as a single text run by substituting, and then positioning the glyphs
US5625773A (en) Method of encoding and line breaking text
US6613098B1 (en) Storage of application specific data in HTML
US6288726B1 (en) Method for rendering glyphs using a layout services library
US5224038A (en) Token editor architecture
US6332148B1 (en) Appearance and positioning annotation text string and base text string specifying a rule that relates the formatting annotation, base text characters
US7111011B2 (en) Document processing apparatus, document processing method, document processing program and recording medium
US5737599A (en) Method and apparatus for downloading multi-page electronic documents with hint information
US6819336B1 (en) Tooltips on webpages
US5845075A (en) Method and apparatus for dynamically adding functionality to a set of instructions for processing a Web document based on information contained in the Web document
US7155672B1 (en) Method and system for dynamic font subsetting
US5819301A (en) Method and apparatus for reading multi-page electronic documents
US20030229857A1 (en) Apparatus, method, and computer program product for document manipulation which embeds information in document data
US20060195784A1 (en) Presentation of large objects on small displays
US7340673B2 (en) System and method for browser document editing
Bradley The XML companion
US5526477A (en) System and method for generating glyphs of unknown characters
US20040177327A1 (en) System and process for delivering and rendering scalable web pages
US7284199B2 (en) Process of localizing objects in markup language documents
US20020087702A1 (en) Remote contents displaying method with adaptive remote font

Legal Events

Date Code Title Description
AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
COP Corrected version of pamphlet

Free format text: PAGES 9-11 AND 14-16, SEQUENCE LISTING, REPLACED BY NEW PAGES 9-11 AND 14-16; PAGES 4/6-6/6, DRAWINGS, REPLACED BY NEW PAGES 4/6-6/6; DUE TO LATE TRANSMITTAL BY THE RECEIVING OFFICE

AK Designated states

Kind code of ref document: C2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: C2

Designated state(s): GH GM KE LS MW SD SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase in:

Ref country code: KR

122 Ep: pct application non-entry in european phase