WO2010094121A1 - Keyboard for languages based on the arabic script - Google Patents

Keyboard for languages based on the arabic script Download PDF

Info

Publication number
WO2010094121A1
WO2010094121A1 PCT/CA2010/000219 CA2010000219W WO2010094121A1 WO 2010094121 A1 WO2010094121 A1 WO 2010094121A1 CA 2010000219 W CA2010000219 W CA 2010000219W WO 2010094121 A1 WO2010094121 A1 WO 2010094121A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyboard
characters
arabic
alphabet
language
Prior art date
Application number
PCT/CA2010/000219
Other languages
French (fr)
Inventor
Mohamed Madi Mohsen
Original Assignee
Mohamed Madi Mohsen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mohamed Madi Mohsen filed Critical Mohamed Madi Mohsen
Publication of WO2010094121A1 publication Critical patent/WO2010094121A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04886Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/0202Constructional details or processes of manufacture of the input device
    • G06F3/0219Special purpose keyboards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods

Definitions

  • the present invention relates to keyboards and methods of entering data into computerized systems for languages based on the Arabic script. More specifically, a keyboard layout design provides for each Latin alphabet key to be mapped onto one or more Arabic script-based alphabet and/or diacritical characters.
  • keyboarding is a tedious process and it may take years to become comfortable with inputting text.
  • users have to familiarize themselves with two particulars about the keyboard layout of interest: i) the location of every character-editing key on the keyboard, and ii) the proper hand and finger to use to signal or press a given key.
  • typists can comfortably type at rates of 50 words per minute or greater.
  • FIG. 1 shows the typical character printing keys available on most standard keyboards layouts, together with the ideal mapping of such keys to the eight character-typing fingers of a trained computer typist for maximizing touch-typing efficiency and typing throughput, together with the number of rows.
  • a standard Latin alphabet keyboard layout most typists encounter when learning to type is the English keyboard layout known as QWERTY, shown in FIG. 2, which is named after the arrangement of the first six left letter keys of the top alphabet row on this keyboard.
  • QWERTY the English keyboard layout known as QWERTY
  • FIG. 2 the English keyboard layout
  • FIG. 2 the hunt-and-peck typing style, which typically is slower than touch-typing is the style many typists adopt unless learning measures are taken at an early stage of using a keyboard.
  • typists typically become comfortable with whatever layout they use to input text as time progresses, and a sense of where a particular key is located on a keyboard layout tends to become readily available through what is known as neuromuscular facilitation or muscle memory.
  • QKL QWERTY keyboard layout
  • most multilingual computer typists who are trained on the QKL can capitalize on their knowledge when typing in another Latin alphabet language, given the similarity in keyboard layouts.
  • the learning curve remains relatively flat while the user reaches personal typing speeds on the new keyboard layouts.
  • a typical QKL typist can readily start typing in other languages such as Danish, Dutch, Finnish, Swedish, Icelandic, Italian, Norwegian, Portuguese, and Spanish to name a few.
  • the same typist can also type with little difficulty using a similar keyboard layout known as QWERTZ, where only the positions of the letter keys Y and Z are interchanged.
  • QWERTZ is widely used in Germany and much of Central Europe.
  • the same knowledge transfer is applicable to other close variants of QWERTY such as AZERTY, used in France and Belgium, and QZERTY, common in Italy.
  • Another common but less competitive keyboard layout is Dvorak, whose keyboard layout is arranged such that faster typing is made easier to achieve.
  • ANKL Arabic WindowsTM Keyboard Layout
  • Table 2 many Arabic letters have a phonetic counterpart in the Latin alphabet. Yet, there is little correlation between the location of Arabic letters on the AWKL and their phonetic counterparts on the QKL.
  • the letter T has the same sound as that of the letter ⁇ in Arabic (Character 3 in Table 1); yet the letter T is located at key 18 of the QKL in FIG. 1, while the letter ⁇ s found at key 32 of the AWKL as seen in FIG. 3.
  • ⁇ _ ⁇ in the AWKL maps onto its English equivalent in QKL, namely the letter S at key 28.
  • the AWKL user must activate the fingers' muscle-memory sequence ⁇ (4, 3), (4, 4), (5, 3), (7, 4), (3, 3) and (4, 3) ⁇ , compared with the QKL sequence ⁇ (4, 4), (4, 2), (1, 3), (1, 4), (6, 2), and (7, 3) ⁇ .
  • the QKL sequence ⁇ (4, 4), (4, 2), (1, 3), (1, 4), (6, 2), and (7, 3) ⁇ .
  • the user is forced to relearn key locations that share the same vocal sound, and as a result, retrain the fingers' muscle-memory to access different keys on the keyboard to produce the same word.
  • This simple example illustrates problems encountered with prior art keyboards when typing text that contain terms or phrases to be transliterated or transcribed into Arabic. Further examples include country and city names, person and/or character names, industrial and commercial product or company names, amongst others.
  • the AWKL suffers an other drawback by the location of frequently-used Arabic characters. For example, the letters - ⁇ pronounced like 'd' as in “dog”) and i (pronounced like 'th' as in “then”) have both similar sound and shape. Yet, on the AWKL, they are assigned to keys that lie at opposite ends of the keyboard (keys 1 and 25), and are far from the home row. . The same is true of the frequently-used diacritic symbol " which is assigned to key 1. Once again, the user is required to leave the keyboard home row position in order to access frequently occurring letters or symbols.
  • a further drawback of the AWKL keyboard is the re-arrangement of non- alphanumeric symbols such as parentheses ( ), brackets [ ], braces ⁇ ⁇ , inequality symbols ⁇ > and the forward-slash key / .
  • These symbols are often used in software development and mathematics where users have become accustomed to the location of each symbol, along with the ordering of elements within pairs.
  • the aforementioned symbols are located at keys 10 and 11, 24 and 25, Shift 24 and Shift 25, 45 and 46, and 47, respectively.
  • these symbols are assigned to key locations 1 1 and 10, 30 and 29, 41 and 40, 24 and 25, and 35, respectively, requiring the QKL user to relearn the position of these keys on the AWKL.
  • U.S. Patent No. 4,670,842 to Metwaly discloses an Arabic keyboard, wherein the arrangement of Arabic letters is based on the lexicographical order of the Arabic alphabet, rather than any phonetic connection between letters of the Latin alphabet and those of the Arabic alphabet.
  • the layout disclosed in Metwaly does not correspond to the QKL. As such, a typist on the Metwaly keyboard who is familiar with the QKL must learn an entirely unrelated Arabic script layout.
  • Opstad's keyboard Another drawback of Opstad's keyboard is that a user must memorize the location of 48 Arabic characters over three modes (Normal, Shift and Option). Similar to the AWKL layout, Opstad's keyboard also suffers from the difficulty of access to modified forms and common ligatures.
  • ISLAM-91 in Java Script, a free online Arabic keyboard accessible at http://wwwl. tour.tu- darmstadt.de/islam/ara/. While allowing for some phonetic mapping of Latin characters onto the Arabic alphabet, this layout includes additional non-intuitive mapping of Arabic characters onto Latin keyboard characters. The user must spend additional time to learn and master the new keyboard. For example, the top row of the ISLAM-91 keyboard includes six Arabic letters, which requires the user to leave the home row in order to type such characters. This mapping is counterintuitive in that Arabic characters are mapped onto numbers.
  • U.S. Patent No. 6,874,960 to Daoud discloses a keyboard that comprises a plurality of touch areas that represent symbols such as letters of the Roman script, Greek script, Hebrew script, Arabic script, or Cyrillic script.
  • the touch areas are arranged in groups that include at least two of the touch areas.
  • Each of the groups includes touch areas that are arranged in a distinctive shape that incorporate one or more of the symbols or parts of the symbols. The user enters a symbol by touching a part of the distinctive shape that is recognized with the symbol.
  • U.S. Patent No. 6,799,914 to Eo discloses a layout of the Arabic alphabet for a 12-key keyboard found on such devices as mobile telephones and PDAs.
  • the keyboard described by Eo is not suited to standard keyboard typing, and is not comparable to the QWERTY or similar keyboards.
  • the present invention address es the drawbacks of the prior art by providing a n intuitive keyboard layout for a language based on the Arabic script, based on a mapping of Latin alphabet characters to characters of said language.
  • the Latin alphabet characters form part of a. Latin alphabet, including, but not limited to, QWERTY, QWERTZ, AZERTY and Dvorak.
  • the keyboard layout of the present invention can also be used by those learning to type in an Arabic-based language for the first time.
  • users of the present invention who, at first, are not familiar with Latin-based keyboards, can quickly learn one or more prior art Latin alphabet keyboards, based on knowledge and experience with the present invention.
  • the invention reduces the user's learning curve by leveraging (i) the user's previous knowledge of Latin alphabet keyboard layouts, (ii) the lexicographic ordering of letters of alphabets based on the Arabic script; and (iii) the manner in which the letters and words are hand-stroked in languages based on the Arabic script.
  • the keyboard layout of the present invention further remedies the problems present in prior art layout designs as it exploits shape and sound similarities that are inherent in Arabic-based scripts when hand-stroking in the Arabic-based language.
  • a typical user of the invention who has familiarity of one or more prior art Latin keyboard layouts may be able to (i) type at a rate similar to that achieved on a Latin alphabet layout, (ii) intuitively remember the location of all letters, and (iii) extend his or her typing abilities to intuitively type any of the diacritics commonly used in languages based on the Arabic script.
  • keyboard refers to both physical and virtual keyboards.
  • the keyboard layout of the present invention assigns characters to keys based on shape, sound, or frequency analysis.
  • the keyboard layout of the present invention uses a multicharacter key, and assigns multiple Arabic-based script letters and diacritics to keys based on shape, sound, and letter frequency distribution analysis.
  • a keyboard for inputting characters of an Arabic script-based language.
  • the keyboard is based on a mapping of a Latin alphabet keyboard onto alphabet characters of that language, wherein the mapping comprises phonetic similarity between Latin alphabet members and alphabet members of the language, shape similarity within alphabet members of the language, and lexicographic ordering of alphabet members within the language.
  • the mapping further comprises frequency analysis of alphabet members of the language. 5.
  • the language may be chosen from Arabic,
  • the language is preferably a widely-spoken language such as Arabic, Farsi (Persian) or Urdu.
  • Latin alphabet keyboard from which the invention is mapped is selected from the group consisting of QWERTY, AZERTY, QZERTY and Dvorak.
  • the keyboard of the present invention can be used as part of or in association with devices such as, but not limited to, laptop computers, desktop computers, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
  • a keyboard for inputting characters of Arabic, Farsi, and/or Urdu.
  • the keyboard is based on a mapping of a Latin alphabet keyboard onto one or more alphabet characters 13.
  • the keyboard of claim 12 wherein for a Latin alphabet key associated with two or more Farsi characters, said Farsi characters are accessed sequentially by multiple presses of said Latin alphabet key, or by use of Function keys in association with said Latin alphabet key.of each language, wherein the mapping comprises phonetic similarity between Latin alphabet members and alphabet members of each language, shape similarity within alphabet members of a given language, and lexicographic ordering of alphabet members within a given language.
  • the mapping further comprises frequency analysis of alphabet members of the language.
  • Latin alphabet key is associated with two or more Arabic, Farsi or Urdu characters
  • the characters may accessed sequentially by multiple presses of the Latin alphabet key, or by use of Function keys in association with the Latin alphabet key.
  • the Latin alphabet keyboard from which the invention is mapped is selected from the group consisting of QWERTY, AZERTY, QZERTY and Dvorak.
  • the keyboard of the present invention can be used as part of or in association with electronic devices such as, but not limited to, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
  • FIG. 1 shows the 47 character-producing keys found on prior art Latin keyboard layouts, with labelled row numbers and various shading and texture to illustrate how such keys are mapped to the fingers of a touch-typist.
  • FIG. 2 shows a typical arrangement of character keys in a prior art QWERTY keyboard layout.
  • FIG. 3 shows a known keyboard layout for typing in Arabic, called t he Arabic WindowsTM Keyboard Layout (AWKL).
  • the illustrated keys show characters in both modes: the normal mode showing the bottom characters, and the shift mode showing the upper characters.
  • FIGS. 4a - 4e illustrate an Arabic keyboard layout of the present invention correlated to a QWERTY keyboard..
  • FIGS. 5a - 5e illustrate an Arabic keyboard layout of the present invention correlated to an AZERTY keyboard.
  • FIG. 6 shows a flowchart for text input using a keyboard of the present invention.
  • Arabic script languages include, but are not limited to: Arabic, Farsi (also known as Persian), Urdu, Pashto, Baloch, Malay, Balti, Brahui, Panjabi (in Pakistan), Kashmiri, Sindhi (in India and Pakistan), Uyghur (in China), Kazakh (in China), Kyrgyz (in China), Azerbaijani (in Iran) and Kurdish (in Iraq and Iran). While the detailed description focuses on a keyboard for Arabic, Urdu and Farsi, the present invention is applicable to the Arabic script languages listed above.
  • the keyboard layout of the present invention leverages this muscle memory of key locations of the QKL user by establishing a correlation between each letter in the Latin alphabet and one or more intuitively suitable counterparts from the Arabic letters or diacritic symbols. Such mapping reduces the learning curve of a QKL-familiar user.
  • the invention relies on factors such as: (i) direct sound pronunciation similarity, (ii) approximate sound pronunciation similarity, (iii) direct shape similarity, (iv) approximate shape similarity; (v) key location proximity, and (vi) frequency distribution analysis of Arabic letters.
  • the invention relies on a letter frequency distribution analysis of Arabic letters performed on more than 1,375,058 words (giving a total of 5,452,865 letters), This analysis is documented at http://www.intellaren.com/web/software/alfa/alfa.html , the results of which are shown in Table 3.
  • the number column in Table 3 corresponds to the number column of Table 1.
  • the present invention also introduces two letters imported from the English alphabet: P, transcribed in Arabic as v > an d V, transcribed in Arabic as ⁇ -* . These two letters are imported for purposes of accurate transcription and transliteration from Latin to Arabic.
  • the Arabic alphabet available on any keyboard layout designed to produce Arabic script is typically composed of the 48 characters (38 letters and 10 non-letter symbols). These are grouped as follows: (i) characters 1 to 28 are the 28 primary Arabic letters consisting of 25 consonants and three vowels; (ii) characters 29 to 36 are eight modified letters; (iii) characters 37 and 38 are two imported English letters for transcription purposes; (iv) characters 39 to 47 are the nine diacritic symbols used as an aid for correct pronunciation; and (v) character 48 is a letter- stretching symbol that enhances readability of the script. Similarities in the shape and sound of Arabic letters form a significant aspect of the lexicographical ordering within the Arabic alphabet. Students of the Arabic language often exploit these similarities in order to learn and memorize the Arabic alphabet.
  • successive letters resemble each other in shape and/or sound, and vary only in the number of dots.
  • successive letters that have shape similarity include the pair ⁇ and ⁇ (B and T) and the pair jand J(R and Z).
  • successive letters share both sound and shape similarity include the pair ⁇ and ⁇ (T and TH as in "thank"), and the pair J and J(D and TH and in "then”).
  • the succeeding letter has one more dot than its predecessor.
  • the succeeding letter is usually written with one or more strokes than its predecessor, where a stroke is a single but continuous hand sketch or a mouse drag without any spatial discontinuity in the form of lifting the hand or releasing the mouse, respectively.
  • the additional strokes are usually in the form of one or more dot(s) stroked above or below the letter. Therefore, the user who writes in Arabic knows that whereas a letter takes one stroke to draw, a similar succeeding letter would take one or more strokes to draw.
  • Examples from Table 1 include: (i) letter 14, o ⁇ is written in one stroke, whereas the succeeding letter, o°, requires an additional stroke in the form of a dot; and (ii) letter 16, - ⁇ , can be written in one or two strokes, whereas the succeeding letter, -k , requires an additional stroke in the form of a dot.
  • each Latin key is associated with one or more alphabetic or diacritic character from the Arabic set of letters and diacritics.
  • the Latin key is associated with two or more Arabic characters
  • the second Arabic character is produced by pressing the corresponding Latin key twice within a given time interval (for example, between 150 - 750 milliseconds).
  • the third Arabic character is produced by pressing the corresponding Latin key a third time within the same or a similar allowed tolerance after the second press, and so on.
  • the function keys e.g. F2, F3, F4 and F5 can be used to simulate the number of times a Latin key should be pressed in order to produce the required Arabic character.
  • 73% of the Latin alphabet corresponds to 87 % (33 of 38) of the Arabic letters.
  • the three primary Arabic vowels ( I, jand LS) are grouped with their respective several modified forms, based on similarity in both sound and shape.
  • the letter ⁇ -J (pronounced as l paa ' ) is phonetically identical to the letter P.
  • the letter y is used in Arabic scripts today for the purposes of accurately transcribing Latin nouns containing the letter P.
  • M and v are both shape- and sound- similar; shape-wise, the latter is rendered with two extra dots below. This justifies mapping v to the second press of the letter key B.
  • the letter o ⁇ succeeds the letter ⁇ _ ⁇ > in the Arabic alphabet. Since it requires only one more stroke than o- 3 in the form of a dot, and is less frequent than the letter o 3 , it is assigned to the second press of letter key C.
  • J is written with one more stroke than ⁇ in the form of a dot than J , and it occurs 2.2 times less frequently than -i . It is therefore intuitive to assign it to the second press of letter key D.
  • the letter £ (pronounced l khaa') has no equivalent letter sound in the English language. However, it is most similar in sound to the letter ⁇ . Since the letter £ is written with two strokes, and is less frequent than ⁇ , it is intuitive to assign it to the second press of letter key K.
  • the letter ⁇ j> is phonetically identical to the sound produced by 'SH'. It also succeeds the letter o" in the Arabic alphabet. The shape- and sound-similarity between o" and J> is apparent. Furthermore, ⁇ j ⁇ has three dots on top whereas the letter o* is dot- less, and (j ⁇ occurs 2.7 times less frequently than ⁇ _>" . It is therefore intuitive to assign (J" to the second press of letter key S.
  • the character » has the same vocal sound as that of the Latin letter T, but its use is restricted to finishing off feminine nouns only.
  • the direct sound pronunciation similarity warrants rendering » accessible through the second press of letter key T.
  • the letter ⁇ is pronounced like 'th' (as in 'three'), and is similar in shape to the letter ⁇ . Since the letter ⁇ occurs less frequently than the letter » (according to Table 3), it is made accessible through three presses of the letter key T.
  • the letter key 'A' is assigned five Arabic characters ( ⁇ , ' , ! , ' and ⁇ ); the letter key 'B' is assigned two Arabic characters ( ⁇ and v ), and so on.
  • a typist either i) presses S twice within a specified time tolerance, ii) presses SHIFT S, where the SHIFT key causes reverse accessibility, or simply iii) presses F2 and then the letter S to obtain the character ⁇ _A
  • the typist either i) presses A four times while respecting the time frame, ii) presses SHIFT 'A' twice while respecting the allowed time frame, or iii) presses F4 and then presses the letter key 'A'.
  • the letter frequency analysis described above provides a basis for the order of assignment of Arabic letters to a given Latin letter key.
  • the letters ⁇ > , * , ⁇ occur at respective frequencies of 2.64%, 1.38% and 0.85%. All three letters are accessible through the key 'T'. Since " » occurs more frequently than ⁇ , it can be obtained by pressing the key 'T' twice within a certain time interval, while ⁇ can be obtained by pressing 'T' three times.
  • the SHIFT key or the function keys F2 and F3 can be used in conjunction with the key 'T' to produce » or ⁇ ,
  • the letter ⁇ is very close in vocal sound to that generated by -*, which corresponds to the letter H in English.
  • it is typically transcribed and transliterated as "H".
  • the letter G has yet to be assigned and is to the immediate left of H on the QWERTY keyboard layout, it is assigned to the letter £.
  • the letter key 'G' may not be to the immediate left of the letter key 'H'. Nonetheless, due to the shape similarity between 'G' and ⁇ , such keyboards can also maintain the same assignment.
  • can be accessed by pressing the letter 'H' key twice.
  • the letter £ is accessible through the letter ⁇ (key 29 above) as well as through the letter t-1 .
  • Arabic diacritics are best approximated on the QWERTY keyboard by corresponding English vowels.
  • the letter O is assigned the dammah and diacritics similar to the vowel '0'; namely, the dammah tanween * , and the sukoon , ' .
  • the diacritic dammah tanween is basically two dammahs and can therefore be written as " instead of * , but the latter is preferred because it is quicker when written by hand.
  • the diacritic sukoon is seated on top of any alphabet letter to indicate that there is a no vowel sound associated with that letter.
  • the shape-similarity between the letter O and the sukoon ' is obvious; it is therefore intuitive to assign the character ' to the next available press of O.
  • the letter I is assigned the kasrah and the related diacritic kasrah tanween, which is basically two kasrahs.
  • the letter key 'U' is designated to render the fat-ha and the related diacritics fat-ha tanween and the superscripted alif.
  • the fat-ha tanween is basically two fat-has , while the superscripted alif , also known as alif mamdoodah, is placed above letters and plays the role of a long alif vowel; it is almost exclusively used in sacred Koranic script.
  • the three diacritics can in turn be used as pointers to several other diacritics that are shape- and/or sound-dependent, namely:
  • the remaining characters to be assigned are the shaddah, which is usually used in combination with most of the above diacritics, and the extending character which is added to horizontally stretch the shape of certain letters for better readability. These two characters are assigned to the one remaining and unassigned vowel: E.
  • This assignment is suitable for rendering the shaddah since the shape of the shaddah resembles the shape of an E that is rotated by 90 degrees; and it is suitable for the shape-extending character since E can be regarded as an acronym for the English word extension, which is suggestive of the character. Therefore, the assignment is as follows:
  • Figs. 4a - 4e Each plane is a function of the number of key presses provided they are performed within a certain time tolerance.
  • Fig 4a illustrates the Arabic characters obtained after pressing a given key once
  • Fig. 4b refers to Arabic characters obtained after pressing a given key twice
  • Fig. 4c refers to Arabic characters obtained after pressing a given key three times
  • Fig. 4d refers to Arabic characters obtained after pressing a given key four times
  • Fig. 4e refers to Arabic characters obtained after pressing a given key five times.
  • Blank keys in subsequent planes indicate that no further characters are mapped to those keys. Therefore, pressing a key more than the number of characters assigned to it simply causes it to start from the beginning in a rotary fashion. Note that when the number of presses is more than one, the number of presses may also be simulated by pressing any of the function keys, such as F2 to F5, to simulate the corresponding number of presses prior to the character key. Last, pressing the SHIFT key reverses the order of generating the characters. If the modifier keys SHIFT or CAPS-LOCK are active, the order of access of Arabic characters is reversed. For example, if successive presses within tolerance of Key "a” produce I i ) U , then Key "A" produces * 1 1 i I .
  • the two punctuation characters "?” and ".” correspond to their Arabic equivalent ?and ⁇ respectively. All other keys on the keyboard keys not shown in Table 5 produce the same characters shown on the typical QWERTY keyboard.
  • the numbers 0, 1, 2, ... are sometimes used in Arabic as well, although these can be modified to generate the Arabic numerals • , W ⁇ °, ⁇ , y , A S (which are respectively 0,12,3,4,5,6,7,8,9). The modification can occur through the use of the SHIFT or ALT modifier keys.
  • the following example illustrates how to type an Arabic sentence using the keyboard layout of the present invention. Repeated characters shown in brackets are accessed by pressing the association key within a certain time interval. For example, '(dd)' means that the letter key 'd' is pressed twice, within a maximum time interval, to produce J; '(aaa)' means the key 'a' is pressed three times within a maximum time interval to produce ! , and so on.
  • FIG. 5a illustrates the Arabic characters obtained after pressing a given key once
  • Fig. 5b refers to Arabic characters obtained after pressing a given key twice
  • Fig. 5c refers to Arabic characters obtained after pressing a given key three times
  • Fig. 5d refers to Arabic characters obtained after pressing a given key four times
  • Fig. 5e refers to Arabic characters obtained after pressing a given key five times.
  • Blank keys in subsequent planes indicate that no further characters are mapped to those keys. Therefore, pressing a key more times than the number of characters assigned to it simply causes it to start from the beginning in a rotary fashion.
  • An AZERTY layout has the following differences from a QWERTY layout: i) the positions of keys A and Q are swapped, ii) the positions of Z and W are swapped, and iii) Key M is located at the right end of the home-row (or, Row 3).
  • the methodology used to map Arabic characters onto a Latin alphabet keyboard may also be used to separately map Farsi (or Persian) and Urdu characters onto the keyboard, as illustrated respectively in Tables 6 and 7.
  • the Farsi alphabet has four additional characters: b, produced by two presses of the Latin character 'h'; g ,produced by two presses of 'j'; ⁇ , produced by three presses of 'k'; and j , produced by two presses of 'z'.
  • the two punctuation characters "?” and ".” correspond to their Farsi equivalent. All other keys on the keyboard keys not shown in Table 6 produce the same characters shown on the typical QWERTY keyboard.
  • the Urdu script has eight additional characters (compared to the Arabic script), four of which are identical to the aforementioned Farsi characters (i.e., b, ⁇ , ⁇ - ⁇ , and j) and can thus be mapped like the four Farsi characters.
  • the Urdu script has four additional characters:: -> ,produced from two presses of 'd'; j produced from two presses of 'r'; ⁇ produced from three presses of 't'; and ⁇ produced from four presses of 'y'.
  • the two punctuation characters "?” and ".” correspond to their Urdu equivalent. All other keys on the keyboard keys not shown in Table 7 produce the same characters shown on the typical QWERTY keyboard.
  • FIG. 6 A flowchart for text input of the present invention is shown in Fig. 6. Conventions within the flowchart are as follows: italic words indicate program variables & constants; words suffixed with () denote program functions; and underlined words indicate programming keywords.
  • TOL -> time tolerance the program starts with a default 250 milliseconds, but can be reset by the typist; dispPosition -> the position at which the next character is displayed. If it refers to the position of a displayed character, it is over-written by the new character; currTime -> current system time; ale -> Arabic letter counter, used to determine which column to access the
  • [1] Perform one-time preliminary initializations when the editor application loads up as follows: o Set previous key (prevKey) to null, meaning no keys have so far been processed. o Set previous time (prevTime) to be the current time of the system. By the time the user actually enters a key in the future, the time of such a press is recorded; as a result, the time obtained at this initialization stage is set to become "previous" time. o Set tolerance (TOL) to a default time frame (e.g., 300 milliseconds). The user is allowed to modify such a value. Consecutive key presses within this TOL carry the logic that the user indeed wants to access the next character within the adjacency list (or row) shown in Table 5. o Set the display position (dispPosition) to 0 at this time, which is the location where the next letter to be rendered is displayed. In most programming languages (Java, C++, etc.), counting of memory addresses and indexed elements starts from 0, rather than 1.
  • currKey is examined: If it is a function key between F2 and F5, proceed to step 9; otherwise, proceed to step 4.
  • [6] set prevTime to be currTime, this way, the next time a key is pressed, that time overwrites the current value stored in currTime, while the old time is stored in prevTime.
  • This step runs two tests: i) checks whether the currKey and the previous key are the same, and ii) checks whether the time difference is less than or equal to TOL of step 1. If either test fails, proceed to Step 8. It should be noted that for the first pass through the flowchart, the test must fail since prevKey is null and no key on the keyboard can generate it. This means that the Arabic letter column (ale) to choose a letter from will be reset to be the first column as shown in step 8. This column is in the EACT shown in Table 5, the column heading is 0. In total, there are five columns to choose from, labeled 0 to 4 under the ale, wherein "0" means first. If the test passes, proceed to step 11. [8] Reset the Arabic Letter Column to the value 0. This means that the character will be chosen from the column with header value 0 in Table 5 (characters headers run from 0 to 4 and are highlighted in light gray).
  • step 7 This step is reached only if both tests in step 7 succeed. Reaching this step means a different key in the adjacency list is about to be retrieved instead of the one currently picked and displayed. Therefore, ale is incremented by one.
  • step 12 is simplified as follows (because counting starts from 0, 1 means two presses):
  • the display position (dispPosition) is updated. Basically, the last printed character is deleted in preparation for the new character to be printed in its place. This happens typically within a fraction of a second.
  • prevKey is updated to hold this current key, so that when a new key is pressed in the next iteration, both old and new keys are correctly compared in step 7 cycled through.
  • This step fetches the next character to be rendered from the adjacency table of Table 5. For example, EACT[t][l] -> » . Since the value of ale is always correct and within range, the correct corresponding character is retrieved from EACT.
  • [17] display position (dispPosition) is updated in preparation for the next user action of inserting more letters from EACT.
  • the present invention provides a keyboard for typing alphabets that are based on the Arabic script.
  • the invention fully exploits the following intuitive relationships between Latin and characters of the Arabic-based script: (i) direct phonetic similarity; (ii) approximate phonetic similarity; (iii) direct shape similarity; (iv) approximate shape similarity and (v) key location proximity.
  • the present invention makes use of only 26 keys of the Latin alphabet for all of the characters of the Arabic-based script, thereby facilitating the user's task.
  • the minimal use of Latin keys is based on assigning a subset of Arabic-based script characters to a given Latin key. Members of a given subset are accessed in an order, preferably determined by a frequency analysis of the individual members of the Arabic-based alphabet.
  • the keyboard of the present invention can be used as part of or in association with electronic devices such as, but not limited to, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
  • electronic devices such as, but not limited to, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.

Abstract

A keyboard layout for typing in any of the languages whose alphabet is based on Arabic script is designed for typists with any level of familiarity with a Latin alphabet keyboard layout. The keyboard layout reduces the user's learning curve by leveraging the user's familiarity with the Latin alphabet keyboard layout, and the user's knowledge of how alphabet characters of the language are hand-stroked and lexicographically ordered in the Arabic script-based language. For each Latin alphabet letter key, the layout assigns one or more script letters or diacritics based on the phonetic similarity between Latin alphabet characters and alphabet character of the language, sound and shape relationship among the characters, and the frequency distribution analysis of the Arabic script-based alphabet. The language is preferably Arabic, Farsi or Urdu.

Description

KEYBOARD FOR LANGUAGES BASED ON THE ARABIC SCRIPT
FIELD OF THE INVENTION
The present invention relates to keyboards and methods of entering data into computerized systems for languages based on the Arabic script. More specifically, a keyboard layout design provides for each Latin alphabet key to be mapped onto one or more Arabic script-based alphabet and/or diacritical characters.
BACKGROUND OF THE INVENTION
For many computer users, keyboarding is a tedious process and it may take years to become comfortable with inputting text. When learning to type efficiently, users have to familiarize themselves with two particulars about the keyboard layout of interest: i) the location of every character-editing key on the keyboard, and ii) the proper hand and finger to use to signal or press a given key. With adequate keyboard typing tutoring, typists can comfortably type at rates of 50 words per minute or greater.
FIG. 1 shows the typical character printing keys available on most standard keyboards layouts, together with the ideal mapping of such keys to the eight character-typing fingers of a trained computer typist for maximizing touch-typing efficiency and typing throughput, together with the number of rows.
A standard Latin alphabet keyboard layout most typists encounter when learning to type is the English keyboard layout known as QWERTY, shown in FIG. 2, which is named after the arrangement of the first six left letter keys of the top alphabet row on this keyboard. Although many users strive to learn touch-typing, the hunt-and-peck typing style, which typically is slower than touch-typing is the style many typists adopt unless learning measures are taken at an early stage of using a keyboard. Regardless of the style, typists typically become comfortable with whatever layout they use to input text as time progresses, and a sense of where a particular key is located on a keyboard layout tends to become readily available through what is known as neuromuscular facilitation or muscle memory. Due to the wide availability of the QWERTY keyboard layout (QKL), most multilingual computer typists who are trained on the QKL can capitalize on their knowledge when typing in another Latin alphabet language, given the similarity in keyboard layouts. As such, the learning curve remains relatively flat while the user reaches personal typing speeds on the new keyboard layouts. For example, a typical QKL typist can readily start typing in other languages such as Danish, Dutch, Finnish, Swedish, Icelandic, Italian, Norwegian, Portuguese, and Spanish to name a few.
Furthermore, with minimal effort, the same typist can also type with little difficulty using a similar keyboard layout known as QWERTZ, where only the positions of the letter keys Y and Z are interchanged. QWERTZ is widely used in Germany and much of Central Europe. The same knowledge transfer is applicable to other close variants of QWERTY such as AZERTY, used in France and Belgium, and QZERTY, common in Italy. Another common but less competitive keyboard layout is Dvorak, whose keyboard layout is arranged such that faster typing is made easier to achieve.
For QWERTY users who need to type in non-Roman scripted alphabet languages such as Arabic, Armenian, Greek, Hebrew, Farsi (Persian), Urdu or Russian for example, most keyboard layouts are quite different from what a QKL or a Dvorak keyboard layout user is accustomed to. Consequently the user must expend a significant amount of time to learn and memorize a different keyboard layout. In addition, the learning process is wrought with typing mistakes and frustration, as the user strives to learn an unfamiliar layout.
For example, although a QKL user typically memorizes most, or all, of the character key locations, the user is faced with the challenge of having to learn a fresh set of new key locations to type in Arabic-based script languages, such as Arabic, Farsi (Persian), Urdu, Pashto, etc. This occurs even though the majority of the letters of the Arabic-based script are phonetically similar to a corresponding letter in the Latin alphabet.
For example, consider the case of Arabic where 48 characters, as shown in Table 1, are used to write Arabic script. Whereas a user of the QKL or a Dvorak keyboard layout needs to be familiar with only 26 Latin alphabet keys, a user of prior art Arabic keyboard layouts is required to memorize at least 48 different key locations over two or more modes (i.e. Normal, Shift, and Option).
TABLE 1 - Characters of the Arabic Script
Figure imgf000004_0001
One of the most widely-used Arabic keyboard layouts, shown in FIG. 3, is the Arabic Windows™ Keyboard Layout (AWKL). As shown in Table 2, many Arabic letters have a phonetic counterpart in the Latin alphabet. Yet, there is little correlation between the location of Arabic letters on the AWKL and their phonetic counterparts on the QKL. For example, the letter T has the same sound as that of the letter ^in Arabic (Character 3 in Table 1); yet the letter T is located at key 18 of the QKL in FIG. 1, while the letter <^\s found at key 32 of the AWKL as seen in FIG. 3. In fact, only the letter <_πin the AWKL maps onto its English equivalent in QKL, namely the letter S at key 28. Given such low phonetic correlation between keys of the QKL and those of AWKL, there is a steep learning curve for those familiar with the QKL. This leads to continuous typing errors, which in turn lead to many BACK-SPACE key hits, confusion and frustration as the typist tries to locate and type character keys on a fresh layout. As such, a considerable amount of time is required for the QKL user to become familiar with the AWKL. This is further illustrated by the following example. To type the six-letter word BRAZIL using the QKL, the typist presses the keys 42, 17, 27, 38, 21, and 35 on the keyboard layout shown in Figures 1 and 2. This implies that a touch-typist would rely on their fingers' muscle-memory to have their fingers activated in this order: (4, 4), (4, 2), (1, 3), (1, 4), (6, 2), and (7, 3), where the tuple (f, r) indicates that finger f = 1, 2, ..., 8, presses a key at row r = 1, 2, 3, or 4. Now, to type the same word in Arabic based on the phonetic resemblance, the typist needs to type the following Unicode string: ckjijj (Arabic scripting is strictly cursive, left to right, top to bottom). The distinct letters that make up this word are as follows: ^ , j , \ , j , t£ and J. According to Table 2, all of the letters in this example have a one-to-one mapping to six English letter counterparts. However, a typist using the AWKL must press keys 30, 41, 32, 46, 29, and 31, as opposed to the above-mentioned sequence of the QKL. There is no match between the keys pressed on AWKL and those of QKL, even though each Arabic character has a direct phonetic match with a Latin character.
In addition, the AWKL user must activate the fingers' muscle-memory sequence {(4, 3), (4, 4), (5, 3), (7, 4), (3, 3) and (4, 3)}, compared with the QKL sequence {(4, 4), (4, 2), (1, 3), (1, 4), (6, 2), and (7, 3)}. As shown in this example, there is no match between sequence elements of AWKL and that of the QKL. The user is forced to relearn key locations that share the same vocal sound, and as a result, retrain the fingers' muscle-memory to access different keys on the keyboard to produce the same word. This simple example illustrates problems encountered with prior art keyboards when typing text that contain terms or phrases to be transliterated or transcribed into Arabic. Further examples include country and city names, person and/or character names, industrial and commercial product or company names, amongst others.
TABLE 2 - Phonetic pronunciation of letters of the Arabic alphabet
Figure imgf000005_0001
Another drawback of prior art keyboards is the complexity of keys required to produce the modified forms of certain Arabic characters. For example, an AWKL- user presses key 32 to type the Arabic letter I . However, the letter 'has four modified forms: ' , }, Wid *, which are generated by AWKL keys SHIFT 32, SHIFT 19, SHIFT 43, and 39, respectively. While there is significant similarity between the letter 'and its four modified forms for the skilled worker, the AWKL requires the use of five dispersed key locations or access methods (i.e., through pressing the SHIFT modifier key first) for typing a similar Arabic letter. The same disadvantages of the AWKL are found for the letter jand its modified form j, the letter <_sand its modified forms lsand Lf, and each diacritic and its modified forms.
The most frequently occurring ligature in Arabic, V , is a two-letter string made from letters Jand I . This ligature, located at key 42 on the AWKL, frequently occurs in the three other modified forms V , Vand V . However, these modified ligatures are located at keys SHIFT 31, SHIFT 18, and SHIFT 42 on the AWKL, which requires the user to remember four different key locations or access methods to produce essentially the same ligature. These examples illustrated the inefficiency of the AWKL for typing modified forms. Such an approach not only over-clutters the keyboard layout, but also introduces a burden upon the typist who needs to understand and memorize key locations to some degree to achieve satisfactory typing performance.
The AWKL suffers an other drawback by the location of frequently-used Arabic characters. For example, the letters -^pronounced like 'd' as in "dog") and i (pronounced like 'th' as in "then") have both similar sound and shape. Yet, on the AWKL, they are assigned to keys that lie at opposite ends of the keyboard (keys 1 and 25), and are far from the home row. . The same is true of the frequently-used diacritic symbol " which is assigned to key 1. Once again, the user is required to leave the keyboard home row position in order to access frequently occurring letters or symbols.
A further drawback of the AWKL keyboard is the re-arrangement of non- alphanumeric symbols such as parentheses ( ), brackets [ ], braces { }, inequality symbols < > and the forward-slash key / . These symbols are often used in software development and mathematics where users have become accustomed to the location of each symbol, along with the ordering of elements within pairs. On the QKL, the aforementioned symbols are located at keys 10 and 11, 24 and 25, Shift 24 and Shift 25, 45 and 46, and 47, respectively. On the AWKL, these symbols are assigned to key locations 1 1 and 10, 30 and 29, 41 and 40, 24 and 25, and 35, respectively, requiring the QKL user to relearn the position of these keys on the AWKL. A similar problem exists for punctuation keys.
Many of the drawbacks of the AWKL are also found in the keyboard disclosed in U.S. Patent No. 4,298,773 to Diab wherein the keyboard map is not based on any similarity with QKL or any existing Latin keyboard layout.
Similarly, U.S. Patent No. 4,670,842 to Metwaly discloses an Arabic keyboard, wherein the arrangement of Arabic letters is based on the lexicographical order of the Arabic alphabet, rather than any phonetic connection between letters of the Latin alphabet and those of the Arabic alphabet. The layout disclosed in Metwaly does not correspond to the QKL. As such, a typist on the Metwaly keyboard who is familiar with the QKL must learn an entirely unrelated Arabic script layout.
There have been prior attempts made to adapt the Arabic keyboard to QKL typists. U.S. Patent No. 5,416,898 to Opstad correlates many keys when compared to the simple AWKL layout. For example, the following Arabic letters:
ύ , SJ , oβ , J , J , ^ , ζ , t-i , J , t>' , l , ώ , j , ιi and f
on the keyboard disclosed in Opstad are mapped onto the Latin keys N, B, C, Z, L, K, J, F, D, S, A, T, R, Q and M, respectively, of QKL. According to Table 2, this mapping maintains the phonetic correspondence between many Arabic characters and their Latin counterparts. The Arabic symbols !, *, :, ', ", s * and ?are also arranged on this layout to match the location of their Latin counterparts. The symbol keys are also left to match their Latin counterparts with one exception: elements of paired symbols (i.e., o, [], () and {} ) are position-swapped (i.e., ><, ][, )( and } { ), leading to confusion and mistyping. Another drawback of Opstad's keyboard is that a user must memorize the location of 48 Arabic characters over three modes (Normal, Shift and Option). Similar to the AWKL layout, Opstad's keyboard also suffers from the difficulty of access to modified forms and common ligatures.
Another attempt to adapt the Arabic letters to keys on the QKL is ISLAM-91 in Java Script, a free online Arabic keyboard accessible at http://wwwl.architektur.tu- darmstadt.de/islam/ara/. While allowing for some phonetic mapping of Latin characters onto the Arabic alphabet, this layout includes additional non-intuitive mapping of Arabic characters onto Latin keyboard characters. The user must spend additional time to learn and master the new keyboard. For example, the top row of the ISLAM-91 keyboard includes six Arabic letters, which requires the user to leave the home row in order to type such characters. This mapping is counterintuitive in that Arabic characters are mapped onto numbers. Furthermore, the assignment of the letters ^ ^ ι_£ -^ to the bottom row is counter-intuitive, as is the assignment of bracket characters [ and ] to the keys 'Q' and 'W', respectively. In addition, Arabic letters with modified forms are somewhat randomly scattered across the keyboard. Finally, the ISLAM-91 keyboard does not make use of the Latin keys 'P' and 'V.
Another attempt to fashion an Arabic script data input device is described in U.S. Patent No. 6,874,960 to Daoud which discloses a keyboard that comprises a plurality of touch areas that represent symbols such as letters of the Roman script, Greek script, Hebrew script, Arabic script, or Cyrillic script. The touch areas are arranged in groups that include at least two of the touch areas. Each of the groups includes touch areas that are arranged in a distinctive shape that incorporate one or more of the symbols or parts of the symbols. The user enters a symbol by touching a part of the distinctive shape that is recognized with the symbol. There is little or no correspondence between the touch areas and the QWERTY keyboard, nor does Daoud disclose a method of inputting diacritics necessary for the Arabic language.
U.S. Patent No. 6,799,914 to Eo discloses a layout of the Arabic alphabet for a 12-key keyboard found on such devices as mobile telephones and PDAs. The keyboard described by Eo is not suited to standard keyboard typing, and is not comparable to the QWERTY or similar keyboards.
There is a need for a standard keyboard layout for typing in a language whose alphabet is based on Arabic script, which is designed for typists who have little or some knowledge of, or experience with, a Latin alphabet keyboard layout. Such an Arabic alphabet-based keyboard layout should take advantage of the user's familiarity with the Latin alphabet keyboard layout, and the knowledge of how Arabic alphabet characters are hand-stroked and lexicographically ordered in order to reduce the time it takes the user to learn the Arabic alphabet-based layout. In addition, such a layout should be simple to use and intuitive, without requiring use of multiple modes.
SUMMARY OF THE INVENTION
The present invention address es the drawbacks of the prior art by providing a n intuitive keyboard layout for a language based on the Arabic script, based on a mapping of Latin alphabet characters to characters of said language. The Latin alphabet characters form part of a. Latin alphabet, including, but not limited to, QWERTY, QWERTZ, AZERTY and Dvorak. The keyboard layout of the present invention can also be used by those learning to type in an Arabic-based language for the first time. In addition, users of the present invention who, at first, are not familiar with Latin-based keyboards, can quickly learn one or more prior art Latin alphabet keyboards, based on knowledge and experience with the present invention. The invention reduces the user's learning curve by leveraging (i) the user's previous knowledge of Latin alphabet keyboard layouts, (ii) the lexicographic ordering of letters of alphabets based on the Arabic script; and (iii) the manner in which the letters and words are hand-stroked in languages based on the Arabic script.
The keyboard layout of the present invention further remedies the problems present in prior art layout designs as it exploits shape and sound similarities that are inherent in Arabic-based scripts when hand-stroking in the Arabic-based language. In one single mode (the Normal mode), a typical user of the invention who has familiarity of one or more prior art Latin keyboard layouts may be able to (i) type at a rate similar to that achieved on a Latin alphabet layout, (ii) intuitively remember the location of all letters, and (iii) extend his or her typing abilities to intuitively type any of the diacritics commonly used in languages based on the Arabic script.
The term 'keyboard' used herein refers to both physical and virtual keyboards.
The keyboard layout of the present invention assigns characters to keys based on shape, sound, or frequency analysis. In contrast to prior art keyboard layouts that assign one letter per key in any one single mode, or, assign multi-characters uniformly to a single key with little or no regard to the shape-similarity, sound-similarity, or the letter frequency analysis, the keyboard layout of the present invention uses a multicharacter key, and assigns multiple Arabic-based script letters and diacritics to keys based on shape, sound, and letter frequency distribution analysis.
In one aspect of the present invention, there is provided a keyboard for inputting characters of an Arabic script-based language. The keyboard is based on a mapping of a Latin alphabet keyboard onto alphabet characters of that language, wherein the mapping comprises phonetic similarity between Latin alphabet members and alphabet members of the language, shape similarity within alphabet members of the language, and lexicographic ordering of alphabet members within the language. The mapping further comprises frequency analysis of alphabet members of the language. 5.
Each alphabet character of the Latin alphabet keyboard is mapped onto 'n' characters of the language, with n = 1 to 5. The language may be chosen from Arabic,
Farsi (Persian), Urdu, Pashto, Baloch, Malay, Balti, Brahui, Panjabi, Kashmiri,
Sindhi, Uyghur, Kazakh, Kyrgyz, Azerbaijani and Kurdish. The language is preferably a widely-spoken language such as Arabic, Farsi (Persian) or Urdu. The
Latin alphabet keyboard from which the invention is mapped is selected from the group consisting of QWERTY, AZERTY, QZERTY and Dvorak. Furthermore, the keyboard of the present invention can be used as part of or in association with devices such as, but not limited to, laptop computers, desktop computers, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
In another aspect of the present invention, there is provided a keyboard for inputting characters of Arabic, Farsi, and/or Urdu. The keyboard is based on a mapping of a Latin alphabet keyboard onto one or more alphabet characters 13. The keyboard of claim 12, wherein for a Latin alphabet key associated with two or more Farsi characters, said Farsi characters are accessed sequentially by multiple presses of said Latin alphabet key, or by use of Function keys in association with said Latin alphabet key.of each language, wherein the mapping comprises phonetic similarity between Latin alphabet members and alphabet members of each language, shape similarity within alphabet members of a given language, and lexicographic ordering of alphabet members within a given language. The mapping further comprises frequency analysis of alphabet members of the language. Where a Latin alphabet key is associated with two or more Arabic, Farsi or Urdu characters, the characters may accessed sequentially by multiple presses of the Latin alphabet key, or by use of Function keys in association with the Latin alphabet key. The Latin alphabet keyboard from which the invention is mapped is selected from the group consisting of QWERTY, AZERTY, QZERTY and Dvorak. Furthermore, the keyboard of the present invention can be used as part of or in association with electronic devices such as, but not limited to, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
Various objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of preferred embodiments of the invention, along with the accompanying drawings.
BRIEF DESCRIPTION OF FIGURES
FIG. 1 shows the 47 character-producing keys found on prior art Latin keyboard layouts, with labelled row numbers and various shading and texture to illustrate how such keys are mapped to the fingers of a touch-typist.
FIG. 2 shows a typical arrangement of character keys in a prior art QWERTY keyboard layout.
FIG. 3 shows a known keyboard layout for typing in Arabic, called t he Arabic Windows™ Keyboard Layout (AWKL). The illustrated keys show characters in both modes: the normal mode showing the bottom characters, and the shift mode showing the upper characters.
FIGS. 4a - 4e illustrate an Arabic keyboard layout of the present invention correlated to a QWERTY keyboard..
FIGS. 5a - 5e illustrate an Arabic keyboard layout of the present invention correlated to an AZERTY keyboard.
FIG. 6 shows a flowchart for text input using a keyboard of the present invention. DETAILED DESCRIPTION
Before the present invention is described in further detail, it is to be understood that the invention is not limited to the particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention pertains. Although any methods and materials similar or equivalent to those described herein can also be used in practice or test of the present invention, a limited number of exemplary methods and materials are described herein.
Wherever ranges of values are referenced within this specification, sub-ranges therein are intended to be included within the scope of the invention unless otherwise indicated. Where characteristics are attributed to one or another variant of the invention, unless otherwise indicated, such characteristics are intended to apply to all other variants of the invention where such characteristics are appropriate or compatible with such other variants.
The following is given by way of illustration only and is not to be considered limitative of this invention. Many apparent variations are possible without departing from the spirit and scope thereof.
The present invention will be described, without loss of generality, with reference to the prior art Latin keyboard known as QKL. It should be obvious to those skilled in the art, however, that the invention is also applicable to any keyboard layout that hosts Latin alphabet letters. These layouts include, but are not limited to: QWERTY-like layouts (e.g. QWERTZ, AZERTY, etc.), and Dvorak or Dvorak-like layouts.
For the purposes of this invention, the discussion is limited to the most common character printing keys common on most keyboards regardless of the layout type. The Arabic script languages include, but are not limited to: Arabic, Farsi (also known as Persian), Urdu, Pashto, Baloch, Malay, Balti, Brahui, Panjabi (in Pakistan), Kashmiri, Sindhi (in India and Pakistan), Uyghur (in China), Kazakh (in China), Kyrgyz (in China), Azerbaijani (in Iran) and Kurdish (in Iraq and Iran). While the detailed description focuses on a keyboard for Arabic, Urdu and Farsi, the present invention is applicable to the Arabic script languages listed above.
Embodiments of the present invention include, but are not limited to, a complete 1 -toft mapping from the keys on a QKL to all the Arabic letter characters shown in Table 1, where n = 1, 2, 3, 4 or 5. That is, each key from the QKL is mapped to 1, 2, 3, 4 or 5 characters of the Arabic alphabet. For example, 'A' is mapped to five characters of the Arabic alphabet, 'Q' is mapped to one character, 'S' is mapped to two characters, and so on. It will be apparent to those skilled in the art in consideration of the invention, however, that similar mappings may be drawn from any keyboard layout to the Arabic characters.
Starting from the knowledge base of a QKL user, it may be assumed that the locations of the 26 standard Latin alphabet keys have been, to some degree, memorized even by the average typist and that a corresponding muscle memory exists. The keyboard layout of the present invention leverages this muscle memory of key locations of the QKL user by establishing a correlation between each letter in the Latin alphabet and one or more intuitively suitable counterparts from the Arabic letters or diacritic symbols. Such mapping reduces the learning curve of a QKL-familiar user. In establishing such correlations, the invention relies on factors such as: (i) direct sound pronunciation similarity, (ii) approximate sound pronunciation similarity, (iii) direct shape similarity, (iv) approximate shape similarity; (v) key location proximity, and (vi) frequency distribution analysis of Arabic letters. In cases where more than one candidate Arabic character can be assigned to a QWERTY key, the invention relies on a letter frequency distribution analysis of Arabic letters performed on more than 1,375,058 words (giving a total of 5,452,865 letters), This analysis is documented at http://www.intellaren.com/web/software/alfa/alfa.html , the results of which are shown in Table 3. The number column in Table 3 corresponds to the number column of Table 1. The present invention also introduces two letters imported from the English alphabet: P, transcribed in Arabic as v> and V, transcribed in Arabic as <-* . These two letters are imported for purposes of accurate transcription and transliteration from Latin to Arabic.
TABLE 3 - Frequency of Arabic letters in descending order
Figure imgf000015_0001
As shown in Table 1, the Arabic alphabet available on any keyboard layout designed to produce Arabic script is typically composed of the 48 characters (38 letters and 10 non-letter symbols). These are grouped as follows: (i) characters 1 to 28 are the 28 primary Arabic letters consisting of 25 consonants and three vowels; (ii) characters 29 to 36 are eight modified letters; (iii) characters 37 and 38 are two imported English letters for transcription purposes; (iv) characters 39 to 47 are the nine diacritic symbols used as an aid for correct pronunciation; and (v) character 48 is a letter- stretching symbol that enhances readability of the script. Similarities in the shape and sound of Arabic letters form a significant aspect of the lexicographical ordering within the Arabic alphabet. Students of the Arabic language often exploit these similarities in order to learn and memorize the Arabic alphabet. In some instances, successive letters resemble each other in shape and/or sound, and vary only in the number of dots. Examples of successive letters that have shape similarity include the pair ^and ώ(B and T) and the pair jand J(R and Z). Examples where successive letters share both sound and shape similarity include the pair ^and ώ(T and TH as in "thank"), and the pair J and J(D and TH and in "then"). In each of these examples, the succeeding letter has one more dot than its predecessor. Such an observation is important in the development of the keyboard layout of the present invention.
Further, when hand-writing the aforementioned Arabic letters, the succeeding letter is usually written with one or more strokes than its predecessor, where a stroke is a single but continuous hand sketch or a mouse drag without any spatial discontinuity in the form of lifting the hand or releasing the mouse, respectively. The additional strokes are usually in the form of one or more dot(s) stroked above or below the letter. Therefore, the user who writes in Arabic knows that whereas a letter takes one stroke to draw, a similar succeeding letter would take one or more strokes to draw. Examples from Table 1 include: (i) letter 14, o^is written in one stroke, whereas the succeeding letter, o°, requires an additional stroke in the form of a dot; and (ii) letter 16, -^, can be written in one or two strokes, whereas the succeeding letter, -k , requires an additional stroke in the form of a dot.
In the keyboard layout of the present invention, each Latin key is associated with one or more alphabetic or diacritic character from the Arabic set of letters and diacritics. Where the Latin key is associated with two or more Arabic characters, the second Arabic character is produced by pressing the corresponding Latin key twice within a given time interval (for example, between 150 - 750 milliseconds). Similarly, the third Arabic character is produced by pressing the corresponding Latin key a third time within the same or a similar allowed tolerance after the second press, and so on. Performing the same operations while depressing the SHIFT key can produce the results in reverse order. Alternatively, the function keys (e.g. F2, F3, F4 and F5) can be used to simulate the number of times a Latin key should be pressed in order to produce the required Arabic character.
According to the phonetic pronunciation of the 28 Arabic letters listed in Table 2, there exists a direct one-to-one phonetic correspondence between 19 Latin and Arabic letters as follows:
1. A I
2. B
3. C O^
4. D J
5. F ι_i
6. H
7. J E
8. K Δ
9. L J
10. M
11. N -> L)
12. Q -> (i
13. R -> j
14. S U"
15. T
16. V
17. W J
18. Y (J
19. Z -> j
It should be noted that while the letter V has no equivalent in the Arabic alphabet, the letter <-* (which is a member of the Farsi (Persian) alphabet), is used for the purposes of accurately transcribing or transliterating Latin nouns containing the letter V.
Nineteen of the Latin letters can directly reference nineteen Arabic letters solely based on direct vocal sound similarity. Statistically, this means that 73% (19 of 26) of the Latin letters can already be used to directly reference 50% (19 of 38) of the Arabic characters listed in Table 1 (i.e. character numbers 1 to 38 of Table 1).
Next, nine of the above Arabic letters can be grouped with subsets of Arabic letters based on direct sound- and/or shape-similarity: 20. A -> and
21. B ->
22. C -> o= -» o3
23. D -> J * i
24. K -» <il "> C
25. S ^ o -^ o
26. T -» ^ -> Ϊ and
27. W ^ j -> J
28. Y -> L? -> LS and is
As such, fourteen other Arabic letters can be indirectly referenced from the above 19 Latin letters already known to the QKL user based on intuitive relationships amongst the Arabic letters themselves. This accounts for another 37% (14 of 38) of Arabic letters.
Therefore, in accordance with the present invention, 73% of the Latin alphabet corresponds to 87 % (33 of 38) of the Arabic letters.
The three primary Arabic vowels ( I, jand LS) are grouped with their respective several modified forms, based on similarity in both sound and shape.
The letter <-J (pronounced as lpaa ' ) is phonetically identical to the letter P. Although not part of the Arabic alphabet per se, it is a member of the Urdu and Farsi (Persian) alphabets which are derived from the Arabic alphabet. However, the letter y is used in Arabic scripts today for the purposes of accurately transcribing Latin nouns containing the letter P. It should be noted that M and v are both shape- and sound- similar; shape-wise, the latter is rendered with two extra dots below. This justifies mapping v to the second press of the letter key B.
The letter o^ succeeds the letter ι_κ> in the Arabic alphabet. Since it requires only one more stroke than o-3 in the form of a dot, and is less frequent than the letter o3 , it is assigned to the second press of letter key C.
The letter i is pronounced like 'th' (as in the word 'than'). The shape- and sound- similarity between J and the letter -i is apparent. Furthermore, J is written with one more stroke than ^ in the form of a dot than J , and it occurs 2.2 times less frequently than -i . It is therefore intuitive to assign it to the second press of letter key D.
The letter £ (pronounced lkhaa') has no equivalent letter sound in the English language. However, it is most similar in sound to the letter <^ . Since the letter £ is written with two strokes, and is less frequent than ^ , it is intuitive to assign it to the second press of letter key K.
The letter ιj> is phonetically identical to the sound produced by 'SH'. It also succeeds the letter o" in the Arabic alphabet. The shape- and sound-similarity between o" and J> is apparent. Furthermore, <j≥ has three dots on top whereas the letter o* is dot- less, and (jϊ occurs 2.7 times less frequently than <_>" . It is therefore intuitive to assign (J" to the second press of letter key S.
The character » has the same vocal sound as that of the Latin letter T, but its use is restricted to finishing off feminine nouns only. The direct sound pronunciation similarity warrants rendering » accessible through the second press of letter key T. Meanwhile, the letter ^ is pronounced like 'th' (as in 'three'), and is similar in shape to the letter ^ . Since the letter ^ occurs less frequently than the letter » (according to Table 3), it is made accessible through three presses of the letter key T.
From the above groupings, it follows that the letter key 'A' is assigned five Arabic characters ( ι , ' , ! , ' and <■); the letter key 'B' is assigned two Arabic characters (^ and v ), and so on. For example, to type the letter <_£, a typist either i) presses S twice within a specified time tolerance, ii) presses SHIFT S, where the SHIFT key causes reverse accessibility, or simply iii) presses F2 and then the letter S to obtain the character ι_A Similarly, to generate the letter \ the typist either i) presses A four times while respecting the time frame, ii) presses SHIFT 'A' twice while respecting the allowed time frame, or iii) presses F4 and then presses the letter key 'A'.
Where three or more Arabic characters are assigned to one Latin letter key (as in the case of A, T and Y), the letter frequency analysis described above provides a basis for the order of assignment of Arabic letters to a given Latin letter key. As an example, according to the frequency analysis shown in Table 3, the letters ^> , * , ^ occur at respective frequencies of 2.64%, 1.38% and 0.85%. All three letters are accessible through the key 'T'. Since "» occurs more frequently than ^ , it can be obtained by pressing the key 'T' twice within a certain time interval, while ώ can be obtained by pressing 'T' three times. Alternatively, the SHIFT key or the function keys F2 and F3 can be used in conjunction with the key 'T' to produce » or ^ ,
There remains only five out of thirty-eight Arabic letters from Table 1 to be assigned, namely: c the pair -l=and -^, and the pair £and £.
The letter ^is very close in vocal sound to that generated by -*, which corresponds to the letter H in English. When transcribing nouns containing the letter ^from Arabic to English, for example, it is typically transcribed and transliterated as "H". Since the letter G has yet to be assigned and is to the immediate left of H on the QWERTY keyboard layout, it is assigned to the letter £. It should be noted that for Latin alphabet keyboard layouts other than QWERTY, the letter key 'G' may not be to the immediate left of the letter key 'H'. Nonetheless, due to the shape similarity between 'G' and ζ, such keyboards can also maintain the same assignment. Alternatively, given the sound similarity between the letters ζβnά -*, and the fact that ζ is several times less frequent than -* (see Table 3), it follows that ζ;can be accessed by pressing the letter 'H' key twice.
Next, we consider the lexicographical ordering of the letters o=> , u^ , -1» and -t , which are listed as having positions 14, 15, 16 and 17, respectively, in Table 1. Since the letter key 'C is assigned to both o=> and o° , it follows that the unassigned letter key 'X', which is to the immediate left of the letter key 'C on the QWERTY keyboard, is assigned to -k Furthermore, -^ requires one more stroke (in the form of a dot) than J= , its lexicographical predecessor. It follows naturally to assign -t to the second press of letter key X. This assignment, like that of letter key 'G' to ζ, uses key location proximity in conjunction with the letters in the Arabic alphabet. It should be noted that in most other Latin-based keyboard layouts (a notable exception being Dvorak), the letter key 'X' is always to the left of the letter key 'C. Attention is now directed to the last pair: £and £. When an Arabic word containing the letter £is transcribed into English or another Latin-based language, it is typically represented by vowels such as A, O, I, U. Since the letter P is unused in the Arabic alphabet and is to the immediate right of a subset of the vowel letters O, I, and U on a QWERTY keyboard, it is assigned to the Arabic letter pair £and
Figure imgf000021_0001
As discussed below, the vowel letters O, I and U are assigned to Arabic diacritic marks based on sound similarity between the diacritics and English vowels. Therefore, the remaining letters are assigned as follows:
29. G ^ c
30. P ^ t 31. X -> J»
And in turn, the last two Arabic letters in the above list can reference the two dotted letters that resemble them in shape:
32. P -> t ~> i 33. X ^ !> -» i
For convenience and consistency reasons, optionally, the letter £ is accessible through the letter ζ (key 29 above) as well as through the letter t-1 .
In addition to the three long vowels mentioned above, there are also short vowels represented by simple-shaped symbols that are written above or under an Arabic letter. These symbols, or diacritics, (in Arabic, harakaat -
Figure imgf000021_0002
or tashkeel - Jj≤-ώ) serve as aids to accurate pronunciation of words. Table 4 provides a list of diacritics, along with examples of their usages when compared to English, as an example of a Latin-based language.
As seen from Table 4, Arabic diacritics are best approximated on the QWERTY keyboard by corresponding English vowels. In the present embodiment, the letter O is assigned the dammah and diacritics similar to the vowel '0'; namely, the dammah tanween * , and the sukoon , ' . The diacritic dammah tanween is basically two dammahs and can therefore be written as " instead of * , but the latter is preferred because it is quicker when written by hand. The diacritic sukoon is seated on top of any alphabet letter to indicate that there is a no vowel sound associated with that letter. The shape-similarity between the letter O and the sukoon ' is obvious; it is therefore intuitive to assign the character ' to the next available press of O.
Next, the letter I is assigned the kasrah and the related diacritic kasrah tanween, which is basically two kasrahs. In order to maintain key location proximity amongst the diacritics, the letter key 'U' is designated to render the fat-ha and the related diacritics fat-ha tanween and the superscripted alif. The fat-ha tanween is basically two fat-has , while the superscripted alif , also known as alif mamdoodah, is placed above letters and plays the role of a long alif vowel; it is almost exclusively used in sacred Koranic script. It should be noted that whereas the ideal letter for modeling fat-ha and related diacritics is the letter A, this key is already used for modeling the long vowel and/or consonant forms of A, the letter I and four associated modified forms. Therefore, the diacritics are generated as follows:
34. O -> '
35. I ->
36. U ->
The three diacritics can in turn be used as pointers to several other diacritics that are shape- and/or sound-dependent, namely:
37. O -> ' -> - and ' 38. I ^ ^ ,
39. U -» -> ' and '
The remaining characters to be assigned are the shaddah, which is usually used in combination with most of the above diacritics, and the extending character which is added to horizontally stretch the shape of certain letters for better readability. These two characters are assigned to the one remaining and unassigned vowel: E. This assignment is suitable for rendering the shaddah since the shape of the shaddah resembles the shape of an E that is rotated by 90 degrees; and it is suitable for the shape-extending character since E can be regarded as an acronym for the English word extension, which is suggestive of the character. Therefore, the assignment is as follows:
40. E -> ' and -
TABLE 4 - Diacritics of the Arabic language
Figure imgf000023_0001
* The hamzah and maddah interact with short vowels to modify pronunciation.
The above assignments are summarized in which shows an English-Arabic Character Table (EACT) of the present invention. The keyboard layout corresponding to Table 5 is shown in the five planes illustrated in Figs. 4a - 4e. Each plane is a function of the number of key presses provided they are performed within a certain time tolerance. In particular, Fig 4a illustrates the Arabic characters obtained after pressing a given key once; Fig. 4b refers to Arabic characters obtained after pressing a given key twice; Fig. 4c refers to Arabic characters obtained after pressing a given key three times; Fig. 4d refers to Arabic characters obtained after pressing a given key four times; and Fig. 4e refers to Arabic characters obtained after pressing a given key five times. Blank keys in subsequent planes indicate that no further characters are mapped to those keys. Therefore, pressing a key more than the number of characters assigned to it simply causes it to start from the beginning in a rotary fashion. Note that when the number of presses is more than one, the number of presses may also be simulated by pressing any of the function keys, such as F2 to F5, to simulate the corresponding number of presses prior to the character key. Last, pressing the SHIFT key reverses the order of generating the characters. If the modifier keys SHIFT or CAPS-LOCK are active, the order of access of Arabic characters is reversed. For example, if successive presses within tolerance of Key "a" produce I i ) U , then Key "A" produces * 1 1 i I . The two punctuation characters "?" and "." correspond to their Arabic equivalent ?and \ respectively. All other keys on the keyboard keys not shown in Table 5 produce the same characters shown on the typical QWERTY keyboard. The numbers 0, 1, 2, ... are sometimes used in Arabic as well, although these can be modified to generate the Arabic numerals , WΛ °,Λ,y,AS (which are respectively 0,12,3,4,5,6,7,8,9). The modification can occur through the use of the SHIFT or ALT modifier keys.
The following example illustrates how to type an Arabic sentence using the keyboard layout of the present invention. Repeated characters shown in brackets are accessed by pressing the association key within a certain time interval. For example, '(dd)' means that the letter key 'd' is pressed twice, within a maximum time interval, to produce J; '(aaa)' means the key 'a' is pressed three times within a maximum time interval to produce ! , and so on.
Arabic sentence:
Figure imgf000025_0001
ι_«jjaJi JLi...) jj-aj JJ-UU.1I CJUJL Aiϋλc <1
Figure imgf000025_0002
Ii* Keys used: h(dd)a al(aaa)(kk)trap Ih plaq(tt) blwgat almfatyg wxrq (aaa)d(kk)al algrwf balprby. English Translation: The invention relates to keyboards and methods of entering letters in Arabic.
Another embodiment of the present invention is illustrated in the five planes of Figs. 5a - 5e, where the 48 Arabic characters of Table 1 are mapped onto an AZERTY keyboard, based on the relationships between Arabic and Latin characters discussed above, and the frequency analysis summarized in Table 3. Each plane is a function of the number of key presses provided they are performed within a certain time tolerance. In particular, Fig 5a illustrates the Arabic characters obtained after pressing a given key once; Fig. 5b refers to Arabic characters obtained after pressing a given key twice; Fig. 5c refers to Arabic characters obtained after pressing a given key three times; Fig. 5d refers to Arabic characters obtained after pressing a given key four times; and Fig. 5e refers to Arabic characters obtained after pressing a given key five times. Blank keys in subsequent planes indicate that no further characters are mapped to those keys. Therefore, pressing a key more times than the number of characters assigned to it simply causes it to start from the beginning in a rotary fashion. An AZERTY layout has the following differences from a QWERTY layout: i) the positions of keys A and Q are swapped, ii) the positions of Z and W are swapped, and iii) Key M is located at the right end of the home-row (or, Row 3).
TABLE 5 - Mapping of Arabic characters to a QWERTY keyboard
Figure imgf000026_0001
The methodology used to map Arabic characters onto a Latin alphabet keyboard may also be used to separately map Farsi (or Persian) and Urdu characters onto the keyboard, as illustrated respectively in Tables 6 and 7.
As seen in Table 6, the Farsi alphabet has four additional characters: b, produced by two presses of the Latin character 'h'; g ,produced by two presses of 'j'; ^ , produced by three presses of 'k'; and j , produced by two presses of 'z'. The two punctuation characters "?" and "." correspond to their Farsi equivalent. All other keys on the keyboard keys not shown in Table 6 produce the same characters shown on the typical QWERTY keyboard.
As shown in Table 7, the Urdu script has eight additional characters (compared to the Arabic script), four of which are identical to the aforementioned Farsi characters (i.e., b, ζ , <-≤ , and j) and can thus be mapped like the four Farsi characters. In addition, the Urdu script has four additional characters:: -> ,produced from two presses of 'd'; j produced from two presses of 'r'; ώ produced from three presses of 't'; and ^ produced from four presses of 'y'. The two punctuation characters "?" and "." correspond to their Urdu equivalent. All other keys on the keyboard keys not shown in Table 7 produce the same characters shown on the typical QWERTY keyboard.
In each of the Farsi and Urdu keyboards, The numbers 0, 1, 2, ... of the Latin character keyboard can be modified to generate the respective Farsi and Urdu numerals The modification can occur through the use of the SHIFT or ALT modifier keys.
TABLE 6 - Mapping of Farsi (Persian) characters to a QWERTY keyboard
Figure imgf000028_0001
TABLE 7 - Mapping of Urdu characters to a QWERTY keyboard
Figure imgf000029_0001
A flowchart for text input of the present invention is shown in Fig. 6. Conventions within the flowchart are as follows: italic words indicate program variables & constants; words suffixed with () denote program functions; and underlined words indicate programming keywords.
The descriptions within the flowchart are as follows:
- prevKey -> previous pressed key;
- prevTime -> previous recorded system time;
TOL -> time tolerance, the program starts with a default 250 milliseconds, but can be reset by the typist; dispPosition -> the position at which the next character is displayed. If it refers to the position of a displayed character, it is over-written by the new character; currTime -> current system time; ale -> Arabic letter counter, used to determine which column to access the
Arabic character from;
- EACT[i]\j]-ϊ Access the fh row,/Λ column in English-Arabic Character Table (EACT, shown in Table 5); tChar -> character retrieved from Table 5; and - displayik, p) displays key k at position p.
The rendering of characters on an editor as based on user typing proceeds according to the following algorithm (each box is numbered with its corresponding step below) :
[1] Perform one-time preliminary initializations when the editor application loads up as follows: o Set previous key (prevKey) to null, meaning no keys have so far been processed. o Set previous time (prevTime) to be the current time of the system. By the time the user actually enters a key in the future, the time of such a press is recorded; as a result, the time obtained at this initialization stage is set to become "previous" time. o Set tolerance (TOL) to a default time frame (e.g., 300 milliseconds). The user is allowed to modify such a value. Consecutive key presses within this TOL carry the logic that the user indeed wants to access the next character within the adjacency list (or row) shown in Table 5. o Set the display position (dispPosition) to 0 at this time, which is the location where the next letter to be rendered is displayed. In most programming languages (Java, C++, etc.), counting of memory addresses and indexed elements starts from 0, rather than 1.
[2] get the key the user just pressed; store it in a variable name current key (currKey).
[3] currKey is examined: If it is a function key between F2 and F5, proceed to step 9; otherwise, proceed to step 4.
[4] get the current time of the system and store it in variable currTime.
[5] calculate the time difference (timeDiff) between the two recorded times shown in the equation.
[6] set prevTime to be currTime, this way, the next time a key is pressed, that time overwrites the current value stored in currTime, while the old time is stored in prevTime.
[7] This step runs two tests: i) checks whether the currKey and the previous key are the same, and ii) checks whether the time difference is less than or equal to TOL of step 1. If either test fails, proceed to Step 8. It should be noted that for the first pass through the flowchart, the test must fail since prevKey is null and no key on the keyboard can generate it. This means that the Arabic letter column (ale) to choose a letter from will be reset to be the first column as shown in step 8. This column is in the EACT shown in Table 5, the column heading is 0. In total, there are five columns to choose from, labeled 0 to 4 under the ale, wherein "0" means first. If the test passes, proceed to step 11. [8] Reset the Arabic Letter Column to the value 0. This means that the character will be chosen from the column with header value 0 in Table 5 (characters headers run from 0 to 4 and are highlighted in light gray).
[9] Being in this step means that a function key between F2 and F5, inclusive, has been pressed already, so the actual QWERTY letter key must be obtained and stored in currKey.
[10] The value of ale is updated accordingly to reflect the column that must be accessed in Table 5 to retrieve the corresponding Arabic character from. Here, we have arrived from step 3 followed by step 9. If the function key pressed is F2, this means that two presses are simulated, so the second column in the Arabic Letter Columns in Table 5 is accessed (Column with heading 1). Similarly, if F4 is pressed, ale is set to 3 to access the fourth column.
[11] This step is reached only if both tests in step 7 succeed. Reaching this step means a different key in the adjacency list is about to be retrieved instead of the one currently picked and displayed. Therefore, ale is incremented by one.
[12] This step precisely identifies the column from which to select the Arabic letter. Given the complexity of this step, a number of examples are provided. Note that the symbol "%" means modulus, also referred to as the mod operation in computing. Modulus is simply another word for remainder when a division operation is carried out. Therefore, the equation (a % b) reads a mod b. So, 7 mod 5 is 2; 12 % 5 is 2; and 3 mod 5 is 3.
To type *, the typist presses letter key t twice. The letter t has a numerical value which maps it to the correct row in EACT of Table 5. The number of Arabic letters accessible through t is 3 (see column NOAC). Therefore, the equation in step 12 is simplified as follows (because counting starts from 0, 1 means two presses):
ale = 1 % EACT[numerical value of t] [number of accessible letters] ale = 1 % 3 ale = 1, which is used to access the second column (remember, counting starts from
0) Now, suppose the user press t four times, but there are only three corresponding Arabic characters, then the equation will still produce the correct Arabic letter after cycling all the way through. In other words, pressing t four times for the present invention is the same as pressing it once. This is illustrated as follows:
ale = 3 % EACT[numerical value of t] [number of accessible letters] ale = 3 % 3 ale = 0, which is used to access the first column (the column with heading 0).
[13] the display position (dispPosition) is updated. Basically, the last printed character is deleted in preparation for the new character to be printed in its place. This happens typically within a fraction of a second.
[14] Using the same approach as in step 6, prevKey is updated to hold this current key, so that when a new key is pressed in the next iteration, both old and new keys are correctly compared in step 7 cycled through.
[15] This step fetches the next character to be rendered from the adjacency table of Table 5. For example, EACT[t][l] -> » . Since the value of ale is always correct and within range, the correct corresponding character is retrieved from EACT.
[16] the fetched character, tChar, is displayed at the designated position, disposition.
[17] display position (dispPosition) is updated in preparation for the next user action of inserting more letters from EACT.
In summary, the present invention provides a keyboard for typing alphabets that are based on the Arabic script. The invention fully exploits the following intuitive relationships between Latin and characters of the Arabic-based script: (i) direct phonetic similarity; (ii) approximate phonetic similarity; (iii) direct shape similarity; (iv) approximate shape similarity and (v) key location proximity. In addition, the present invention makes use of only 26 keys of the Latin alphabet for all of the characters of the Arabic-based script, thereby facilitating the user's task. The minimal use of Latin keys is based on assigning a subset of Arabic-based script characters to a given Latin key. Members of a given subset are accessed in an order, preferably determined by a frequency analysis of the individual members of the Arabic-based alphabet.
The keyboard of the present invention can be used as part of or in association with electronic devices such as, but not limited to, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
CONCLUSION
The foregoing has constituted a description of specific embodiments showing how the invention may be applied and put into use. These embodiments are only exemplary. The invention in its broadest, and more specific aspects, is further described and defined in the claims which now follow.
These claims, and the language used therein, are to be understood in terms of the variants of the invention which have been described. They are not to be restricted to such variants, but are to be read as covering the full scope of the invention as is implicit within the invention and the disclosure that has been provided herein.

Claims

What is claimed is:
1. A keyboard for inputting characters of an Arabic script-based language, said keyboard based on a mapping of a Latin alphabet keyboard onto characters of said language, wherein said mapping comprises:
phonetic similarity between Latin alphabet characters and alphabet character of said language;
shape similarity within alphabet characters of said language; and
lexicographic ordering of alphabet characters of said language.
2. The keyboard of claim 1 wherein said mapping further comprises frequency analysis of alphabet characters of said language.
3. The keyboard of claim 1 or 2, wherein said language is selected from the group consisting of Arabic, Farsi, Urdu, Pashto, Baloch, Malay, Balti, Brahui, Panjabi, Kashmiri, Sindhi, Uyghur, Kazakh, Kyrgyz, Azerbaijani and Kurdish.
4. The keyboard of claim 3, wherein said language is selected from the group consisting of Arabic, Farsi and Urdu.
5. The keyboard of any one of claims 1 to 4, wherein each alphabet character of the Latin alphabet keyboard is mapped onto 'n' characters of said language, with n = l to 5.
6. The keyboard of any one of claims 1 to 5, wherein said Latin alphabet keyboard is selected from the group consisting of QWERTY, AZERTY , QZERTY and Dvorak.
7. The keyboard of any one of claims 1 to 6, wherein said keyboard is adapted to activate a device selected from the group consisting of laptop computers, desktop computers, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
8. The keyboard of any one of claims 1 to 7, wherein said keyboard is virtual.
9. A keyboard for inputting characters of a language selected from Arabic, Farsi and Urdu, said keyboard based on a mapping of a Latin alphabet keyboard onto one or more of said characters, wherein said mapping is based on:
phonetic similarity between Latin alphabet characters and alphabet characters of said language; shape similarity within alphabet characters of said language; and lexicographic ordering of alphabet characters within said language.
10. The keyboard of claim 9, wherein said mapping further comprises a frequency analysis of alphabet characters of said language.
11. The keyboard of claim 9 or 10, wherein said language is Arabic and the mapping comprises:
Figure imgf000037_0001
12. The keyboard of claim 11, wherein for a Latin alphabet key associated with two or more Arabic characters, said Arabic characters are accessed sequentially by multiple presses of said Latin alphabet key, or by use of Function keys in association with said Latin alphabet key.
13. The keyboard of claim 9 or 10, wherein said language is Farsi and the mapping comprises:
Figure imgf000038_0001
14. The keyboard of claim 13, wherein for a Latin alphabet key associated with two or more Farsi characters, said Farsi characters are accessed sequentially by multiple presses of said Latin alphabet key, or by use of Function keys in association with said Latin alphabet key.
15. The keyboard of claim 9 or 10, wherein said language is Urdu and the mapping comprises:
Figure imgf000039_0001
16. The keyboard of claim 15, wherein for a Latin alphabet key associated with two or more Urdu characters, said Urdu characters are accessed sequentially by multiple presses of said Latin alphabet key, or by use of Function keys in association with said Latin alphabet key.
17. The keyboard of any one of claims 9 to 16, wherein Latin alphabet keyboard is selected from the group consisting of QWERTY, AZERTY , QZERTY and Dvorak.
18. The keyboard of any one of claims 9 to 17, wherein said keyboard is adapted to activate adevice selected from the group consisting of laptop computers, personal computers, wireless phones, handheld computers, MP3 playing devices, interactive remote controls, two-way pagers, automobile PCs, navigational computers, data loggers, assistance technology devices, electronic games, and graphic pads.
19. The keyboard of any one of claims 9 to 18, wherein said keyboard is virtual.
Figure imgf000041_0001
PCT/CA2010/000219 2009-02-20 2010-02-18 Keyboard for languages based on the arabic script WO2010094121A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15415009P 2009-02-20 2009-02-20
US61/154,150 2009-02-20

Publications (1)

Publication Number Publication Date
WO2010094121A1 true WO2010094121A1 (en) 2010-08-26

Family

ID=42633389

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2010/000219 WO2010094121A1 (en) 2009-02-20 2010-02-18 Keyboard for languages based on the arabic script

Country Status (1)

Country Link
WO (1) WO2010094121A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110216012A1 (en) * 2008-11-20 2011-09-08 Elias Khoury Designated keyboard for chatting in arabic
US9298277B1 (en) 2015-12-02 2016-03-29 Sheikha Sheikha Salem Alsabah Method for typing Arabic letters and associated diacritics
WO2017003029A1 (en) * 2015-07-01 2017-01-05 조돈우 Arabic alphabet input device
CN106959764A (en) * 2016-07-19 2017-07-18 敬永权 It is a kind of to contribute to the code input method of correct writing Chinese characters
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112507734B (en) * 2020-11-19 2024-03-19 南京大学 Neural machine translation system based on romanized Uygur language

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2007413A (en) * 1977-10-31 1979-05-16 Diab K M Processing Arabic-Farsi languages
US4507734A (en) * 1980-09-17 1985-03-26 Texas Instruments Incorporated Display system for data in different forms of writing, such as the arabic and latin alphabets
US4527919A (en) * 1978-02-07 1985-07-09 Lettera Arabica S.A.R.L. Method for the composition of texts in Arabic letters and composition device
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
US20070222644A1 (en) * 2006-03-08 2007-09-27 Young-Jae Jung Keypad array of portable terminal for input of alphabetic letters
US20080077393A1 (en) * 2006-09-01 2008-03-27 Yuqing Gao Virtual keyboard adaptation for multilingual input

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2007413A (en) * 1977-10-31 1979-05-16 Diab K M Processing Arabic-Farsi languages
US4527919A (en) * 1978-02-07 1985-07-09 Lettera Arabica S.A.R.L. Method for the composition of texts in Arabic letters and composition device
US4507734A (en) * 1980-09-17 1985-03-26 Texas Instruments Incorporated Display system for data in different forms of writing, such as the arabic and latin alphabets
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
US20070222644A1 (en) * 2006-03-08 2007-09-27 Young-Jae Jung Keypad array of portable terminal for input of alphabetic letters
US20080077393A1 (en) * 2006-09-01 2008-03-27 Yuqing Gao Virtual keyboard adaptation for multilingual input

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110216012A1 (en) * 2008-11-20 2011-09-08 Elias Khoury Designated keyboard for chatting in arabic
US8531405B2 (en) * 2008-11-20 2013-09-10 Elias Khoury Designated keyboard for chatting in arabic
WO2017003029A1 (en) * 2015-07-01 2017-01-05 조돈우 Arabic alphabet input device
US9298277B1 (en) 2015-12-02 2016-03-29 Sheikha Sheikha Salem Alsabah Method for typing Arabic letters and associated diacritics
CN106959764A (en) * 2016-07-19 2017-07-18 敬永权 It is a kind of to contribute to the code input method of correct writing Chinese characters
CN106959764B (en) * 2016-07-19 2019-10-22 敬永权 A kind of code input method facilitating correct writing Chinese characters
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112507734B (en) * 2020-11-19 2024-03-19 南京大学 Neural machine translation system based on romanized Uygur language

Similar Documents

Publication Publication Date Title
US8200865B2 (en) Efficient method and apparatus for text entry based on trigger sequences
EP0769175B9 (en) Multiple pen stroke character set and handwriting recognition system
US20070016862A1 (en) Input guessing systems, methods, and computer program products
US20170206004A1 (en) Input of characters of a symbol-based written language
US10133479B2 (en) System and method for text entry
US20120326988A1 (en) Multilingual key input apparatus and method thereof
KR20120107110A (en) Features of data entry system
US20050017955A1 (en) User-friendly brahmi-derived hindi keyboard
KR20050119112A (en) Unambiguous text input method for touch screens and reduced keyboard systems
EP3518083A1 (en) Multilingual character input device
JP2003015808A (en) Touch-type key input apparatus
WO2003014983A1 (en) Method of and apparatus for selecting symbols in ideographic languages
WO2010094121A1 (en) Keyboard for languages based on the arabic script
De Rosa et al. T18: an ambiguous keyboard layout for smartwatches
CN105683891B (en) Inputting tone and note symbols by gestures
US9563282B2 (en) Brahmi phonemics based keyboard for providing textual inputs in indian languages
Sarcar et al. Eyeboard++ an enhanced eye gaze-based text entry system in Hindi
JP2022094941A (en) Character input method, character input program, and character input device
AbuHmed et al. UOIT keyboard: A constructive keyboard for small touchscreen devices
JP2005275635A (en) Method and program for japanese kana character input
Go et al. Itone: a japanese text input method for a dual joystick game controller
JP7334168B2 (en) Touchscreen user interface with multilingual support
Nakamura et al. A Flick-based Japanese Tablet Keyboard using Direct Kanji Input.
Roussille et al. DUCK: a deDUCtive Keyboard
CN103309595A (en) Display screen soft keyboard used for inputting Chinese characters by using wrongly written characters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10743366

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112 (1) EPC (EPO FORM 1205A DATED 16/01/2012)

122 Ep: pct application non-entry in european phase

Ref document number: 10743366

Country of ref document: EP

Kind code of ref document: A1