CA1258135A - Data stream shaping of arabic characters - Google Patents

Data stream shaping of arabic characters

Info

Publication number
CA1258135A
CA1258135A CA000507434A CA507434A CA1258135A CA 1258135 A CA1258135 A CA 1258135A CA 000507434 A CA000507434 A CA 000507434A CA 507434 A CA507434 A CA 507434A CA 1258135 A CA1258135 A CA 1258135A
Authority
CA
Canada
Prior art keywords
characters
arabic
script
data
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA000507434A
Other languages
French (fr)
Inventor
Derek K.W. Smith
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IBM Canada Ltd
Original Assignee
IBM Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IBM Canada Ltd filed Critical IBM Canada Ltd
Priority to CA000507434A priority Critical patent/CA1258135A/en
Application granted granted Critical
Publication of CA1258135A publication Critical patent/CA1258135A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/111Mathematical or scientific formatting; Subscripts; Superscripts

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

ABSTRACT

An apparatus for modifying encoded Arabic characters transmitted as part of a data stream comprises a ripple-through buffer, where the characters stored temporarily therein are analysed to determine if they are Arabic alphabet characters and are modified to convert them to Arabic script characters by means of logic processing, preferably by a microprocessor. The inverse, simpler, operation is also possible, where script characters are modified into basic shape characters.

Description

~L258~5 DATA STREAM SHAPING OF ARABIC CHARACTERS
-FIELD OF THE INVENTION

The present invention relates to data-stream processing of Arabic alphabet characters. It provides apparatus for converting an input data stream containing basic, unconcatenated, Arabic words into an output data stream wherein the Arabic letter shape, for proper concatenation in words, have been substituted for the basic shapes originally transmitted.

BACKGROUND AND PRIOR ART OF THE INVENTION

The problem of converting the basic shape of an Arabic letter into its context sensitive proper shape within an Arabic word is not trivial. A review of the background to this problem is included in the commonly assigned Canadian Patent No. 1,207,905, of F. Metwaly, issued July 15, 1986, and entitled "Method and System for the Generation of Arabic Script".

The prior art patents, mentioned in Canadian Patent 1,207,905, although not germane to the particular problem addressed by the present invention, demonstrate the degree of complexity that was necessary to produce Arabic script from basic letter shapes. At the very least, as Canadian Patent No. 1,044,806 issued December 19, 1978 to S.S. Hyder demonstrates, it was necessary to examine an Arabic character in the context of the character preceding it and that succeeding it before deciding on its script or display shape.

A disadvantage of the prior art solutions is that they are too complex, at least for application within a communication data stream.

Due to the fact that Arabic characters are entered into a computer system, stored, manipulated and transmitted in their basic shape format, it is necessary to process them every time before display or printing to $~

-~258135 produce readable Arabic script, A hos compu'er ~iay J_ com~unicating w th subordinatc devises. It is ~Aesirable to interpose a simple device to modify the data stre2- so that no furthe. p~ocessing is necessary befor~ ~ ~Dlay or prin'ing at the subordinate device.
SU.`1MARY OF THE INVENTION
The present invention provides a simple appar~tus which interrupts a data stream~ processes two sequen'ial characters at-a-time, and outputs a modified data stream delayed by the duration of two sharacters. Neither the transmi.ter no- the receiver of the data stream are interfered with adversely.
The basic Arabic alphabet has twenty-eight letters. For simple Arabic script suitable for business ancA the like environments, seventeen letters may eac'n assume one of t.e two shapes, two letters may each assume one of ~hree shapes, and two letters may each assume one of four shapes. The remaining letters have one shape only. An expanded keyboard alphabet may contain, as distinct characters, some of the basic letters but uniquely shaped.
The apparatus performs a mapping operation, mapping the basic alphabet, or in practice the basic shape code page, into the Arabic font page. Both the basic code page and the front page, of course, contain numerals, in arabic and latin scripts, and many other non-Arabic characters such as the Latin alphabet, all of which do not change shape and are treated as stand-alone characters.
In the preferred embodiment, the basic code anc fon~ pages are matrices wherein each character is iden.ified by a unique ASCII code point at the intersection of a row and a column in the FxF matrices (in He~decimal no_ation).
The p~esent invention provides appara'us for modifying a data stream, which includes data words er.coding basic Arabic characters to generate a deiayed~

C~9-~6-0~i ~2Sl~35 da,a stream wnerein ~asic Arabic charac.ers a.e modi~i2~
in=o Arabic script characters, comp.ising: a data bufrer having 2 serial input and a serial output for receiving and outputting dat~, respectively; means, fcr assign ng in a predetermined manner one of two losic states to one cf two consecutive characters stored temporarily in said data buffer; and means responsive to said two consecutive characters and to said one of two logic states .or modifyi-.g some characters while temporarily stored in s~id data buîfer in a predetermined manner, whereby basic Arabic characters are received and Arabic script characters are output.
~RIEF DESCRI~TIO~ OF THE DR~WI~'GS
The preferred embodiment of the invention wi'l no~ be des_ribed in conjunction with the annexed drawings in which:
~igure 1 shows the basic Arabic ASCII code a~e frcm which all numerals and other characters have be~n omitted for clarity;
Figure 2 shows the Arabic ~SCII font pag~ frcm which all numerals and other characters have be2n omi~ted for clarity; and Figure 3 shows a bloc'~ diagram of an appar2tus according to the present invention.
~ETAILED DESCRIPTION OF THE PREFERRED EM~ODIMENT
Refering now to Figure 1 of the drawings, the bas,c Arabic alphabet is repres~nted in the ~S~II code page tmatrix) by thirty-six code points. Henceforth each character will be referred to by its ASCII code in hexadecimal notation' for example, the right most character (called `'shadda") is referred to as Fl. Some of these basic characters will change shape when incorporated i a word, depending on where in the word they are located. In Figure 2 of the drawings, the permissible -5 va-iations of t'nat particular font are represented as code C~3-86-COl ~258~3~

points i;- the ASCII font pase. The thirty-si:~ basic characte~rs of ~igure 1 re,ain -their code pGSit'OrS, in the ~atrix of Figure 2. Both code pages are indus ry standards, and contain numerals, ia~in alphabet characters an othe- characters, which are no. of particular conce~n to this invention. As will be seen later, all non-~rabic characters, including numerals, are treated as s,and-c one characters and their codes remain unaltered.
The purpose of the ap aratus, shown in Figu-e 3, is to map input data-stream characters, representing the basic Arabic characters of Figure 1, onto the font characters of Figure ~, which Ihen form the output da~a stream~
The apparatus in Figu-e 3 comprises a ripple-through '~uffer or register 10, which has a serial input 11 and a serial output 1~. Irne buffer 10 is capable of holding two c'naracters of eight bits each, eight-'3i~_s being the necessary number of bits to s ecify a code point in the 16x16 ~atrices of Figures 1 and 2. The buffer 10 also has parallel inputs 13 and 13' and parallel outputs 14 and 14' giving parallel access to the bit positions of a current character (CC), having just been fully entere~
from the input 11, and glvlng parallel access to bit positions of a preceding eharacter (PC), having just been fully transferred into the last eight-bit positions or the buffer 10, respectively. The parallel output 14 is input to a st~te analyses logic 15, which decermines whethe~- the current character in Ihe buffer 10 connects ~eoncatenctes) or not. If a character connects, it is assi~ned a s-.ate o-- logic ~, if it does no~ connect, it is assigned a state or logic 1. The state of CC is entered intG a s~ate register 16. A rule application logic 17 co~putes fro~
the character codes in the bufrer 10 and the states in the \o re~ister 16 whether the characters in the buffer ~ shoula _, be ~ltered, and if so into what characters of ~he fon~_ o6-G~l ~L2S813S
, p2~ e one the o~her or both CC and PC are converted t_o This up~ating of the c~.aracters stored momenta~ily in .he buffer 13 is accomplished via ~ata up~ate DUS 18 and _he parall.el inputs 13, 13'.
The s~ate analyser logic 15 and the rule applica~ion logic 17 operate to implem2nt the followina losic/arithmetric equations, which map the cod2 page of Figure 1 onto the font page of Figure 2 following the concentra~ion rules of Arabic script. _t should be understood that these equations are spec-fic to the particular code pages or matrices as shown in FigLres 1 and 2, and, of course, to the rules of script of Arabic.
DEFINITIONS
(Note: In the following logic/arithmetric equations it is not necessary to distinguish between character codes of Figures 1 and 2, because those in Eigure 1 occu?y the same code points in Figure 2.) CC means current character PC means preceding character CS means state of CC
PS means state of PC
State ~ means character connects.
State 1 means character does not connectO
Al' bracketed numbers denote hexadecimal ASCII codes.
STATE D~TERMINATION EQUATIONS
CS = ~
If CC / (C2), then CS = 1 If (C~) > CC ~ (C3), then CS = 1 If (D3) ~ CC > (CE), then CS = 1 If (E1) ~ CC ~ (DA), then CS = 1 If CC = (E8), then CS = 1 If CC = (C9), then CS = 1 If CC = (E~), then CS = 1 C.~ -86-~01 ~25813~
-G-STATE C'~fNGE EQU~T102~5 If PC = (E9), .hen PS = 1 If PC = (C7) t then PS = 1 If ?C = (C2), then PS = 1 If PC = (C3), then PS = 1 C~R~ENT CHAR~CTER EQUATIONS
State of CS = 0 If CC = (E7) and PS = ~, then CC = ~F4) If CC = (D9) and PS = ~, then CC = (EC) 1~ I, CC = (DA) and PS = 0, then CC ~ (F7) If CC = (C7) and PS = ~, then CC = (~3) If CC = (C2) and PS = 0, then CC = (~2) If CC = (C3) and PS = ~, then CC = (.~) Sta~e of CS - 1 If CC = (C4), then CC = (C4) If CC = (C6), then CC = (C6) If CC = (C9), then CC = (C9) If CC = (CF), ther. CC - (CF) If CC = (D~), then CC = (DO) If CC = ~Dl), then CC = (Dl) If CC = (D2), then CC = (D2) If CC = (E8), then CC = (E8) PRECEDING CHARACTERS EQUATION
State of CS = 0 If CC = (C7) and PC = (E4) and PS = ~, then PC = (9E~
If CC = (C2) and PC = (E4) and PS = 0, then PC = (F~) If CC = (C3) and PC = (E4) and PS = 0, then ,,~j PC -- (9-~) If CC = (C7) and PC = (E4) and PS = 1, then PC = ( 9rJ ) If CC = (C2) and PC = (E4! and PS = 1, ther.
PC = ~F9) 3~ If CC = (C3) and PC = (E4) and PS = i, then ~C = (~9) _7_ 1258135 Star~ o~ CS = 1 If PC = (C8), then PC = (A9) If PC = (CA), then PC = (AA) If PC = (CB), then PC = (AB) If PC = (CC), then PC - (AD~
If PC = (CD), then PC = (AE) If PC = (CE), then PC = (AF) Ir PC = (D3), then PC = (BC) If PC = (D4), then PC = (BD) If PC = (D5), then PC = (BE) If PC = (D6), then PC = (EB) If PC = (El), then PC = (BA) If PC = (E2), then PC = (F8) If PC = (E3), then PC = (FC) If PC = (E4), then PC = (FB) If PC = (E5), then PC = (EF) If PC = (E6), then PC = (F2) I. PC = (F4), then PC = (F3) If PC = ~E7), then PC = (F3) If PC = (EC), then PC = (C5) If PC = (D9), then PC = (DF) If PC = (F7), then PC = (ED) If PC = (DA), then PC = (EE) If PC = (C7) and PS = ~, then PC = (A8) If PC = (C2) and PS = 0, then PC = (A2) If PC = (C3) and PS = ~, then PC - (A5) If PC = (E9) and PS = 0, then PC = (F5) If PC = (EA) and PS = ~, then PC = (Fo) If PC = (EA) and PS = 1, then PC = (FD~
3~ OPERATIO~
Any character that is not one of the basic thi~ty-six characters shown in Figure 1 is auto~atically assigned a state of 1 (i.e. that it does not connect and paases through the buffer 10 without alterati~n. Each or the remaining (Arabic) characters as it is fully entered C~9-86-001 ~25~3~i . ~

in the CC posi.ion in ~ne buf-e~ 10 is assigned ei~ne~ a s~ate o_ ~ or 1, depending on whether the character is capable or connection to the character succeeding it, i.e.
.he char~cter to the left of i. (remember that Arabic is 5 writter r-rom right to left). These assign en.s of a state may be accomplished by means of a lock-up table store~ in a ROM, or by a logic circuit implementing the s'ate determination equation above-mentioned.
A connectable character that has rippled through lC into the PC position in the buffer 10 is altered intc its terminal shape if followed in the CC position by any non-connecting character, which, of course, includes word delimeters. For example, the character (C~) ir followed by a numeral will be clocked out of the buffer 10 af=e-~
having been updated via bus 18 into the character (AF).
The device is initialized by clearing the b~_~er 10 and assigning 1 states. As the first CC is clocXed in, i s sta~e is determined. As CC becomes PC its state moves into second position in the slates register 16. If CC has been assigned a state of 1, it passes unaltered into the PC position. If, however, CC has been assigned a sta-e Or 0 and PS (the state of PC) is 0, then CC will be updated while still in the CC position, as is determinead by the current character equations.
The logic/arithmetric equations, given above are most efficiently implemented by means OL a microprocessor. But it is equally possible to impler_A.t the equations by means of lock-up tables stored in read-only memories.
As shown in the preferred embodimen', an input character maps into exactly one output character. It is sometimes desirable to have better script resolution b~
having some script characters occupy two character sLots;
for example, when mapping the input (D~) into the output (B_) plus its "tail" (9F). In such a case, it would b-C`9-86-C~l 1 25~ ri i nec_ssaLv to have tWO characte registe~s for e-c-n c ~C
and PC, that is .o dou'Dle the size of the rip?le-.,hro;~h bu'-fer 10. However, this would necessitate the speed-ng up of the bi, rate of the output data s_ream The reverse mapping operatio~ is also possiD'e and sometimes necessary, wherein scrip_ characters are map?ed `~ck into basic (keyboard) charac,ers. As wil' be appreciated, such reverse opera~ion is much simp'er tc impleme:lt ard may De ca~ried out with .he same or si-pler a~paratus with simple mapping eauations.

~5 CA9-8~-001 _ J

Claims (6)

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows:
1. Apparatus for modifying an input data stream, which includes data words encoding Arabic alphabet characters, to generate an output stream wherein said Arabic alphabet characters are modified in a predetermined manner into differently shaped corresponding alphabet characters, comprising: a ripple-through buffer for storing two characters of said input data stream at time: logic processing means for identifying said two characters; and data update means for modifying said two characters in a predetermined manner.
2. Apparatus as defined in claim 1, wherein said data update means modifies said two characters such that basic Arabic alphabet characters are modified into Arabic alphabet script characters according to Arabic rules of script.
3. Apparatus as defined in claim 1, wherein said data update means modifies said two characters such that Arabic alphabet script characters are modified into basic Arabic alphabet script characters in a predetermined manner.
4. Apparatus for modifying a data stream, which includes data words encoding basic Arabic characters, to generate a delayed data stream wherein basic Arabic characters are modified into Arabic script characters, comprising: a data buffer having a serial input and a serial output for receiving and outputting data, respectively; logic analyses means for assigning in predetermined manner one of two logic states to one of two consecutive characters stored temporarily in said data buffer, and logic processing means responsive to said two consecutive characters and to said one of two logic states for modifying some characters while temporarily stored in said data buffer in a predetermined manner whereby basic Arabic characters are received and Arabic script characters are output.
5. Apparatus as defined in claims 1, 2 or 4, said logic processing means being a microprocessor.
6. Apparatus as defined in claim 4, said logic analysis means being lock-up table stored in a memory, and said logic processing means being a microprocessor.
CA000507434A 1986-04-24 1986-04-24 Data stream shaping of arabic characters Expired CA1258135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA000507434A CA1258135A (en) 1986-04-24 1986-04-24 Data stream shaping of arabic characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA000507434A CA1258135A (en) 1986-04-24 1986-04-24 Data stream shaping of arabic characters

Publications (1)

Publication Number Publication Date
CA1258135A true CA1258135A (en) 1989-08-01

Family

ID=4132955

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000507434A Expired CA1258135A (en) 1986-04-24 1986-04-24 Data stream shaping of arabic characters

Country Status (1)

Country Link
CA (1) CA1258135A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2691819A1 (en) * 1992-05-12 1993-12-03 Apple Computer Computer based typesetting system - uses processor to operate on indexed fonts in memory taking account of context

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2691819A1 (en) * 1992-05-12 1993-12-03 Apple Computer Computer based typesetting system - uses processor to operate on indexed fonts in memory taking account of context

Similar Documents

Publication Publication Date Title
KR970003322B1 (en) Method for interchance code conversion of multi-byte character string characters
US6279828B1 (en) One dimensional bar coding for multibyte character
AUPR824601A0 (en) Methods and system (npw004)
US4429414A (en) Pattern recognition system operating by the multiple similarity method
US4761761A (en) Multitype characters processing method and terminal device with multiple display buffers
CN102193645A (en) Character inputting method and system
US4670842A (en) Method and system for the generation of Arabic script
GB1499734A (en) Binary reference matrixes
CA1258135A (en) Data stream shaping of arabic characters
JP2019016001A (en) Nc program conversion apparatus
CN101663880A (en) Method and device for inputting chinese character
JPS6126192A (en) Method for recognizing hungul character from hungul letter string
CA2220644A1 (en) Method and system for encoding and decoding typographic characters
KR0180739B1 (en) Ideographic teletext transmissions
KR20030043532A (en) Alphabet input device and method in a small apparatus
US7032175B2 (en) Collision-free ideographic character coding method and apparatus for oriental languages
KR100451206B1 (en) Hangul input method for mobile communication terminal
KR100486866B1 (en) Method of inputting a hangeul of mobile terminal and keyboard used for it
JPH09502074A (en) How to transmit teletext pages
CN100359439C (en) Method of switching input method identification and correction
KR100273897B1 (en) Words arrangement method of dictionary
KR920009091B1 (en) Converting method from korean character m byte code to ks2 byte combination type code
JP4303027B2 (en) Apparatus and method for converting lexical data to data
KR940009451B1 (en) Korean character code exchange method
KR940006121B1 (en) Korean character image implementation method in printer

Legal Events

Date Code Title Description
MKEX Expiry