WO2005001675A2 - Generation algorithmique d'une calligraphie arabe, farsie ou urdu - Google Patents

Generation algorithmique d'une calligraphie arabe, farsie ou urdu Download PDF

Info

Publication number
WO2005001675A2
WO2005001675A2 PCT/CA2004/000969 CA2004000969W WO2005001675A2 WO 2005001675 A2 WO2005001675 A2 WO 2005001675A2 CA 2004000969 W CA2004000969 W CA 2004000969W WO 2005001675 A2 WO2005001675 A2 WO 2005001675A2
Authority
WO
WIPO (PCT)
Prior art keywords
characters
glyphs
data
glyph
arabic
Prior art date
Application number
PCT/CA2004/000969
Other languages
English (en)
Other versions
WO2005001675A3 (fr
Inventor
Syed S. Hyder
Original Assignee
Hyder Syed S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hyder Syed S filed Critical Hyder Syed S
Publication of WO2005001675A2 publication Critical patent/WO2005001675A2/fr
Publication of WO2005001675A3 publication Critical patent/WO2005001675A3/fr

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B41PRINTING; LINING MACHINES; TYPEWRITERS; STAMPS
    • B41JTYPEWRITERS; SELECTIVE PRINTING MECHANISMS, i.e. MECHANISMS PRINTING OTHERWISE THAN FROM A FORME; CORRECTION OF TYPOGRAPHICAL ERRORS
    • B41J3/00Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed
    • B41J3/01Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed for special character, e.g. for Chinese characters or barcodes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/22Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of characters or indicia using display control signals derived from coded signals representing the characters or indicia, e.g. with a character-code memory
    • G09G5/24Generation of individual character patterns
    • G09G5/246Generation of individual character patterns of ideographic or arabic-like characters

Definitions

  • the invention relates to the field of algorithmic generation of Arabic-Farsi- Urdu (AFU) script or calligraphy starting from essentially an alphabet character representation of a text.
  • AFU Arabic-Farsi- Urdu
  • AFU Arabic-Farsi-Urdu
  • the AFU languages use a cursive script with an alphabet of 28-36 characters.
  • a character of a Middle East language like Farsi and/or Arabic will have four possible forms, namely initial, medial, final and isolated. The form of a character depends on the context of the characters, which precede and succeed it.
  • a method for generating the form of a character based on context sensitivity is described in "Electronic Digital System and Method for Reproducing Languages using the Arabic-Farsi Script", United States Patent No. 3,938,099 issued to Syed S. Hyder, dated February 10, 1976.
  • a method for processing a data string of Arabic text characters into Arabic calligraphic script representation data comprising: identifying words in the string; identifying a form of the characters in the words, the form comprising initial, medial, final and isolated; for the characters that are not of the isolated form, identifying a type of the characters as a function of compatibility with a type of a neighboring character; selecting, for each one of the characters in the data string, a glyph from a set of predetermined glyphs corresponding to the characters, the forms and the type; and determining a vertical offset for each glyph to match neighboring glyphs, the script representation data comprising glyph identification data and offset data for each character in the data string.
  • an apparatus for processing a data string of Arabic text characters output from an Arabic text source into Arabic calligraphic script representation data comprising: a word identification module receiving the data string and outputting a word; a form identification module receiving the word and outputting a form of the characters in the word, the form being one of initial, medial, final, and isolated; a type identification module receiving the form and the characters and outputting type data of the characters as a function of compatibility with a type of a neighboring character; a glyph identification module receiving the type data and the characters and selecting, for each one of the characters, a glyph from a set of predetermined glyphs corresponding to the characters, the form, and the type; and an offset determining module receiving the glyph and the characters and determining a vertical offset for the glyph to match neighboring glyphs and outputting the calligraphic script representation data.
  • an accented character can be represented by the accent character followed by the character to be accented.
  • AFU more than one diacritic may be applied to a character, with the additional diacritic being added to the first diacritic.
  • An example is the combination of the shadda diacritic and one of the tashkeel diacritics in which the tashkeel is placed over or under the shadda itself and not the letter.
  • FIG. 1 illustrates the AFU letters of the alphabet in their four forms (initial, medial, final, and detached) and their associated names
  • FIG. 2A to 2C are examples of glyph characters with varying forms and types for each glyph and their placement with respect to a point of origin
  • FIG. 3A and 3B are examples of diacritics and their placement with respect to a point of origin
  • FIG. 4 is a flow chart illustrating in greater detail the determination of diacritic positioning
  • FIG. 5 is a flowchart illustrating the process of attribute matching
  • FIG 6 is a flow chart illustrating the process of generating script data from character data according to the preferred embodiment
  • FIG 7 is an illustration of what happens on screen as characters forming a single word are typed together
  • FIG. 8 is a block diagram of the apparatus according to one embodiment of the present invention.
  • FIG. 9 is a block diagram illustrating the apparatus as part of a system including a user device and a printer.
  • Figure 1 is a table illustrating the four forms of the Arabic script for each letter: initial, medial, final, and detached.
  • the following six letters: As ⁇ Alif, Dal, Thai, Ra', Zay, Waw ⁇ have the same medial and final form. This means that these letters cannot be joined with the letter that comes after them when they come in the middle or beginning of a word.
  • the second one is written in detached form.
  • Figures 2A to 2C are examples of glyph characters with various forms and types.
  • Figure 2A is a Ha' character in its initial form and of type 1. The O indicates the point of origin with respect to which the character is placed, as well as the point at which it will be joined with a succeeding character. Since the character is in initial form, there is no point at which it is joined for a preceding character.
  • Figure 2B is the Ra' character in its final form and in type 5. There is no joining point for a succeeding character since it is in final form. A preceding character is joined at point J.
  • Figure 2C is the Mim character in its medial form and in type 1. This character has a joining point for a preceding and a succeeding character, identified by P and S respectively.
  • Figures 3A and 3B are examples of diacritics, which are also considered to be characters by the system of the preferred embodiment.
  • Figure 3A is a sukun.
  • this diacritic is considered to be a glyph by itself, without link to any other glyph with which it could be used.
  • the sukun is placed above another glyph when used in a word.
  • FIG. 5 is a flowchart illustrating the process of attribute matching for the method of the preferred embodiment. It corresponds to the following example. Let ⁇ A ⁇ be the alphabet set of AFU and q be a character so that C', G A,
  • Ljj, k as the left attribute of the character q of the form u-j and type ⁇ k .
  • a form of the characters in the words is identified, namely as initial, medial, final and isolated or detached.
  • a type of the characters is identified as a function of compatibility with a type of a neighboring character.
  • a glyph is selected from a set of predetermined glyphs corresponding to the characters, forms and type.
  • the glyphs are preferably designed to have a connection point (for characters to be calligraphically joined to neighboring characters) at a predetermined position within the glyph definition. Whether in a set position or not, the vertical offset for each glyph to match neighboring glyphs is determined.
  • the script representation data thus comprises glyph identification data and, if necessary, explicit offset data, for each character in the character data string.
  • Accents or diacritics preferably involve a specification of an offset parameter for the diacritics with respect to the letter to be accented.
  • a glyph is a member of a set of types and ligatures, and a font is a combination of glyphs used for printing.
  • a synthesizer software selects the appropriate glyphs to generate the words of the AFU language as originally written by the calligrapher.
  • the Algorithm for type definition is: 1. Perform a backward scan starting from a word separator, 2. ⁇ (c ⁇ ) is final form, else repeat 3. ⁇ (c 2 ) is medial or initial, 4.
  • combinations of diacritics are generated by the synthesizer and do not require additional font space.
  • a diacritic compiler disallows unacceptable combinations of diacritics.
  • Glyphs can be modified as required without concern to ligatures that are not used.
  • New fonts can be easily developed as only the required glyphs are defined for a font.
  • the character placement must allow for the context dependent height and width positioning, so that for instance, if q, ,..., c n are characters that link in sequence when stacked vertically, up or down, with respect to the successor or (predecessor) so that the height of the end of a first character q is the beginning of the next character .
  • FIG. 8 is a block diagram illustrating the processing apparatus 21.
  • An Arabic text source 20 feeds a data string into the apparatus 21, received by a word identification module 22.
  • the word identification module 22 identifies the word in a string and feeds the word to the form identification module 24.
  • the form identification module 24 determines if each character in the word is of initial, medial, final, or isolated form and feeds the form associated with each character to the type identification module 26.
  • the type data and its associated character is then fed to the glyph identification module 28, where a glyph is selected from a set of predetermined glyphs that corresponds to the type and form for each character.
  • the glyph and all its relevant information is input into the offset determining module 30, where the glyph and offset are combined to produce and output the calligraphic script representation data.
  • This data can be sent to a database 32 for storage, to a display module for display on a screen (not shown), or directly to a printing device for printing onto paper.
  • the type identification module 26 identifies a best match of attributes between glyphs available in a set of glyphs for a form of a character, the best match corresponding to a visualization of a calligrapher.
  • the word identification module 22 identifies diacritics as separate characters in the string and associates the diacritics to separate glyphs in the set of predetermined glyphs.
  • the offset determining module 30 determines an. offset position of each diacritic to be associated with a glyph representing a letter. The word identification module 22 verifies unacceptable combinations of diacritics disallows them.
  • Figure 9 illustrates the different emplacements for the processing apparatus 21.
  • the image/text translator module 44 is an equivalent to a page definition language compiler.
  • the present invention can be an extension to PostscriptTM as it can output to the printing controller 46 the same type of information as if a page definition language were used. All the information required for the printer to place the glyphs on the page are present, namely glyphs (including form and type) and offsets.
  • the processing apparatus 21 can be a plug-in to an internet browser. It can be a web browser comprising a translator that takes standard HTML text and converts it onscreen to calligraphic script representation data. It should be noted that the present invention can be carried out as a method, can be embodied in a system, a computer readable medium or an electrical or electro-magnetic signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Character Discrimination (AREA)

Abstract

L'invention concerne un procédé de traitement d'un train de données de caractères de texte arabes en données de représentation calligraphique arabe. Le procédé consiste à : identifier des mots dans le train de données ; identifier la forme des caractères formant les mots, à savoir la forme initiale, médiane, finale et isolée ; pour les caractères ne représentant pas la forme isolée, à identifier un type des caractères en fonction de leur compatibilité avec un type de caractères voisins ; sélectionner, pour chacun des caractères du train de données, un glyphe à partir d'un jeu de glyphes prédéterminés correspondant aux caractères, aux formes et au type ; et déterminer un décalage vertical pour chaque glyphe de façon qu'il corresponde aux glyphes voisins, les données de représentation d'écriture comprenant des données d'identification de glyphes et des données de décalage pour chaque caractère du train de données.
PCT/CA2004/000969 2003-06-30 2004-06-30 Generation algorithmique d'une calligraphie arabe, farsie ou urdu WO2005001675A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US48318403P 2003-06-30 2003-06-30
US60/483,184 2003-06-30

Publications (2)

Publication Number Publication Date
WO2005001675A2 true WO2005001675A2 (fr) 2005-01-06
WO2005001675A3 WO2005001675A3 (fr) 2005-10-20

Family

ID=33552038

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2004/000969 WO2005001675A2 (fr) 2003-06-30 2004-06-30 Generation algorithmique d'une calligraphie arabe, farsie ou urdu

Country Status (1)

Country Link
WO (1) WO2005001675A2 (fr)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2449516A (en) * 2007-05-21 2008-11-26 Sherikat Link Letatweer Elbarm Transliteration of roman text to Arabic
EP2047381A2 (fr) * 2006-07-25 2009-04-15 Monotype Imaging Inc. Procédé et appareil pour la création d'un sous-ensemble de polices
US8615709B2 (en) 2010-04-29 2013-12-24 Monotype Imaging Inc. Initiating font subsets
EP2804112A4 (fr) * 2012-01-09 2015-11-25 Jungha Ryu Procédé d'édition d'image de caractères dans un appareil d'édition d'image de caractères et support d'enregistrement sur lequel est enregistré un programme pour exécuter le procédé
US9626337B2 (en) 2013-01-09 2017-04-18 Monotype Imaging Inc. Advanced text editor
US9691169B2 (en) 2014-05-29 2017-06-27 Monotype Imaging Inc. Compact font hinting
US9805288B2 (en) 2013-10-04 2017-10-31 Monotype Imaging Inc. Analyzing font similarity for presentation
US9817615B2 (en) 2012-12-03 2017-11-14 Monotype Imaging Inc. Network based font management for imaging devices
US10115215B2 (en) 2015-04-17 2018-10-30 Monotype Imaging Inc. Pairing fonts for presentation
US10878271B2 (en) 2019-03-19 2020-12-29 Capital One Services, Llc Systems and methods for separating ligature characters in digitized document images
US10909429B2 (en) 2017-09-27 2021-02-02 Monotype Imaging Inc. Using attributes for identifying imagery for selection
US11334750B2 (en) 2017-09-07 2022-05-17 Monotype Imaging Inc. Using attributes for predicting imagery performance
US11537262B1 (en) 2015-07-21 2022-12-27 Monotype Imaging Inc. Using attributes for font recommendations
US11657602B2 (en) 2017-10-30 2023-05-23 Monotype Imaging Inc. Font identification from imagery

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9319444B2 (en) 2009-06-22 2016-04-19 Monotype Imaging Inc. Font data streaming
US9569865B2 (en) 2012-12-21 2017-02-14 Monotype Imaging Inc. Supporting color fonts

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4680710A (en) * 1984-11-19 1987-07-14 Kizilbash Akeel H Computer composition of nastaliq script of the urdu group of languages
GB2208556A (en) * 1987-08-12 1989-04-05 Linotype Limited Printing
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
US6288726B1 (en) * 1997-06-27 2001-09-11 Microsoft Corporation Method for rendering glyphs using a layout services library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4680710A (en) * 1984-11-19 1987-07-14 Kizilbash Akeel H Computer composition of nastaliq script of the urdu group of languages
GB2208556A (en) * 1987-08-12 1989-04-05 Linotype Limited Printing
US5416898A (en) * 1992-05-12 1995-05-16 Apple Computer, Inc. Apparatus and method for generating textual lines layouts
US6288726B1 (en) * 1997-06-27 2001-09-11 Microsoft Corporation Method for rendering glyphs using a layout services library

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2047381A2 (fr) * 2006-07-25 2009-04-15 Monotype Imaging Inc. Procédé et appareil pour la création d'un sous-ensemble de polices
EP2047381A4 (fr) * 2006-07-25 2011-01-26 Monotype Imaging Inc Procédé et appareil pour la création d'un sous-ensemble de polices
US8201088B2 (en) 2006-07-25 2012-06-12 Monotype Imaging Inc. Method and apparatus for associating with an electronic document a font subset containing select character forms which are different depending on location
GB2449516A (en) * 2007-05-21 2008-11-26 Sherikat Link Letatweer Elbarm Transliteration of roman text to Arabic
US10572574B2 (en) 2010-04-29 2020-02-25 Monotype Imaging Inc. Dynamic font subsetting using a file size threshold for an electronic document
US8615709B2 (en) 2010-04-29 2013-12-24 Monotype Imaging Inc. Initiating font subsets
US10510168B2 (en) 2012-01-09 2019-12-17 Jungha Ryu Method for editing character image in character image editing apparatus and recording medium having program recorded thereon for executing the method
EP2804112A4 (fr) * 2012-01-09 2015-11-25 Jungha Ryu Procédé d'édition d'image de caractères dans un appareil d'édition d'image de caractères et support d'enregistrement sur lequel est enregistré un programme pour exécuter le procédé
US9817615B2 (en) 2012-12-03 2017-11-14 Monotype Imaging Inc. Network based font management for imaging devices
US9626337B2 (en) 2013-01-09 2017-04-18 Monotype Imaging Inc. Advanced text editor
US9805288B2 (en) 2013-10-04 2017-10-31 Monotype Imaging Inc. Analyzing font similarity for presentation
US9691169B2 (en) 2014-05-29 2017-06-27 Monotype Imaging Inc. Compact font hinting
US10115215B2 (en) 2015-04-17 2018-10-30 Monotype Imaging Inc. Pairing fonts for presentation
US11537262B1 (en) 2015-07-21 2022-12-27 Monotype Imaging Inc. Using attributes for font recommendations
US11334750B2 (en) 2017-09-07 2022-05-17 Monotype Imaging Inc. Using attributes for predicting imagery performance
US10909429B2 (en) 2017-09-27 2021-02-02 Monotype Imaging Inc. Using attributes for identifying imagery for selection
US11657602B2 (en) 2017-10-30 2023-05-23 Monotype Imaging Inc. Font identification from imagery
US10878271B2 (en) 2019-03-19 2020-12-29 Capital One Services, Llc Systems and methods for separating ligature characters in digitized document images
US11710331B2 (en) 2019-03-19 2023-07-25 Capital One Services, Llc Systems and methods for separating ligature characters in digitized document images

Also Published As

Publication number Publication date
WO2005001675A3 (fr) 2005-10-20

Similar Documents

Publication Publication Date Title
WO2005001675A2 (fr) Generation algorithmique d'une calligraphie arabe, farsie ou urdu
US5404436A (en) Computer method and apparatus for converting compressed characters for display in full size
JP4311365B2 (ja) 文書処理装置およびプログラム
JPH0798765A (ja) 方向検出方法および画像解析装置
CN104133809B (zh) 一种字形加粗方法
JP2011141749A (ja) 文書画像生成装置、文書画像生成方法及びコンピュータプログラム
EP1093078B1 (fr) Réduction de la différence de l'apparence entre des unités de texte codées et non-codées
JP3242511B2 (ja) 文字生成装置および文字生成方法
JPH0725068A (ja) 文字生成方法及びその装置
JP2006276905A (ja) 翻訳装置、画像処理装置、画像形成装置、翻訳方法及びプログラム
KR20220159065A (ko) 글자체 제공을 위한 서비스 제공 장치 및 방법
JPH05265429A (ja) 文字フォント作成処理方式
CN117391045B (zh) 可复制蒙文的可携带文件格式文件输出方法
WO2022145343A1 (fr) Architecture pour numériser des documents à l'aide d'un apprentissage profond multi-modèle et programme de traitement d'image de document
JP4919245B2 (ja) 行組版装置、行組版プログラム及びそれを記録した記録媒体
Sherif et al. Parameterized Arabic font development for AlQalam
JP2009187168A (ja) 情報処理装置及び情報処理プログラム
Abudena et al. Toward a novel module for computerizing Quran’s full-script writing
TW575840B (en) Invisible decoding computer character pattern of Chinese character or non-syllabic character
Ross Digital typeface design and font development for twenty-first century bangla language processing
JP2002072999A (ja) フォント作成装置及び方法
JPH03199061A (ja) フォントを傾斜させる方法
JPH08153092A (ja) 文書処理装置
KR20010091682A (ko) 인터넷 학습사이트용 중국어문서에서의 성조 표시방법
JP2691871B2 (ja) 擬似文字を使用したカンプ作成装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DPEN Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101)
122 Ep: pct application non-entry in european phase