WO2005001675A2 - Generation algorithmique d'une calligraphie arabe, farsie ou urdu - Google Patents
Generation algorithmique d'une calligraphie arabe, farsie ou urdu Download PDFInfo
- Publication number
- WO2005001675A2 WO2005001675A2 PCT/CA2004/000969 CA2004000969W WO2005001675A2 WO 2005001675 A2 WO2005001675 A2 WO 2005001675A2 CA 2004000969 W CA2004000969 W CA 2004000969W WO 2005001675 A2 WO2005001675 A2 WO 2005001675A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- characters
- glyphs
- data
- glyph
- arabic
- Prior art date
Links
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B41—PRINTING; LINING MACHINES; TYPEWRITERS; STAMPS
- B41J—TYPEWRITERS; SELECTIVE PRINTING MECHANISMS, i.e. MECHANISMS PRINTING OTHERWISE THAN FROM A FORME; CORRECTION OF TYPOGRAPHICAL ERRORS
- B41J3/00—Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed
- B41J3/01—Typewriters or selective printing or marking mechanisms characterised by the purpose for which they are constructed for special character, e.g. for Chinese characters or barcodes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
- G06F40/129—Handling non-Latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09G—ARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
- G09G5/00—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
- G09G5/22—Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of characters or indicia using display control signals derived from coded signals representing the characters or indicia, e.g. with a character-code memory
- G09G5/24—Generation of individual character patterns
- G09G5/246—Generation of individual character patterns of ideographic or arabic-like characters
Definitions
- the invention relates to the field of algorithmic generation of Arabic-Farsi- Urdu (AFU) script or calligraphy starting from essentially an alphabet character representation of a text.
- AFU Arabic-Farsi- Urdu
- AFU Arabic-Farsi-Urdu
- the AFU languages use a cursive script with an alphabet of 28-36 characters.
- a character of a Middle East language like Farsi and/or Arabic will have four possible forms, namely initial, medial, final and isolated. The form of a character depends on the context of the characters, which precede and succeed it.
- a method for generating the form of a character based on context sensitivity is described in "Electronic Digital System and Method for Reproducing Languages using the Arabic-Farsi Script", United States Patent No. 3,938,099 issued to Syed S. Hyder, dated February 10, 1976.
- a method for processing a data string of Arabic text characters into Arabic calligraphic script representation data comprising: identifying words in the string; identifying a form of the characters in the words, the form comprising initial, medial, final and isolated; for the characters that are not of the isolated form, identifying a type of the characters as a function of compatibility with a type of a neighboring character; selecting, for each one of the characters in the data string, a glyph from a set of predetermined glyphs corresponding to the characters, the forms and the type; and determining a vertical offset for each glyph to match neighboring glyphs, the script representation data comprising glyph identification data and offset data for each character in the data string.
- an apparatus for processing a data string of Arabic text characters output from an Arabic text source into Arabic calligraphic script representation data comprising: a word identification module receiving the data string and outputting a word; a form identification module receiving the word and outputting a form of the characters in the word, the form being one of initial, medial, final, and isolated; a type identification module receiving the form and the characters and outputting type data of the characters as a function of compatibility with a type of a neighboring character; a glyph identification module receiving the type data and the characters and selecting, for each one of the characters, a glyph from a set of predetermined glyphs corresponding to the characters, the form, and the type; and an offset determining module receiving the glyph and the characters and determining a vertical offset for the glyph to match neighboring glyphs and outputting the calligraphic script representation data.
- an accented character can be represented by the accent character followed by the character to be accented.
- AFU more than one diacritic may be applied to a character, with the additional diacritic being added to the first diacritic.
- An example is the combination of the shadda diacritic and one of the tashkeel diacritics in which the tashkeel is placed over or under the shadda itself and not the letter.
- FIG. 1 illustrates the AFU letters of the alphabet in their four forms (initial, medial, final, and detached) and their associated names
- FIG. 2A to 2C are examples of glyph characters with varying forms and types for each glyph and their placement with respect to a point of origin
- FIG. 3A and 3B are examples of diacritics and their placement with respect to a point of origin
- FIG. 4 is a flow chart illustrating in greater detail the determination of diacritic positioning
- FIG. 5 is a flowchart illustrating the process of attribute matching
- FIG 6 is a flow chart illustrating the process of generating script data from character data according to the preferred embodiment
- FIG 7 is an illustration of what happens on screen as characters forming a single word are typed together
- FIG. 8 is a block diagram of the apparatus according to one embodiment of the present invention.
- FIG. 9 is a block diagram illustrating the apparatus as part of a system including a user device and a printer.
- Figure 1 is a table illustrating the four forms of the Arabic script for each letter: initial, medial, final, and detached.
- the following six letters: As ⁇ Alif, Dal, Thai, Ra', Zay, Waw ⁇ have the same medial and final form. This means that these letters cannot be joined with the letter that comes after them when they come in the middle or beginning of a word.
- the second one is written in detached form.
- Figures 2A to 2C are examples of glyph characters with various forms and types.
- Figure 2A is a Ha' character in its initial form and of type 1. The O indicates the point of origin with respect to which the character is placed, as well as the point at which it will be joined with a succeeding character. Since the character is in initial form, there is no point at which it is joined for a preceding character.
- Figure 2B is the Ra' character in its final form and in type 5. There is no joining point for a succeeding character since it is in final form. A preceding character is joined at point J.
- Figure 2C is the Mim character in its medial form and in type 1. This character has a joining point for a preceding and a succeeding character, identified by P and S respectively.
- Figures 3A and 3B are examples of diacritics, which are also considered to be characters by the system of the preferred embodiment.
- Figure 3A is a sukun.
- this diacritic is considered to be a glyph by itself, without link to any other glyph with which it could be used.
- the sukun is placed above another glyph when used in a word.
- FIG. 5 is a flowchart illustrating the process of attribute matching for the method of the preferred embodiment. It corresponds to the following example. Let ⁇ A ⁇ be the alphabet set of AFU and q be a character so that C', G A,
- Ljj, k as the left attribute of the character q of the form u-j and type ⁇ k .
- a form of the characters in the words is identified, namely as initial, medial, final and isolated or detached.
- a type of the characters is identified as a function of compatibility with a type of a neighboring character.
- a glyph is selected from a set of predetermined glyphs corresponding to the characters, forms and type.
- the glyphs are preferably designed to have a connection point (for characters to be calligraphically joined to neighboring characters) at a predetermined position within the glyph definition. Whether in a set position or not, the vertical offset for each glyph to match neighboring glyphs is determined.
- the script representation data thus comprises glyph identification data and, if necessary, explicit offset data, for each character in the character data string.
- Accents or diacritics preferably involve a specification of an offset parameter for the diacritics with respect to the letter to be accented.
- a glyph is a member of a set of types and ligatures, and a font is a combination of glyphs used for printing.
- a synthesizer software selects the appropriate glyphs to generate the words of the AFU language as originally written by the calligrapher.
- the Algorithm for type definition is: 1. Perform a backward scan starting from a word separator, 2. ⁇ (c ⁇ ) is final form, else repeat 3. ⁇ (c 2 ) is medial or initial, 4.
- combinations of diacritics are generated by the synthesizer and do not require additional font space.
- a diacritic compiler disallows unacceptable combinations of diacritics.
- Glyphs can be modified as required without concern to ligatures that are not used.
- New fonts can be easily developed as only the required glyphs are defined for a font.
- the character placement must allow for the context dependent height and width positioning, so that for instance, if q, ,..., c n are characters that link in sequence when stacked vertically, up or down, with respect to the successor or (predecessor) so that the height of the end of a first character q is the beginning of the next character .
- FIG. 8 is a block diagram illustrating the processing apparatus 21.
- An Arabic text source 20 feeds a data string into the apparatus 21, received by a word identification module 22.
- the word identification module 22 identifies the word in a string and feeds the word to the form identification module 24.
- the form identification module 24 determines if each character in the word is of initial, medial, final, or isolated form and feeds the form associated with each character to the type identification module 26.
- the type data and its associated character is then fed to the glyph identification module 28, where a glyph is selected from a set of predetermined glyphs that corresponds to the type and form for each character.
- the glyph and all its relevant information is input into the offset determining module 30, where the glyph and offset are combined to produce and output the calligraphic script representation data.
- This data can be sent to a database 32 for storage, to a display module for display on a screen (not shown), or directly to a printing device for printing onto paper.
- the type identification module 26 identifies a best match of attributes between glyphs available in a set of glyphs for a form of a character, the best match corresponding to a visualization of a calligrapher.
- the word identification module 22 identifies diacritics as separate characters in the string and associates the diacritics to separate glyphs in the set of predetermined glyphs.
- the offset determining module 30 determines an. offset position of each diacritic to be associated with a glyph representing a letter. The word identification module 22 verifies unacceptable combinations of diacritics disallows them.
- Figure 9 illustrates the different emplacements for the processing apparatus 21.
- the image/text translator module 44 is an equivalent to a page definition language compiler.
- the present invention can be an extension to PostscriptTM as it can output to the printing controller 46 the same type of information as if a page definition language were used. All the information required for the printer to place the glyphs on the page are present, namely glyphs (including form and type) and offsets.
- the processing apparatus 21 can be a plug-in to an internet browser. It can be a web browser comprising a translator that takes standard HTML text and converts it onscreen to calligraphic script representation data. It should be noted that the present invention can be carried out as a method, can be embodied in a system, a computer readable medium or an electrical or electro-magnetic signal.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
- Character Discrimination (AREA)
Abstract
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US48318403P | 2003-06-30 | 2003-06-30 | |
US60/483,184 | 2003-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2005001675A2 true WO2005001675A2 (fr) | 2005-01-06 |
WO2005001675A3 WO2005001675A3 (fr) | 2005-10-20 |
Family
ID=33552038
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2004/000969 WO2005001675A2 (fr) | 2003-06-30 | 2004-06-30 | Generation algorithmique d'une calligraphie arabe, farsie ou urdu |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2005001675A2 (fr) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2449516A (en) * | 2007-05-21 | 2008-11-26 | Sherikat Link Letatweer Elbarm | Transliteration of roman text to Arabic |
EP2047381A2 (fr) * | 2006-07-25 | 2009-04-15 | Monotype Imaging Inc. | Procédé et appareil pour la création d'un sous-ensemble de polices |
US8615709B2 (en) | 2010-04-29 | 2013-12-24 | Monotype Imaging Inc. | Initiating font subsets |
EP2804112A4 (fr) * | 2012-01-09 | 2015-11-25 | Jungha Ryu | Procédé d'édition d'image de caractères dans un appareil d'édition d'image de caractères et support d'enregistrement sur lequel est enregistré un programme pour exécuter le procédé |
US9626337B2 (en) | 2013-01-09 | 2017-04-18 | Monotype Imaging Inc. | Advanced text editor |
US9691169B2 (en) | 2014-05-29 | 2017-06-27 | Monotype Imaging Inc. | Compact font hinting |
US9805288B2 (en) | 2013-10-04 | 2017-10-31 | Monotype Imaging Inc. | Analyzing font similarity for presentation |
US9817615B2 (en) | 2012-12-03 | 2017-11-14 | Monotype Imaging Inc. | Network based font management for imaging devices |
US10115215B2 (en) | 2015-04-17 | 2018-10-30 | Monotype Imaging Inc. | Pairing fonts for presentation |
US10878271B2 (en) | 2019-03-19 | 2020-12-29 | Capital One Services, Llc | Systems and methods for separating ligature characters in digitized document images |
US10909429B2 (en) | 2017-09-27 | 2021-02-02 | Monotype Imaging Inc. | Using attributes for identifying imagery for selection |
US11334750B2 (en) | 2017-09-07 | 2022-05-17 | Monotype Imaging Inc. | Using attributes for predicting imagery performance |
US11537262B1 (en) | 2015-07-21 | 2022-12-27 | Monotype Imaging Inc. | Using attributes for font recommendations |
US11657602B2 (en) | 2017-10-30 | 2023-05-23 | Monotype Imaging Inc. | Font identification from imagery |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9319444B2 (en) | 2009-06-22 | 2016-04-19 | Monotype Imaging Inc. | Font data streaming |
US9569865B2 (en) | 2012-12-21 | 2017-02-14 | Monotype Imaging Inc. | Supporting color fonts |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4680710A (en) * | 1984-11-19 | 1987-07-14 | Kizilbash Akeel H | Computer composition of nastaliq script of the urdu group of languages |
GB2208556A (en) * | 1987-08-12 | 1989-04-05 | Linotype Limited | Printing |
US5416898A (en) * | 1992-05-12 | 1995-05-16 | Apple Computer, Inc. | Apparatus and method for generating textual lines layouts |
US6288726B1 (en) * | 1997-06-27 | 2001-09-11 | Microsoft Corporation | Method for rendering glyphs using a layout services library |
-
2004
- 2004-06-30 WO PCT/CA2004/000969 patent/WO2005001675A2/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4680710A (en) * | 1984-11-19 | 1987-07-14 | Kizilbash Akeel H | Computer composition of nastaliq script of the urdu group of languages |
GB2208556A (en) * | 1987-08-12 | 1989-04-05 | Linotype Limited | Printing |
US5416898A (en) * | 1992-05-12 | 1995-05-16 | Apple Computer, Inc. | Apparatus and method for generating textual lines layouts |
US6288726B1 (en) * | 1997-06-27 | 2001-09-11 | Microsoft Corporation | Method for rendering glyphs using a layout services library |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2047381A2 (fr) * | 2006-07-25 | 2009-04-15 | Monotype Imaging Inc. | Procédé et appareil pour la création d'un sous-ensemble de polices |
EP2047381A4 (fr) * | 2006-07-25 | 2011-01-26 | Monotype Imaging Inc | Procédé et appareil pour la création d'un sous-ensemble de polices |
US8201088B2 (en) | 2006-07-25 | 2012-06-12 | Monotype Imaging Inc. | Method and apparatus for associating with an electronic document a font subset containing select character forms which are different depending on location |
GB2449516A (en) * | 2007-05-21 | 2008-11-26 | Sherikat Link Letatweer Elbarm | Transliteration of roman text to Arabic |
US10572574B2 (en) | 2010-04-29 | 2020-02-25 | Monotype Imaging Inc. | Dynamic font subsetting using a file size threshold for an electronic document |
US8615709B2 (en) | 2010-04-29 | 2013-12-24 | Monotype Imaging Inc. | Initiating font subsets |
US10510168B2 (en) | 2012-01-09 | 2019-12-17 | Jungha Ryu | Method for editing character image in character image editing apparatus and recording medium having program recorded thereon for executing the method |
EP2804112A4 (fr) * | 2012-01-09 | 2015-11-25 | Jungha Ryu | Procédé d'édition d'image de caractères dans un appareil d'édition d'image de caractères et support d'enregistrement sur lequel est enregistré un programme pour exécuter le procédé |
US9817615B2 (en) | 2012-12-03 | 2017-11-14 | Monotype Imaging Inc. | Network based font management for imaging devices |
US9626337B2 (en) | 2013-01-09 | 2017-04-18 | Monotype Imaging Inc. | Advanced text editor |
US9805288B2 (en) | 2013-10-04 | 2017-10-31 | Monotype Imaging Inc. | Analyzing font similarity for presentation |
US9691169B2 (en) | 2014-05-29 | 2017-06-27 | Monotype Imaging Inc. | Compact font hinting |
US10115215B2 (en) | 2015-04-17 | 2018-10-30 | Monotype Imaging Inc. | Pairing fonts for presentation |
US11537262B1 (en) | 2015-07-21 | 2022-12-27 | Monotype Imaging Inc. | Using attributes for font recommendations |
US11334750B2 (en) | 2017-09-07 | 2022-05-17 | Monotype Imaging Inc. | Using attributes for predicting imagery performance |
US10909429B2 (en) | 2017-09-27 | 2021-02-02 | Monotype Imaging Inc. | Using attributes for identifying imagery for selection |
US11657602B2 (en) | 2017-10-30 | 2023-05-23 | Monotype Imaging Inc. | Font identification from imagery |
US10878271B2 (en) | 2019-03-19 | 2020-12-29 | Capital One Services, Llc | Systems and methods for separating ligature characters in digitized document images |
US11710331B2 (en) | 2019-03-19 | 2023-07-25 | Capital One Services, Llc | Systems and methods for separating ligature characters in digitized document images |
Also Published As
Publication number | Publication date |
---|---|
WO2005001675A3 (fr) | 2005-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2005001675A2 (fr) | Generation algorithmique d'une calligraphie arabe, farsie ou urdu | |
US5404436A (en) | Computer method and apparatus for converting compressed characters for display in full size | |
JP4311365B2 (ja) | 文書処理装置およびプログラム | |
JPH0798765A (ja) | 方向検出方法および画像解析装置 | |
CN104133809B (zh) | 一种字形加粗方法 | |
JP2011141749A (ja) | 文書画像生成装置、文書画像生成方法及びコンピュータプログラム | |
EP1093078B1 (fr) | Réduction de la différence de l'apparence entre des unités de texte codées et non-codées | |
JP3242511B2 (ja) | 文字生成装置および文字生成方法 | |
JPH0725068A (ja) | 文字生成方法及びその装置 | |
JP2006276905A (ja) | 翻訳装置、画像処理装置、画像形成装置、翻訳方法及びプログラム | |
KR20220159065A (ko) | 글자체 제공을 위한 서비스 제공 장치 및 방법 | |
JPH05265429A (ja) | 文字フォント作成処理方式 | |
CN117391045B (zh) | 可复制蒙文的可携带文件格式文件输出方法 | |
WO2022145343A1 (fr) | Architecture pour numériser des documents à l'aide d'un apprentissage profond multi-modèle et programme de traitement d'image de document | |
JP4919245B2 (ja) | 行組版装置、行組版プログラム及びそれを記録した記録媒体 | |
Sherif et al. | Parameterized Arabic font development for AlQalam | |
JP2009187168A (ja) | 情報処理装置及び情報処理プログラム | |
Abudena et al. | Toward a novel module for computerizing Quran’s full-script writing | |
TW575840B (en) | Invisible decoding computer character pattern of Chinese character or non-syllabic character | |
Ross | Digital typeface design and font development for twenty-first century bangla language processing | |
JP2002072999A (ja) | フォント作成装置及び方法 | |
JPH03199061A (ja) | フォントを傾斜させる方法 | |
JPH08153092A (ja) | 文書処理装置 | |
KR20010091682A (ko) | 인터넷 학습사이트용 중국어문서에서의 성조 표시방법 | |
JP2691871B2 (ja) | 擬似文字を使用したカンプ作成装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
122 | Ep: pct application non-entry in european phase |