WO2000060560A1 - Text processing and display methods and systems - Google Patents

Text processing and display methods and systems Download PDF

Info

Publication number
WO2000060560A1
WO2000060560A1 PCT/AU2000/000286 AU0000286W WO0060560A1 WO 2000060560 A1 WO2000060560 A1 WO 2000060560A1 AU 0000286 W AU0000286 W AU 0000286W WO 0060560 A1 WO0060560 A1 WO 0060560A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
display
word
writing
standard
Prior art date
Application number
PCT/AU2000/000286
Other languages
French (fr)
Inventor
Mark Kevin O'connor
Original Assignee
Connor Mark Kevin O
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Connor Mark Kevin O filed Critical Connor Mark Kevin O
Priority to AU35442/00A priority Critical patent/AU780472B2/en
Priority to GB0124973A priority patent/GB2364160A/en
Publication of WO2000060560A1 publication Critical patent/WO2000060560A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B17/00Teaching reading
    • G09B17/003Teaching reading electrically operated apparatus or devices
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages

Definitions

  • This invention relates to computer-based text processors, text-processing methods, text display means and/or to means for assisting users improve their knowledge of or facility with a written language.
  • the invention may be applied, for example, to the teaching of writing and/or pronunciation to people learning new languages, to improve the writing skills and comprehension of native speakers and to familiarising native speakers and others with the spelling and pronunciation of technical or unusual terms. It is applicable to languages with alphabetic writing as well as to languages with ideographic writing, such as Chinese, and to languages with phonetic writing systems, such as Japanese Katakana and Harikana.
  • the invention may also be used to teach sign languages, where they can be reproduced in written form.
  • Pictoqraph a picture or stylized picture standing for a word or word-element (morpheme), or possibly a phrase.
  • Pictographs are a sub-category of logographs.
  • loqoqraph a single symbol representing an entire morpheme, word or phrase.
  • Synonym logogram. (Note: Most Chinese and Japanese characters contain phonetic elements. Hence they are better described as logographs than as ideographs or ideograms).
  • ideograph a symbol standing for a concept and not necessarily for a particular word or word-element in a given natural language, (e.g. most mathematical symbols, or the ampersand symbol "&").
  • morpheme or word-element the smallest meaningful element of a word or word-shape. For instance, the word “unbutton” contains two morphemes: “un” and “button”.
  • word-shape the visual form that a word or a word-element or a short closely-connected phrase takes in a given writing-system. Note that in logographic systems it is fairly common for a single character or word-shape to stand for a short phrase like "all right" or "house fly”.
  • writinq-svstem a coding system or a set of rules for representing in visual form the words and/or text of a language. Such a set of rules need not be logically consistent nor consistently applied nor free from exceptions.
  • a major component of the writing-system is the spelling-system, which is a set of rules for using the letters.
  • the writing system for representing the standard text of a language is thus referred to as the conventional or standard writing-system.
  • phonetic information which either goes beyond or corrects or clarifies that which is supplied or suggested in the conventional writing-system or spelling-system of a given " language.
  • syllabary a system of phonetic symbols each of which represents an entire syllable. This is contrasted with an alphabet where the symbols (letters) normally represent phonemes.
  • phoneme one of the set of speech sounds in any given language that serve to distinguish one word from another.
  • /p/ and Ibl are separate phonemes in English because they distinguish such words as "pet” and “bet”
  • the light and dark /I/ sounds in "little” are allophones not separate phonemes since they may be transposed without changing meaning.
  • allophone a major variation within a phoneme. See under phoneme above.
  • Display-option any of a set of various writing-systems in which a text, and usually some information about its pronunciation, may be displayed, especially on an electric screen.
  • Display-options may be differentiated from each other by the writing-systems they employ to identify the words, as well .as by the amount or type of additional phonetic information they make visible and the means they use to display or distinguish it. They are not differentiated by merely aesthetic choices, as for instance by font selection, or letter size, or page layout or paragraph style, though they may have specific default settings for these. Display-options are so called because the decision to use one or more of them will normally be at the discretion of the user.
  • converter algorithm an algorithm or program, associated with a specific display-option, that does the following: 1. carries out that display-option's characteristic selections from among the additional phonetic information and/or the graphic information that is available for a given text; 2. usually incorporates and assembles this information into a new version of the text, called processed text which contains all necessary information for a display-system to represent that text in a specific display-option; 3. upon request, engages the display-system and provides it with the relevant processed text or with the necessary graphic information.
  • color-coding of word shapes the use of color to indicate those parts or regions of a word-shape which are to be processed in a particular way, or to be considered in a particular way, or which correspond to particular parts of another word-shape.
  • line-coding of word shapes as for color-coding, but with the use of different types of line or curve (e.g. thinner/thicker, heavier/lighter, dotted/unbroken, pulsing/non-pulsing) instead of differences of color.
  • line or curve e.g. thinner/thicker, heavier/lighter, dotted/unbroken, pulsing/non-pulsing
  • pulsing line or pulsing image a line or curve, or an image or a part of an image which is rendered more conspicuous by its doing any of the following:
  • cartouche a line (which may also be a "pulsing line”) that encloses or semi-encloses or brackets or otherwise indicates a word or a phrase or an image, or a section of a word or a section of an image. It thus indicates a portion of the text or of a word-shape which is to be processed differently or which is intended to receive different attention from other portions.
  • a "cartouche” is a shape which surrounds and marks a royal name. Here it is given a wider definition).
  • homonym a word-shape or spelling which has distinct and essentially unrelated meanings. These may or may not be pronounced differently. For instance: “wind”, “wound”, “read”, “lead”, “bat”, and “cleave”.
  • relevant homonym a homonym whose ambiguity needs to be resolved before the correct equivalent word-shape for it can be determined in a given alternative writing-system or display-option, or in a range of writing-systems or display-options. (Homonyms can often retain their multiple meanings under an alternative writing-system, in which case they are not "relevant homonyms" for conversion to that writing-system).
  • reconversion homonyms pairs or groups of words or word-shapes which are distinguished from each other in standard text, but which become identical in one or more other writing-systems or display- options. For instance the English words “beer” and “bier” may become indistinguishable in a fully phonetic writing-system. Such words, before being converted from standard text, require to be identified and marked by a reconversion homonym resolver (being an appropriate algorithm or program) in such a way that they can be reliably re-converted to their correct word-shapes.
  • a reconversion homonym resolver being an appropriate algorithm or program
  • relevant homonym filter or homonym filter an algorithm or computer program that can check the words of a standard text in a given language against a check-list of relevant homonyms either for that language or for a given set of display-options in that language.
  • homonym parser an algorithm or computer program that uses the context and possibly the grammar to determine which meaning a homonym has in a given text context.
  • non-homophonous homonym or phonemic homonym a spelling or word-shape which changes its phonemic pronunciation according to its meaning. For instance: “wind”, “wound”, “invalid”, “live”, “lives”.
  • Non-homophonous/phonemic homonyms commonly need to be resolved when transferring to a display-option or writing-system that provides additional phonetic information.
  • homophonous homonym a homonym in which all meanings are pronounced and spelt alike. For instance: “tender,” “bat,” “cleave”.
  • non-homophonous homonym parser or phonetic homonym parser an algorithm or computer program that uses the context and possibly the grammar to determine which meaning, and hence which pronunciation, a non-homophonous/phonemic homonym has in a given text context.
  • alternating display displaying alternately, or in repetitive sequence, the word-shapes either of the same word or of the same portion of text, in two or more writing-systems or display-options. Each version in turn is displayed, but not both at once. The alternation may appear as a fairly rapid flickering and be machine-generated. Otherwise, the alternate display of the word or text portion may be requested by the user, and either replaces or is superimposed upon the initial display-option. This may continues until either the user revokes the command or a fixed period has elapsed. In practice, this may mean the user highlighting a given word or phrase then clicking with a pointing device to see either how the word is pronounced or how it looks in a different writing system.
  • inter-leaving setting closely together (so that they may be compared) two or more word-shapes or versions of the same piece of text in different writing-systems or in different display-options. Interleaving may be done on a line-by-line basis, with lines in different writing-systems presented one above the other. The term also covers cases where the two versions appear side-by-side, or otherwise close together.
  • non-standard letters any other phonemic symbols, including those of other languages or of the International Phonetic Alphabet (IPA). Note: In a case like that of those Slavonic languages, which are sometimes written in Cyrillic and sometimes in Roman alphabets, "non-standard letters" can mean the non-dominant alphabet of a given language.
  • set of phonetic letter-variants a set of variants deliberately made in the shapes or appearances of the standard letters of a given language, so that each of the resulting variant shapes of a given standard letter can be allocated to one of the different phonemes or allophones which that letter commonly represents in the conventional spelling of that language. (This can be a device to retain conventional spelling, yet achieve phonetic accuracy).
  • dialect a variety of a language whose pronunciation differs frequently at allophonic level and sometimes at phonemic level from other varieties of the same language. The term is not pejorative. Dialects are normally regional in origin.
  • pronouncing-dictionary data-base A list of words or word-shapes matched with full phonemic information, as commonly found in the pronouncing guides of dictionaries. Additional allophonic information may also be provided.
  • schwa vowel the neutral vowel or "murmur vowel” or “obscure vowel", heard in the last syllables of the English words: "fatal”, “jewel”, “civil”, “gallop”, “consul”.
  • standard text text in the form that is normal or conventional for a given language; e.g. alphabetic letters with conventional spelling and punctuation.
  • 'British' 'American' spellings and numerous intermediate 'house spelling styles' of various publishers
  • the standard writing-system of a language by definition employs standard text and, for English, standard letters.
  • alphabetic standard text the common or conventional way of representing a given language in letters (even if it is more commonly written in logographs).
  • * enriched text a version of resolved text to which additional phonetic or semantic information has been added in order to facilitate its conversion into a variety of display-options.
  • the additional information may be in the form of a number of types of markers added to the text.
  • different converter-programs or converter-algorithms may make different selections from among these markers as part of the process of creating processed texts for different display-options.
  • processed text a version of a text containing all necessary phonetic and graphic information, as required by a given display-system, for displaying that text in a specific display-option.
  • graphic information the specifically graphic (i.e. visual-display) information about a text that the text- displaying elements of a given system require in order to show that particular text in a given display- option.
  • the required graphic information may involve detailed descriptions of logographic or other non-standard symbols and their relative positions; or it may merely require the use of a relatively simple code to identify the symbols and groupings required.
  • a major function of any writing-system is to connect its visual code to the spoken word or the underlying ideas in a systematic manner. While the dominant writing-systems of all major modern languages are at least partly phonetic, few provide sufficient phonemic detail to systematically connect speaking and writing. Similarly, while over 80% of modern Chinese logographs contain phonetic elements, most of the phonetic clues are now too cryptic, out of date or inappropriate for most Chinese speakers. The lack of phonetic ciues in English has produced great annoyance and difficulty for very many years. The lexicographer H. W. Fowler in his influential Modern English Usage summed up more than a century of debate with the remark that a suitable phonetically-accurate spelling system for English "would be of incalculable value".
  • the present invention comprises a text processor for facilitating user familiarisation with the word-shapes of the standard text of a language by enabling a user to select between a plurality of non-standard texts that differ from one another according to the degree to which each incorporates clues to the identity of spoken words that correspond to the word-shapes of the non-standard text.
  • the standard text is alphabetical
  • the non-standard texts may differ from one another according to the degree to which their word-shapes incorporate phonetic clues.
  • the standard text is logographic
  • the non-standard texts may differ from one another according to the degree to which they incorporate pictographic clues.
  • the text processor will normally be computer-based so that the standard and non-standard texts can be displayed on a computer screen or monitor by suitable inputs from the user, operating a computer keyboard, mouse or other input device.
  • the invention employs computer-based text display means to facilitate familiarisation with the standard written form of a natural language by allowing a user to selectively display text in any one of a plurality of display modes that vary incrementally in the degree to which they depart from standard text according to the degree to which phonetic and/or pictographic clues are incorporated therein.
  • the display modes thus form a hierarchy according to the level of phonetic/pictographic explicitness, there being preferably at least three steps in the hierarchy:
  • the text display means may contain a separate and complete version of the text for each display mode so that the user may switch between them or display them side-by-side.
  • the user may select the number of lower levels required and cause the computer system to generate the desired lower levels from all, or only selected portions, of a passage of standard text.
  • Finding out how one word or phrase is pronounced will provide the learner with clues to certain more or less regular phonetic patterns in the writing-system and, thus, to the pronunciation of other words and phrases.
  • these display-options or levels are preferably such that readers can move between them with relatively little investment in learning-time.
  • the invention also comprises methods whereby the user can be greatly helped in learning to read text in alternative writing-systems by the computer-graphic process (designated as text-morphing) of gradual progressive mutation on a display-screen of the visual image (word-shape) of a given word.
  • the computer-graphic process designated as text-morphing
  • word-shape the visual image of a given word.
  • the invention includes methods and systems for converting an alphabetic standard text (which means standard letters and conventional spellings) firstly into a resolved text and secondly into a phonetically or pictorially enriched text.
  • an alphabetic standard text which means standard letters and conventional spellings
  • text may be supplied to users already in resolved or enriched form.
  • the enriched text is then, thirdly, converted into the appropriate processed texts for a series of display-options.
  • These display-options offer various amounts of additional phonetic information, and may display or distinguish it with various degrees of conspicuousness. This can be done for the user's choice of dialect or style of pronunciation.
  • users or readers who receive a text can rapidly display it (and can also print it) with the degree of phonetic annotation they prefer. They can carry out this process on the entire text or on the portions they select or on single words. They can also switch the screen display rapidly between two or more writing-systems. Or they may inter-leave the same text in two or more writing systems (or display-options) in any way that makes it easier for them to transfer their reading skills from one writing- system to another. Thus literate adults may become multi-literate in several writing-systems.
  • Methods are also disclosed for maintaining standard letters and conventional spelling while offering full phonemic or allophonic information.
  • the invention is best presented in a format that makes practical and psychological sense to users, and fits with existing habits.
  • One likely format is similar to that of some existing options in modern word- processing packages.
  • users are at present invited to make their own selection of fonts and font sizes, so they might be invited to select among a range of display-options (ranged under one or more icons) which use differing writing-systems or display varying amounts of phonetic information.
  • FIGURE 1 is a flow-chart illustrating the process steps of the first example, where the user is relied upon to resolve phonemic homonyms and enriched text is generated.
  • FIGURE 2 is a flow chart illustrating the process steps of the first example, where processing steps of the second example, where a plurality of different, user-selectable, phonetically-modulated display modes are generated.
  • FIGURE 3 is a flow chart illustrating the processing steps of the second example, where phonemic homonyms are resolved automatically and where a plurality of phonetically-modulated display modes is generated.
  • FIGURE 4 is a code chart, providing an example of a set of phonetic letter-variants suitable for use in the resolution of homonyms in English.
  • FIGURE 5 is a tabulation illustrating how the same piece of enriched text can be represented in a gradated set of seven display-options, each successive option (a) - (g) providing an increased amount of phonetic information using the phonetic letter-variants of Figure 4.
  • FIGURE 6 is a table illustrating one example of the text-morphing of the Chinese pronouns "WO", "Nl” and "TA” ['me', 'you' and 'him'/'her'/'it'] in successive steps (a) - (g) from their alphabetic word- shapes into their logographic shapes.
  • FIGURE 7 is a table illustrating a second method of text-morphing of the Chinese pronouns of Figure 6 in which the text-morphing (which may move in either direction) extends to the corresponding English word-shapes. In this case, nine stages [(a)-(i)] are employed
  • FIGURE 8 is a table illustrating a third method of text-morphing of one of the Chinese pronouns of Figure 6 employing cartouches.
  • FIGURE 9 illustrates a second method of text-morphing of one of the Chinese pronouns of Figure 6 employing similar elements.
  • FIGURE 10 illustrates a three-step text-morphing process for the Chinese character for 'ant'.
  • FIGURE 11 illustrates a three-step text-morphing process for the Chinese character for 'mother'.
  • FIGURE 12 illustrates a four-step text-morphing process for the Chinese character for 'horse'.
  • FIGURE 13 is a tabulation of the same English text reproduced in each of three alternative writing/spelling systems.
  • FIGURE 14 is a tabulation showing at (a) a passage of English in standard text, the same passage in the Shaw Alphabet at (b) and the letters of the Shaw Alphabet at (c).
  • FIGURE 15 is a tabulation of three display-options of part of Lincoln's speech beginning, "Fourscore and seven years ago ". Each option combines additional phonetic information with legibility and conventional spelling and uses the "allographs" of Figure 4.
  • FIGS 1 to 3 are flow-charts which collectively describe algorithms or computing programs which (1) convert standard text into resolved text; (2) convert resolved text into enriched text; and (3) prepare a specific version of the the enriched text (now called processed text) which contains all the relevant phonetic information for a given display-option.
  • FIGURE 1 shows a set of steps for the production of phonetically enriched text from standard text, using user input to resolve any phonemic homonyms. The first two steps of FIGURE 1 deal with the resolution of homonyms. Step 1 is the marking of reconversion homonyms by a reconversion homonym resolver. This device, though not commercially available, may be readily constructed by those skilled in the art.
  • Standard text is considered as a string of words, each of which is classified, according to the information found in a dictionary-style database, as creating or not creating ambiguities upon reconversion.
  • Those words like lee/lea or its/it's whose reconversion from a more phonetic writing system creates problems of ambiguity are identified, and the ambiguity is resolved by adding electronic markers (which need not be visible upon the screen or in print-outs) which will tell the reconversion program which of two or more phonetically identical conventional spellings to use.
  • the reconversion homonym resolver examines the words of a piece of standard text. It detects reconversion homonyms like "lea'Tlee”, “its' ⁇ 'it's”, “their'Ythey're”, and “beer'Vbier” which may become identical in an alternative display-option. It then adds markers to ensure they can be reconverted automatically and unambiguously to standard text.
  • the reconversion homonym resolver may be a free-standing algorithm or program which uses a checklist of reconversion homonyms that are relevant to a given language and a given set of display-options. Or the resolver may be collated into the pronouncing-dictionary data-base in such a way that those words in the data-base that are liable to become reconversion homonyms carry markers which can be detected and directly incorporated into enriched text by the algorithm or program which (in the second stage) creates enriched text.
  • the first step may be entirely omitted if none of the display-options use non-standard letters or alternative spellings.
  • the second step of the process of Figure 1 is the resolution of non-homophonous homonyms (also referred to as "phonemic homonyms", that is, words spelled alike but pronounced differently).
  • a dictionary-style database identifies those words whose pronunciation is ambiguous. Resolution by the user is requested, and accepted if provided, otherwise the "default resolution" is either to show both phonetic values of the word or to use the determination of a parser algorithm or program that has been held in reserve.
  • An alternative flow-path would pass such homonyms directly to the parser algorithm or program (as in FIGURE 2 or FIGURE 3) and then invite human over-ride.
  • the text is passed to a phonemic homonym filter.
  • This checklist is specific to the language in question. It might also be made more narrowly specific to a given writing-system or display-option or dialect or style of pronunciation. However, it is simpler to use a fuller check-list that detects all potential homonyms for a given language. Resolving all of these then creates a resolved text from which a single enriched text, and thence all the processed texts for the various alphabetic display-options, can reliably be derived.
  • the relevant homonym filter is normally concerned only with phonemic homonyms (also known as non-homophonous homonyms) that is, with those ambiguous word-shapes (like English "wind” or “wound”) whose pronunciation changes according to their meaning.
  • phonemic homonyms also known as non-homophonous homonyms
  • homophonous homonyms also become relevant.
  • a different and larger check-list and a different homonym parser, defined earlier as a relevant homonym filter are required.
  • non-homophonous homonyms are so rare that it may be satisfactory to simply flag their presence and offer as alternatives the possible phonemic renderings of a given word-shape, while perhaps inviting the user to decide between them.
  • non-homophonous homonyms are sufficiently common that a parser program or algorithm will normally be required to offer a probable resolution of them.
  • the parser program resembles those in existing spelling checkers and grammar- checkers in that it uses clues based on context and/or grammar. Human over-ride of the parser's determination may also be invited, and if provided, incorporated in the resulting resolved text.
  • a parser for non-homophonous homophones may not be required when the text is input by voice-recognition. Or the author or supplier of electronic text may sort out such homonyms by human agency and supply text in resolved (or even in enriched) form.
  • the resolved text next passes to the last two steps of FIGURE 1.
  • These resembles the computing program (now available on the Internet) that was developed by the American typographer, Edmund Rondthaler and by computing engineer Ed Lias, for electronic conversionof standard text into the American Literacy Council's (ALC's) phonemically- accurate reformed spelling called "American Fonetic".
  • the program or algorithm for creating enriched text uses a dictionary- style data-base, similar to the pronouncing-guides of major dictionaries. This provides common or most-common modern pronunciations of words.
  • the phonetic markers in enriched text are converted into visual phonetic markers.
  • the fact that the letter “c” in a given word is "soft” (as in "cell") will be represented by a marker in the enriched text.
  • a given display-option may then present that letter with a visual marker, such as a cedilla.
  • Each converter corresponds to a given display-option; and, as it converts enriched text to the processed text required by that display-option, it makes its own characteristic selection from among the electronic markers in enriched text. Those markers which are retained or activated in a given processed text act as signals to the display-system to produce various visual effects.
  • Such converters are not commercially available, but can readily be constructed by those skilled in the art, once their usefulness in this larger system is recognised.
  • the first stage of the converter need produce only a single new version of phonetically enriched text. (The resulting text-files might use the extension code PET for "Phonetically Enriched Text"). Texts in this form may be exchanged with other users.
  • At least one display-option displays none of the phonetic markers added to the enriched text, and hence presents exactly the appearance of standard text. Thus even text that has been both enriched and then amended by the user can still be displayed or printed as standard text.
  • at least one display- option shows additional phonetic information to at least full phonemic, and possibly to allophonic level.
  • FIGURE 3 An alternative more direct path from resolved to processed text is shown in FIGURE 3. Instead of first creating an enriched text containing all the information that any of the display-options may require, each converter takes directly from a pronouncing- dictionary data-base only the additional phonetic information required for its own display.
  • An ingenious alternative path is also possible from enriched text to a range of alphabetic display- options.
  • This path uses the properties of font-sets to dispense with the intermediate processed text.
  • texts are displayed in a set of fonts.
  • the fonts are so designed that some or all of the phonetic letter-variants (a defined term) which appear as different characters in the fonts used to provide full or maximum additional phonetic information become indistinguishable in other fonts. That is, they appear as the standard letter, without phonetic variations.
  • the various converters can produce the range of phonetically modulated display-options from a single stored version of enriched text. The converters may still be required to set default options for the various display- options and to pre-format the text accordingly.
  • each display-option has fixed rules for its selection among phonetic markers, and also a default setting for its visual display of them, there can also be scope for user-selected adjustments to the conspicuousness of the visual phonetic markers. It is desirable that these should range from visually obvious to relatively inconspicuous and finally invisible.
  • FIGURE 4 is a code chart that shows how conventional spelling may be retained, yet be combined with a set of phonetic letter-variants that offer a full phonemic description of the common pronunciations of English words. It offers sufficient variants of each letter to allocate one variant for each of the phonemes which that letter represents in standard text. Up to four "anomalous" variants of each vowel letter are used to cover erratic uses, like the Y-sound of the letter "o" in "women”.
  • the user can be offered a choice of regional pronouncing-dictionary data-bases. In the case of English, the most obvious two would be based on the major North American and on the major British dictionaries; but other possibilities include Indian, Australian, Scottish, and Irish. Within each of these, different styles of pronunciation (ceremonial, formal, colloquial) might also be offered.
  • the claims refer to selecting among several such pronouncing-dictionary data-bases for different dialects. Yet it may not be necessary to maintain them as fully separate. Markers can be added within a single data-base to indicate which among the alternative pronunciations of a word is most favored in a given dialect area or style of pronunciation. For instance, if the user has chosen "Southern British” and the data-base is then queried as to the pronunciation of the words "missile” or "hostile", markers will enable the converter program or algorithm to select the variant in which the second syllable is pronounced long (to rhyme with "style") rather than short. Note that only a minority of words change phonemic pronunciation when the dialect changes. (Otherwise we would be dealing with a change of language, not of dialect).
  • the list of relevant homonyms for reconversion to standard text may need to alter slightly according to the range of dialects or styles of pronunciation which are offered, or which the user selects.
  • Alternative spelling-systems involve changes to word-shapes, yet the text may remain recognisable.
  • Motivated readers of phonetic spellings of English or French sometimes claim they can achieve satisfactory reading speeds within a day or so, though others take longer.
  • These display-options are, initially at least, most likely to be used by language learners and by beginners in literacy. For them, means of making a smooth transition to standard text will be important, and may include the interleaving of a standard-text display-option.
  • Other adult users may at first be restricted to reformed- spelling enthusiasts. For them it may be essential that texts they create or edit in this form can be readily returned to standard text.
  • the American Literacy Council provides a program which translates, in both directions, between "American Fonetic Spelling" and standard text. It incorporates a simple homonym-parser program to decide between alternative pronunciations of non-homophonous homonyms like wind, wound, bow, row, invalid, live, lives. This is based on noting the surrounding words, and is said to be about 80% successful. Spell checkers may of course be provided for these alternative spelling-systems.
  • FIGURES 4, 5, 15 show how the code table of phonetic letter-variants displayed in FIGURE 4 can be used to produce a set of set of seven display-options that show an incremental range of additional phonetic information.
  • the display- options range from one in which all additional phonetic detail is suppressed (producing the appearance of standard text) to one in which full phonemic and even allophonic detail are provided. In display- option (a) no word-shapes are changed from standard text.
  • Display-option (d) marks with a dieresis those vowels which in everyday speech are liable to regress to, or towards, the schwa vowel.
  • display-option (f) provides full phonemic information; and the final version, (g), provides allophonic information of the sort commonly required by foreign learners, but not by native speakers. It shows, for instance, that the "p” in “examples” is not aspirated, but the "p” in “pie” is.
  • a display-option that uses a set of phonetic letter-variants based purely on color, it should include provision for the user to render the colors less conspicuous, much as when the color setting on a TV is adjusted towards monochrome. Similarly, the colors need not be used when printing out the text.
  • FIGURES 4,5 A writing-system that is better suited to adult use is set out in FIGURES 4,5. It relies on shape rather than color to distinguish phonetic letter-variants. (See FIGURE 4). It may use, as in FIGURE 15, a (much smaller) range of colors which are therefore much easier to contrast and distinguish. (Note that owing to the restriction that prohibits color in Figures, FIGURE 15 is submitted as a black and white photocopy. However the letters that appear to be in various shades of fainter ink are in fact intended by the inventor to be in various colors.) These colors are used not to constitute individual phonetic letter- variants but as common visual markers to indicate broad phonetic categories. For instance, all silent letters can be marked in a given color, all "cardinal-value" letters in another color, and all "wild" letters in another..
  • FIGURE 15 shows the use of color to combine additional phonetic information with legibility and conventional spelling.
  • the phonetic letter-variants also known as "allographs" are essentially as in Figure 4, but color is also used as a common visual marker for broader categories.
  • the three versions or levels correspond roughly to the second, fourth and seventh versions in Figure 5.
  • colors are used as common visual markers which apportion letters to three categories: cardinal values, mute, and other values. Since "other values” include the schwa vowel which is found in most unstressed unsyllables, the red color tends to dominate, warning the foreign learner how few letters are to be taken at face value when pronouncing English words.
  • the second and third versions use additional colors and phonetic letter-variants to make more subtle phonemic, and in the third version allophonic, distinctions.
  • a proposed alternative display-option or writing-system or teaching method that has advantages for text in English and possibly other languages, is one that does use colors as phoneme-markers, but only or mainly for the (19 or so) vowel-phonemes, since few of the consonant letters cause major confusion for learners. This is a more practicable number of colors to distinguish.
  • the number of colors, and the amount of color on the page, may be further reduced by not using color for vowels used with their most common value, or by substituting other common visual markers (such as line-quality or italics or diacritics) for these and other general categories, such as letters representing the schwa/regressive vowel.
  • mnemonic pictures may be provided.
  • the pictures may also be associated with pairs of rhyme-words, such as: green scene, black sack, gray clay, red bed, feared beard, gold fold, brown gown, etc.
  • Pairs of phonetic letter-variants need not necessarily serve to retain conventional spelling. They may also be used to eliminate two-letter combinations, like English TH and CH, or to permit a fully logical phonemically-accurate spelling based on one-symbol-per-phoneme.
  • a writing-system is commonly described as phonemically accurate or as "highly phonetic” even if the pronunciations suggested by the spelling are not the only nor necessarily the most common ones in use. This is the case with alphabetic standard text in most languages.
  • an otherwise accurate spelling system may choose to represent fuller or more formal pronunciations than are commonly heard in rapid speech. It may use and, for instance, rather than 'nd or 'n.
  • An advantage of proceeding thus is that written text can be divided into a number of separate modules (the written words), each of which tends to have a single invariant word-shape. This invariant image facilitates both swift reading, sometimes called sight-reading, and automatic conversion into alternative writing- systems.
  • native-speakers are already familiar with the phonemes of their language, they normally need only basic phonemic information in order to pronounce a new word. However, the foreign learner also needs allophonic (sub-phonemic) information; and this level of information is normally provided by language-teaching text-books.
  • a display-option that shows allophonic variants may also be useful for those wishing to write either in, or about, a specific dialect.
  • markers may be common visual markers in that they consistently represent certain broad categories of phonetic change that tend to create allophones. For many languages these categories will include: voicing, de-voicing, aspirating, de-aspirating, lip-rounding, de-rounding, palatalising, etc.
  • the addition of allophonic detail normally provides no obstacle to automatic re-conversion of the text to standard alphabetic form.
  • a possible display-option is one which shows the phonemes (or even the allophones) used in a specific spoken performance of a text.
  • Such phonetic writing-systems for Chinese dialects can be extremely useful, especially when combined with the methods described below for disambiguating Chinese alphabetic standard text and thus allowing it to compete with hanzi logographs as a fully adequate representation of Chinese texts.
  • the phonetic representations of particular Chinese dialects can be produced by essentially the same systems as were previously described for selecting a given dialect or style of pronunciation in any other alphabetic language.
  • a modification is that the appropriate dictionary-style data-base for a given dialect would preferably match not the Pinyin alphabetic word-shapes (which are highly ambiguous) but the traditional hanzi logographs to particular dialectal pronunciations and their (phonetic) alphabetic word- shapes.
  • FIGURE 13 shows the same English text in (a) Standard Text, (b) "Cut Spelling", and (c) "American Fonetic Spelling". These versions were automatically generated from standard text by converter programs. Conversion into Cut Spelling has been made with Alan Mole's BTRSPL program. Note that the two reformed spellings are notably more economical in space and therefore probably (for those once habituated to their word-shapes) in visual saliency. They are also clearly far easier to learn and to spell with confidence, since each of them follows a few clear spelling rules. The introduction of such "improved" systems has been almost impossible so long as it was a matter of persuading the whole of society to switch to a new writing system. However the concept of offering the user a free choice among alternative systems has the potential not merely to aid the learner but to create a swing of public preference towards systems found to be briefer or easier.
  • FIGURE 14 shows a further possibility: the introduction via individual choice, of alternative writing systems that, while not very similar to standard text, are briefer or more visually salient.
  • (a) it shows a passage of English in standard text, the same passage in the Shaw Alphabet at (b) and the letters of the Shaw Alphabet at (c).
  • Two versions of a text are set out for comparison. The first is in standard form. The second, printed in the same number of lines, combines non-standard letters and non-conventional (phonetic) spelling.
  • the script is Kingsley Reade's Shaw Alphabet, which provides one letter per English phoneme. Note that the outlines of its letters are much simpler than those of standard letters.
  • the Reade Alphabet which is public property under the terms of Bernard Shaw's will, predates yet can be readily adapted to electronic display-options. It is set out below, matched to a list of English phonemes (viz. the sounds represented by the letters in bold print in the short English words).
  • display-options may include other optional writing-systems.
  • Some of these display-options may be non-alphabetic or non-phonetic but more visually salient. Or they may be hybrid, part-alphabetic systems.
  • the possibility of users choosing to learn (initially for their own private purposes) these additional writing-systems would depend largely on the provision of teaching methods or self-teaching methods such as alternating display, inter-leaving and various types of text-morphing.
  • a form of inter-leaving that might be useful in such a hybrid writing-systems is one in which the new symbol/logograph, for a period selected by the user, appears beside rather than in place of its alphabetic equivalent. Enthusiasts might then wish to progress by stages to the mastery of a display- option in which only relatively uncommon words are represented alphabetically.
  • the use of colors, or of other common visual markers, so that all the symbols for a given part of speech share a common color or other common visual quality, might also make it easier for readers to progress in such hybrid systems. Prose style might also be helped.
  • a graphics program that converted all English words into different stylised picture-symbols or logographs might be cumbersome; but it is certainly possible for the average personal computer to run a program that provides symbols for a few hundred of the most common words in a given language, and that reliably returns them to standard text when the text is to be printed or sent to another user. Note that Japanese newspapers and publications have long used a mixture of logographs and phonetic scripts.
  • Text-morphing offers an important bridge from alphabetic display-options to traditional logographs.
  • Computer graphics now make it possible to begin with either a phonetically predictable word-shape or a pictographic image that clearly suggests a given word, and then to mutate that word-shape or image by degrees into the traditional visual representation of that word in any writing-system.
  • Different display- options may be inter-leaved to help the learner/user associate the word-shapes of the same text in different writing-systems.
  • individual word-shapes or small groups of them may be flickered, slow- flickered or text-morphed to the same end.
  • word-shapes can also help greatly in creating either visual associations or a visual memory trail (involving intermediate shapes) between two seemingly unrelated images.
  • process of text- morphing need not be one-way. It may reverse.
  • a word-shape may also be made to alternate or (more rapidly) to flicker back and forth between two states.
  • Triple and multiple inter-leaving can be a powerful teaching method for language-learning. For instance students of Japanese may wish to read texts while seeing simultaneously the traditional writing (which is largely kanji logographs) plus a phonetic rendering plus a translation. They can thus see form, pronunciation and meaning at once.
  • FIGURE 6 shows the text-morphing of the Chinese pronouns "WO", “Nl” and “TA” from their alphabetic word-shapes into their logographic shapes. Each column reads from the top down. The third row gives the Pinyin alphabet word-shapes plus standard intonation marks. If a smooth transition is desired, more intermediate shapes would be required, but the principle is clearly exemplified.
  • FIGURE 7 shows an alternative method of text-morphing of the Chinese pronouns of Figure 6 in which the text-morphing (which may move in either direction) extends to the corresponding English word- shapes (first row). Note how line-coding (thinner lines) draws attention to certain elements in the changing shapes.
  • EXAMPLE 13 International Pictographs and Traditional Logographs
  • One proposed display-option involves the creation, for a given language, of a new set of pictographs each corresponding to one of the traditional logographs, and clearly suggesting its meaning. Such a set of pictographs is far easier to learn and to remember than the traditional logographs. Text-morphing can then create paths from it to the traditional logographs. See FIGURES 10, 11 , 12.
  • FIGURE 8 shows a third method of text-morphing one of the Chinese pronouns of Figure 6 employing cartouches.
  • the cartouche draws attention to an element that remains similar during the transition. It thus helps to create a memory trail between the two unrelated word-shapes. Note also how the shaped cartouche draws attention to the rotation of the element inside it.
  • FIGURE 9 illustrates a further method of text-morphing one of the Chinese pronouns shown in the third column of Figure 6. Note how the substitution of a capital T for the lower-case "t" in the word "TA” creates a point of similarity with the corresponding traditional logograph, which is then exploited in the text-morphing process.
  • FIGURES 10-12 offer three examples of how a self-explanatory pictograph can be simultaneously designed to suggest, or to readily text-morph into, a corresponding non-pictorial logograph.
  • the "horse” example may re-enact the historical process by which the Chinese logograph evolved from an early pictograph). Note how both the pictograph and the largely arbitrary logograph for "mother” can be simultaneously visible. This is a form of "slow flickering" or alternation as the term is defined above.
  • an invented pictograph can always be so designed as to contain one element that is pronounced of a given logograph, and the points of resemblance can be color-coded or line-coded to draw attention to them.
  • the learner can know which elements to concentrate upon.
  • FIGURES 10-12 For instance, a picture-symbol that is a stylised line-drawing of a bird might need to text-morph into an arbitrary conventional logograph for "bird" that looks quite unbird like.
  • FIGURES 10-12 show that there can also be intermediate images in which both pictograph and traditional logograph are clearly visible. Text may be presented in such a display-option.
  • a further learning method which would not be practicable in most other languages, is made possible by the phonetic system of Chinese and Japanese. Especially in Chinese, most words or morphemes are monosyllabic. Further, there is a remarkably limited number of syllables in the Mandarin dialect: less than 500 if one ignores the four tones or intonations, and less than 2,000 if one takes note of them. This contrasts with tens of thousands of syllables in most European languages. It also makes possible the use of a syllabary. Since most syllables correspond to several different logographs with different meanings, it is easy to associate each symbol in the syllabary with some object that suggests a pictographic representation.
  • Pinyin Chinese schools often use the Pinyin alphabet, written small above the logographs, as the initial learning medium in schools. Similarly the Japanese use the phonetic hiragana syllabary as their introductory script for children. Pinyin was intended by the Chinese government to become China's official writing-system; but text in Pinyin is often ambiguous because of the numerous homophonous homonyms in modern Mandarin. Pinyin has no way to distinguish these. Yet they require to be represented by different traditional logographs. Hence they form awkward reconversion homonyms when Pinyin text is to be converted to logographs.
  • a symbol that helps to resolve the ambiguity can be added to most such ambiguous word- shapes.
  • An alternative method is also proposed that requires only alphabetic letters.
  • a particular synonym or "guide-word” is paired with an ambiguous word or word-element.
  • the guide-word is shown adjacent to it, but in a different color or shading or otherwise distinguished.
  • the guide-word thus serves as a silent guide to meaning but not to pronunciation.
  • the conspicuousness of such annotations may, as in previously described methods, be varied through the user's own adjustments to the default settings of one or more display-options.
  • a modification to this method is to use only the opening letter or letters of the "guide word" where these are enough to resolve ambiguity.
  • a combination of the above two methods is also possible whereby a reduced list of just a few dozen "guide-words" is provided. They are then used not as exact synonyms but as clues to the desired area of meaning.
  • the guide-words themselves may be abbreviated or replaced by symbols or diacritic marks, or by pictographs or logographs, whether new or traditional. What is essential is that they remain reliably associated with the word-shape of each alphabetic homonym that needs to be resolved. They thus create a fixed logograph which permits of rapid reading.
  • the use of such logographs can create a resolved text that permits both of automatic re-conversion and of direct conversion into a range of display-options, including traditional logographs. This method can be extended to Japanese and to other languages with similar homonym problems.
  • the text in either the traditional logographs or in the new (partly alphabetic) logographs produced by the above method is stored as a series of numeric or alpha-numeric codes, each piece of code corresponding to a single word or word-element of the text.
  • Converter programs or algorithms can then produce whatever word-shapes are required for a particular display-option: whether traditional logographs, standard alphabetic word-shapes, pictographs, or semantically enriched alphabetic logographs as just described.
  • the later claims cover two different kinds of largely pictographic writing-systems.
  • One of them is intended to to be international, at some cost in idiomatic quality and range of vocabulary.
  • the other aims to represent most of the words of a given natural language through a combination of pictographs, ideographs and letters.
  • Modern color display-screens and color printers allow pictographic symbols to be far more visually vivid and recognisable than traditional logographs. Freed of the restriction to a single color of ink, and freed of the need to save the scribe's labor by reducing the clarity and detail of the symbol, one can afford to make symbols so vivid that the reader can recognise their general meaning at a glance.
  • a further advantage is that the resulting writing-system is potentially international, not linked to a given language.
  • a primarily pictographic writing-system is proposed that provides one symbol per semantic area for each of several hundred semantic areas.
  • Such semantic areas areas of meaning
  • the upper practical limit to the number of pictographs is set by the problems of keeping them sufficiently compact yet sufficiently distinct and memorable.
  • Simple pictographs can be combined, following logical rules, into more complex pictographs or logographs.
  • non-pictographic symbols can be used, much like prefixes or suffixes, for grammatical distinctions or for such semantic concepts as: the opposite, or the absence of a quality, or for "large example of or "small example of or “forbidden to".
  • Such an "algebraic multiplication” can turn a set of pictographs into two or three times as many compound logographs.
  • EXAMPLE 18 Sign Language Such a writing-system is also at least as precise as most kinds of sign-language (except when they adopt the slow process of spelling out the letters of a word in a given natural language) and might well be combined with sign-language systems. A variant of the same writing-system may also be designed to correspond to the vocabulary of a simplified natural language, such as Basic English. It may also be inter-leaved with an alphabetic display-option for that language.
  • Each logograph typically consists of one easily interpretable pictograph and one or two very simple phonetic elements (letters). Line-coding and/or color-coding may also be used on the letter(s) so that they indicate not merely a letter used in the conventional spelling of that word, but the pronunciation the letter carries. Color may also be used as a structural principle in creating and distinguishing the pictographs. The more sources of visual discrimination that are offered, the more compact the writing- system can be without losing clarity and visual salience.
  • the pictograph for devil need not be used for such soubriquets as: “the Tempter”, “the prince of darkness”, “the father of lies”, “the evil angel”, and “the Anti-Christ”.
  • the only relevant synonyms requiring its pictograph might be “seraph” and “cherub”, plus “archangel” (represented presumably by the pictograph for "angel” plus a modifying sign used in algebraic fashion for "big”).
  • Such a writing-system forms a possible initial teaching method for small children learning to read in their native language. Later they can transfer their reading skills to standard text. Especially when combined with voice-production by a computer, this writing-system might also allow them to make some rapid progress in a second language, sufficient for instance to write to (electronic?) penfriends of their own age in it.
  • This writing-system is not, when used purely by itself, suitable for adult learners of foreign languages, since they require more detailed phonetic information, and this is better provided by an alphabetic writing-system or a syllabary. However this writing-system, when it is used as a display- option and inter-leaved with an alphabetic display-option, can be ideal both for young native-speakers and for foreign learners.
  • the great advantage for the language-learner is that since the semi-pictographic writing-system is semi-intuitive, they will usually be able to grasp the meaning and syntactic structure, while an interleaved alphabetic display-option offers the pronunciation. This can mean throwing away that traditional language-learner's crutch, the "vocabulary" or word-list of supposed synonyms in one's own language, and grappling directly with the foreign language.
  • the same writing-system may also prove a superior method of presenting text even for adult literate native-speakers. They may find such semi-pictographic text more scannable and more visually salient than text in purely alphabetic writing-systems.
  • a further merit of such a writing-system is that when it is used on signage, or even in printed text, its meaning is partly intelligible to those who do not speak the language used.
  • a sign bearing the Modern Greek word EPIKINTHINOS (even if it were written, as here, in the Roman alphabet) would fail to warn non Greek-speakers they were in danger.
  • EXAMPLE 22 International Resolved Text.
  • An invented pictographic writing-system offers the opportunity, though (as just stated) not the necessity, to resolve many homonyms and to display many distinctions of meaning that are not made in a given spoken language, nor normally in standard alphabetic text. For instance, the pictographs might clearly distinguish bat (a sporting implement) from bat (a flying mammal).
  • To convert a text into such a display-option after resolving all such homonyms and providing different pictographs or logographs for them, would be in effect to translate the text (though with some loss of idiomatic richness and of precise vocabulary) into a universal semantic code from which it could be automatically re-translated into any other language. Word-order and some idiomatic details might need to be adjusted in the interests of international communication.
  • Automatic conversion is also possible from such an electronic coding system into a simplified version of a given natural language such as Basic English or Globish or Inglingo, provided an appropriate database is provided and also an appropriate parser/aranger program or algorithm for those cases where the conversion to simpler vocabulary requires changes to word-order.
  • Basic English simplifies vocabulary via a table of synonyms.
  • Globish also regularises spelling.
  • Inglingo also shortens some words and simplifies some grammatical expressions).
  • Such translation (as opposed to conversion) requires not only appropriate data-bases of linguistic information but also an appropriate (unspecified) context-sensitive and grammar-sensitive translation program. Such a program needs to be capable of resolving any remaining ambiguities caused by the ambiguous interplay of words in natural languages.
  • the most-likely short-term application is as a free-standing computer program or as a major additional feature or plug-in for word-processing packages or for text-display programs (including electronic books, whether for adults, children or language-learners, and including the pronouncing guides of computerised dictionaries).
  • the various writing-systems proposed above and in the Claims are designed to be suitable for use initially within a series of display-options, but might later find independent use.
  • Other applications of this invention may include: (1) interactive computer packages for teaching children to read, or for teaching foreign languages, including those written in non-alphabetical systems; and (2) as one element in compound word-processing and translating software packages which can understand spoken or written texts in any of several languages and can also translate such text into other languages in their repertoire and can then produce the translated version either as text in that language's conventional writing-system, or as text in a more phonetic or more visually-salient writing- system or display-option, or as spoken language.
  • the invention involves bringing together many existing computational processes to produce a kind of macro system for international text processing.
  • the substantially-new concepts and techniques disclosed include: providing text in multiple user-selected display-options, phonemic modulation of text through a number of display-options, non-homophonous homonym parsers and filters, reconversion homonym resolvers, distribution of enriched and processed text, user-selected pronouncing- and dialect-dictionaries, the use of sets of phonetic letter-variants, automatic conversion between writing systems and display-options, cartouches, shaped-cartouches, color-coding and line- coding, alternativeating display (flickering), inter-leaving, and text-morphing.
  • the linking of all these powerful 5 processes and their associated data-bases into a single international customised reading and writing system is now practical.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Using a personal computer based system, a standard text is processed graphically so as to display an enriched text that includes visual clues as to the pronunciation and/or meaning of words in the orginal text. In one embodiment the visual clues indicate the phonetic structure of words in the text. In another embodiment pictograms are added to the text to indicate the meanings of the words in the text. A method of 'morphing' between the graphical display of standard and enriched texts is also disclosed.

Description

TITLE: TEXT PROCESSING AND DISPLAY METHODS AND SYSTEMS
TECHNICAL FIELD
This invention relates to computer-based text processors, text-processing methods, text display means and/or to means for assisting users improve their knowledge of or facility with a written language.
The invention may be applied, for example, to the teaching of writing and/or pronunciation to people learning new languages, to improve the writing skills and comprehension of native speakers and to familiarising native speakers and others with the spelling and pronunciation of technical or unusual terms. It is applicable to languages with alphabetic writing as well as to languages with ideographic writing, such as Chinese, and to languages with phonetic writing systems, such as Japanese Katakana and Harikana. The invention may also be used to teach sign languages, where they can be reproduced in written form.
Terminology
The following defined terms are used in this specification. Some of these definitions, especially those marked with an asterisk, introduce and partly explain new concepts or processes whose use is further explained in the Outline of the Invention or the Description of Examples below.
pictoqraph: a picture or stylized picture standing for a word or word-element (morpheme), or possibly a phrase. Pictographs are a sub-category of logographs.
loqoqraph: a single symbol representing an entire morpheme, word or phrase. Synonym: logogram. (Note: Most Chinese and Japanese characters contain phonetic elements. Hence they are better described as logographs than as ideographs or ideograms).
ideograph: a symbol standing for a concept and not necessarily for a particular word or word-element in a given natural language, (e.g. most mathematical symbols, or the ampersand symbol "&").
morpheme or word-element: the smallest meaningful element of a word or word-shape. For instance, the word "unbutton" contains two morphemes: "un" and "button".
word-shape: the visual form that a word or a word-element or a short closely-connected phrase takes in a given writing-system. Note that in logographic systems it is fairly common for a single character or word-shape to stand for a short phrase like "all right" or "house fly".
*common visual marker: any visual sign which can be added to a number of word-shapes or letters or syllabary signs, and which has the same meaning in each case. For instance, a common color or a common diacritical mark might be used to mark all silent letters in a text.
writinq-svstem: a coding system or a set of rules for representing in visual form the words and/or text of a language. Such a set of rules need not be logically consistent nor consistently applied nor free from exceptions. In the case of alphabetically-written languages, a major component of the writing-system is the spelling-system, which is a set of rules for using the letters. The writing system for representing the standard text of a language is thus referred to as the conventional or standard writing-system. There may be non-standard writing systems of a given language that use non-standard texts, symbols or letters; eg, Pitman's Shorthand for English.
additional phonetic information: phonetic information which either goes beyond or corrects or clarifies that which is supplied or suggested in the conventional writing-system or spelling-system of a given "language.
syllabary: a system of phonetic symbols each of which represents an entire syllable. This is contrasted with an alphabet where the symbols (letters) normally represent phonemes.
phoneme: one of the set of speech sounds in any given language that serve to distinguish one word from another. Thus /p/ and Ibl are separate phonemes in English because they distinguish such words as "pet" and "bet", whereas the light and dark /I/ sounds in "little" are allophones not separate phonemes since they may be transposed without changing meaning. (Definition as per The Collins Dictionary of the English Language).
allophone: a major variation within a phoneme. See under phoneme above.
phonetic:
1 : of or relating to phonetics.
2: denoting any perceptible distinction between one speech sound and another, irrespective of whether the sounds are phonemes or allophones. [Compare phonemiαl 3: conforming to pronunciation. (Collins Dictionary definitions).
* display-option: any of a set of various writing-systems in which a text, and usually some information about its pronunciation, may be displayed, especially on an electric screen. Display-options may be differentiated from each other by the writing-systems they employ to identify the words, as well .as by the amount or type of additional phonetic information they make visible and the means they use to display or distinguish it. They are not differentiated by merely aesthetic choices, as for instance by font selection, or letter size, or page layout or paragraph style, though they may have specific default settings for these. Display-options are so called because the decision to use one or more of them will normally be at the discretion of the user.
user: the person applying the methods or systems of the invention to a text; sometimes synonymous with "reader", "learner", "writer", or "editor".
converter algorithm, converter program or converter: an algorithm or program, associated with a specific display-option, that does the following: 1. carries out that display-option's characteristic selections from among the additional phonetic information and/or the graphic information that is available for a given text; 2. usually incorporates and assembles this information into a new version of the text, called processed text which contains all necessary information for a display-system to represent that text in a specific display-option; 3. upon request, engages the display-system and provides it with the relevant processed text or with the necessary graphic information.
* color-coding of word shapes: the use of color to indicate those parts or regions of a word-shape which are to be processed in a particular way, or to be considered in a particular way, or which correspond to particular parts of another word-shape.
* line-coding of word shapes: as for color-coding, but with the use of different types of line or curve (e.g. thinner/thicker, heavier/lighter, dotted/unbroken, pulsing/non-pulsing) instead of differences of color.
pulsing line or pulsing image: a line or curve, or an image or a part of an image which is rendered more conspicuous by its doing any of the following:
1. repetitively appearing and disappearing (like the cursor on a computer screen),
2. repetitively changing, over part or all of itself, its brightness, its thickness, its color(s) or any other conspicuous visual attribute, 3. carrying out such changes, as per the previous Clause, in such a spatial and temporal sequence as to create the illusion of movement (as often seen in neon signs), or 4. repetitively changing between two or more different forms.
cartouche: a line (which may also be a "pulsing line") that encloses or semi-encloses or brackets or otherwise indicates a word or a phrase or an image, or a section of a word or a section of an image. It thus indicates a portion of the text or of a word-shape which is to be processed differently or which is intended to receive different attention from other portions. (In Egyptology a "cartouche" is a shape which surrounds and marks a royal name. Here it is given a wider definition).
homonym: a word-shape or spelling which has distinct and essentially unrelated meanings. These may or may not be pronounced differently. For instance: "wind", "wound", "read", "lead", "bat", and "cleave".
relevant homonym: a homonym whose ambiguity needs to be resolved before the correct equivalent word-shape for it can be determined in a given alternative writing-system or display-option, or in a range of writing-systems or display-options. (Homonyms can often retain their multiple meanings under an alternative writing-system, in which case they are not "relevant homonyms" for conversion to that writing-system).
* reconversion homonyms: pairs or groups of words or word-shapes which are distinguished from each other in standard text, but which become identical in one or more other writing-systems or display- options. For instance the English words "beer" and "bier" may become indistinguishable in a fully phonetic writing-system. Such words, before being converted from standard text, require to be identified and marked by a reconversion homonym resolver (being an appropriate algorithm or program) in such a way that they can be reliably re-converted to their correct word-shapes.
* reconversion homonym resolver: see above.
relevant homonym filter or homonym filter: an algorithm or computer program that can check the words of a standard text in a given language against a check-list of relevant homonyms either for that language or for a given set of display-options in that language.
homonym parser: an algorithm or computer program that uses the context and possibly the grammar to determine which meaning a homonym has in a given text context.
non-homophonous homonym or phonemic homonym: a spelling or word-shape which changes its phonemic pronunciation according to its meaning. For instance: "wind", "wound", "invalid", "live", "lives". Non-homophonous/phonemic homonyms commonly need to be resolved when transferring to a display-option or writing-system that provides additional phonetic information.
homophonous homonym: a homonym in which all meanings are pronounced and spelt alike. For instance: "tender," "bat," "cleave".
*non-homophonous homonym parser or phonetic homonym parser: an algorithm or computer program that uses the context and possibly the grammar to determine which meaning, and hence which pronunciation, a non-homophonous/phonemic homonym has in a given text context.
* alternating display: displaying alternately, or in repetitive sequence, the word-shapes either of the same word or of the same portion of text, in two or more writing-systems or display-options. Each version in turn is displayed, but not both at once. The alternation may appear as a fairly rapid flickering and be machine-generated. Otherwise, the alternate display of the word or text portion may be requested by the user, and either replaces or is superimposed upon the initial display-option. This may continues until either the user revokes the command or a fixed period has elapsed. In practice, this may mean the user highlighting a given word or phrase then clicking with a pointing device to see either how the word is pronounced or how it looks in a different writing system.
* inter-leaving: setting closely together (so that they may be compared) two or more word-shapes or versions of the same piece of text in different writing-systems or in different display-options. Interleaving may be done on a line-by-line basis, with lines in different writing-systems presented one above the other. The term also covers cases where the two versions appear side-by-side, or otherwise close together.
* text-morphing: causing an image of a word, syllable, morpheme or phrase in one writing-system or display-option to dissolve or mutate into the corresponding image in another display-option or writing- system. This may be done to help the reader or learner associate the two images (word-shapes) or groups of images, and sometimes in such a way as to create a kind of visual memory-trail between the two images via the intermediate stages.
standard letters: the letters used in the dominant alphabetic written form of a language. For instance the 26 letters of the English alphabet.
non-standard letters: any other phonemic symbols, including those of other languages or of the International Phonetic Alphabet (IPA). Note: In a case like that of those Slavonic languages, which are sometimes written in Cyrillic and sometimes in Roman alphabets, "non-standard letters" can mean the non-dominant alphabet of a given language.
* set of phonetic letter-variants: a set of variants deliberately made in the shapes or appearances of the standard letters of a given language, so that each of the resulting variant shapes of a given standard letter can be allocated to one of the different phonemes or allophones which that letter commonly represents in the conventional spelling of that language. (This can be a device to retain conventional spelling, yet achieve phonetic accuracy). dialect: a variety of a language whose pronunciation differs frequently at allophonic level and sometimes at phonemic level from other varieties of the same language. The term is not pejorative. Dialects are normally regional in origin. They may be distinguished from styles of pronunciation which often have more to do with social situations, and which may involve a range of registers that are sometimes designated by dictionaries as "formal", "colloquial", "racy", etc. style of pronunciation: see above, under dialect.
pronouncing-dictionary data-base: A list of words or word-shapes matched with full phonemic information, as commonly found in the pronouncing guides of dictionaries. Additional allophonic information may also be provided.
schwa vowel: the neutral vowel or "murmur vowel" or "obscure vowel", heard in the last syllables of the English words: "fatal", "jewel", "civil", "gallop", "consul".
text: the words of something printed or written; a piece or a portion of a piece of writing that makes some sense.
standard text: text in the form that is normal or conventional for a given language; e.g. alphabetic letters with conventional spelling and punctuation. Note that, in English, both so-called 'British' 'American' spellings (and numerous intermediate 'house spelling styles' of various publishers) can equally be considered as standard text, since the difference between them is minor and rarely troubles the reader. The standard writing-system of a language by definition employs standard text and, for English, standard letters.
alphabetic standard text: the common or conventional way of representing a given language in letters (even if it is more commonly written in logographs).
* resolved text: text to which markers have been added so as to resolve those homonyms which might otherwise cause ambiguities and thus impede conversion into enriched or processed text, or which might impede re-conversion to standard text.
* enriched text: a version of resolved text to which additional phonetic or semantic information has been added in order to facilitate its conversion into a variety of display-options. (Note that the term is not related to the Rich Text Format or RTF provided in some wordprocessing systems). The additional information may be in the form of a number of types of markers added to the text. In this case, different converter-programs or converter-algorithms may make different selections from among these markers as part of the process of creating processed texts for different display-options. * processed text: a version of a text containing all necessary phonetic and graphic information, as required by a given display-system, for displaying that text in a specific display-option.
* graphic information: the specifically graphic (i.e. visual-display) information about a text that the text- displaying elements of a given system require in order to show that particular text in a given display- option. Depending on the given system (and on the symbols used in a given display-option) the required graphic information may involve detailed descriptions of logographic or other non-standard symbols and their relative positions; or it may merely require the use of a relatively simple code to identify the symbols and groupings required.
"pictoqraph or loooqraph" or "pictooraphic or logographic": For brevity, these fixed phrases are used to refer to any and all of the ways in which a single symbol can represent an entire word, word-element or phrase. They refer to non-alphabetic writing-systems that use any variety or combination of logical, pictorial, phonetic, or symbolic principles or codes to create word-shapes; and they include traditional writing-systems that are substantially arbitrary or unpredictable.
BACKGROUND OF INVENTION and DISCUSSION OF PRIOR ART
A major function of any writing-system is to connect its visual code to the spoken word or the underlying ideas in a systematic manner. While the dominant writing-systems of all major modern languages are at least partly phonetic, few provide sufficient phonemic detail to systematically connect speaking and writing. Similarly, while over 80% of modern Chinese logographs contain phonetic elements, most of the phonetic clues are now too cryptic, out of date or inappropriate for most Chinese speakers. The lack of phonetic ciues in English has produced great annoyance and difficulty for very many years. The lexicographer H. W. Fowler in his influential Modern English Usage summed up more than a century of debate with the remark that a suitable phonetically-accurate spelling system for English "would be of incalculable value". It is widely recognised that writing-systems that lack predictability tend to produce massive illiteracy and sub-literacy. [Dr Valerie Yule noted (Yule, p. 10) that while a European high-school student "may read about 50,000 words, a Chinese adult can rarely recognise more than about 4000 logographs." ]
Despite the obvious inadequacies of the written form of most languages and the many attempts at reform, few changes have resulted. American-English spelling is only slightly more 'rational' than British-English spelling; and the post-Revolution simplified Chinese characters are no more 'rational' than the traditional, though they can be written more easily. It is evident that the written form of a major* language may become entrenched to the point where significant reform is all but impossible. Ironically, the rapidly growing use of wordprocessors with inbuilt dictionaries, spell-checkers and thesauruses only serves to further entrench the traditional 'irrational' written forms of the major languages.
The advent of computer programs that allow words or phrases to be read aloud upon command, certainly help users learn pronunciation and to associate the visual and spoken forms of such words or 5 phrases. [See for example, US patents 5,3666,377 to Miller (1994) "Method of manufacturing reading materials to improve reading skills", 5,737,725 to Case (1998) "Method and system for automatically generating new voice files...", and 4,121 ,051 to Place (1978) "Speech synthesizer".] However, speech- production is a slow process and such programs do not assist users to visually identify and employ phonetic clues in the text in a manner that facilitates un-prompted reading.
10
US patent 4,443,199 to Sakai (1984) related to a method for teaching the pronunciation and spelling of words to children in which the words were spelt out by aligning flat tiles of varying shapes and colors, each of which carried a single letter. The letters themselves were not colored but the tiles were sometimes given different colors or shapes to indicate silent letters or particular pronunciations of given 15 letters. Or different-shaped tiles, bearing accurate phonetic information, were inserted among the misleading letters of the conventional spelling. While a useful learning device for children under the direction of a trained teacher, it was not suited to self-learning and required cumbersome apparatus.
US patent 4,609,357 to Clegg (1986), US patent 4,650,423 to Sprague et al, and US patent 4,696,492, 20 to Hardin et al (1987) disclose yet more alternative universal phonetic scripts which, like the better known International Phonetic Alphabet (IPA), can be used to write most languages with phonemic accuracy. Like the IPA, however, such scripts are alternative writing systems that are of little help in learning a standard writing system such as English or Chinese.
25 Many proposals have been advanced to allow the traditional Chinese logographs to be keyed into computers using standard (or near-standard) alphanumeric keyboards. Huang (US Patent No. 4,500,872, 1985) proposed a method for representing up to 40,000 Chinese logographs using no more than 6 ASCII characters, the first four being phonetic. Such a code simplified the problem of storing and ordering the vast amount of graphic information required for Chinese or Japanese logographs. But
30 Huang's system was only one of many closely-competing patents for coding or for improved keyboard- entry or computer-handling of Chinese logographs: cf. US patent Nos. 5,475,767 Du 1995; No. 5,164,900 Bernath 1992; No. 4,505,602 Wong 1985; No. 4,559,615 Goo et al., 1993; No. 4,096,934 Kirmser et al, 1978; No. 4,698,758 Larsen 1987; No. 5,079,702 Ho, 1992; No. 4,879,653 Shinoto 1989; No. 5,410,306 Ye 1995; No. 5,360,343 Tang 1994; No. 5,257,938 Tien 1993; No. 5,270,927 Sproat,
35 1993; No. 4,937,745 Carmon 1990; No. 4,484,305 Ho, 1984; and No. 5,047,932 Hseih 1991.
The challenge of converting phonetic Japanese scripts to kanji logographs also attracted such patents as: US patent Nos. 4,777,600 Saito et al 1988 ("Phonetic data to kanji character converter"); No. 5,208,547 Sakurai et al, 1993; and Shinoto 1989 listed above. Patent No. 4,679,951 by King et al, 1987 was not restricted to Chinese or Japanese writing-systems, and offered a method for entering "symbolic language texts" via a keyboard code.
None of these systems offered means for learning standard Chinese logographs by visual association with either the phonetic form or the pictorial form of the logograph in an mnemonicaHy effective manner.
Other US patents of some general interest are: No. 5,689,616 by Li 1997 "Automatic language identification/verification system" and No. 5,529,496 by Barrett, 1996 "Method and device for teaching reading of a foreign language based on Chinese logographs". However, neither relates to user-selected display of alternative writing systems.
OUTLINE OF THE INVENTION From one aspect, the present invention comprises a text processor for facilitating user familiarisation with the word-shapes of the standard text of a language by enabling a user to select between a plurality of non-standard texts that differ from one another according to the degree to which each incorporates clues to the identity of spoken words that correspond to the word-shapes of the non-standard text. Where the standard text is alphabetical, the non-standard texts may differ from one another according to the degree to which their word-shapes incorporate phonetic clues. Where the standard text is logographic, the non-standard texts may differ from one another according to the degree to which they incorporate pictographic clues. The text processor will normally be computer-based so that the standard and non-standard texts can be displayed on a computer screen or monitor by suitable inputs from the user, operating a computer keyboard, mouse or other input device.
From another aspect, the invention employs computer-based text display means to facilitate familiarisation with the standard written form of a natural language by allowing a user to selectively display text in any one of a plurality of display modes that vary incrementally in the degree to which they depart from standard text according to the degree to which phonetic and/or pictographic clues are incorporated therein. The display modes thus form a hierarchy according to the level of phonetic/pictographic explicitness, there being preferably at least three steps in the hierarchy:
• the standard text mode,
• a transparently phonetic or pictographic mode (which is more predictable or more self-evident), and
• an intermediate mode which has characteristics of both the transparent and the standard modes. It will usually be desirable to include more than one intermediate mode in the hierarchy, the mode levels other than the 'top' standard language level being called 'lower' levels and the level including the most explicit phonetic or pictographic clues being called the 'bottom' level. In one method of use, the text display means may contain a separate and complete version of the text for each display mode so that the user may switch between them or display them side-by-side. In another method, the user may select the number of lower levels required and cause the computer system to generate the desired lower levels from all, or only selected portions, of a passage of standard text. Generation of such lower levels will require the use of a parser routine capable of determining the meaning and pronunciation of a word from its context (eg, distinguish the two sounds of 'wound' in English). Where the pronunciation of a technical term - such as a medical or biological term with a Latin or Greek root - is required, it may only be necessary to highlight that term in the top level display and call for its re-display in a selected lower level. It is envisaged that the system will provide the user with the option of alternating between display modes of the selected word or phrase in a pulsing or 'flickering' manner. Finding out how one word or phrase is pronounced will provide the learner with clues to certain more or less regular phonetic patterns in the writing-system and, thus, to the pronunciation of other words and phrases. In any event, these display-options or levels are preferably such that readers can move between them with relatively little investment in learning-time.
It is noted that there has been much recent interest in language disambiguation methods in the context of automatic language translation. Such parsing methods and algorithms may be applied in the present invention to perform the parsing functions indicated above. For example, Church et al, in US Patent No. 5,541 ,836 (1996) disclosed a method for enabling a computer to detect from the context whether, for instance, the word "sentence" means a grammatical unit or a penalty imposed for an offence. Dahlgren et al, in patent No. 5,794,050 in 1998 offered a method for teaching a computer to cope with the multiple ambiguities of natural language yet without leading to the "combinational excess" produced by considering all theoretically possible meanings. Such advances can be used or adapted to provide the means for automatic computerised conversion between alphabetic and pictographic or logographic writing.
From another aspect, the invention also comprises methods whereby the user can be greatly helped in learning to read text in alternative writing-systems by the computer-graphic process (designated as text-morphing) of gradual progressive mutation on a display-screen of the visual image (word-shape) of a given word. This, together with the electronic display-techniques of inter-leaving and alternation, the user can create trails of visual association that assist readers who are already familiar with one writing- system to associate its word-shapes with the corresponding word-shapes of a different (even a very different) writing-system.
From another aspect the invention includes methods and systems for converting an alphabetic standard text (which means standard letters and conventional spellings) firstly into a resolved text and secondly into a phonetically or pictorially enriched text. In the case of electronic books or voice- recognition software, text may be supplied to users already in resolved or enriched form. The enriched text is then, thirdly, converted into the appropriate processed texts for a series of display-options. These display-options offer various amounts of additional phonetic information, and may display or distinguish it with various degrees of conspicuousness. This can be done for the user's choice of dialect or style of pronunciation.
The choice of display-option is normally made by the user. Indeed the invention shifts from editors and publishers to users and readers the power to decide in what writing-system a given text shall be encoded and presented. Thus each user who owns or has the use of a computer system can convert their existing skill in reading standard text into a skill at reading other writing-systems that are significantly different. This can be done at the user/recipient's own pace and choice.
The result is this: users or readers who receive a text can rapidly display it (and can also print it) with the degree of phonetic annotation they prefer. They can carry out this process on the entire text or on the portions they select or on single words. They can also switch the screen display rapidly between two or more writing-systems. Or they may inter-leave the same text in two or more writing systems (or display-options) in any way that makes it easier for them to transfer their reading skills from one writing- system to another. Thus literate adults may become multi-literate in several writing-systems.
In the case of alphabetic writing-systems, they can also suppress the phonetic markers so that the text, including amendments or additions, resumes the exact appearance of standard text. This can make private use of user-selected or user-customised writing-systems practical for those creating or re- editing texts that they later intend to distribute.
Methods are also disclosed for maintaining standard letters and conventional spelling while offering full phonemic or allophonic information.
Once the concept and practice of offering text in multiple display-options is established, several alternative writing-systems become practicable, including ones that are partly or wholly logographic. These may also be highly pictorial. Methods are also disclosed for inter-leaving and combining display- options as a teaching method either for (1) native-speakers or foreign learners who need to learn traditional logographs, or (2) those who wish to learn newly-invented pictographic or logographic writing-systems.
It is possible to have the best of both logographic and phonetic/alphabetic writing-systems if the two can be connected or inter-leaved, in some of the ways disclosed below, as alternative display-options for the same text. The inter-leaving of display-options enables a learner to transfer reading skills from alphabetic writing-systems (which are more easily learnt) to logographic writing-systems, whether newly-devised or traditional, which are more visually salient. A valuable bridge to literacy in traditional Chinese or Japanese logographs may be achieved, both for native-speaking children and for foreign learners.
The invention is best presented in a format that makes practical and psychological sense to users, and fits with existing habits. One likely format is similar to that of some existing options in modern word- processing packages. Just as users are at present invited to make their own selection of fonts and font sizes, so they might be invited to select among a range of display-options (ranged under one or more icons) which use differing writing-systems or display varying amounts of phonetic information.
The invention is also, and alternatively, indicated in the claims appended hereto
DESCRIPTION OF EXAMPLES
Having portrayed the nature of the invention, particular examples will now be described by way of illustration. In the following description reference will be made to the accompanying drawings, a brief description of the Figures of which follows.
Brief Description of Figures
FIGURE 1 is a flow-chart illustrating the process steps of the first example, where the user is relied upon to resolve phonemic homonyms and enriched text is generated.
FIGURE 2 is a flow chart illustrating the process steps of the first example, where processing steps of the second example, where a plurality of different, user-selectable, phonetically-modulated display modes are generated.
FIGURE 3 is a flow chart illustrating the processing steps of the second example, where phonemic homonyms are resolved automatically and where a plurality of phonetically-modulated display modes is generated.
FIGURE 4 is a code chart, providing an example of a set of phonetic letter-variants suitable for use in the resolution of homonyms in English.
FIGURE 5 is a tabulation illustrating how the same piece of enriched text can be represented in a gradated set of seven display-options, each successive option (a) - (g) providing an increased amount of phonetic information using the phonetic letter-variants of Figure 4. FIGURE 6 is a table illustrating one example of the text-morphing of the Chinese pronouns "WO", "Nl" and "TA" ['me', 'you' and 'him'/'her'/'it'] in successive steps (a) - (g) from their alphabetic word- shapes into their logographic shapes.
FIGURE 7 is a table illustrating a second method of text-morphing of the Chinese pronouns of Figure 6 in which the text-morphing (which may move in either direction) extends to the corresponding English word-shapes. In this case, nine stages [(a)-(i)] are employed
FIGURE 8 is a table illustrating a third method of text-morphing of one of the Chinese pronouns of Figure 6 employing cartouches.
FIGURE 9 illustrates a second method of text-morphing of one of the Chinese pronouns of Figure 6 employing similar elements.
FIGURE 10 illustrates a three-step text-morphing process for the Chinese character for 'ant'.
FIGURE 11 illustrates a three-step text-morphing process for the Chinese character for 'mother'.
FIGURE 12 illustrates a four-step text-morphing process for the Chinese character for 'horse'.
FIGURE 13 is a tabulation of the same English text reproduced in each of three alternative writing/spelling systems.
FIGURE 14 is a tabulation showing at (a) a passage of English in standard text, the same passage in the Shaw Alphabet at (b) and the letters of the Shaw Alphabet at (c).
FIGURE 15 is a tabulation of three display-options of part of Lincoln's speech beginning, "Fourscore and seven years ago ...". Each option combines additional phonetic information with legibility and conventional spelling and uses the "allographs" of Figure 4.
EXAMPLE 1 : Broad Alphabetic Methods
Figures 1 to 3 are flow-charts which collectively describe algorithms or computing programs which (1) convert standard text into resolved text; (2) convert resolved text into enriched text; and (3) prepare a specific version of the the enriched text (now called processed text) which contains all the relevant phonetic information for a given display-option. FIGURE 1 shows a set of steps for the production of phonetically enriched text from standard text, using user input to resolve any phonemic homonyms. The first two steps of FIGURE 1 deal with the resolution of homonyms. Step 1 is the marking of reconversion homonyms by a reconversion homonym resolver. This device, though not commercially available, may be readily constructed by those skilled in the art. Standard text is considered as a string of words, each of which is classified, according to the information found in a dictionary-style database, as creating or not creating ambiguities upon reconversion. Those words like lee/lea or its/it's whose reconversion from a more phonetic writing system creates problems of ambiguity are identified, and the ambiguity is resolved by adding electronic markers (which need not be visible upon the screen or in print-outs) which will tell the reconversion program which of two or more phonetically identical conventional spellings to use.
(Obviously it is important that the standard text should previously have been checked to ensure that it is accurately spelled, but computer-based spelling checkers are well known in the art.) In the first step of the process of Figure 1 , the reconversion homonym resolver examines the words of a piece of standard text. It detects reconversion homonyms like "lea'Tlee", "its'Υ'it's", "their'Ythey're", and "beer'Vbier" which may become identical in an alternative display-option. It then adds markers to ensure they can be reconverted automatically and unambiguously to standard text. The reconversion homonym resolver may be a free-standing algorithm or program which uses a checklist of reconversion homonyms that are relevant to a given language and a given set of display-options. Or the resolver may be collated into the pronouncing-dictionary data-base in such a way that those words in the data-base that are liable to become reconversion homonyms carry markers which can be detected and directly incorporated into enriched text by the algorithm or program which (in the second stage) creates enriched text.
When producing alphabetic display-options that provide additional phonetic information, the first step may be entirely omitted if none of the display-options use non-standard letters or alternative spellings.
The second step of the process of Figure 1 is the resolution of non-homophonous homonyms (also referred to as "phonemic homonyms", that is, words spelled alike but pronounced differently). Once again, a dictionary-style database identifies those words whose pronunciation is ambiguous. Resolution by the user is requested, and accepted if provided, otherwise the "default resolution" is either to show both phonetic values of the word or to use the determination of a parser algorithm or program that has been held in reserve. An alternative flow-path would pass such homonyms directly to the parser algorithm or program (as in FIGURE 2 or FIGURE 3) and then invite human over-ride.
In the second step the text is passed to a phonemic homonym filter. This checks the text's words against a checklist of relevant homonyms. This checklist is specific to the language in question. It might also be made more narrowly specific to a given writing-system or display-option or dialect or style of pronunciation. However, it is simpler to use a fuller check-list that detects all potential homonyms for a given language. Resolving all of these then creates a resolved text from which a single enriched text, and thence all the processed texts for the various alphabetic display-options, can reliably be derived. In a system where only alphabetic display-options are provided, the relevant homonym filter is normally concerned only with phonemic homonyms (also known as non-homophonous homonyms) that is, with those ambiguous word-shapes (like English "wind" or "wound") whose pronunciation changes according to their meaning. Note that in the non-alphabetic display- options, including pictographic or logographic ones, that are dealt with in some of the later claims, homophonous homonyms also become relevant. Hence a different and larger check-list and a different homonym parser, defined earlier as a relevant homonym filter, are required.
In some languages non-homophonous homonyms are so rare that it may be satisfactory to simply flag their presence and offer as alternatives the possible phonemic renderings of a given word-shape, while perhaps inviting the user to decide between them. In English, however, non-homophonous homonyms are sufficiently common that a parser program or algorithm will normally be required to offer a probable resolution of them.
The parser program resembles those in existing spelling checkers and grammar- checkers in that it uses clues based on context and/or grammar. Human over-ride of the parser's determination may also be invited, and if provided, incorporated in the resulting resolved text.
Note that a parser for non-homophonous homophones may not be required when the text is input by voice-recognition. Or the author or supplier of electronic text may sort out such homonyms by human agency and supply text in resolved (or even in enriched) form.
The resolved text next passes to the last two steps of FIGURE 1. These resembles the computing program (now available on the Internet) that was developed by the American typographer, Edmund Rondthaler and by computing engineer Ed Lias, for electronic conversionof standard text into the American Literacy Council's (ALC's) phonemically- accurate reformed spelling called "American Fonetic". Like Lias's program, the program or algorithm for creating enriched text uses a dictionary- style data-base, similar to the pronouncing-guides of major dictionaries. This provides common or most-common modern pronunciations of words.
However, its output is not in reformed spelling but in enriched text. That is, the same standard letters may still occur and in the same order; but electronic markers are added, where needed, to specify which phonemes the letters represent in given words. For instance, in texts in English the letter "a" may need as many as five different markers to specify its various common sounds, as heard in the words "father", "hat", "hate", "final", "head" (in this last example the "a" being silent). These added markers need not be visible on screen or in print-outs. The next stage of the process is shown in FIGURE 2. 3Here the resolved and phonetically enriched text passes to a range of display-converters which control the visual display of this phonetically- enriched text and offer a gradated series of displays. In these displays, the phonetic markers in enriched text are converted into visual phonetic markers. Thus the fact that the letter "c" in a given word is "soft" (as in "cell") will be represented by a marker in the enriched text. A given display-option may then present that letter with a visual marker, such as a cedilla.
Each converter corresponds to a given display-option; and, as it converts enriched text to the processed text required by that display-option, it makes its own characteristic selection from among the electronic markers in enriched text. Those markers which are retained or activated in a given processed text act as signals to the display-system to produce various visual effects. Once again, such converters are not commercially available, but can readily be constructed by those skilled in the art, once their usefulness in this larger system is recognised. Thus, by the method shown in FIGURE 1 and FIGURE 2, the first stage of the converter need produce only a single new version of phonetically enriched text. (The resulting text-files might use the extension code PET for "Phonetically Enriched Text"). Texts in this form may be exchanged with other users. In those display-options that offer fuller or maximum additional phonetic information, visual phonetic markers are produced that correspond to most or all of the electronic markers present in the enriched text. Yet at least one display-option displays none of the phonetic markers added to the enriched text, and hence presents exactly the appearance of standard text. Thus even text that has been both enriched and then amended by the user can still be displayed or printed as standard text. At the other extreme, at least one display- option shows additional phonetic information to at least full phonemic, and possibly to allophonic level.
An alternative more direct path from resolved to processed text is shown in FIGURE 3. Instead of first creating an enriched text containing all the information that any of the display-options may require, each converter takes directly from a pronouncing- dictionary data-base only the additional phonetic information required for its own display.
An ingenious alternative path is also possible from enriched text to a range of alphabetic display- options. This path uses the properties of font-sets to dispense with the intermediate processed text. For this purpose, texts are displayed in a set of fonts. The fonts are so designed that some or all of the phonetic letter-variants (a defined term) which appear as different characters in the fonts used to provide full or maximum additional phonetic information become indistinguishable in other fonts. That is, they appear as the standard letter, without phonetic variations. By switching between an appropriately designed set of fonts, the various converters can produce the range of phonetically modulated display-options from a single stored version of enriched text. The converters may still be required to set default options for the various display- options and to pre-format the text accordingly.
The writing systems used in the sets of alphabetic display-options that are provided for a given lan- guage, will tend to be specific to that language. "Universal" display-options or writing-systems in which text in any language might be described with full phonemic accuracy are possible, but they may involve an inconveniently large set of phonetic letter-variants.
Skilled readers take in familiar alphabetic words at a glance, almost as logographs. Hence it is important that the overall visual outlines of word-shapes should remain similar in most of these display- options. This helps the learner/user transfer from literacy in one display-option or writing-system to literacy in others. The subjective effect, when switching between most alphabetic display-options, should be rather like that of switching to an unfamiliar type-face.
While each display-option has fixed rules for its selection among phonetic markers, and also a default setting for its visual display of them, there can also be scope for user-selected adjustments to the conspicuousness of the visual phonetic markers. It is desirable that these should range from visually obvious to relatively inconspicuous and finally invisible.
For instance, in a display-option where the markers consist of diacritical marks, there can be scope for adjusting their size and color. By dimming or switching them to less conspicuous colors it may be possible to create an indefinite number of visually gradated stages. This has important uses. For instance, language-learners who have read a chapter of an electronic book in a display-option that provides full and conspicuous phonemic information, may now wish to tone-down the conspicuousness of the visual phonetic markers within that display-option in order to check that they can still recognise the word-shapes and remember their pronunciation, even when the phonetic clues are dimmed or de-emphasised and the display is thus returned to something very close to the appearance of standard text.
EXAMPLE 2: Re-converting to standard text
As suggested, it is important that text which has been altered or edited by the user while in a particular display-option can be automatically and unambiguously returned to conventional form. Where this is not the case, users are likely to use the display-option perhaps for reading, but not for editing or revising or composing text. So long as the additional phonetic information is expressed merely by using a set of phonetic letter-variants, automatic re-conversion to standard text remains possible. This may be done either by suppressing the visual expression of the additional phonetic markers or else by providing a program or algorithm to reverse the process of the original text-conversion and to remove the phonetic markers altogether. (The file extension might then be DOC rather than PET).
FIGURE 4 is a code chart that shows how conventional spelling may be retained, yet be combined with a set of phonetic letter-variants that offer a full phonemic description of the common pronunciations of English words. It offers sufficient variants of each letter to allocate one variant for each of the phonemes which that letter represents in standard text. Up to four "anomalous" variants of each vowel letter are used to cover erratic uses, like the Y-sound of the letter "o" in "women". [The rights for artwork and design of FIGURES 4-13 belong to Mr Peter Burns of Townsville, who produced them as the request of the inventor.] Consonant letter-variants (for a given standard letter) appear on the Code Chart in the following order: cardinal values, mutes (shown in shadow imprint font), and variants. Vowel letter-variants are listed in the following order: "cardinal" (e.g. par, pet, pit, pot, putt); "mute", shown in shadow imprint; "regressive" (regressing towards schwa, as in above, below, candid, conceit, supetb) shown in low bold font with dieresis; "burred" (as in fern, bird, colonel, turn) shown in fine italics with dieresis; "name values" (i.e. having the sound of the vowel-letter's English name, as in: fate, feet, fight, vote, mute); "continental or Italian values" (as in: fast, fete, police, fought, lute); "erratic pure vowel variants" (as in: yacht, pretty, lingerie, blood, food, foot, women, put, busy, bury); "quasi-consonantals, including rhotics" (like the "u" in suede); and finally "residual diphthongs".
Display-options using alternative spelling-systems and/or non-standard letters still permit of automatic re-conversion, though attention is required, as explained above, to the problem of words which become reconversion homonyms in the new spelling-system. Even when amendments to the text are entered via the keyboard, they may still be typed in conventional letters and spelling, and continuously submitted to the converters for conversion into resolved, enriched and processed text and thence integrated into whichever display-option the user is currently using. Where relevant homonyms are keyed in by the user, a program may invite the user to resolve them, much as currently happens in some wordprocessing programs when the user keys in a problematic spelling. If amendments are added by voice-recognition, the process is even simpler, since the text may in effect arrive in enriched form. Thus even when the keyboard skills of a given user confine them to entering text in standard form, they will still be able to view the resulting amended text in the writing-system that best suits them. They may also be offered the option of hearing it read aloud, though this is usually a slower process.
EXAMPLE 3: Dialects
The user can be offered a choice of regional pronouncing-dictionary data-bases. In the case of English, the most obvious two would be based on the major North American and on the major British dictionaries; but other possibilities include Indian, Australian, Scottish, and Irish. Within each of these, different styles of pronunciation (ceremonial, formal, colloquial) might also be offered.
For simplicity, the claims refer to selecting among several such pronouncing-dictionary data-bases for different dialects. Yet it may not be necessary to maintain them as fully separate. Markers can be added within a single data-base to indicate which among the alternative pronunciations of a word is most favored in a given dialect area or style of pronunciation. For instance, if the user has chosen "Southern British" and the data-base is then queried as to the pronunciation of the words "missile" or "hostile", markers will enable the converter program or algorithm to select the variant in which the second syllable is pronounced long (to rhyme with "style") rather than short. Note that only a minority of words change phonemic pronunciation when the dialect changes. (Otherwise we would be dealing with a change of language, not of dialect).
The list of relevant homonyms for reconversion to standard text may need to alter slightly according to the range of dialects or styles of pronunciation which are offered, or which the user selects.
EXAMPLE 4: Alternative Spelling-Systems
Alternative spelling-systems involve changes to word-shapes, yet the text may remain recognisable. Motivated readers of phonetic spellings of English or French sometimes claim they can achieve satisfactory reading speeds within a day or so, though others take longer. These display-options are, initially at least, most likely to be used by language learners and by beginners in literacy. For them, means of making a smooth transition to standard text will be important, and may include the interleaving of a standard-text display-option. Other adult users may at first be restricted to reformed- spelling enthusiasts. For them it may be essential that texts they create or edit in this form can be readily returned to standard text.
The American Literacy Council provides a program which translates, in both directions, between "American Fonetic Spelling" and standard text. It incorporates a simple homonym-parser program to decide between alternative pronunciations of non-homophonous homonyms like wind, wound, bow, row, invalid, live, lives. This is based on noting the surrounding words, and is said to be about 80% successful. Spell checkers may of course be provided for these alternative spelling-systems.
Alternative spelling-systems and their display-options need not be designed purely for phonetic accuracy. Visual saliency may instead be the issue. Long words may, for instance, be abbreviated to make text shorter and easier to scan. The reader who needs to see a full phonetic rendition of such words may then be required to slow-flicker them into another display-option.
EXAMPLE 5: Phonetic Letter Variants
However, the writing-systems that are likely initially to have most popular appeal are those that can offer additional phonetic information while retaining conventional spelling. This involves using a set of phonetic letter-variants, examples of which are depicted in FIGURES 4, 5, 15. FIGURE 5 shows how the code table of phonetic letter-variants displayed in FIGURE 4 can be used to produce a set of set of seven display-options that show an incremental range of additional phonetic information. The display- options range from one in which all additional phonetic detail is suppressed (producing the appearance of standard text) to one in which full phonemic and even allophonic detail are provided. In display- option (a) no word-shapes are changed from standard text. In display-option (b), designed for advanced literates, only difficult proper names or place names are changed, while "gill" in the sense of a liquid measure has its soft "g"-sound indicated. (The user is presumed to need no help with "gill", the fish's organ, which has a hard "g"-sound). In display-option (c), words whose pronunciation might trouble an advanced learner of English are clarified. For instance, it shows that the "ss" in "scissors" is a- typically pronounced as "z". It also shows the difference between "gibber" (as in "gibber plain" with a hard "g") and its other meaning (with either pronunciation of the "g"). Display-option (d) marks with a dieresis those vowels which in everyday speech are liable to regress to, or towards, the schwa vowel. At the other extreme of the range, display-option (f) provides full phonemic information; and the final version, (g), provides allophonic information of the sort commonly required by foreign learners, but not by native speakers. It shows, for instance, that the "p" in "examples" is not aspirated, but the "p" in "pie" is.
A particular problem with the representation of vowel phonemes in English is that the language has a strong stress-accent, and the vowels in unaccented syllables are commonly "slurred" to the schwa vowel. Yet when the vowel-letters in most unstressed syllables are replaced by a symbol for the schwa vowel, the visual disruption (for those who are used to standard text) is considerable. As well, some people refuse to accept that so many of the vowels they produce are commonly or correctly reduced to the schwa vowel. Further, there is often genuine doubt. Is the first vowel of convention as fully slurred to schwa as the last vowel? Must, or should, the second vowels or second syllables of hyper- and hypo-, be considered identical schwa-vowels.
A solution to this otherwise-unresolvable argument is offered by the writing-system depicted in FIGURES 4,5, 15. The method is to retain the original letter(s) for all vowel-sounds which either have regressed to, or else are regressing towards the schwa vowel, but to identify such letters by a common visual marker, for instance a common color. Thus both phonetician (or learner) and purist (or etymologist) can be satisfied. If schwa vowels are indicated in English, it is normally unnecessary to mark stressed and unstressed syllables, though this may be done as well. If this is desired, appropriate phonetic markers can be incorporated in the pronouncing-dictionary data-bases and in the enriched text. With the schwa problem solved, it is possible to design a writing-system that provides full phonemic information, yet can retain conventional spelling with all its possible aesthetic or historical values. See FIGURES 4 and 5.
Color alone can in theory be used to produce a sufficient set of phonetic letter-variants. This has not previously been proposed as a writing-system for adults. However, Caleb Gattegno's "Words In Colour" teaching method was widely trialed in the 1960s. (It was not patented). Gattegno advocated teaching small children to read with simplified standard texts. Most of the materials were in black and white, but classrooms were furnished with wall-charts called "Fidels" on which sample words were provided in conventional letters and spelling but with the colors used as a further guide to pronunciation. About 50 English phonemes were distinguished by using the same number of different colors for the letters on these charts. "Visual dictation" involved the teacher's pointer indicating words on these charts. The class then chanted these words aloud to form simple sentences. Dr Yule reports that Gattegno's teaching method slipped out of use in the 1970s largely because it had "too many colors". A further problem for any more general use of such a writing-system (which, however, Gattegno did not advocate) is that most literate adults would not want to face such a rainbow-like range of colors on every page. Because Gattegno's method was not designed for electronic display nor for display-options nor for use by adults, it is not specifically excluded from the Claims. The Claims list "color" among a long list of visual markers that may be used to form phonetic letter-variants. If a display-option is provided that uses a set of phonetic letter-variants based purely on color, it should include provision for the user to render the colors less conspicuous, much as when the color setting on a TV is adjusted towards monochrome. Similarly, the colors need not be used when printing out the text.
A writing-system that is better suited to adult use is set out in FIGURES 4,5. It relies on shape rather than color to distinguish phonetic letter-variants. (See FIGURE 4). It may use, as in FIGURE 15, a (much smaller) range of colors which are therefore much easier to contrast and distinguish. (Note that owing to the restriction that prohibits color in Figures, FIGURE 15 is submitted as a black and white photocopy. However the letters that appear to be in various shades of fainter ink are in fact intended by the inventor to be in various colors.) These colors are used not to constitute individual phonetic letter- variants but as common visual markers to indicate broad phonetic categories. For instance, all silent letters can be marked in a given color, all "cardinal-value" letters in another color, and all "wild" letters in another..
FIGURE 15 shows the use of color to combine additional phonetic information with legibility and conventional spelling. The phonetic letter-variants (also known as "allographs") are essentially as in Figure 4, but color is also used as a common visual marker for broader categories. The three versions or levels correspond roughly to the second, fourth and seventh versions in Figure 5. In the first version, colors are used as common visual markers which apportion letters to three categories: cardinal values, mute, and other values. Since "other values" include the schwa vowel which is found in most unstressed unsyllables, the red color tends to dominate, warning the foreign learner how few letters are to be taken at face value when pronouncing English words. The second and third versions use additional colors and phonetic letter-variants to make more subtle phonemic, and in the third version allophonic, distinctions.
The heavy use of color in all three display-options may strike those who are already literate as garish and excessive, but it is likely to be what learners will prefer. (An advantage of display-options is that one doesn't have to please everyone). The user should be given the option of adjusting the colors towards monochrome, or of printing in monochrome or standard text.
A proposed alternative display-option or writing-system or teaching method that has advantages for text in English and possibly other languages, is one that does use colors as phoneme-markers, but only or mainly for the (19 or so) vowel-phonemes, since few of the consonant letters cause major confusion for learners. This is a more practicable number of colors to distinguish. The number of colors, and the amount of color on the page, may be further reduced by not using color for vowels used with their most common value, or by substituting other common visual markers (such as line-quality or italics or diacritics) for these and other general categories, such as letters representing the schwa/regressive vowel. To further help the learner or user remember which colors correspond to which vowel-sounds, mnemonic pictures may be provided. The pictures may also be associated with pairs of rhyme-words, such as: green scene, black sack, gray clay, red bed, feared beard, gold fold, brown gown, etc. Sets of phonetic letter-variants need not necessarily serve to retain conventional spelling. They may also be used to eliminate two-letter combinations, like English TH and CH, or to permit a fully logical phonemically-accurate spelling based on one-symbol-per-phoneme.
Note that a writing-system is commonly described as phonemically accurate or as "highly phonetic" even if the pronunciations suggested by the spelling are not the only nor necessarily the most common ones in use. This is the case with alphabetic standard text in most languages. In particular, an otherwise accurate spelling system may choose to represent fuller or more formal pronunciations than are commonly heard in rapid speech. It may use and, for instance, rather than 'nd or 'n. An advantage of proceeding thus is that written text can be divided into a number of separate modules (the written words), each of which tends to have a single invariant word-shape. This invariant image facilitates both swift reading, sometimes called sight-reading, and automatic conversion into alternative writing- systems.
EXAMPLE 6: Allophones
Since native-speakers are already familiar with the phonemes of their language, they normally need only basic phonemic information in order to pronounce a new word. However, the foreign learner also needs allophonic (sub-phonemic) information; and this level of information is normally provided by language-teaching text-books. A display-option that shows allophonic variants may also be useful for those wishing to write either in, or about, a specific dialect.
To provide a display-option offering this level of phonetic clarity it is necessary to have a trained phonetician insert some additional markers in the pronouncing-dictionary data-base. It may also require the use either of some additional variant shapes for certain letters or else of one or more diacritical marks or other markers to indicate the allophonic variants of a given letter. Some or most of such markers may be common visual markers in that they consistently represent certain broad categories of phonetic change that tend to create allophones. For many languages these categories will include: voicing, de-voicing, aspirating, de-aspirating, lip-rounding, de-rounding, palatalising, etc. The addition of allophonic detail normally provides no obstacle to automatic re-conversion of the text to standard alphabetic form.
EXAMPLE 7: Idiolects.
A possible display-option is one which shows the phonemes (or even the allophones) used in a specific spoken performance of a text.
Someone learning a language may need to match the text, as displayed in a given display-option, to the precise phonemes they are hearing in a specific spoken performance of the text. To prepare text for such a display-option is not straightforward. In some languages, for instance in English, there is considerable latitude for speakers to decide whether certain words should be "slurred". Slurring may mean that syllables are dropped or that the schwa vowel replaces the vowel used in more formal pronunciations. For instance many speakers regularly slur "February" to "Febry" and "gradually" to "grajily". Such a display-option cannot be prepared by automatic computer-conversion from standard written text. Instead it might be supplied by appropriate voice-recognition software. Or, in the case of educational material, it may be pre-prepared by a phonetician who has listened to the speech- performance in question. Or the speaker may simply take care to pronounce the words as they will be represented in the display-option.
EXAMPLE 8: Summary of Alphabetic Possibilities
The number and range of alphabetic display-options which will prove useful for texts of a given language depends largely on how full is the phonetic information already provided by standard text. Apart from languages like English and Danish whose conventional spelling is often unphonetic, there are languages like French or modern Greek whose conventional spelling fairly accurately indicates pronunciation yet contains additional details or silent letters that are not predictable from the spoken language.
In the case of alphabetic Chinese, the official Roman-alphabet transcription called Pinyin is too recent to have acquired significant spelling irregularities. Though not perfect, it offers such adequate phonetic information for the Mandarin dialect that the only obvious phonetic enrichment is the addition of intonation markers to indicate the four tones of this dialect. Further alphabetic display-options, at least for Mandarin, might not involve additional phonetic information. Instead they might offer additional semantic information, as described below. By contrast, there is large scope for display-options offering phonetic representations of the other major Chinese dialects. Some of these are more truly separate languages. Some have additional tones, and all show many phonemic as well as allophonic changes. Yet all use the same traditional hanzi logographs (though usually in the somewhat simplified forms introduced by the mainland Chinese government after World War II). This means that, at least in formal writing, the word-order tends to be the same. Hence automatic conversion is possible (by straightforward use of a dictionary-style data-base and without true language-translation) from hanzi logographs into alphabetic or partly alphabetic display-options that are phonetically accurate for a given dialect. (An alternative alphabetic display-option indicating the Mandarin-dialect pronunciation can, if desired, be interleaved).
Such phonetic writing-systems for Chinese dialects can be extremely useful, especially when combined with the methods described below for disambiguating Chinese alphabetic standard text and thus allowing it to compete with hanzi logographs as a fully adequate representation of Chinese texts. The phonetic representations of particular Chinese dialects can be produced by essentially the same systems as were previously described for selecting a given dialect or style of pronunciation in any other alphabetic language. A modification is that the appropriate dictionary-style data-base for a given dialect would preferably match not the Pinyin alphabetic word-shapes (which are highly ambiguous) but the traditional hanzi logographs to particular dialectal pronunciations and their (phonetic) alphabetic word- shapes. Finally, there is the bolder option of introducing non-conventional alphabets and non- conventional spelling systems.
FIGURE 13 shows the same English text in (a) Standard Text, (b) "Cut Spelling", and (c) "American Fonetic Spelling". These versions were automatically generated from standard text by converter programs. Conversion into Cut Spelling has been made with Alan Mole's BTRSPL program. Note that the two reformed spellings are notably more economical in space and therefore probably (for those once habituated to their word-shapes) in visual saliency. They are also clearly far easier to learn and to spell with confidence, since each of them follows a few clear spelling rules. The introduction of such "improved" systems has been almost impossible so long as it was a matter of persuading the whole of society to switch to a new writing system. However the concept of offering the user a free choice among alternative systems has the potential not merely to aid the learner but to create a swing of public preference towards systems found to be briefer or easier.
FIGURE 14 shows a further possibility: the introduction via individual choice, of alternative writing systems that, while not very similar to standard text, are briefer or more visually salient. At (a) it shows a passage of English in standard text, the same passage in the Shaw Alphabet at (b) and the letters of the Shaw Alphabet at (c). An Alternative Alphabet for English. Two versions of a text are set out for comparison. The first is in standard form. The second, printed in the same number of lines, combines non-standard letters and non-conventional (phonetic) spelling. The script is Kingsley Reade's Shaw Alphabet, which provides one letter per English phoneme. Note that the outlines of its letters are much simpler than those of standard letters. This allows the text to be printed in less space yet seem (to the experienced eye) more salient. The Reade Alphabet, which is public property under the terms of Bernard Shaw's will, predates yet can be readily adapted to electronic display-options. It is set out below, matched to a list of English phonemes (viz. the sounds represented by the letters in bold print in the short English words).
EXAMPLE 9: Display-options offering non-alphabetic writing-systems
Once some users have become used to escaping the strait-jacket of alphabetic standard text, they may wish the display-options to include other optional writing-systems. Some of these display-options may be non-alphabetic or non-phonetic but more visually salient. Or they may be hybrid, part-alphabetic systems. The possibility of users choosing to learn (initially for their own private purposes) these additional writing-systems would depend largely on the provision of teaching methods or self-teaching methods such as alternating display, inter-leaving and various types of text-morphing.
As mentioned, the skilled reader takes in alphabetic word-shapes at a glance, almost as logographs. Yet strings of letters do not make very salient logographs. They have, on the page, "all the charm of Morse Code". Hence there could be advantage in offering additional display-options (or writing- systems) in which some alphabetic words are replaced by pictographic or logographic symbols that are more visually salient. Few users would attempt an immediate transfer to a full-scale symbol-writing system like C.K. Bliss's Bliss Symbols; but the idea becomes much more practical if the symbols are introduced piecemeal via a series of user-selected display-options. For instance a display-option in which such common words as "the" and "a" and "who" and "which" automatically convert into more visually salient symbols need pose only a momentary obstacle to readers. Thereafter it makes text easier to scan because the eye can more easily pick out where most of the nouns are and also where the relative clauses begin. Further slight changes of this sort might again produce advantages in legibility that individual users might find outweigh the initial strangeness.
A form of inter-leaving that might be useful in such a hybrid writing-systems is one in which the new symbol/logograph, for a period selected by the user, appears beside rather than in place of its alphabetic equivalent. Enthusiasts might then wish to progress by stages to the mastery of a display- option in which only relatively uncommon words are represented alphabetically. The use of colors, or of other common visual markers, so that all the symbols for a given part of speech share a common color or other common visual quality, might also make it easier for readers to progress in such hybrid systems. Prose style might also be helped. For instance, if all the logographs for oppositional conjunctions (such as but, however, though, although, yei) share a common color, and if this color is different from that used for non-oppositional conjunctions, the eye can see at a glance the grammatical patterns marked by these conjunctions. Purely alphabetic texts can also, of course, benefit by display- options which use color-coding or other common visual markers to distinguish the parts of speech and to clarify the syntactic flow of a piece of text.
A graphics program that converted all English words into different stylised picture-symbols or logographs might be cumbersome; but it is certainly possible for the average personal computer to run a program that provides symbols for a few hundred of the most common words in a given language, and that reliably returns them to standard text when the text is to be printed or sent to another user. Note that Japanese newspapers and publications have long used a mixture of logographs and phonetic scripts.
EXAMPLE 10: Resolved Text for Logographs.
Existing word-processing programs often supply a range of symbols, some of which might be adopted as logographs. However, full-scale pictographic or logographic display-options require more elaborate display-converter programs. These, when bridging from an alphabetic writing-system towards a non- alphabetic writing-system, should preferably take their input (on the alphabetic side) from the resolved or enriched form of the text, since this form will have already resolved the problems of non- homophonous homonyms like bow/bow, row/row, wound/wound without losing the distinction between such (visually-different) reconversion homonyms as lore/law court/caught and threw/through. However, as set out in some of the Claims, they may also need either a parser or skilled human input to resolve an additional number of homophonous homonyms.
EXAMPLE 11 : Text-morphing
Text-morphing offers an important bridge from alphabetic display-options to traditional logographs. Computer graphics now make it possible to begin with either a phonetically predictable word-shape or a pictographic image that clearly suggests a given word, and then to mutate that word-shape or image by degrees into the traditional visual representation of that word in any writing-system. Different display- options may be inter-leaved to help the learner/user associate the word-shapes of the same text in different writing-systems. Or individual word-shapes or small groups of them may be flickered, slow- flickered or text-morphed to the same end. The use of colors, and the color-coding of different parts of word-shapes, can also help greatly in creating either visual associations or a visual memory trail (involving intermediate shapes) between two seemingly unrelated images. As well, the process of text- morphing need not be one-way. It may reverse. A word-shape may also be made to alternate or (more rapidly) to flicker back and forth between two states.
Triple and multiple inter-leaving can be a powerful teaching method for language-learning. For instance students of Japanese may wish to read texts while seeing simultaneously the traditional writing (which is largely kanji logographs) plus a phonetic rendering plus a translation. They can thus see form, pronunciation and meaning at once.
FIGURE 6 shows the text-morphing of the Chinese pronouns "WO", "Nl" and "TA" from their alphabetic word-shapes into their logographic shapes. Each column reads from the top down. The third row gives the Pinyin alphabet word-shapes plus standard intonation marks. If a smooth transition is desired, more intermediate shapes would be required, but the principle is clearly exemplified.
FIGURE 7 shows an alternative method of text-morphing of the Chinese pronouns of Figure 6 in which the text-morphing (which may move in either direction) extends to the corresponding English word- shapes (first row). Note how line-coding (thinner lines) draws attention to certain elements in the changing shapes.
As Figure 6 shows, while the text-morphing process should take advantage of any pre-existing similarities between the two word-shapes, it does not depend on such similarities existing. Any word- shape can be mutated into any other word-shape with some appearance of continuity, even if only by changing it over section by section, or by dimming one set of lines while strengthening another. However, the visual memory path between the two shapes will be more easily memorised if the user is able to focus upon elements which remain the same or similar during the text-morphing process, [cf. FIGURES 7-3.] All these visual processes will usually be more effective educationally when at least partially under the control of the learner or user. Such methods can also be used in interactive teaching programs, as indicated in several of the later Claims.
EXAMPLE 12: Code
To handle logographic writing-systems a computer program needs some code to label the logographs. The conventional spellings of standard text would provide one such alphabetic code, with perhaps some additional ASCII characters required to distinguish homophonous homonyms. Once pictographs or logographs are stored by the computer in a given code, automatic re-conversion to alphabetic form (and to conventional spelling for those languages where such spelling exists) is straightforward via a dictionary-style data-base.
EXAMPLE 13: International Pictographs and Traditional Logographs
Traditional logographic systems are very restrictive. Their arbitrary nature makes them hard to memorise, leading to written vocabularies as small as 4000 words. Yet the history of China, Japan and (in part) Korea, shows that humans have been prepared to make very large sacrifices in order to have logographic writing-systems that can link languages and dialects. Today the need for internationally comprehensible signage and communication exerts a similar pressure on all languages. The use of icons is growing. Logographic writing-systems may be the way of the future. Yet they are so poor at providing crucial phonetic information that any newly-designed logographic system needs, in practice, to lean upon a phonetic alphabet or syllabary. Display-options provide a practical means for it to do so.
One proposed display-option involves the creation, for a given language, of a new set of pictographs each corresponding to one of the traditional logographs, and clearly suggesting its meaning. Such a set of pictographs is far easier to learn and to remember than the traditional logographs. Text-morphing can then create paths from it to the traditional logographs. See FIGURES 10, 11 , 12.
FIGURE 8 shows a third method of text-morphing one of the Chinese pronouns of Figure 6 employing cartouches. The cartouche draws attention to an element that remains similar during the transition. It thus helps to create a memory trail between the two unrelated word-shapes. Note also how the shaped cartouche draws attention to the rotation of the element inside it. FIGURE 9 illustrates a further method of text-morphing one of the Chinese pronouns shown in the third column of Figure 6. Note how the substitution of a capital T for the lower-case "t" in the word "TA" creates a point of similarity with the corresponding traditional logograph, which is then exploited in the text-morphing process. FIGURES 10-12 offer three examples of how a self-explanatory pictograph can be simultaneously designed to suggest, or to readily text-morph into, a corresponding non-pictorial logograph. (The "horse" example may re-enact the historical process by which the Chinese logograph evolved from an early pictograph). Note how both the pictograph and the largely arbitrary logograph for "mother" can be simultaneously visible. This is a form of "slow flickering" or alternation as the term is defined above.
In fact an invented pictograph can always be so designed as to contain one element that is reminiscent of a given logograph, and the points of resemblance can be color-coded or line-coded to draw attention to them. Thus even when looking at the pictograph in its original form, the learner can know which elements to concentrate upon. [cf. FIGURES 10-12] For instance, a picture-symbol that is a stylised line-drawing of a bird might need to text-morph into an arbitrary conventional logograph for "bird" that looks quite unbird like. Yet a visual memory trail might be created if, for instance, a fold in the bird's feathers in one part of the pictograph is designed to show a pattern reminiscent of the shape of the logograph, and if this single point of resemblance between two otherwise disparate visual shapes is made conspicuous by color-coding or line-coding, thus creating a visual memory trail that allows the learner or user to connect two otherwise disparate images. Similarly a fold in the apron or clothing of the pictograph for "chef or "waiter" or "mother" might text-morph towards the traditional logograph while the rest of the pictograph fades away. FIGURES 10-12 show that there can also be intermediate images in which both pictograph and traditional logograph are clearly visible. Text may be presented in such a display-option.
EXAMPLE 14: A Chinese Syllabary
A further learning method, which would not be practicable in most other languages, is made possible by the phonetic system of Chinese and Japanese. Especially in Chinese, most words or morphemes are monosyllabic. Further, there is a remarkably limited number of syllables in the Mandarin dialect: less than 500 if one ignores the four tones or intonations, and less than 2,000 if one takes note of them. This contrasts with tens of thousands of syllables in most European languages. It also makes possible the use of a syllabary. Since most syllables correspond to several different logographs with different meanings, it is easy to associate each symbol in the syllabary with some object that suggests a pictographic representation. Hence it is possible with modern display-systems to create a pictographic or essentially-pictographic syllabary such that the interpretation of its symbols is almost self-evident and hence readily learned and remembered. The full range of less-than-2000 syllables might be unambiguously indicated by a syllabary containing less than 500 symbols, provided the tones are indicated by diacritical marks or other convenient common visual markers. Such a syllabary provides phonetic information to full-phonemic level. When the learner wishes to transfer from literacy in such an easily-learned system to literacy in the traditional logographs, color-coding and line-coding can help connect the disparate word-shapes. EXAMPLE 15: Improved alphabetic writing-systems and display-options for Chinese and Japanese.
Chinese schools often use the Pinyin alphabet, written small above the logographs, as the initial learning medium in schools. Similarly the Japanese use the phonetic hiragana syllabary as their introductory script for children. Pinyin was intended by the Chinese government to become China's official writing-system; but text in Pinyin is often ambiguous because of the numerous homophonous homonyms in modern Mandarin. Pinyin has no way to distinguish these. Yet they require to be represented by different traditional logographs. Hence they form awkward reconversion homonyms when Pinyin text is to be converted to logographs.
Automatic translation from logographs to the phonetic Pinyin script, or to Japanese phonetic syllabaries, is not a problem; but automatic computerised translation from Pinyin to conventional Chinese logographs requires more powerful context-sensitive parser programs than are currently available to resolve the problem of numerous homonyms that are also homophones. (However, the US patents, cited above, by Bernath 1992 and Sproat (1993) and Ho (1984) already claim such methods). Humans, too, sometimes have trouble sorting out the ambiguities of Pinyin script. Any display-option which resolves even some of the ambiguities of an alphabetic writing-system for Chinese can make it a much more credible rival to the traditional logographs. Such a method can also greatly improve the speed and certainty of any parser program or algorithm used to resolve the remaining ambiguities.
For instance, a symbol that helps to resolve the ambiguity can be added to most such ambiguous word- shapes. An alternative method is also proposed that requires only alphabetic letters. In this, a particular synonym or "guide-word" is paired with an ambiguous word or word-element. The guide-word is shown adjacent to it, but in a different color or shading or otherwise distinguished. The guide-word thus serves as a silent guide to meaning but not to pronunciation. The conspicuousness of such annotations may, as in previously described methods, be varied through the user's own adjustments to the default settings of one or more display-options. A modification to this method is to use only the opening letter or letters of the "guide word" where these are enough to resolve ambiguity.
A combination of the above two methods is also possible whereby a reduced list of just a few dozen "guide-words" is provided. They are then used not as exact synonyms but as clues to the desired area of meaning. The guide-words themselves may be abbreviated or replaced by symbols or diacritic marks, or by pictographs or logographs, whether new or traditional. What is essential is that they remain reliably associated with the word-shape of each alphabetic homonym that needs to be resolved. They thus create a fixed logograph which permits of rapid reading. The use of such logographs can create a resolved text that permits both of automatic re-conversion and of direct conversion into a range of display-options, including traditional logographs. This method can be extended to Japanese and to other languages with similar homonym problems.
We thus have a writing-system (or display-option) for such languages which offers a full phonemic description of the (often ambiguous) spoken language. When used with the restricted vocabulary of small children, it can function much like a conventional alphabetic system. But as vocabulary becomes less restricted and the need to distinguish homophonous homonyms grows, the user may switch to display-options in which the guide-words or their symbols are more heavily used or more conspicuous. Guide-words or their indicators may also of course be added to the symbols of a syllabary, whether it is a traditional syllabary or an invented one like the Chinese syllabary described above. Applied to Japanese, this method may enable the modification of traditional syllabaries or of imported alphabets, to compete more strongly with traditional kanji logographs.
In a variant on the main method, or an addition to it, the text in either the traditional logographs or in the new (partly alphabetic) logographs produced by the above method is stored as a series of numeric or alpha-numeric codes, each piece of code corresponding to a single word or word-element of the text. Converter programs or algorithms can then produce whatever word-shapes are required for a particular display-option: whether traditional logographs, standard alphabetic word-shapes, pictographs, or semantically enriched alphabetic logographs as just described.
EXAMPLE 16: Two Pictographic Systems
The later claims cover two different kinds of largely pictographic writing-systems. One of them is intended to to be international, at some cost in idiomatic quality and range of vocabulary. The other aims to represent most of the words of a given natural language through a combination of pictographs, ideographs and letters. Modern color display-screens and color printers allow pictographic symbols to be far more visually vivid and recognisable than traditional logographs. Freed of the restriction to a single color of ink, and freed of the need to save the scribe's labor by reducing the clarity and detail of the symbol, one can afford to make symbols so vivid that the reader can recognise their general meaning at a glance. A further advantage is that the resulting writing-system is potentially international, not linked to a given language.
A primarily pictographic writing-system is proposed that provides one symbol per semantic area for each of several hundred semantic areas. Such semantic areas (areas of meaning) should be no larger, on average, than each of the 1000-odd sections in a traditional thesaurus, and may be much smaller. The upper practical limit to the number of pictographs is set by the problems of keeping them sufficiently compact yet sufficiently distinct and memorable. Simple pictographs can be combined, following logical rules, into more complex pictographs or logographs. As well, non-pictographic symbols can be used, much like prefixes or suffixes, for grammatical distinctions or for such semantic concepts as: the opposite, or the absence of a quality, or for "large example of or "small example of or "forbidden to". Such an "algebraic multiplication" can turn a set of pictographs into two or three times as many compound logographs.
EXAMPLE 17: International Pictographs
When one is converting standard text (of a given natural language) into such a writing-system there remains the problem of distinguishing all the closely related words in a given area of meaning. A pictograph or an ideograph may clearly indicate a watercourse. Yet is the intended word creek, or brook, or river or stream or channel! Sufficient pictorial detail may provide sufficient semantic precision to give the answer often enough, at least, to satisfy a small child writing (with a restricted vocabulary) to a penfriend or electronic penfriend. If the penfriend speaks a different language the writing-system will have a further advantage in that its symbols may be equally intelligible in both languages.
EXAMPLE 18: Sign Language Such a writing-system is also at least as precise as most kinds of sign-language (except when they adopt the slow process of spelling out the letters of a word in a given natural language) and might well be combined with sign-language systems. A variant of the same writing-system may also be designed to correspond to the vocabulary of a simplified natural language, such as Basic English. It may also be inter-leaved with an alphabetic display-option for that language.
EXAMPLE 19: Pictographs Plus Letters
Yet for an adult writing in a given natural language, and wanting to use the wider resources of that language's vocabulary, there is an alternative way. Letters may be added to pictographs to indicate precisely which common word (out of the range offered by the thesaurus in that given area of meaning) is intended. These letters will commonly be the first or last letters of a given word.
The result is then a set of compound logographs, adequate for the writing of a given natural language. Each logograph typically consists of one easily interpretable pictograph and one or two very simple phonetic elements (letters). Line-coding and/or color-coding may also be used on the letter(s) so that they indicate not merely a letter used in the conventional spelling of that word, but the pronunciation the letter carries. Color may also be used as a structural principle in creating and distinguishing the pictographs. The more sources of visual discrimination that are offered, the more compact the writing- system can be without losing clarity and visual salience.
This writing-system is surprisingly practicable even in a language as rich in vocabulary as English. One reason is that only a few of the many near-synonyms listed in a given section of the thesaurus may require use of the specific pictograph assigned to that area of meaning. For instance of all the words and phrases listed under "devil" in a medium-size English thesaurus, probably only the following would qualify: "devil", "fiend", "demon", "imp" --and perhaps the proper names "Beelzebub", "Satan", "Lucifer", and "Mephistophilis". By contrast, the pictograph for devil need not be used for such soubriquets as: "the Tempter", "the prince of darkness", "the father of lies", "the evil angel", and "the Anti-Christ". Similarly, for the word "angel" the only relevant synonyms requiring its pictograph might be "seraph" and "cherub", plus "archangel" (represented presumably by the pictograph for "angel" plus a modifying sign used in algebraic fashion for "big"). Rarer words, like the names of less well-known angels and devils, might well be represented alphabetically, since most readers will require information on their pronunciation. Finally, even the pictograph for devil might be replaced by a combined symbol for "bad" + "angef.Many of the rare or technical words are largely intemationai, so their use in alphabetic form need not detract much from the ability of foreigners, or foreign learners of a given language, to read such a writing-system (or signage or display-option) with a fair level of understanding.
Such a writing-system forms a possible initial teaching method for small children learning to read in their native language. Later they can transfer their reading skills to standard text. Especially when combined with voice-production by a computer, this writing-system might also allow them to make some rapid progress in a second language, sufficient for instance to write to (electronic?) penfriends of their own age in it. This writing-system is not, when used purely by itself, suitable for adult learners of foreign languages, since they require more detailed phonetic information, and this is better provided by an alphabetic writing-system or a syllabary. However this writing-system, when it is used as a display- option and inter-leaved with an alphabetic display-option, can be ideal both for young native-speakers and for foreign learners.
The great advantage for the language-learner is that since the semi-pictographic writing-system is semi-intuitive, they will usually be able to grasp the meaning and syntactic structure, while an interleaved alphabetic display-option offers the pronunciation. This can mean throwing away that traditional language-learner's crutch, the "vocabulary" or word-list of supposed synonyms in one's own language, and grappling directly with the foreign language.
The same writing-system may also prove a superior method of presenting text even for adult literate native-speakers. They may find such semi-pictographic text more scannable and more visually salient than text in purely alphabetic writing-systems.
Such a semi-pictographic writing-system is not suitable for individual scribes. Hence the formulation of its rules can and should be controlled by the compilers of the relevant programs or algorithms, so as to produce a standardised set of word-shapes. Such a set should be worked out for each closely related group of words in the thesaurus of a given language. EXAMPLE 20: Signage
A further merit of such a writing-system is that when it is used on signage, or even in printed text, its meaning is partly intelligible to those who do not speak the language used. By contrast, a sign bearing the Modern Greek word EPIKINTHINOS (even if it were written, as here, in the Roman alphabet) would fail to warn non Greek-speakers they were in danger.
EXAMPLE 21 : Homonyms and Pictographs
The creation of homonyms in such a writing-system (for a given natural language) should generally be avoided. Once bat (a flying mammal) has been allocated a clear pictograph, a separate pictograph should not be offered for, for instance, bar (the sporting implement). Just as it is not necessarily a problem that one pronunciation has several meanings, so it is not a problem if all these meanings are represented by a single spelling or a single logograph. A comprehensive dictionary of a given natural language will show that many or most words have multiple meanings. It is very difficult to distinguish all such meanings in writing. Yet no very clear line can be drawn between homonyms and words of multiple meaning. English-speakers for instance cope well with the multiple ambiguities of words like bat, ball, court, tender. A pictographic or logographic system that provided separate symbols for each of these meanings would be possible, but its list of word-shapes would thereby become much larger and harder to memorise. It would also involve endless fine distinctions. Is court as in tennis-court really a different word-element from court as in law court -or in court-room, or in courtly, or in courtship? It may be wise not to meddle.
Where a word has several meanings, it can be advantageous to base its logograph or pictograph on the meaning that is easiest to depict. This principle leads to an important teaching method for Chinese or Japanese, as set out in some of the later claims. However, most or all such ambiguities do need to be resolved for a genuinely international system of pictographs, or for a universal (trans-lingual) semantic code.
EXAMPLE 22: International Resolved Text. An invented pictographic writing-system offers the opportunity, though (as just stated) not the necessity, to resolve many homonyms and to display many distinctions of meaning that are not made in a given spoken language, nor normally in standard alphabetic text. For instance, the pictographs might clearly distinguish bat (a sporting implement) from bat (a flying mammal). To convert a text into such a display-option, after resolving all such homonyms and providing different pictographs or logographs for them, would be in effect to translate the text (though with some loss of idiomatic richness and of precise vocabulary) into a universal semantic code from which it could be automatically re-translated into any other language. Word-order and some idiomatic details might need to be adjusted in the interests of international communication.
For such an international or semi-international writing-system, resolution of homonyms and multiple meanings would usually be essential. (Exceptions might be words like democracy or nature, whose multiple meanings or ambiguities may be common to several languages). Skilled human input would, a t present, be needed to check that all relevant distinctions were made. When converting from such a writing-system into the alphabetic standard text of a given language, it is of course essential to mark the reconversion homonyms.
EXAMPLE 24: International Writing = International Translation?
When (as above) appropriately fine distinctions of meaning are resolved and stored in an appropriate electronic code, automatic conversion of the text is possible from that code into an international or semi-international set of pictographs/ideographs (as described above) which are comprehensible by speakers of any language. Once again, some idiomatic subtleties will be lost, and some adjustment of word-order and of grammatical markers may be required.
Automatic conversion is also possible from such an electronic coding system into a simplified version of a given natural language such as Basic English or Globish or Inglingo, provided an appropriate database is provided and also an appropriate parser/aranger program or algorithm for those cases where the conversion to simpler vocabulary requires changes to word-order. (Basic English simplifies vocabulary via a table of synonyms. Globish also regularises spelling. Inglingo also shortens some words and simplifies some grammatical expressions). Also possible, once the ambiguities of individual words have been resolved in sufficient detail, is automatic translation of text from the electronic coding system into other languages, either directly or via an international semantic code. Such translation (as opposed to conversion) requires not only appropriate data-bases of linguistic information but also an appropriate (unspecified) context-sensitive and grammar-sensitive translation program. Such a program needs to be capable of resolving any remaining ambiguities caused by the ambiguous interplay of words in natural languages.
Some Useful Applications
The most-likely short-term application is as a free-standing computer program or as a major additional feature or plug-in for word-processing packages or for text-display programs (including electronic books, whether for adults, children or language-learners, and including the pronouncing guides of computerised dictionaries). The various writing-systems proposed above and in the Claims are designed to be suitable for use initially within a series of display-options, but might later find independent use. Other applications of this invention may include: (1) interactive computer packages for teaching children to read, or for teaching foreign languages, including those written in non-alphabetical systems; and (2) as one element in compound word-processing and translating software packages which can understand spoken or written texts in any of several languages and can also translate such text into other languages in their repertoire and can then produce the translated version either as text in that language's conventional writing-system, or as text in a more phonetic or more visually-salient writing- system or display-option, or as spoken language.
In the case of the purely alphabetic display-options (covered in the earlier Claims) the procedure and algorithms both for translating conventional text into phonetically-enriched text, and for displaying the added phonetic markers in various gradations of conspicuousness, are relatively simple. Once the idea is clearly stated, and its usefulness realised, it will be obvious that algorithms and computer programs can be written to carry out this work.
The computer-graphic display-converter programs required for text-morphing between alphabetic and non-alphabetic writing-systems are more complex and require more labor. Yet they are clearly within the scope of existing programmers' art once the objective is clearly stated. Like most programs involving computer graphics, these might be carried out in many differing ways, resulting in differing levels of effectiveness and excellence.
Hence it is the general principles of display-options, of phonetic modulation of text and of text- morphing, rather than their embodiment in specific algorithms or computer programs, that are claimed.
Not every possible permutation is spelled out in the above examples or the following Claims, since most of them will soon become obvious to anyone skilled in the art, but enough are stated to make clear the range of possibilities which designers of sets of phonetically-modulated alphabetic display- options or of logographic display-options should consider.
In offering worked examples of phonetic modulation of text I have used specific writing-systems. Yet the methods set out below need not depend (except as specified in individual Claims) upon which types of writing system are offered as display-options. The Claims also set no theoretical limit to the number of display-options that may be offered.
From one perspective, the invention involves bringing together many existing computational processes to produce a kind of macro system for international text processing. The substantially-new concepts and techniques disclosed include: providing text in multiple user-selected display-options, phonemic modulation of text through a number of display-options, non-homophonous homonym parsers and filters, reconversion homonym resolvers, distribution of enriched and processed text, user-selected pronouncing- and dialect-dictionaries, the use of sets of phonetic letter-variants, automatic conversion between writing systems and display-options, cartouches, shaped-cartouches, color-coding and line- coding, altenating display (flickering), inter-leaving, and text-morphing. The linking of all these powerful 5 processes and their associated data-bases into a single international customised reading and writing system is now practical.
The application of these many techniques is too closely linked for them to be patented separately. Yet to set out the ways they can best be combined, without producing chains of multiple-dependent claims, 10 requires an unusually large number of claims, albeit with some repetitive elements.
Bibliography. 15 Rondthaler, Edward, and Edward J. Lias, editors. Dictionary of American spelling : a simplified alternative spelling for the English language : written as it sounds, pronounced as it's written, New York
(680 Fifth Ave., New York) : American Language Academy, c. 1986.
Scragg, CD. A History of English Spelling, Manchester Uni. Press, 1974.
Yule, Valerie Orthographic Factors in Reading: Spelling and Society, PhD Thesis, Monash University, 20 1991.
Gattegno, Caleb Reading with Words In Colour ubl. by Educational Explorers Ltd, Reading, UK, 1969,
SBN 85225 512 8.

Claims

1 Computer-based text processor means for facilitating user familiarisation with the word-shapes of the standard text of a language, characterised in that the user is enabled to select between a plurality of non-standard texts that differ from one another according to the degree to which each said non-standard text incorporates clues to the identity of spoken words that correspond to the word- shapes of the non-standard text.
2 Text processor means according to claim 1 wherein the standard text is alphabetical and the non-standard texts differ from one another according to the degree to which their word-shapes incorporate phonetic clues.
3 Text processor means according to claim 1 wherein the standard text is logographic and the non-standard texts differ from one another according to the degree to which they incorporate pictographic clues.
4 A text processor for visually depicting the text of an alphabetically-written natural language, characterised in that: additional phonetic information is made visible; said additional phonetic information is supplied while maintaining the text's conventional spelling and the basic outlines of its word-shapes, which remain recognisable to the reader; the user can choose the amount of additional phonetic information by choosing among a range of display-options; the said range extends from: a display-option in which all additional phonetic information is visually suppressed, to a display-option in which all available additional phonetic information is displayed.
5 The text processor of Claim 4 with the additional feature that: the user can select the portions of the text, or single words, to which a display-option will be applied.
6 A system for visually depicting the text of an alphabetically-written natural language in a range of display-options that offer varying amounts of additional phonetic information, said system comprising: means for storing a text in several versions; means for displaying text in a range of display-options; means for the user to select a standard text or a portion of standard text and to specify the display- option in which it will be presented; means for ensuring said standard text is correctly spelled; a reconversion homonym resolver; one or more pronouncing-dictionary data-bases; an algorithm or computer-program for selecting from the said data-base(s) all the available additional phonetic information that relates to the words of a particular text, and for inserting this information into an enriched text in the form of various phonetic markers; means for user-selection of a desired display-option; one or more or a range of display-converters, each of which is capable of
(a) selecting from among the phonetic markers in the enriched text those required to produce the selection of additional phonetic information that is characteristic of a given display-option, and (b) converting the enriched text into processed text;
(c) storing the processed text; additional memories or means for storing the text in various forms containing the specific information required for a particular display-option or display-options; means for further user-selected modification, if appropriate, of the default settings of a given display- option.
7 The system of any one of claims 4 - 6, with the following additional features: some or all of the display-options provide, as means to supply the additional phonetic information, some or all of the following visual indicators: diacritical marks, colors, qualities or thicknesses of line, variations to positions of letters and to letter-shapes, new letters, new alphabets, syllabaries.
8. The system of any one of Claims 4 - 7 with the following additional elements; a range of pronouncing-dictionary data-bases corresponding to different dialectal pronunciations or styles of pronunciation; said data-bases providing additional phonetic detail to phonemic or allophonic level; means for user-selection of the desired dialectal pronunciation or style of pronunciation.
9 The system of Claim 6, with the following additional features permitting an alternative route from standard to processed texts: a pronouncing-dictionary data-base or data-bases incorporating a range of phonetic markers to distinguish differing types or levels of additional phonetic information; means for selecting directly from the said pronouncing-dictionary data-base(s) the additional phonetic 5 information that is (a) characteristic of a given display-option, and (b) relevant to the words of a particular sample of text, said text being an example of standard text; means of incorporating said characteristic selection of additional phonetic information directly into processed texts.
10 10 The system of Claim 6, with the following additional features permitting an alternative route from enriched texts to display-options, bypassing processed text: a text-storing system whereby much or all of the additional phonetic information in the enriched text is either stored as, or is represented for display purposes as, a set of phonetic letter-variants; the visual appearance of these phonetic letter-variants therefore depending on the font that is chosen; 15 a set of fonts, in some of which fonts either none or only certain of the phonetic letter-variants are distinguished from each other; the said set of fonts being so designed that the range of distinctions among phonetic letter-variants made visible in each font corresponds to that desired in a given display-option; the effect being that, by selecting a given font the converter produces the phonetic letter-variants 20 required in a given display-option; the converter retaining the function of providing any further necessary graphic information, including default settings, to the display system;
11. The system of Claim 6 with the further provision of means responsive for: 25 distributing or making available to other users the stored versions of resolved or enriched or processed text that have been derived from a standard text; and accepting and storing and displaying resolved, enriched, or processed text that has been acquired from other users or suppliers.
30 12. The system of Claim 6, with the following additions: a non-homophonous homonym filter, being means for checking each word of a correctly-spelled standard text against a list of non-homophonous homonyms found in its language; means for referring each occurrence of a non-homophonous homonym, together with a sufficient amount of surrounding text, to the attention of the user;
35 means for receiving back from the user a determination of the pronunciation for each non- homophonous homonym; means for combining such information from the user with other phonetic information derived from the pronouncing-dictionary data-base(s).
13 The system of Claim 6, with the following additions: a non-homophonous homonym filter, being means for checking each word of a correctly-spelled standard text against a list of non-homophonous homonyms found in its language; means for referring each occurrence of a non-homophonous homonym, together with a sufficient amount of surrounding text, to a non-homophonous homonym parser; a non-homophonous homonym parser; means for receiving back from the said non-homophonous homonym parser a determination of the probable pronunciation for each non-homophonous homonym; means for displaying said determination of non-homophonous homonyms and for seeking possible user over-ride; means for accepting such user over-ride, if offered; means for combining the phonetic information resulting from the non-homophonous homonym parser's determination, and from the user over-ride, if offered, with other phonetic information derived from the pronouncing-dictionary data-base(s) .
14 A method of converting standard text in an alphabetically-written natural language into enriched text, said method including the following steps: ensuring the standard text is correctly spelled; resolving reconversion homonyms, if relevant to the system's range of display-options, with a reconversion homonym resolver; determining the pronunciation of each word of the standard text from a pronouncing-dictionary database; incorporating into the text the additional phonetic information, derived from the pronouncing-dictionary data-base, in the form of various phonetic markers; storing the resulting enriched text.
15 A method of displaying the text of an alphabetically-written natural language with additional phonetic information, said method being the method of Claim 14 with the following additional steps: seeking and receiving the user's selection of the portion of the text to which the display-option is to be applied; selecting the required converter-algorithm or converter-program for a specified display-option; requesting the required converter-algorithm or converter-program to convert the appropriate enriched text into the correct processed text for its display-option; that is, to select, assemble and incorporate the additional phonetic information that is required for displaying the said text in the selected display-option, storing the resulting processed text; displaying the processed text in the specified display-option, with that display-option's characteristic selection of additional phonetic information.
16 The method of Claim 15, with the additional steps of:
5 seeking and receiving the user's selection of the preferred display-option.
17 The method of Claim 16, with the additional steps of: where a selected display-option permits of user-adjustment to its default settings, seeking and receiving such user-adjustment; and 10 modifying the display-option's display of a given text accordingly.
18 The method of Claim 15 with the following additional steps: checking each word of the standard text against a list of non-homophonous homonyms found in that language;
15 referring each non-homophonous homonym, together with a sufficient amount of surrounding text, to a non-homophonous homonym parser; receiving back from said parser a determination of the probable pronunciation for each non- homophonous homonym; where appropriate, seeking and receiving the user's over-ride of the said parser's determination;
20 incorporating the phonetic information derived from the parser or parser-plus-user-over-ride process so as to produce resolved text; further determining the pronunciation of the words of the said resolved text from a pronouncing- dictionary data-base so as to produce enriched text.
25 19 The method of Claim 14 with the following additional or modified steps: seeking and receiving the user's selection of the preferred dialectal pronunciation and/or style of pronunciation; determining the pronunciation of each word of the text from a pronouncing-dictionary data-base which corresponds to, or which includes markers specific to, the previously-selected dialectal pronunciation
30 and/or style of pronunciation; the said pronouncing-dictionary data-base registering the word-shapes of non-alphabetic or logographic standard text, where they are required, in any appropriate code.
20 A display-option which represents the phonemes or allophones of a given spoken-performance of 35 a text, the match between display-option and spoken performance being achieved by either:
(1) the use of phonetic information provided by voice-recognition software;
(2) the use of phonetic information provided by skilled human agency; (3) the performer of the spoken performance taking care to produce the phonemes or allophones already specified in the display-option which is to be offered as the written record of the performance.
21 The method of Claim 15 with the following additional steps: standing by to repeat the method, unless otherwise requested by the user, upon any amendments or additions which the user may make to the text, and to then update the stored versions of resolved, enriched and processed text; standing by to reconvert, if requested, the latest updated version of resolved, enriched or processed text into an updated version of the original standard text; said reconversion involving either the removal or the rendering invisible of the additional phonetic information.
22 The method of Claim 14 with the additional step of distributing to other users either:
(a) stored versions of processed text, so that said other users can immediately display the text in the same display-option, or
(b) the latest stored versions of the resolved or enriched text, so that said other users can display the text according to the range of display-options in Claim 1.
23 The method of any one of Claims 14 - 22 with an additional step relating to non-homophonous homonyms which may occur in standard text, said step being either:
(a) to flag or highlight in display-options the presence of each non-homophonous homonym, while displaying alternative word-shapes for its alternative pronunciations; or
(b) to display, in their contexts, the list of non-homophonous homonyms found in the text, and invite the user to choose between their alternative pronunciations; then when such determination is provided by the user, to incorporate it into a resolved text; or where such determination is not provided, to proceed as in (a).
24. The method of any one of Claims 14 - 23 with additional display-options which use an alternative (non-conventional) spelling-system or spelling-systems, with consequent changes to word-shapes, such method involving the following additional steps: standard letters or a set of phonetic letter-variants corresponding to standard letters are used; the correct letter-combinations in said alternative spelling-system(s) are found by consulting a dictionary-style data-base in which corresponding word-shapes (letter-combinations) for the two writing- systems are matched; where reconversion homonyms are created in a given alternative spelling-system, and where the reconversion homonym resolver has not yet added such markers as will be required for the automatic reconversion of such homonyms to standard text, the relevant converter adds them to the processed text.
25 The method of any one of Claims 14 -24, with the following modifications: additional display-option or display-options are provided; said display-option(s) include non-standard letters, with consequent changes to word-shapes; said non-standard letters remain associated with the letters they replace; in cases where reconversion homonyms are created that are peculiar to this writing system and have not already been resolved by the reconversion homonym resolver, the relevant converter adds appropriate additional markers to the processed text.
26 The method of Claim 25 with the following additional steps: a non-standard spelling-system or spelling-systems is used in one or more of the/those display- option^) which also use non-standard letters; the correct letter-combinations for such display-options, and for their corresponding processed texts, are found by consulting a dictionary-style data-base in which corresponding word-shapes (letter- combinations) are matched; where conversion to such processed texts creates reconversion homonyms which are peculiar to such writing-systems, and which have not already been resolved by the reconversion homonym resolver, the relevant converter adds appropriate additional markers to the processed text;
27 A method of modifying display-options, whether alphabetic or logographic, to make visible the grammatical or syntactical structure of sentences, whereby line-coding or color-coding or other convenient common visual markers are used to distinguish some or all of the grammatical parts of speech.
28 A method, suitable for use in the text-processor of any one of Claims 1 - 6, of representing printed texts of an alphabetically-written language with additional phonetic detail, the method being characterised in that: standard letters are used; the conventional spelling of a given language is preserved; a set of phonetic letter-variants is created, for most or all of the standard letters; the number of resulting variations for any given letter is sufficient to provide a separate variant of that letter for either:
(a) all the separate phonemes frequently represented by that letter in the conventional spelling of the language; or
(b) all the separate phonemes and some or all of the allophones frequently represented by that letter in the conventional spelling of the language; the resulting variant shapes or appearances of any given letter are produced by change to, or in, any or all of the following: the color; or the overall shape of the letter; the slant or orientation or rotation of the letter; the type or quality of lines making up the letter; the use or omission or modification of serifs; the use or non-use of hollow-print or of a solid-filled letter-shape; presence or positioning of diacritical marks; presence or positioning of cross-hatching, of pulsing lines or of cartouches; presence or positioning of dotted lines or curves; of a break or breaks at specified points in the lines or curves of a letter; presence or absence of a spot or of a mark or of a thickening of line or of a cross-stroke at any of a range of positions inside or upon the letter; any other suitable changes in visual appearance may also be used to distinguish variant forms of a letter; word-shapes, even if all the standard letters are replaced by such variant forms, remain sufficiently similar to those in standard text to be recognisable.
29 The method of Claim 28, with the following additional steps: text may be shown in two or more display-options falling within the following range: from, a display-option in which all additional phonetic information is visually suppressed, to, a display-option in which all available additional phonemic and allophonic information is displayed; the extent to which variant letter-shapes replace standard letters, and hence the extent to which the appearance of text alters from that of standard text, varies according to the amount and kind of additional phonetic information that a given display-option is designed to supply.
30 The method of Claim 29, with the following additional step: the difference in appearance between standard text and text as displayed in a given display-option from the above range, can be increased or decreased; such change involving means responsive for simultaneously adjusting the size or conspicuousness of some or all of the changes in visual appearance that create the set of phonetic letter-variants.
31 The method of Claim 29, with the following additional steps: all silent letters are retained, as in conventional spelling, but are marked by a common visual marker, being a common variation in appearance; when representing text in English, all vowels regressing to or towards the schwa vowel continue to be represented by the same letter as in conventional spelling, but are marked by a common visual marker that is different from that used for silent letters; the said common visual marker may be or include any or all of the following: a common color; a common change in brightness, color-saturation or any other quality of color; a common diacritical mark; a common displacement or sloping or rotation of the letter; a common increase or diminution of the letter's size; a common change in the letter's font or type style; a common thickness or faintness or any other quality of line in all or in specific part or parts of each letter; use of pulsing lines or of cartouches; cross hatching; use or omission or modification of serifs or other ornamental features; presence of a spot or mark or cross stoke or a thickening of line at a specified position or positions inside or on each letter; as well as any other convenient common visual marker.
32 The method of any one of Claims 28 - 31 with the following additional step: where the set of phonetic letter-variants, as used in a given display-option, is insufficient to represent all the phonemic or allophonic uses to which a given standard letter may be put in the conventional spelling of a given language, any occurrence of such a letter where its sound-value in a particular word cannot be reliably indicated is marked as "wild"; all "wild" letters are marked with a common color or other convenient common visual marker.
33 A teaching method (designated as "text-morphing"), for use in a system possessing capacities for text-storage and for electronic display of text or for other flexible means of text-display, said method being a method of enabling the learner, who already knows a word-shape in one writing- system, to associate that word-shape with its corresponding form in a different writing-system; said method being to mutate the word-shape, via a series of intermediate stages, from its form in one writing-system to its form in the other writing-system; regardless of whether the intermediate stages are shown for long enough to be individually discerned or whether they succeed each other so swiftly that the human eye sees normally a smooth transition between the said word-shapes.
34 A teaching method to assist the user in learning to read unfamiliar writing-systems, the method being the method of Claim 14 with the additional steps of selecting not just one but two or more display-options; enabling the user to compare the appearance of the same text or word-shape in two or more display- options, the method of so enabling the user being any single one, or any combination, of the following methods: interleaving the display-options; or aternating between the display-options;; or text-morphing between the display-options.
35 The text-morphing method of 33 with the additional step of: creating a visual link between two word-shapes by emphasizing, exaggerating, or enlarging, during the intermediate stages, some element or elements in the first word-shape that is/are similar or identical to an element or elements in the second word-shape.
36 The method of any one of claims 33 - 35 with the additional step of: color-coding or line-coding some parts of the two said word-shapes so as to create or to clarify or to make more conspicuous a similarity between them. 5
37 The method of Claim 33 with the specific/additional step of using a cartouche or cartouches to draw attention to regions or elements either in the two word-shapes or in the intermediate shapes between them, which show visual similarities or relationships that may help the learner to associate the two word-shapes.
10
38 The method of Claim 37 with the specific/additional step of giving the cartouche a shape, with a right way up, such that the learner can readily perceive when it and its contents have been rotated.
15 39 A method according to any one of Claims 33 - 38 with the added step of selecting among existing variants of, or creating a new minor variant of one or both of the said word-shapes such that there is then created either: a stronger visual similarity between the two word-shapes than previously existed, or a point or points of visual similarity between them that did not previously exist.
20
40. The system of any one of claims 6 - 13, with the further provision of: appropriate capabilities for coding, storing and displaying pictographic or logographic symbols, or syllabaries; appropriate capabilities for interleaving, alternating display, or text-morphing between such writing- 25 systems as: alphabetic; pictographic or logographic; syllabaries; additional display-options (and means responsive thereto) wherein some, or many, or all of the words or syllables or morphemes of alphabetic standard text are represented instead by pictographic or logographic writing-systems, or syllabaries; a logographic-dictionary data-base, being a dictionary-style data-base in which pictographic or 30 logographic equivalents, or syllabary equivalents, are stored in an appropriate code and are matched with the word-shapes of standard text; appropriate converter-programs or converter-algorithms, plus any other means responsive, which can provide either automatic conversion, or automatic conversion subject to user over-ride, from standard text into suitable processed texts; 35 such conversion being either (a) directly from standard text which is correctly spelled, or (b) from resolved or enriched text, as used to prepare the processed texts for other purely-alphabetic displays.
41 The system of Claim 40 with the following additional features: a relevant homonym filter or filters, at least one of which uses an expanded check-list which includes and identifies not only those homonyms relevant to alphabetic displays but also those relevant to pictographic or logographic display-options, or to syllabary display-options, or to both; an appropriate homonym parser or parsers to determine the meanings of any such relevant homonyms; means for receiving back from the said homonym parser or parsers a determination of the probable meaning for each relevant homonym; means for displaying said determination of relevant homonyms and for seeking possible user over-ride; means for accepting such user over-ride; means for producing a kind of phonetically and graphically enriched text which combines the information resulting from the homonym parser's or parsers' determinations, and from the user's override if offered, with any necessary additional phonetic or graphic information derived from the pronouncing-dictionary or logographic dictionary data-bases. means for storing such enriched text; converters for converting such enriched text into varieties of processed text, and for activating the corresponding display-options.
42 The system of Claim 40 or 41 with the additional provision of: means for storing such enriched or processed text, and for distributing it, as appropriate, to other users; means for receiving same from other users or suppliers.
43 . The system of any one of Claims 40 - 42, with the additional provision of: a pictographic-and-logographic-to-alphabetic dictionary-style data-base or data-bases in which the pictographs or logographs of a given writing-system are stored and accessed, and from which their equivalents in alphabetic writing-systems can be determined; an algorithm or program or sub-program for automatic conversion or re-conversion of pictographic or logographic text into alphabetic text; means responsive for the display and storage of the resulting alphabetic text.
44. A method for using the system of any one of Claims 40 - 43, with the following modified or additional procedures: upon seeking and receiving the user's selection of which display-option is to be used: such display-option being a non-alphabetic or substantially non-alphabetic one which requires the resolution of ambiguities caused by homonyms; producing resolved text by either: (a) passing the enriched text, in which the pronunciations and thus the meanings of non-homophonous homonyms have already been resolved, to a relevant homonym filter that identifies those homophonous homonyms capable of affecting the logographic or partly logographic display-options, then using a homonym parser to resolve those remaining relevant homonyms; or (b) Passing the text, in its standard alphabetic form, to a different relevant homonym filter which identifies all relevant homonyms, including non-homophonous homonyms, and then using a homonym parser to resolve all relevant homonyms; adding to the resulting resolved text any additional phonetic or graphic information which may be required for a given display-option; storing the resulting resolved text (which may also be enriched text); subsequently converting the resulting resolved or enriched text into processed text; storing the processed text.
45 The method of Claim 44, with the subsequent further steps of: referring the resolved text to a sub-program or sub-algorithm of the relevant converter which makes its own characteristic selection as to which words of the text are to be converted to pictographic or logographic form; referring the selected words of the resolved text to an alphabetic-to-pictographic-or-logographic dictionary-style data-base in which logographs or pictographs are provided for each alphabetic word- shape and for the separate meanings of some homonyms; determining from the said data-base the required visual form or the identificatory code of each pictographic or logographic word-shape that is needed; combining and assembling such information, and storing the resulting processed text; creating, in combination with the system's means of display, the required display in which some or all of the words of the text appear in non-alphabetic word-shapes.
46 The method of Claim 45, with the further step of: standing by to repeat the same process, unless otherwise requested by the user, on any amendments or additions which the user may make to the text, and to then update the latest stored versions of resolved, enriched and processed text.
47 The method of Claim 46, with the further step of: standing by to re-convert such amended processed text, if requested, to standard text in alphabetic form via either a reverse-use of the said alphabetic-to-pictographic-or-logographic dictionary-style database or else via the direct use of a pictographic-or-logographic-to-alphabetic dictionary-style data-base.
48 A teaching method, being the methods of Claim 36, with the addition of a specific text-morphing method whereby: the letters of a word or morpheme or phrase (including any diacritical marks) are displaced from the horizontal and joined (or approached) by connecting lines or connecting curves which are either in the shape or will later be text-morphed into the shape of the corresponding pictograph or logograph; and the alphabetic letters subsequently disappear or else merge into, the shape of the corresponding 5 pictograph or logograph.
49 The teaching method of claim 48, with the additional step that as the alphabetic letters disappear or merge into the corresponding pictograph or logograph, they contribute some lines or curves to its emerging shape.
10
50 The teaching method of claim 48, with the modification that either: the direction of text-morphing is reversed, so that the alphabetic letters appear out of the logograph, thus facilitating the user's conversion of pictographic or logographic reading skills into alphabetic reading skills; or 15 the display, on request, flickers, slow-flickers or interleaves between the two word-shapes in such a way as to facilitate learning in either direction or especially in the direction of alphabetic reading skills.
51 The method of Claim 48 with a variation to its specific text-morphing procedure whereby: the lines and curves that join or approach the letters do not text-morph directly towards a required
20 traditional logograph, but first pass through a special kind of intermediate shape; said special kind of intermediate shape having been selected for its possessing the following two qualities:
(a) it is either a recognisable pictograph for the meaning of a traditional logograph or it can be readily associated with a traditional logograph via a phonetic pun or rebus, or some other logical or symbolic
25 connection;
(b) it either shows a point or points of visual similarity to a traditional logograph sufficient to facilitate the text-morphing process between it and the logograph, or it shows points of visual similarity to many or all of such traditional logographs as share its phonemic pronunciation.
30 52 The teaching method of Claim 51 , with the additional step of using various colors or various types of line for the lines and curves that make up the word-shapes or pictographs that are being text- morphed, in such a way that each carries within itself the beginnings of several different color-coded or line-coded images corresponding to such various traditional logographs as share the same phonemic pronunciation.
35
53 A teaching method for logographic writing systems, in which the method of Claim 52 is modified in that: the said special intermediate shapes are designed to be immediately intelligible to the learner without their needing to be approached via alphabetic display-option(s).
54 A teaching method for native-speaking children or learners in languages that have (a) limited numbers of syllables, (b) mainly monosyllabic words or morphemes and (c) a traditional logographic
5 writing-system that is partly arbitrary: wherein the method is the method of Claim 52, with the modifications that: the intended path for the learner from alphabetic word to conventional logographic shape, is via a phonetic syllabary, and hence an additional display-option or display-options are provided which offer a syllabary-based writing- 10 system for that language; in the said syllabary-based writing-system each syllable's symbol has been chosen for both:
(a) its visual similarity or other clear connection to some thing or object suggested by one of that syllable's meanings in the given language, and
(b) its visual potential to be text-morphed in the direction of some or several or all of the most common 15 traditional logographs which share the phonemic pronunciation of that syllable.
55 The teaching method of Claim 54, with the modification that the alphabetic display-options are disregarded or omitted, and the path for the learner is direct from the syllabary to the conventional logographs.
20
56 The teaching method of Claim 55 with the further step that: the part or parts of a syllabary symbol that is/are most closely related to a specific traditional logograph may be marked by a cartouche.
25 57 The method of Claim 56 with the further steps that said cartouche either: contributes part of its own shape or line to the visual emergence of the required traditional logograph; or indicates by its shape or color or line-quality, or other convenient visual attribute, the required category of logograph or of logograph-meaniπg which is to be associated with that portion of the syllabary
30 symbol.
58 A teaching method to help learners-convert their ability to read a traditional logographic writing- system into an ability to read either some other (unspecified) alternative pictographic or logographic writing-system, or a a phonetic syllabary; the method being the method of Claim 55, but with the direc- 35 tion of text-morphing reversed, and with the use of alternating display, and interleaving to facilitate the learner's transfer of reading skills from the old to the new writing-system. 59 A writing-system suitable for display-options and for color-screens and color printers, having the following characteristics; a set of at least some hundreds of pictographs is provided and allocated so that there is a separate pictograph for each main area of semantic meaning that is found in one or more natural languages; the pictographs are designed to be as self-evident or as near to self-evident as possible; the pictographs, designed to be produced electronically rather than drawn by hand, may include ones that are stylised and simple but also includes some others that use details as fine and colors as numerous as the display-system or printing system can conveniently and clearly produce; some pictographs are produced by the combination of two or more simpler pictographs; as well as the pictographs, some more abstract, logical or symbolic logographs are used; the meanings of the pictographs are modified, and multiplied, where appropriate, by the semi-algebraic use of diacritical marks or of other convenient common visual markers standing for broad semantic categories, such as largeness or smallness or absence of a quality or the opposite of a quality; common visual markers are provided, where appropriate, to indicate grammar and to distinguish parts of speech; similar indicators are provided, if appropriate in a given natural language, to indicate which words are connected with which, and thus how a given sentence is to be construed; the order of pictographs may follow the word-order of a given natural language or of an original standard text, but it may also follow a conventional order that is a feature of the writing-system, or a conventional order that is designed to be internationally comprehensible.
60 The writing-system of 59 with the following modifications to enable it to be used as the written form of a given sign-language: the areas of meaning covered by individual pictographs are so arranged that there is a pictograph to correspond to the area of meaning of each common hand-sign and, where relevant, of some common combinations of hand-signs; an alternative display-option (and writing system) is offered in which the pictographs are replaced by images of or stylised representations of or coded representations of the corresponding hand-signs; the user or learner can associate these two display-options or writing-systems with each other and with standard text by alternating display, interleaving, or text-morphing between any two or between all three of them.
61 The writing-system of 59 with the following modifications to enable it to be used as the written form of a given simplified natural language that uses a restricted vocabulary: the areas of meaning covered by individual pictographs are so arranged that there is a pictograph to correspond to the area of meaning of each major item of vocabulary; the user or learner can associate this display-option or writing-system with that of standard text by alternating display, interleaving, or text-morphing between the two.
62 The writing-system of 59, with the following modifications to enable it to represent specific words in a specific natural language: most pictograph have attached to them at least one letter, being normally the first letter of the word in question; where there are two or more words within the pictograph's area of meaning that start with the same letter, the final letter of a word can be used instead, or as well; or two or more letters may be used at either or both ends; final letters can be distinguished from initial letters, whether by their position adjacent to the pictograph or by some other common visual marker; the result being to create a set of new compound logographs, whose general areas of meaning are indicated by their pictographic elements; each new compound logograph precisely indicating either one word, or one word-element, or one short connected phrase, in a given natural language.
63 A teaching or self-teaching method, being the writing-system or display-option of 62, when either interleaved with standard-text, or with the provision of a user-controlled option of alternating display or text-morphing between the two writing-systems or display-options, thus assisting the user who knows one of these systems to learn the other.
64 A writing-system for languages in which the number of homophonous homonyms creates levels of ambiguity that are an obstacle to the use of alphabetic standard text, said writing-system having the following characteristics: it is primarily alphabetic; a range of symbols or diacritics or colors or line-qualities or other convenient common visual markers is provided; each homophonous homonym, when it bears a given one of its meanings, is consistently associated with a given common visual marker (or, for one of its meanings only, with the lack of such a marker), thus forming part of a set of new logographs that correspond to each of the main meanings of that homonym; the number of such common visual markers is relatively limited, but sufficient in most cases to indicate the intended area of meaning: for instance, whether abstract, concrete, living, non-living, man-made, noun, verb, adjective, or any other broad but useful category; where no exact category of common visual marker is available that is suitable to resolve a given ambiguity, special wildcard common visual markers may be used, or an ordinary common visual marker may be used irregularly; the use of such new logographs creating a form of semantically enriched text that permits of unambiguous reconversion into an original alphabetic form, and also into a range of display-options, including traditional logographs (where these already exist for a given language).
5 65 A writing-system that reduces or removes the ambiguities caused by homophonous homonyms in the standard alphabetic text of some languages, or in the standard syllabary text, and having the following characteristics: standard letters and conventional spellings, or standard syllabary symbols, are retained; each ambiguous word-shape (homonym) is paired, on each occurrence, with a guide-word which is a 10 synonym or near-synonym of its meaning on that occurrence; the guide-word is not pronounced, but serves as a silent guide to the area of meaning intended; the guide-word is represented, either in full or in abbreviated form, close to the homonym; the guide-words are distinguished from the main text by any convenient common visual marker.
15 66 The method of claim 65 with the further modification that the visual salience of the guide-words can be modified, even to the point of making them invisible in a display-option or in a print-out.
67 The method of 65 or of 66, with the following modifications: a reduced set of guide-words is used, sufficient to offer general clues to the intended area of meaning 20 for a given occurrence of a given homonym: for instance whether abstract, concrete, living, non-living, man-made, noun, verb, adjective, or any other broad but useful category; where no exact guide-word is available to resolve a given ambiguity, a special wildcard guide-word may be used, or an ordinary guide-word may be used irregularly.
25 68 The method of claim 64, with the following modifications: most or all of the common visual markers which are used for resolving the ambiguities of homonyms stand for members of a reduced set of guide-words; the markers may be in symbolic, diacritical, pictographic or other non-alphabetic forms, as well as in alphabetic forms.
30
69 The methods or writing-systems of any one of claims 62 - 66 with the following additions: when a text is stored in electronic form, each member of any of the resulting sets of new logographs that represent resolved homonyms, is stored via a unique code; thus producing a form of electronic resolved text which has the following qualities: 35 such text can be automatically converted or reconverted, without ambiguity, into various display- options, including standard alphabetic text; where the homonyms whose resolution is marked in this method of storage include the relevant homonyms for conversion to or reconversion from the traditional logographs of that language, such conversion can be automatically carried out by an appropriate converter that has access to an appropriate dictionary-style data-base; where the homonyms whose resolution is marked in this method of storage include the relevant homonyms for conversion or reconversion to a set of pictographs that is largely or wholly comprehensible across language barriers, such conversion can be automatically carried out by an appropriate converter that has access to an appropriate dictionary-style data-base; if it should be the case that the homonyms whose resolution is marked in this method of storage include all those ambiguities and multiple meanings that require to be resolved to make possible automatic translation by a given translation program into another natural language or languages or into an international semantic code, then such automatic translation into that language or languages or semantic code is possible.
70 A method or system according to any preceding claim, with the additional feature or step of a guess-promoting interactive teaching-display, designed for those who are either learning to read, or else are learning to read more fluently in an unfamiliar writing system, whereby: an acoustic rendering of a small part of the text is provided just after its word-shapes (in whatever writing-system, or in two or more writing-systems) are made visible; the learner being thereby prompted to guess at the word before it is spoken.
71 The method of Claim 51 with the additional step of providing a guess-promoting interactive teaching-display whereby: an acoustic rendering of a small part of the text is provided just after its word-shapes (in whatever writing-system, or in two or more writing-systems) are made visible; the learner being thereby prompted to guess at the word before it is spoken.
72 The method or teaching method of Claim 70, with further steps whereby: a word-shape is displayed in the writing-system which is to be learned; the learner is invited to say the word aloud or to give some other sign of recognising it; if the learner does not do so within a specified time, the program begins alternating text display, or interleaving or text-morphing the word to show how it appears in other writing-systems or display- options that offer clearer phonetic or other clues, until either the learner guesses it or the system provides an acoustic rendering of the word; the word-shape then returns or text-morphs back to the shape which the learner was originally unable to guess.
73 The method of claim 72, but with the said clues taking acoustic as well as visual form, as for instance by the system acoustically producing part of the word or phrase to be guessed.
74 A method or systems according to any preceding method claim, with the additional feature or step of using appropriate unspecified means to read aloud or to offer a spoken performance or demonstration of the pronunciation of part or all of a text.
75 The system of any preceding system claim, with provision of the following additional feature: a word-order algorithm or program which alters the word-order or the order of the word-shapes in some or all of the pictographic or logographic display-options, thus producing an order different to that of the standard text; such altered word-order being either: more satisfactory for swift reading or reliable comprehension in such a writing-system, or more international: meaning more comprehensible to poor-speakers or to non-speakers of a given language, when said word-order is used with such a writing-system.
76 The system of Claim 75, with the following additional feature: an unspecified translation algorithm or program is provided as alternative to the use of dictionary-style data-bases in producing processed text; said unspecified translation algorithm or program resembles those used for translation between natural languages in that it is context-sensitive and grammar-sensitive; it translates standard text to processed text according to the overall meaning, rather than strictly word for word, or word-shape to word-shape; it is capable of replacing or over-ruling or co-operating with a word-order algorithm or program as described in Claim 75; it thus translates alphabetic standard text in a given language not merely into a logographic or partly logographic writing-system but into a variant of the given language; said variant of the given language having as a distinctive feature the use of a word-order that is less idiomatic or more suitable for comprehension by speakers of other languages than is the word-order of standard text; the resulting writing-system, when highly pictographic or based on symbols with internationally recognised meanings, being capable of being understood by readers who speak different languages, and who may be unaware of the language of the original author or of an original standard text which has been converted into this writing-system.
PCT/AU2000/000286 1999-04-05 2000-04-05 Text processing and display methods and systems WO2000060560A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU35442/00A AU780472B2 (en) 1999-04-05 2000-04-05 Text processing and display methods and systems
GB0124973A GB2364160A (en) 1999-04-05 2000-04-05 Text processing and display methods and systems

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AUPP9604A AUPP960499A0 (en) 1999-04-05 1999-04-05 Text processing and displaying methods and systems
AUPP9604 1999-04-05

Publications (1)

Publication Number Publication Date
WO2000060560A1 true WO2000060560A1 (en) 2000-10-12

Family

ID=3813802

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2000/000286 WO2000060560A1 (en) 1999-04-05 2000-04-05 Text processing and display methods and systems

Country Status (3)

Country Link
AU (1) AUPP960499A0 (en)
GB (1) GB2364160A (en)
WO (1) WO2000060560A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002050799A2 (en) * 2000-12-18 2002-06-27 Digispeech Marketing Ltd. Context-responsive spoken language instruction
WO2004053725A1 (en) * 2002-12-10 2004-06-24 International Business Machines Corporation Multimodal speech-to-speech language translation and display
EP1727053A3 (en) * 2005-05-27 2007-09-05 Dybuster AG Method and system for spatial, appearance and acoustic coding of words and sentences
US8275620B2 (en) 2009-06-11 2012-09-25 Microsoft Corporation Context-relevant images
US8672682B2 (en) 2006-09-28 2014-03-18 Howard A. Engelsen Conversion of alphabetic words into a plurality of independent spellings
KR20180023864A (en) * 2016-08-26 2018-03-07 스타십벤딩머신 주식회사 Apparatus and method for creating image contents
CN108780439A (en) * 2016-03-08 2018-11-09 威兹瑞德有限责任公司 For system and method abundant in content and for instructing reading and realizing understanding
CN110853116A (en) * 2019-10-30 2020-02-28 天津大学 Saliency-enhanced line drawing automatic generation method
US10657327B2 (en) 2017-08-01 2020-05-19 International Business Machines Corporation Dynamic homophone/synonym identification and replacement for natural language processing
CN111242114A (en) * 2020-01-08 2020-06-05 腾讯科技(深圳)有限公司 Character recognition method and device
CN111260965A (en) * 2020-01-17 2020-06-09 宇龙计算机通信科技(深圳)有限公司 Word stock generation method and related device
WO2023087051A1 (en) * 2021-11-16 2023-05-25 Christopher Colin Stephen Method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4007548A (en) * 1975-01-31 1977-02-15 Kathryn Frances Cytanovich Method of teaching reading
WO1981001478A1 (en) * 1979-11-16 1981-05-28 M Sakai Teaching aid:phoneti-peuter mobile
US4713008A (en) * 1986-09-09 1987-12-15 Stocker Elizabeth M Method and means for teaching a set of sound symbols through the unique device of phonetic phenomena
US5366377A (en) * 1993-07-30 1994-11-22 Miller Edward R Method of manufacturing reading materials to improve reading skills

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4007548A (en) * 1975-01-31 1977-02-15 Kathryn Frances Cytanovich Method of teaching reading
WO1981001478A1 (en) * 1979-11-16 1981-05-28 M Sakai Teaching aid:phoneti-peuter mobile
US4713008A (en) * 1986-09-09 1987-12-15 Stocker Elizabeth M Method and means for teaching a set of sound symbols through the unique device of phonetic phenomena
US5366377A (en) * 1993-07-30 1994-11-22 Miller Edward R Method of manufacturing reading materials to improve reading skills

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002050799A2 (en) * 2000-12-18 2002-06-27 Digispeech Marketing Ltd. Context-responsive spoken language instruction
WO2002050799A3 (en) * 2000-12-18 2003-01-23 Digispeech Marketing Ltd Context-responsive spoken language instruction
WO2004053725A1 (en) * 2002-12-10 2004-06-24 International Business Machines Corporation Multimodal speech-to-speech language translation and display
EP1727053A3 (en) * 2005-05-27 2007-09-05 Dybuster AG Method and system for spatial, appearance and acoustic coding of words and sentences
US7607918B2 (en) 2005-05-27 2009-10-27 Dybuster Ag Method and system for spatial, appearance and acoustic coding of words and sentences
US8672682B2 (en) 2006-09-28 2014-03-18 Howard A. Engelsen Conversion of alphabetic words into a plurality of independent spellings
US8275620B2 (en) 2009-06-11 2012-09-25 Microsoft Corporation Context-relevant images
CN108780439A (en) * 2016-03-08 2018-11-09 威兹瑞德有限责任公司 For system and method abundant in content and for instructing reading and realizing understanding
KR20180023864A (en) * 2016-08-26 2018-03-07 스타십벤딩머신 주식회사 Apparatus and method for creating image contents
KR102037179B1 (en) 2016-08-26 2019-10-28 스타십벤딩머신 주식회사 Apparatus and method for creating image contents
US10657327B2 (en) 2017-08-01 2020-05-19 International Business Machines Corporation Dynamic homophone/synonym identification and replacement for natural language processing
CN110853116A (en) * 2019-10-30 2020-02-28 天津大学 Saliency-enhanced line drawing automatic generation method
CN110853116B (en) * 2019-10-30 2023-08-29 天津大学 Automatic generation method of saliency-enhanced line drawing
CN111242114A (en) * 2020-01-08 2020-06-05 腾讯科技(深圳)有限公司 Character recognition method and device
CN111242114B (en) * 2020-01-08 2023-04-07 腾讯科技(深圳)有限公司 Character recognition method and device
CN111260965A (en) * 2020-01-17 2020-06-09 宇龙计算机通信科技(深圳)有限公司 Word stock generation method and related device
WO2023087051A1 (en) * 2021-11-16 2023-05-25 Christopher Colin Stephen Method and system

Also Published As

Publication number Publication date
AUPP960499A0 (en) 1999-04-29
GB2364160A (en) 2002-01-16
GB0124973D0 (en) 2001-12-05

Similar Documents

Publication Publication Date Title
Habash Introduction to Arabic natural language processing
Cook The English writing system
McGuinness Early reading instruction: What science really tells us about how to teach reading
US6292768B1 (en) Method for converting non-phonetic characters into surrogate words for inputting into a computer
Mattingly Linguistic awareness and orthographic form
Cook et al. An introduction to researching second language writing systems
Taouka et al. The cognitive processes involved in learning to read in Arabic
KR20070024498A (en) A method for teaching a language
Winer Orthographic standardization for Trinidad and Tobago: Linguistic and sociopolitical considerations in an English Creole community
Kessler et al. Writing systems: Their properties and implications for reading
WO2000060560A1 (en) Text processing and display methods and systems
Miller Children’s early understanding of writing and language: The impact of characters and alphabetic orthographies
RU2470354C2 (en) Method of studying system of writing chinese characters and based on chinese characters writing systems of other languages
KR20060111602A (en) Language phonetic system and method thereof
Yule The design of spelling to match needs and abilities
CN110716654B (en) Chinese character input method, voice synthesis method and Chinese character input system
AU780472B2 (en) Text processing and display methods and systems
Odinye Phonology of mandarin chinese: a comparison of Pinyin and IPA
Goodman The process of reading in non-alphabetic languages: An introduction
Perfetti et al. 11 Writing Systems and Global Literacy Development
AU2022228148B2 (en) Method and system
Leong Orthographic and psycholinguistic considerations in developing literacy in Chinese
KR20000053095A (en) Method for converting non-phonetic characters into surrogate words for inputting into a computer
US20160063886A1 (en) Color Reading and Language Teaching Method
KR20080049606A (en) Learning method of phonetics

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 35442/00

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 09937639

Country of ref document: US

ENP Entry into the national phase

Ref country code: GB

Ref document number: 200124973

Kind code of ref document: A

Format of ref document f/p: F

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWG Wipo information: grant in national office

Ref document number: 35442/00

Country of ref document: AU