MXPA99001290A - Text processor - Google Patents

Text processor

Info

Publication number
MXPA99001290A
MXPA99001290A MXPA/A/1999/001290A MX9901290A MXPA99001290A MX PA99001290 A MXPA99001290 A MX PA99001290A MX 9901290 A MX9901290 A MX 9901290A MX PA99001290 A MXPA99001290 A MX PA99001290A
Authority
MX
Mexico
Prior art keywords
text
presentation
rules
language
enhancing
Prior art date
Application number
MXPA/A/1999/001290A
Other languages
Spanish (es)
Inventor
C Walker Randall
Original Assignee
C Walker Randall
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by C Walker Randall filed Critical C Walker Randall
Publication of MXPA99001290A publication Critical patent/MXPA99001290A/en

Links

Abstract

A text enhancement method and apparatus for the presentation of text for improved human reading. The method includes extracting text specific attributes from machine readable text and varying the text presentation in accordance with the attributes. The preferred embodiment includes extracting parts of speech and punctuation from a sentence, applying folding rules which use the parts of speech to determine folding points, using the folding points to divide the sentence into text segments, applying horizontal displacement rules (111) to determine horizontal displacement for the text segments and presenting the text segments (113) each on a new line having the determined horizontal displacement. Another embodiment displays text color based on parts of speech.

Description

TEXT PROCESSOR FIELD OF THE INVENTION The present invention relates to the presentation of computer-aided text. More specifically, the invention relates to improving the presentation of texts using specific attributes of content to improve the reading experience.
BACKGROUND OF THE INVENTION The purpose of reading is to make an impact on the mind of the reader. This is true whether the text read is a novel or a display of aircraft headers. The material presented in a non-textual medium conveys very adequate information to the human absorption beyond the corresponding text, of a dimension. The still images present object attributes such as colors, relative sizes, relative places, patterns, groupings, and hierarchies. An object is seen in a context. For example, an image of a small object located with others in a corner conveys different information than the same object, enlarged, alone, and in the center. The opening sentences and the paragraphs add contextual information through their single placement. Moving images have movement and temporal change as added attributes. Much information reaches the brain visually and through pattern recognition. Other information arrives as audio information, transmitting information through the tone, changing the inflection, and changing the volume level. The computer presentation of the text for human reading has failed to use much of the human brain. As a result, only a part of the possible bandwidth is used, and it is thought that the presentation of text on the computer is unfriendly, constant, mechanical and in some way deficient. When given the option, many people prefer to read a printed book instead of a book on a computer screen. The presentation of text on computer is currently bad in relation to a book, and even more bad in relation to its potential. Some work has been done in the presentation of computerized text. Huanng (U.S. Patent No. 4,880,385) discloses an opto-mechanical speed reading device that allows the text printed on paper to be viewed through a display window one line at a time, automatically advancing the lines. Advances have been made in the methods for the computerized syntactic analysis of natural language in parts of the language. Schabes et al. (U.S. Patent Number: 5,475,588), describes an improved parsing system for creating parsing trees. Black, Jr. et al. (U.S. Patent Number: 5,331,556) discloses a method for storing portions of speech information in a file along with the original text for subsequent enhanced text search. Okamoto et al., (U.S. Patent Number: 4,661,924), and Church (U.S. Patent Number: 5,146,405) describe methods for removing ambiguity from multiple parts of the language. Zamora et al. (U.S. Patent Number: 4,887,212), van Vliembergen (U.S. Patent Number: 5,068,789), Hemphill et al. (U.S. Patent Number: 5,083,268) describe methods for analyzing grammatically text in natural language. All of the above cited patents are incorporated herein by reference. Reading is a complex process and there are many factors that determine differences in reading performance between readers and even for the same reader on different occasions. These include innate neurophysiological conditions such as dyslexia, but also age; behavioral and motivational factors, - level of education and previous reading practice; and perceptual restrictions. A reader may also have different reading objectives that will affect how he approaches the reading material. Despite all the above, the text is presented by computers, at worst, as "accounts in a thread" of a dimension, at best, similar to the two-dimensional text presented by the books. Undoubtedly, whether the text is presented in print or on an electronic display matters little because the presentation of the text is essentially identical. The examples of the constant presentation of text are numerous. Topical sentences are presented the same as other sentences in a paragraph. Technical words are presented without distinction of non-technical words. Adjectives appear the same as nouns. The educationally difficult terms are presented the same as the simple terms. The last paragraph in a chapter is presented the same as the first paragraph. The text is presented with justification left, broken lines of constant width that require an eye movement twisting it from the extreme right to the extreme left at regular intervals. The text is broken in half sentence, half sentence, half idea, applying old typography rules. These texts force the eye to travel back and forth over long distances, imitating a typewriter car. The text advances manually, broken into pieces determined by how many lines can fit on the screen. There are several good reasons for not placing only one sentence enhanced per page in a book. These reasons may have been inadequately brought into the visual display on a computer of a text. The possibility of modifying the presentation of: text to highlight the ability of the reader to read text reinforcing the power of digital manipulation seems to have been underestimated by the more knowledgeable of computing.
COMPENDIUM OF THE INVENTION The present invention is directed to a text enhancement method and apparatus for the presentation of text for improved human reading. The invention includes extracting specific attributes from machine-readable text, and creating a three-dimensional visual product (time and reading area) to enhance the reading experience. The preferred embodiment of the invention extracts attributes such as parts of the language of an incoming sentence and displays that sentence in text segments cascaded downward and across the screen. Segmentation and horizontal phase shifting are determined by applying rules that use the language parts, punctuation, and reader preferences. The color of the text and the background can also vary depending on the parts of the language and the position of the sentences within the paragraphs and paragraphs within the chapters. The invention adds important visual attributes to enhance the presentation of text displayed on the computer over the constant and mechanical textual visual display of current systems. An important visual indication is created that is related to the content of the new phrase and its relation to the previous sentence. This indication is available before the text is read, giving context to the text. The processing of sentence parsing uses the punctuation and content of the sentence to create a system of important visual cues that include different phrases that promote faster recognition of the words in a sentence and their meanings. The enhanced sentence cascades on the page in a pattern of meaningful phrases determined by the content of the text and the reader's preferences, moving the eyes a short distance from sentence to sentence. A sentence represents a complete thought, and a paragraph represents a discrete theme or argument. Reading comprehension improves when a sentence appears at a time and when the transition from one paragraph to another is visually indicated and includes a pause for the appearance of these elements of the text. The invention creates visual attributes suitable for providing these visual cues. Visual attributes can include text segmentation, horizontal offset of one line in relation to another, text and background color, text brightness, and animation. Animation can include blinking and bumping, progressive, time-dependent, time-dependent illumination of text elements, and movement of text from a standard presentation to cascading presentation. In an exemplary reading session, the reader selects text to read, and informs the reader of the type of text, for example a novel. The reading system recovers the appropriate environment previously stored for that reader that reads that type of selected text. The reader has the opportunity to edit the reading rules and word sets but declines. Beginning at the beginning of a chapter, where, where the reader left off, the text is presented, a prayer to the v € .z, in cascade and through the screen. As the displayed lines are broken into significant text segments instead of arbitrarily terminating at 80 characters, it is possible to read and understand a whole segment at a time, with the resulting movement of the eye from the center of the text segment, down and through from the screen. The color of the background of the sentence is a function of the position of the sentence within the paragraph and the position of the paragraph within the chapter. Each of the sentences in a paragraph may have the same nuance, but has incremental changes, but discernible in the saturation of color or darkness between sentences, and incremental changes, but discernible in the nuance between paragraphs. The color of the background thus provides non-literal position information, pointing to the beginning and end of a paragraph, and the beginning and end of a chapter. The sentences are segmented according to rules approved by the reader. Each preposition unfolds a phrase in a predictable way, resulting in two text segments, each one being on a new line. The content of the line above effects the phase shift of the line below. The same preposition unfolds and outlines the text horizontally in the same way whenever possible. As a result, the same preposition causes a similar pattern, and already familiar with the reader for some time. Parts of the language affect the color of the text according to rules approved by the reader. As a result, both the structure of the sentence and the parts of the language are immediately recognizable from the recognition of patterns and colors, even before a word of text is recognized, providing visual, not literal, indications to the reader. Sentences advance at a speed using a rule previously selected by the reader. The formula is a function of the type of text, the number of words, the educational level, and the number of syllables present in the line. The speed of advance is a little faster than the speed comfortable for the reader, to increase the understanding. A longer pause between sentence presentations marks a new paragraph before it appears. The reader is able to easily interact with the reading system, retaining difficult sentences on the screen longer, and passing the presentation faster or slower. The exemplary reading session described above is exemplary, non-limiting or exhaustive of the possible embodiments within the scope of the invention. The enhancement process exploits the fact that text elements have non-literal qualities that can be used to produce a text presentation product in time and space that is more meaningful and increases the reader's ability to understand the meaning literal of the text to a greater degree than the existing formats on computers or paper. Falling cascading phrases down and across the screen imparts a visual corollary of how the text could be experienced if read from a read aloud, without the reader having to subvocalize it. Reading a text enhanced like this is almost like listening to it with your eyes, but like a total visual package that is experienced almost simultaneously.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a flowchart of a method used in one embodiment of the invention; Figure 2 is a table illustrating various attributes of the elements in a sentence; Figure 3 contains an example of highlighted text that includes waterfall and right descent angles; Figure 4 presents the sentence of Figure 3 as enhanced text that includes cascade and right and left descent angles; Figure 5 illustrates step 110 in detail; Figure 6 illustrates sets of words that establish a hierarchy of secondary split points; Figure 7 illustrates locations of split points in a sentence; Figure 8 illustrates step 136 in detail; Figure 9 illustrates the sentence of Figure 7 presented as highlighted text; Figure 10 illustrates a Historical Sampling Map of Presentation Intervals, where the time needed to read is a function of difficulty and complexity; and Figure 11 illustrates the sentence in Figure 7 that shows the visual pronunciation of syllables with emphasis.
DETAILED DESCRIPTION OF THE PREFERRED MODALITIES Figure 1 shows a high level flow diagram of a method used in one embodiment of the invention.
The steps in Figure 1 and all these figures can be divided into additional steps or combined in fewer steps and still describe the same modality.
In step 102, the reading device is started, and passes the identification (ID) of the reader and the identification of the text file to be read. In a preferred embodiment, the identification of the reader is obtained from the operating system. In another modality the reader is prompted to give identification. In a modality, there is only one reader of the assumed device. Given the identification of the reader, the specified parameters of the reader are retrieved in step 104. The parameters are discussed in detail later. Given the identification of the text, the text file and the text-specific parameters are retrieved in step 104. In a preferred embodiment, the parameters have default values, and do not require reader input. In a more preferred embodiment, the more specific parameters of the reader are allowed to enter, the values are accepted, and the parameter values are stored as in step 108. In one mode, the user is allowed to call in memory menus that contain dialog boxes, see default parameters, and modify them using standard user interfaces that include numeric entry, text entry, sliders, and buttons. In a preferred embodiment, given convenient access permission, modification of global parameters that apply to all text readers is also allowed. The reader is allowed to set parameters only for the current text, all texts of the same type as the current text, or all texts read by this reader. The specific parameters of the reader include.-dimensions of the field of vision; color palette for background and text; entries of word sets for rules; minimum sentence length, - maximum line length, -minimum and maximum text segment weight, - descent angles; horizontal scroll rule entries; horizontal justification preferences; Interfrase hollow length; forward speed; proportions of intervals between sentences and between paragraphs; Definitions of Labeling; animation specifications; parameter definitions to derive content-dependent presentation intervals, - parameter definitions and weight to derive the subject of the sentence / text, weight, density, complexity, and content; and special event argument identities.
Dimensions of the Display Field The dimensions of the display field can specify the size of the window to display the text in the two dimensions X and Y and a third dimension, time. This field may include the full screen or a smaller portion of the screen. The time dimension can include the time over which time-dependent animations such as blinking are presented.
Color Palette The color palette for the background and text allows the specification of the preferred background color and text that vary with the text. In a preferred embodiment, the hue and brightness of the background color vary both with the position of the sentence within a paragraph, and with the position of the paragraph within a chapter. In a more preferred embodiment, the brightness is set to a level for the first sentence in a paragraph and to a second level for the last sentence in the paragraph, the brightness of each intermediate sentence of the first level progressing to the second level. In a preferred embodiment, the hue of the background is set at a first value for the first paragraph in a chapter and a second value for the last paragraph in a chapter by advancing the hue of each paragraph from the first value to the second value. In a more preferred embodiment, the first nuance is green and the second nuance is violet. In preferred modalities, the color of the text may vary with the categorical and continuous attributes of a word. A categorical attribute is a function of the category in which a word fits. In a more preferred embodiment, the categories include parts of the language, for example verb, and technical words. Continuous attributes include those that can be measured such as educational level, such as 2nd grade of secondary, number of syllables, and length of the word. In a preferred embodiment, different parts of the language have different text colors. In a more preferred embodiment, the color of the text varies with the color of the background to provide the optimal contrast preferred by the reader. Word sets can be used to specify some parts of the language. For example, "I" is a pronoun, "I have" is a verb, "an" is an article, and "weight" could be a noun, verb, or adjective. The parts of the language need only be probable parts of the language as with the word "weight", which could be a noun, verb, or adjective. An example of the parts of the language in a single sentence is shown in Figure 2. The multiple possible parts of the language column 150 is shown as well as column 152 of parts of the language to which the ambiguity is removed. Word sets are introduced into the preferred modalities using two methods. The first method uses a structured text format such as the Standard Generalized Mark-Up Language (SGML), allowing to import and export large sets of words and dictionaries from various sources. The SGML normal parsers are then used to store the word sets in a convenient database. The second method allows interactive editing of word sets using a hierarchy of menus called memory and dialog boxes. In a preferred embodiment, the word sets specified by the reader are stored in a database separately from the dictionary or glossary word sets, to allow separate, faster storage, editing and retrieval.
Minimum Text Segment Length and Maximum Line Length A preferred mode includes a minimum text segment length specified by the reader. A text segment is a piece of text that is presented in a line in a visual display. The minimum length of text segment, which can be called in words, is the minimum that the reader wants to see on a separate line. A maximum line length is the maximum length that the reader wants to see on a line. The lengths can be measured in number of characters, words, syllables, educational level, or any sum of products of these. In one mode, the minimum length of the line is ten characters and the maximum line length is thirty-five characters.
Phrase Weight The weight of the phrase is an attribute derived from a phrase (text segment or potential text segment) giving some measure of the amount of material in a phrase. In a modality, the weight of the sentence is simply the number of words in a sentence. In the preferred embodiment, the weight of the phrase includes phrase density and phrase complexity. The density of the phrase can include the number of technical words or number of words that exceed a certain grade level. The complexity of the phrase can include the number of spelling similarities between words in a phrase, number of ambiguous words, and total weight of words with weight specified by the reader.
Minimum and Maximum Text Segment Weight A preferred mode includes a minimum and maximum text segment weight specified by the reader. The weight of the text segment is the amount of text that the reader wants to see on a separate line. In the preferred embodiments, the weight of the text segment is the sum of the weights of the phrases within the text segment.
Justification Rules Horizontal The rules of horizontal justification specify the horizontal justification of a line of text in relation to the line above. The justification can include the type of justification for the line or phrase that is being placed, that is, right, left, or central justification. The justification can also include the portion of the text of the line above which justification is measured, that is, the entire line of text against or a phrase, the point of that portion used, for example, the most to the left , the most to the right, or the center. The horizontal justification in a modality is simply measured within the line that is being placed instead of in relation to the line above. In a preferred embodiment, the first sentence in a line is justified in the center, measured from the center of the last sentence in the line immediately above. In another modality, the entire line of the text is justified in the center below the center of the text line above. In yet another modality the text segment "center of gravity", calculated using the difficulty of each word, is used as the center of the segment for justification purposes.
Drop Angle An angle of descent is the amount of horizontal displacement for each new line, which modifies the horizontal position necessary for the horizontal justification rules only. By definition, each text segment is presented in a new line. In a preferred embodiment, the descent angle is specified in character units. The descent angle and the horizontal justification at least partially determine the "cascade of text" that descends and traverses the screen in the preferred modalities. An example of cascaded text is illustrated in Figures 3 and 4. An angle of descent can be zero, which means that, without further ado, the horizontal position of the text segment is determined only by the rules of horizontal justification. A descent angle can be left, where the bottom line will be offset to the left in relation to the top line, or right, when the text is offset to the right. Figure 3 illustrates only right descent angles. In one mode, the descent angle is a constant for each new line. In a preferred embodiment, the descent angle is a function of the weight of the text segment of the line above. In another preferred embodiment, horizontal justification rules need central justification below the center of each line immediately above, and the descent angle is calculated to present a substantially straight path, when all the text lines are presented, from the center of line to the center line, from the upper left to the lower right on the screen display surface. In a preferred embodiment, the entries of the descent angle rules include attributes of the text on the upline. In a preferred embodiment, the inputs include the reason for unfolding the top line, i.e., the primary split point, the secondary split point, or the collapse rule. In a preferred embodiment, a more positive descent angle is needed when the line immediately above is unfolded due to a primary split point instead of a secondary split point. In another preferred embodiment, the entries include the weight of the text segment of the current line and the line above. It is recognized that the horizontal justification rule may need left justification and measure horizontal displacement from the left margin, as well as a zero descent angle, combining it to result in justified text to the left in each line. It is also recognized that horizontal text placement can be carried out in numerous ways equivalent to the previous example. In particular, the calculations of the position of the text can be carried out first by justifying then by shifting, or by first shifting and then justifying with equivalent results.
Hole Length In a modality, the gaps are associated with split points whose location has been determined, but due to other rules, they remain on the same line and do not cause splitting. A gap of zero or more spaces is added after a split point where the split point has failed to cause the creation of a new line. In a preferred embodiment, the length of the gap is a parameter determined by the reader, where the zero gap length results in no gaps being created. The gaps allow a visual indication as the existence of phrases even where the phrases have not caused the formation of new line.
Advance Speeds Advance speeds specifically exhibit the duration times and time intervals between the presentation of one grouping of the text and the next. In a preferred embodiment, a sentence is presented on a screen. In a preferred embodiment, the display duration speed may be a function of the number of words appropriately weighted in the text, the educational level, pronunciation time, number of sentences, number of syllables, or sentence weight. Time intervals can include the interval between sentences and paragraphs. In a preferred embodiment, the time interval between sentences and paragraphs is different. In this way, the speed of constant arrival of the text in the current systems can be replaced with a speed depending on where the text lies and what is the content of the text.
Stations Stations are places on the surface of the visual display in which the text is displayed. The stations can be identified windows within which the text is presented or points where the presentation of the text begins. A preferred embodiment includes an active text station, where the text that is being read is presented. Another preferred embodiment includes a pre-reading station that displays the text that is about to be read and a post-read station that displays already read text. In one embodiment, the total text presented in all stations on the display surface is controlled by the expected reading or pronunciation time. In one mode, only about one minute of material is left on the screen at a time.
Animation Animation is the presentation of time-dependent text. Animation examples include blinking, dissolving, and striking, that is, illuminating successive portions of text at a specific speed. In a preferred embodiment, a modified displacement effect is carried out by stacking the text to be read in flat rows in a pre-reading station at the top of the screen, stacking the previously read rows in a post station. -Look at the bottom of the screen, and cascade active prayer in an active station through the middle part of the screen. In one embodiment, the pre-read or post-read text color and / or background color differ from the text that is currently being read.
Labeling Labeling includes designating whether a sentence will be labeled with a definition or drawing related to the word. In one preferred embodiment, a set of words specifies which words will be labeled with a definition. When a reader selects a tagged word, a definition, drawing or other electronic illustration of the word can be displayed. In a preferred embodiment, a tagged word, when selected, displays a pop-up window containing the definition or the drawing. Preferred methods of selection include using a mouse and the right slide button.
Reading Event A reading event includes the appearance and disappearance of a sentence and any additional time before the appearance of the next sentence. A reading event contains a series of reading moments. A moment of reading is the appearance of a frozen picture of the text for the eyes of the reader who tries to comprehend the text. For many readers, the reading moment will find the eye centered on a phrase, the sentence alone on a line, the reader comprehending the unique phrase.
Reading Speed The presentation interval is the duration of the presentation of a sentence on the screen. The presentation interval can be controlled by the reader or can be determined automatically. The control of the reader in a modality can be via mouse oppressions. In another embodiment, the control of the reader may be via the detection of eye movement or detection of brain wave change. Brain wave detection operates by generating a request for a new sentence when the brain waves corresponding to the reading activity change to the brain waves associated with having finished reading the sentence. The reading speed is read in text content per unit of time. These Text Content Units can be measured in units of length such as number of words, syllables, pronunciation time, or any other previously discussed measure of sentence length or weight or any sum of products thereof. Text content units can also be measured in units of complexity such as the ones mentioned above with respect to the 26th. This is the automatic presentation of the Hypertext in a first order of depth, following the first Hypertext link, ignoring cycles. Another example is a first broad presentation, presenting the first level of text, followed by any Hypertext link from the main level, followed by any hypertext link of the first level, continuing until the lowest depth is reached or the reader intervenes. Yet another example is to trim either the depth or width of the first presentation to include only portions that have certain keywords of interest.
Magic Reading Glass In one modality, the presentation of the text can begin by presenting common text, not formatted. When the reader wishes to see the text presented as enhanced text, the reader allocates a portion of the text for enhancement by placing an icon such as a "magic glass reading icon" over some part of the sentence. The following sentence and sentences are presented as enhanced text. This enhanced text presentation continues until the reader turns off the magic reading glass. This magic reading glass can serve as an alternative entry point into the enhanced text processor.
Complexity of the phrase or weight or any sum of products of these. The units of text content in a preferred embodiment are calculated for a presented sentence and are used to compare the actual reading speed in Te Content Units by time with display intervals controlled by the reader, and to set the interval of text presentation for automatically controlled text presentation intervals. In this way, the display interval may be dependent on the content, and may be caused to track the probably preferred display speed of the reader.
Special Events Special events are time-dependent events that occur during the reading event. Special events include a visual or audible indication that gives a time signal remaining. For example, when 75 percent of the content-dependent display interval has passed, a visual indication will appear on the screen. Special events can be accommodated through a special event script. Complex special events include a series of special events.
Visual pronunciation Visual pronunciation involves staggering in time the changes in color or brightness of individual phrases, words and syllables, though not necessarily all of these. One modality denotes emphasis on longer, more difficult words that require more time to pronounce them using pointing. Visual pronunciation is an example of a complex special event.
Calling from Memory to Cascade In a modality, the text is displayed in a pre-reading station during a percentage of the content-dependent presentation interval, then a text segment or line is presented in a cascade of sentence time. Calling the memory from the waterfall is an example of a complex special event.
Non-Linear Text Linear text is text presented and read from the beginning to the end. A book designed to be read from the beginning to the end is a common example of linear text. The non-linear text includes the presentation and reading of the text in a different order from the linear one. An example of non-linear text is the Hypertext, where certain words are presented in a way that indicates link, for example, the text is in angle brackets or in blue. The non-linear presentation can be either directed by the reader or automatic. An Example of Method In Figure 1, step 110, the text is pre-processed.
Step 110 is shown in detail in Figure 5. The text is parsed to identify paragraphs, sentences, words, and punctuation. Paragraphs can be identified by blank lines, paragraph markers, indentation characters, tabulator characters, or any other convenient feature in the text. Sentences can be identified using grammar rules that include periods, spaces, capitalization of first words, and abbreviations or lack of them. In a preferred mode that reads text that behaves well, a period, question mark, or exclamation point, either alone or followed by a period, followed by two spaces or end of paragraph, marks the end of a sentence. In step 124, the sentence is marked in words and punctuation. The emphasis specified by the original author, for example italics or underlining, is preserved in the preferred modalities. A standard lexicon tracker such as Lex (TRADEMARK) is used in a modality, where the end of a word is denoted in the grammar rules by a blank or punctuation. Another modality uses a lexical analyzer in handwriting. One mode stores formatting characters such as tabs and indentations as punctuation. The location of a word is preferably stored as an attribute of the word, to provide links to, and search within, the original work. A preferred embodiment also allows groups of words to be "stapled" together, and can be recognized as a group of words. In one modality, these group of words are recognized by the lexical tracker. In another modality, these words are recognized by a pre-processor that precedes the lexical tracker to ensure recognition as a phrase rather than merely individual words. Stapled words, for example, "Prince of Wales" would be recognized as a single phrase, and preferably would not be split by the preposition in two sentences displayed in two lines. In step 126, the words are consulted in dictionaries, glossaries and tables to determine the attributes of the word. The text is further processed to determine the categorical and continuous attributes. In a preferred embodiment the important categorical attributes include parts of the language and important continuous attributes include word location, educational level, pronunciation time, and number of syllables, location, sound, and vocal emphasis level. Identifying parts of the language with 100 percent accuracy would require extensive programming to determine the context of actual words in the text. This precision is not required to practice the invention, since errors are of less consequence because the reader is a human, not a machine. The possible parts of the language are determined by first consulting the word in a dictionary or glossary. This dictionary or glossary only needs to have the probable parts of the language for a word, not a definition. For example, the word "weight" could be a noun, verb or adjective. A preferred mode stores the parts of the language attribute that use a bitmap to preserve the multiple possible parts of the language. One mode explicitly stores an attribute of ambiguity, which indicates whether the word still has multiple possible parts of the language. Another modality uses the existence of more than a single possible part of the language as an indication of ambiguity. In a preferred embodiment, the language parts are in a dictionary by default and can be consulted. In a more preferred embodiment, a set of words may be added to invalidate or supplement the default set. In another modality, technical words are specified by sets of words entered by the user. Figure 6 illustrates nine sets of words that specify parts of the language arranged in a hierarchy. These sets of words and hierarchies are also used as input to the splitting rules, described later. A preferred embodiment verifies the sets of words illustrated in Figure 6, starting with Class 1, Subclass 1, and ending with Class 3, Subclass 1. The search ends as soon as a word or phrase is found. Given the limited vocabulary and static nature of the word sets in Figure 6, a more preferred embodiment uses a faster hand-written parser to look up words and phrases in Figure 6. Preferred embodiments include analyzers written in C, C ++, Perl, AWK compiled, AWK and AWK to C, C with regular expression functions, or any convenient language. A parser mode uses YACC. In one embodiment, the dictionary is a commercially available dictionary in electronic media such as compact disk read-only memory (CD-ROM). The standard dictionary is grammatically analyzed to determine word attributes such as parts of language and number of syllables. Since word definitions are not necessary in many modalities, it is possible to store numerous words with associated number of syllables and parts of the language. In a more preferred embodiment, the most commonly used and most recently used words are stored in a fast access memory such as a Random Access Memory (RAM). In modalities where the dictionaries have to be done by hand, a quick method is preferred which uses routing calculation, collision detection and cuvettes. In modalities where word sets are set before reading, the perfect routing calculation without buckets is preferred. In yet another modality, the pronunciation emphasis level is derived as an attribute that depends in part on the language part. In a more preferred embodiment, the emphasis on pronunciation is categorized as primary, secondary, and none. In one modality, the pronunciation time and the actual sound, for example, as found in a sound file, is also retrieved from the dictionary or glossary and stored as word attributes. In step 128, the specific word sets of the text and the reader are searched. In a preferred embodiment, the specified word sets of the reader are stored in a database separately from the word sets of the dictionary or glossary, to allow for separate storage and faster retrieval. In one modality, the sets of words specified by the reader before dictionaries are verified, and the dictionaries are only verified if the necessary words and attributes are not found in the sets of words specified by the reader. Preferred modalities use a hierarchy of databases to query words. In step 130, the ambiguity is removed from multiple parts of the language. In one modality, a microgram is used to determine the probable parts of the language. A microgram uses adjacent or nearby words to determine more precisely the most probable part of the language for a word. For example, the word "weight" in the phrase "a weight of" would probably be a noun, since it is preceded by an article and followed by a preposition. As another example, if a word could be a noun or verb, and the word was preceded by a pronoun, an auxiliary verb, then the word would probably be a verb. If the word "weight" were preceded by "I", the word would probably be a verb. In another modality, all ambiguity is removed by simply choosing the most statistically probable use of the word. In yet another mode the ambiguity is not automatically removed, only the ambiguity is manually removed using human editing. In a preferred embodiment, an ambiguity attribute is stored for each word, indicating whether there are multiple possible parts of the language still after the ambiguity has been removed. In yet another modality, the attribute of ambiguity is not stored but is derived from the existence of multiple possible parts of the language stored for a word. In one embodiment, the ambiguity of the visual display of text colors in strips or alternating with each part of the language is inferred. For example, if the verbs are orange and the adjectives are yellow, then a possible verb or adjective could have strips or text characters alternating yellow and orange. In one embodiment of the invention, the parts of the language are determined in part by the search for morphemes (root words) and attributes of part of the language are assigned based on terminations, for example, -mente, -ar, -ando In step 132, the determined attributes for words and phrases are stored, creating an "enriched sentence", which will probably remain unchanged among readers. For example, the educational level and part of the language of a word will remain unchanged for different readers, even though they may desire different lengths of text segments and presentation speeds. In one embodiment, the rich sentence is stored in persistent storage such as a file. In another embodiment, the rich text is stored in a read-only memory on compact disc. In a preferred embodiment, the enriched sentence is implemented as a linked list of nodes, each node having the word and phrase attributes described above including the position of the word in the original text. In Figure 5, step 134, the primary split points are determined by applying rules of primary split points. Split points are text division points located between letters. In a preferred embodiment, the split points are classified as primary and secondary. The primary split points are determined using primary split rules that determine the locations of the primary split points based on punctuation marks. Figure 7 illustrates a primary split point after the comma that follows "Africa." The primary split points divide the text into "Super-phrases". In a preferred embodiment, the primary split points are located in each comma, two points, semicolon, and parentheses, bracket and left key. The location of the split point can be stored as an attribute in a node in a linked list of nodes that make up the enriched sentence. The secondary split points are determined by applying secondary split point rules. In the preferred modalities, the secondary splitting points and rules are classified in a hierarchy and the secondary splitting rules accept parts of the language as entries. In a more preferred embodiment, the secondary splitting rules include as rules entries attributes of the text content of the segments and text phrases that are being processed. For example, a secondary split point may be required for a text segment that exceeds a preferred maximum text segment weight by a reader even when a maximum text segment length has not been reached. The continuous attributes such as the difficulty, density, complexity, power and time of pronunciation of the phrase can be used as inputs to a rule that modifies the range established by a table such as that of Figure 6, using parts of the language only for determine ranges of secondary splitting parts. For example, a text segment that has a weight greater than 35 percent above the text average would have a Class rank of 1 assigned regardless of the range that would otherwise be needed by the table in Figure 6. In a modality preferred, the weight or power of the phrase is used exclusively to determine the ranges of secondary split points, instead of only the parts of the language. In an alternative mode, splitting rules are needed to unfold based on the number of characters in the line, and parts of the language are displayed using colors corresponding to the language part of a word. The last modality may not offer the advantages of the waterfall, but it does offer indications of visual display based on the content of the text. Figure 6 illustrates a table used in a preferred embodiment for determining secondary split points. For example, prepositions determine Class 3, subclass 1, secondary split points. In Figure 7, there are secondary split points of class 3 before the prepositions "en" and "de". The secondary split points divide the Super-phrases into "Mini-phrases", as illustrated in Figure 7. The mini-phrases are related to text segments because the Mini-phrases are often the same as the text segments and frequently each one is displayed in a new line. However, in the Mini-phrases are identified, you can dictate rules that more than or less than one Mini-phrase appear as a text segment in a line. The primary splitting rules are applied first, followed by the secondary splitting rules, applied in order of the range of the splitting rule. An example of a secondary splitting rule range is shown in Figure 6, established because the range of the word sets gives rise to the secondary split points. Some preferred modalities use either sentence weight or power to determine the range of the secondary split point instead of just using the language parts. A more preferred embodiment allows the reader to enter a preference for parts of the language or weight / power determination of the range of the secondary split point. Some readers prefer structure-based text segmentation, while others prefer text segmentation based on complexity or estimated time to read a segment of text. In a preferred embodiment, secondary splitting rules are applied only until a limit is reached. This limit frequently the minimum length of the line. A method for determining the location of the secondary splitting points is shown in Figure 5, step 136, and in detail in Figure 8, step 136. In one embodiment, where the application of a secondary splitting rule to a Super- phrase would result in a length of Mini-phrase less than the specified minimum length of line, the splitting rule does not apply and no other splitting rules are applied to that Super-phrase. Conversely, when no point would otherwise exist on a line that exceeds the maximum line length, a collapse rule is applied, forcing the text to split into two lines. When no splitting rules have to be applied to all the Super-phrases, the splitting process is complete. In Figure 1, step 111, the parameters, attributes, and splitting rules can be used as input to the horizontal scroll rules. The horizontal scroll rules determine the horizontal location of the text segment. In a preferred embodiment, the horizontal scroll rules include both horizontal justification rules and descent angle rules. The horizontal displacement in this mode is the sum of the results of the horizontal justification rule and the descent angle rule. In an easy to implement mode, the horizontal scroll rule is simply the descent angle as applied to the justified text segment in the center. This mode does not use the splitting rule by ending the previous text segment as eIt introduces and provides minimal eye movement while reading the prayer cascade. Another mode adds left descent for the previous Class 1 split points, and right descent for the previous Class 3 split points. A preferred embodiment allows the reader the specific additional shift to the right or to the left for split points, including the values entered by the reader for primary split points, and each class and subclass of secondary split points. A modality stores the displacement added in a table in units of characters. With the determined horizontal displacement, presenting the remains of the text. In the example of Figure 9, Super-sentence 1, "I had a farm in Africa" is divided from Super-sentence 2, "at the foot of the Ngong Mountains.", By the primary split point after the coma. The Super-sentence 2 is divided into two Mini-sentences by the secondary split point before the "from" preposition. In the modality illustrated in Figure 9, two minifrases are presented as the same text segment since the mini-phrase "in Africa" is smaller than the minimum length specified by the reader. The first Mini-phrase, "in the foot", in a new line is justified centrally under the previous text segment and slides to the right through a right descent angle. The last Mini-phrase of "from the Ngong Mountains" is left slid due to a horizontal displacement rule that needs sliding to the left when the previous line ended with a secondary split point instead of a primary one. Thus, in the example of Figure 9, the total horizontal displacement is determined by a combination of the descent angle and the justification of the line. Paired punctuation marks include parentheses, brackets, braces, and citation marks, serve as punctuation marks that determine primary split score locations in a preferred embodiment. In a modality, this paired score is included as an entry to horizontal scrolling rules, including horizontal justification and descent angle rules. For example, a multi-sentence long text segment contained within the parentheses could have reduced horizontal changes and reduced vertical displacement, that is, less line-to-line space than the other text in the active display area of the display surface.
In step 112, the highlighted text is created. In this step, the necessary codes are created to create a properly displayed text segment. For example, where the specifications of the reader require coloring technical words in red, and the rich text indicates that a word is a technical word, an escape sequence can be created that will be interpreted by the deployment step as requiring red text. Similar encoding may be required for the animation. The highlighted text can be stored at this point for later deployment. In step 113, highlighted text is displayed on the display device, a text segment per line formed again. The highlighted text can also include the animation, background color, text color, labeling, and presentation speeds discussed above. In a preferred embodiment, the color of the background is presented as a function of the positions of the sentence and the paragraph. In another embodiment, illustrated in Figure 11, some text is initially presented in one color or brightness over a period of time, followed by presentation in a second color or brightness. In Figure 11, the sentence is presented in three newly formed lines indicated by arrows 200, 202, and 203. Within those lines, the text is presented all on the same line, with the words in Figure 11 shown in lines different to illustrate the temporary change of the color / marking of the initial text 206 to the color / marking of the text 207, and the color / marking of the initial text 208 to 209. The text "Af" in 206 is the syllable with emphasis on "Africa" , and initially it is shown in color / sign 206 for this reason. The text "gong" is also the syllable with emphasis on "Ngong", and has the initial color / signaling 208, followed by the color / signaling 209. The preferred embodiment shown in Figure 1 allows the rich text of the step to be edited. 132 and the highlighted text of step 113. In a preferred embodiment, a word pointing device such as a mouse is used to select a portion of the text to be edited. The rich sentence corresponding to that portion of the highlighted text is selected internally, and the attributes available for editing. This can be implemented using memory recall menus that allow each attribute of rich text and highlighted text to be edited. When editing the rich text could affect the presentation, the text is pre-processed according to step 110 again. In the editing session of step 114, reader annotations, either hidden or immediately visible, are accepted and can be stored in enriched and enhanced sentences. These annotations perform the same function as handwritten notes in a book. Editable features include parts of the language, definition, color, text, split points, and horizontal displacement. You can also store the edited attributes and the deployment of this mode, in step 116, keeping the changes. In one modality, only the highlighted sentences are stored. In a preferred embodiment, both rich and enhanced text are stored. In the preferred modalities, some editions are recorded either as reader-specific or global, for presentation to all text readers. The manual editing of the enhanced text is especially useful when the same text will be seen repeatedly by others, such as with an electronic book. In a variation of the embodiment of Figure 1, steps 113, 114, and 116 are omitted, without human intervention and without immediate deployment. In this mode, rich and enhanced text is created and stored for future exhibition. The enhanced text can be stored in standard word processing format such as Microsoft Word binaries (TRADEMARK) or Corel Word Perfect (TRADEMARK). In this modality, the presentation software can be simple, small, and fast in relation to the software required to consult words and analyze text. This presentation is a preferred modality for the massive distribution of the highlighted text to read it as "electronic books". In a related variation of the embodiment of Figure 1, the same highlighted text is retrieved in step 106, without requiring pre-processing and without allowing edits. This presentation is also a preferred modality for the massive distribution of the enhanced text to read it as "electronic books". Referring to Figure 1, step 113, the display of each highlighted text screen can be triggered manually as by mouse oppression. In a preferred embodiment, the presentation speed is controlled by parameters specified by the reader that include display time for the text, and inter-sentence and inter-paragraph arrival times. In a more preferred embodiment, the Text Content is measured, and the range of text presentation depends on this Text Content. In a preferred embodiment, the pronunciation time of the text is used as a measure of the Text Content to determine the display interval. In another modality, the weights of the sentence are used to measure the content of the Text and to determine the presentation interval. A preferred mode allows readers to extend the presentation time for a sentence, as well as speed up and slow down the presentation speed. This speed can be recorded, as can the length of the sentence and the sentence difficulty corresponding to a particular presentation interval. Figure 10 illustrates an example of plotting in third dimension, a "Historical Interval Presentation Sampling Map", which represents the time needed to read a sentence as a function of two attributes, difficulty of prayer and sentence length. The time needed to read can be measured by the preferences of the reader or by the tracking of eye movement or the activity of brain waves previously mentioned. Time can be correlated against two attributes as in Figure 10, or any number of attributes. The display interval in a preferred embodiment is continuously adjusted to select by comparison the predicted reading time required. In yet another modality, more appropriate when measuring the reading speed accurately, the entries for the splitting rules vary and the resulting reading speed is tracked. In a convenient embodiment to create an optimal mass market reading product, the entries to the splitting rules are varied and the reading speeds are recorded for a sample population. The inputs are varied to determine the optimal speed and reading comprehension. For example, the importance accorded by the secondary splitting rules for parts of the language relating to the weight of the phrase are varied to determine the optimum reading speed. In another modality, the manual editing of the reader of the visual display of the initial sentence is analyzed for the relative contribution of the structure of the sentence, parts of the language, length, and complexity. After this initial period of "tuning" or "training", similar weighings are used for the splitting rules for automatic text enhancement. Subsequent editions can be used to further refine the splitting rules. In step 118, the presentation of the text stops when there is no more text or a reader asks to stop. Step 120 provides cleaning including rich and enhanced text storage, as well as historical storage about the reading session. Numerous features and advantages of the invention covered by this document have been presented in the above description. It will be understood, however, that this description is, in many aspects, only illustrative. Changes can be made in the details, particularly in matters of combining, separating, and accommodating steps, without exceeding the scope of this invention. The scope of the invention, of course, is defined in the language in which the claims are expressed.

Claims (28)

1. A method to enhance the presentation of text comprising: a) extracting specific attributes of text from the text; and b) vary the presentation of the text according to said attributes.
2. A method for enhancing the presentation of text, as mentioned in claim 1, wherein the specific attributes of the text include the location of the text within a body of text, the presentation of the text includes background color, and the presentation includes varying the background color according to the text location.
3. A method for enhancing the presentation of text as mentioned in claim 1, wherein the specific attributes of the text include a measure of textual difficulty, and the text presentation includes an automatic text advance rate, and the presentation it includes varying the speed of presentation of text according to the measure of the difficulty of the text.
4. A method for enhancing the text presentation, as mentioned in claim 3, wherein the measure of the difficulty of the text includes an estimated pronunciation time of the text.
5. A method for enhancing the text presentation, as mentioned in claim 3, wherein the measure of the difficulty of the text includes an estimated educational level of the text.
6. A method for enhancing the presentation of text, as mentioned in claim 3, wherein the measure of the difficulty of the text includes the number of syllables in the text.
7. A method for enhancing the presentation of text, as mentioned in claim 1, wherein: a) attributes include punctuation and language parts; b) the extraction includes the syntactic analysis of the text in punctuation and parts of the language; c) the presentation of varied text is implemented using rules that have inputs and outputs, - d) the rules entries include the language parts; e) enhanced text presentation includes visual attributes; and f) the outputs of the rules include visual attributes.
8. A method for enhancing the presentation of text, as mentioned in claim 7, wherein: a) the rules include splitting rules, - b) the splitting rules divide the text into text segments; and c) the entries of the splitting rules include the score.
9. A method for enhancing the presentation of text as mentioned in claim 8 wherein the entries of the splitting rules also include the language parts.
10. A method for enhancing the presentation of text, as mentioned in claim 9, wherein the visual attributes include the display of the text segments in a color that depends on the language parts.
11. A method for enhancing the text presentation, as mentioned in claim 9, wherein the visual attributes include the display of the text segments on new lines.
12. A method for enhancing the presentation of text, as mentioned in claim 11, wherein: a) the rules include horizontal scroll rules; b) visual attributes include horizontal displacement of the text segment; and c) the rules of horizontal displacement include parts of the language as inputs and horizontal displacement as outputs.
13. A method for enhancing the presentation of text, as mentioned in claim 12, further comprising: a) accepting the user's editions of the split points of the displayed text; and b) store the user's edits in machine readable form.
14. A method for enhancing the presentation of text, as mentioned in claim 13, further comprising: a) accepting the user's editions of the language portions of the text displayed, - and b) storing the user's editions in a form Machine readable
15. A method for enhancing the presentation of text, as mentioned in claim 13, further comprising: a) accepting the user's annotations to the language parts of the displayed text; and b) store the user's notes in machine-readable form.
16. A method for enhancing the presentation of text, as mentioned in claim 12, wherein the splitting rules include primary rules and secondary rules, wherein the primary rules include punctuation as entries, wherein the secondary rules include the parts of language as inputs.
17. A method for enhancing the presentation of text, as mentioned in claim 16, wherein the splitting rules further include a microgram to remove the ambiguity of language parts.
18. A method for enhancing the presentation of text comprising: a) providing a plurality of splitting rules which use punctuation; b) provide the rules of horizontal displacement of the text segment; c) analyze text syntax in words and punctuation; d) determine the places of the split point in the text by applying the splitting rules, using the punctuation, dividing the text into text segments; e) apply rules of horizontal displacement of segments of: text to the text segments, thereby determining a horizontal displacement for each text segment, - and f) displaying the text segments each in a new line, each segment having text deployed the horizontal displacement determined.
19. A method for enhancing the presentation of text, as mentioned in claim 18, wherein the splitting rules use parts of the language as inputs, which also comprise, determining the probable parts of the language of the words from the analysis step syntactic.
20. A method for enhancing text presentation, as mentioned in claim 19, wherein the text is a sentence.
21. A method for enhancing the presentation of text, as mentioned in claim 19, further comprising: a) providing color deployment rules, wherein the color display rules include parts of the language such as entries and color of the text as an output, - and b) display the text according to the color output of the text.
22. A method for enhancing the presentation of text, comprising: a) providing splitting rules using parts of the language and punctuation, where the splitting rules include primary rules and secondary rules, using the primary rules punctuation marks, and using the secondary rules language parts, where the secondary splitting rules include a microgram to remove the ambiguity of the language parts; b) provide rules of horizontal displacement of the text segment, where the rules use parts of the language, - c) provide a minimum length of text segment, - d) provide a maximum length of text segment, - e) analyze syntactically the text in words and punctuation; f) determine the probable parts of the language of the words in step e, - g) determine the places of the primary split points in the text by applying the primary splitting rules, dividing the text in Super-phrases; h) determine places of primary split points in the superscripts by applying the secondary splitting rules, dividing the super-phrases into text segments, - i) repeat step h until all the text segments are not larger that the maximum length of the text segment, and not less than the minimum length of text, - j) apply the rules of horizontal displacement of the text segment to the text segments, thereby determining the horizontal displacement for each text segment, - and k) displaying each text segment in a new line, each text segment having the determined horizontal displacement.
23. A method for enhancing the presentation of text, as mentioned in claim 22, which further comprises: a) accepting the user's editions of the split points of the displayed text; and b) store the user's edits in machine readable form.
24. A method for enhancing the presentation of text, as mentioned in claim 22, further comprising: a) accepting the user's editions of the language portions of the displayed text; and b) store the user's edits in machine readable form.
25. A method for enhancing the presentation of text, as mentioned in claim 22, further comprising: a) accepting the user's annotations to the language portions of the displayed text; and b) store the user's notes in machine-readable form.
26. A device for enhancing the presentation of texts comprising: a) an element for parsing to analyze the text in words and punctuation; b) an element to determine probable parts of the language for the words resulting from step a; c) an element to determine places of splitting points in the text that include applying splitting rules, which use the parts of the language and punctuation marks, dividing the text into segments of text. d) an element to determine the horizontal displacement of the text segment by applying the rules of the horizontal displacement of the text segment to the text segments, thereby determining the horizontal displacement for each text segment; and e) an element for displaying the text segments each in a new line, each having the determined horizontal displacement.
27. A device for enhancing the presentation of text as mentioned in claim 26, further comprising: a) an element for selecting a color based on the language parts; and b) an element to display the text according to the selected color.
28. A device for enhancing the presentation of text comprising: a) a grammar analyzer capable of grammatically analyzing the text in words and punctuation; b) a table containing words and probable parts of the language corresponding to the words; c) a determiner of place of the split point which applies splitting rules which use parts of the language and punctuation marks, dividing the text into text segments; d) a determiner of the horizontal displacement of the text segment which applies the horizontal displacement rules of the text segment and determines the horizontal displacement for each text segment; and e) a text segment display to display the text segments each on a new line, using the determined horizontal displacement.
MXPA/A/1999/001290A 1996-08-07 1999-02-04 Text processor MXPA99001290A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08693444 1996-08-07

Publications (1)

Publication Number Publication Date
MXPA99001290A true MXPA99001290A (en) 2000-02-02

Family

ID=

Similar Documents

Publication Publication Date Title
US5802533A (en) Text processor
US7765471B2 (en) Method for enhancing text by applying sets of folding and horizontal displacement rules
MXPA99001290A (en) Text processor
AU2007203103B2 (en) Reading Product Fabrication Methodology
AU2004200447B2 (en) Text processor
Fitzpatrick Towards Accessible Technical Documents: Production of Speech and Braille Output from Formatted Documents
JPH04284567A (en) Electronic dictionary device
FUKUI et al. Research on Model Based Document Processing System DARWIN
Nazemi et al. Complete Reading System For The Vision Impaired
Kirk Using SARA on the British National Corpus
Mathias The Japanese Language through Time