WO2014041365A1 - Amélioration de la lisibilité d'un texte - Google Patents

Amélioration de la lisibilité d'un texte Download PDF

Info

Publication number
WO2014041365A1
WO2014041365A1 PCT/GB2013/052405 GB2013052405W WO2014041365A1 WO 2014041365 A1 WO2014041365 A1 WO 2014041365A1 GB 2013052405 W GB2013052405 W GB 2013052405W WO 2014041365 A1 WO2014041365 A1 WO 2014041365A1
Authority
WO
WIPO (PCT)
Prior art keywords
words
line
text
word
formatted
Prior art date
Application number
PCT/GB2013/052405
Other languages
English (en)
Inventor
Timothy William Gerald BADDELEY
Michael Wayne FLINT
Neil William DAVIES
Simon John HUTTON
Kimberley Dawn BAYLISS
James Richard TAYLOR
Jess MCINTOSH
Original Assignee
Purple Secure Systems Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Purple Secure Systems Ltd filed Critical Purple Secure Systems Ltd
Publication of WO2014041365A1 publication Critical patent/WO2014041365A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Definitions

  • the present invention concerns improving the readability of text, an improved text product, and a computer system and method for creating such a text product. More particularly, but not exclusively, this invention concerns computer software for converting a source text into a new text product (for example in the form of a data file) which may improve readability of the text for certain readers.
  • the invention also concerns a novel formatted text product, a method of converting text comprising a multiplicity of sentences into a formatted text data product, a computer system for converting a source text comprising a multiplicity of sentences into a formatted text data product, a computer software product for programming a computer to create such a computer system, and a method of improving the ease of reading a literary work.
  • US 3,61 1,593 provides a proposal for improving readability of a text by means of providing indications in the margins of the text at the start of each line, or every other line, to guide the reader's eye from one line to the next.
  • indications such as bullet points or other marks
  • the appearance of such special extra markings in the text may make readers suffering from reading difficulties feel patronised or otherwise made to feel different from more able readers.
  • the solution offered by US 3,611,593 is thus rather intrusive.
  • the present invention seeks to mitigate the above-mentioned problems.
  • the present invention seeks to provide a text product which may ease reading of the text by certain readers.
  • the present invention seeks to provide a method and/or apparatus and/or software which enable the creation of such an improved text product.
  • the present invention provides, according to a first aspect, a formatted text product comprising text formatted so that the majority of pairs of adjacent lines of text have been modified such that one of the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the present invention is designed to help readers with tracking when the eye moves to locate the following line. This is achieved by means of the pairs of repeated word or words from the end of one line to the beginning of the next which effectively add a link or connection between adjacent lines. It is believed that the repetition of a word at the end of a line at the beginning of the next will assist certain readers in tracking from one line to the next.
  • it is reckoned to be a sufficiently 'soft ' solution that its application will not bother fluent readers, since the brain rapidly ceases to notice any difference between conventional text and this 'connected' text.
  • the formatted text product may be in the form of printed matter, such as a paper- based publication, for example a book.
  • the formatted text product may be an electronic data product, for example a data file.
  • the text product may be displayed on a portable electronic book reader, such as for example the KindleTM device sold by Amazon.
  • the text product may be displayed within a web-page, for example within or overlaying a browser window on the display device of a computer system.
  • the electronic data product could be stored on physical media, such as the memory of a computer, a hard drive, memory stick or other memory device, or a data disc (such as a CD-ROM, DVD or the like).
  • the text product may be displayed on an electronic display device, for example associated with a computer.
  • the computer may be a tablet computer, smart phone, desk- based PC, laptop, or similar computing apparatus.
  • the text product may be displayed in a distinct part of the display separate from the source text which may also be displayed, optionally at the same time, on the same, on the same device.
  • the text product may be displayed so that it overlays, partially or wholly, or otherwise masks or hides (at least partially), the display of the source text.
  • the text product may be displayed in a window, pop-up, or similar.
  • Such a window, pop-up, or similar object may have the appearance, to the user, of a lens which transform the source text via the lens into the formatted text product.
  • the formatted text product may be such that substantially all pairs of adjacent lines within a paragraph of text have been modified such that one of the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the formatted text product may comprise multiple paragraphs of text. For example, there may be three or more paragraphs of text. There may be ten or more paragraphs of text.
  • the formatted text product may be such that the majority of pairs of adjacent lines of text, and more preferably substantially all pairs of adjacent lines, within the majority of, and more preferably substantially all of, the paragraphs of text have been modified such that one of the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the first word or words of the second line of the pair of adjacent lines that are repeated from the first line of the pair of adjacent lines may be formatted differently from the bulk formatting of an adjacent line.
  • the first word or words of the second line may be positioned within a left-hand margin spatially separated from the portions of the text product not comprising the repeated words at the beginning of lines.
  • An embodiment of the invention may therefore have an appearance as set out below (where the letters "A” and "B” each represent words in a line, the letter B in this case being the repeated word and appearing in the left-hand margin, the number of lines in the paragraph being variable of course):
  • the first word or words of the second line may be in a different colour from the bulk formatting of an adjacent line.
  • An embodiment of the invention may therefore have an appearance as set out below, where the letter “A” represents a word having a first colour - or format - and the letter “B” represents a word having a second different colour - or format - with all adjacent pairs of lines in the text being such that the word represented by the last letter (in this case always represented by the letter "A") on the first line of the pair is repeated on the next line of the pair (in this case always represented by the letter "B”):
  • the text may be formatted such that, for example, when reading from one line to the next, the last word or words on one line may be one colour (or format), the first (repeated) word(s) on the next line may be the same colour (or format), optionally with subsequent words on that next line being a different colour (or format). It may for example be that both the last word or words of the first line of the pair of adjacent lines and the repeated first word or words of the second line of the pair of adjacent lines are formatted in the same way but differently from the bulk formatting of an adjacent line.
  • An embodiment of the invention may therefore have an appearance as set out below, where the letter "A” represents a word having a first colour - or format - and the letter “B” represents a word having a second different colour - or format - with all adjacent pairs of lines in the text being such that the word represented by the last letter (in this case always represented by the letter "B") on the first line of the pair is repeated on the next line of the pair (in this case also always represented by the letter "B”):
  • the text may be formatted such that, for example, when reading from one line to the next, the last word or words on one line may be one colour (or format), the first (repeated) word(s) on the next line may be the same colour (or format), with subsequent words on that next line being the same colour (or format).
  • the subsequent words on that next line may be the same colour (or format) as the first (repeated) word(s), for example up until but not including the last word(s) on that line, if such last word(s) are repeated on the line thereafter.
  • the preceding words on a given line before the last word(s) that are repeated on said one line
  • An embodiment of the invention may therefore have an appearance as set out below, where similar to above the letters "A” represents a word having a first colour - or format - and the letter “B” represents a word having a second different colour - or format - with all adjacent pairs of lines in the text being such that the word represented by the last letter (whether "A” or "B") on one line is repeated as the first word on the next line: AAAAAAAAAAAAAAAAAAAAB
  • the text may be formatted such that, for example, when reading from one line to the next, the last word or words on one line may be one colour (or format), the first
  • (repeated) word(s) on the next line may be the same colour (or format), with subsequent words on that next line being a different colour (or format).
  • the subsequent words on that next line may be the same colour (or format) as the last word(s) on that line.
  • An embodiment of the invention may therefore have an appearance as set out below (again, the letters "A" and "B" representing words of different colours/formats):
  • the text may be formatted such that the last word or words on the first line (which are repeated on the second line) may be in a different colour from the bulk formatting of an adjacent line.
  • An embodiment of the invention may use three or more different colours for the repeated words, and therefore for example have an appearance as set out below (in a similar manner to the preceding illustrations, the letters "A”, “B”, “C” and “D” representing words of different colours/formats):
  • the bulk formatting used in one line may be the same as one only of the first (repeated) word(s) and the last (repeated) word(s) on that line.
  • Alternative embodiments of the invention may therefore have an appearance as set out in the two illustrative examples below (in a similar manner to the preceding illustrations, the letters "A”, "B” and "C” representing words of different colours/formats): AAAAAAAAAAAAAAAAAAAAB
  • BBBBBBBBBBBBBBBBBBBBBC CCCCCCCCCCCCCCCCCCCA AAAAAAAAAAAAAAAAAAAAAAAB BBBBBBBBBBBB
  • a method of converting text comprising a multiplicity of sentences into a formatted text data product for display on a display area having an available line width, wherein the method includes modifying the text such that for the majority of pairs of adjacent lines of text one of the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the method may include outputting the formatted text data product so created, for example by displaying the text on a display screen, by printing the text, and/or by saving the text for subsequent use.
  • the step of outputting of the formatted text data product may be in the form of outputting a data file, for example in the form of an electronic data product.
  • the method may include providing a source text to be converted, the source text comprising a series of words.
  • the method may include identifying words in a source text, and in particular, identifying the last word or words in a line.
  • each of the "words" so identified when performing the step of identifying the words may consist of a series of characters, for example including punctuation, and not necessarily being in the form of words that would normally be found in a conventional dictionary.
  • a "word” may simply be defined as the series of characters between a space and the end of a line or the next space. There may therefore be no analysis of the word so identified, and the step of identifying a word may for example simply comprise detecting the start and end of a word, without requiring knowledge of the content of the word so "identified".
  • the method may include identifying lines of words in a text, for example to allow identification of the last word or words in a line.
  • the method may include identifying the last line of words in a paragraph.
  • the step of identifying a line of words in a text may distinguish between a line which only contains a web-link and/or images and a lines of words. A line which comprises only one word, not being a web-link, may be treated as not being a line of words.
  • a line which includes formatting of a certain type may be identified as not being a line of words (or being a line of words but not one that is to be processed so as to produce connected text - such as a repeated word in an adjacent line).
  • the one or more lines of text so identified may be stored as data in a different way such that the data includes additional line information allowing a different means for subsequently identifying separate lines in the text.
  • the step of identifying lines of words in a text may include dividing a string of words, for example without any information concerning line width, into a set of lines of words. Dividing a string of words into a set of lines of words may include using page width information and character spacing information.
  • the method may include creating as part of the formatted text data product a first line of words, in dependence on the available line width of a display area, by populating the line with a first sequence of words forming part of the series of words from the source text.
  • the method may include a step of inserting, at the start of a line of words in the formatted text data product, a copy of the last word or words of the preceding line of words. Such a step may be repeated a multiplicity of times.
  • the method may use a computer to perform one or more of the steps of the method. As such, there may be a step of providing a computer having a memory and a processor.
  • One or more software modules may be provided to carry out steps of the method.
  • the method may comprise inputting into the memory of the computer, data representing the source text to be converted.
  • the computer may for example use one or more software modules to perform the following sub-steps:
  • the method may include (as step (e) for example) repeating step (c) but in respect of the second line of words, and repeating step (d) to create a third line of words after the second line of words.
  • the steps (a) to (d) are preferably carried out in that order.
  • the source text when provided as a text data product, may be converted into the formatted text data product (for example so that after the method is performed the source text so provided only survives as the resultant formatted text data product).
  • the source text may thus be progressively edited/changed during performance of the method.
  • the formatted text data product may be separate from the data representing the source text.
  • the step of populating the line with a first sequence of words forming part of the series of words from the source text may simply be performed by manipulating the source text.
  • the step of providing or creating, for example, a line of words, in dependence on the available line width of the display area, by providing, or populating, the line with a sequence of words may be performed simply by using, and adjusting as necessary, the source text.
  • the step of populating the line with a first sequence of words forming part of the series of words from the source text may be performed by creating a new data product (for example the formatted text data product) separate from, but at least partially copied from, the source text.
  • steps (c) and (d) are preferably repeated (as step (e) for example) mutatis mutandis so as to create a formatted text data product in which the majority of pairs of adjacent lines of text in the formatted text data product are such that one of the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the method may be performed within a software application, for example a third party software application, which performs other functions and/or has one or more software modules.
  • a software application will for the sake of convenience be referred to as a host software application.
  • the method may be performed within a software application, which is provided with, or has access to, one or more software modules, for example library functions.
  • Such one or more modules may include word- processing functions that allow for manipulation and/or display on-screen of text.
  • the software module that is utilised to identify the last word or words in a line of words may advantageously make use of such word-processing functions. Using such functions may allow for the software module(s) that provide the text conversion functions of the present invention to be small in size (i.e.
  • the code required to provide the functionality of the present invention may be provided in a relatively small size of file - fewer bytes).
  • the step of inserting at the start of a line (for example "the second line") a (duplicate) copy a word or words may require formatting of the subsequent lines of text (for example in order to accommodate the extra inserted word in the available line width, which may require shifting one or more words from the end of a line to the start of the next and so on).
  • a function is provided by a software module of the host software application.
  • Such a function may be useful when the method creates a copy of the source text, and then manipulates that copy of the source text to produce the formatted text data product, (or when the method simply works by directly manipulating the source text to produce the formatted text data product), rather than creating a new formatted text data product line by line from the source text.
  • the host software application may in some embodiments be a bespoke software implementation of the present invention.
  • the host software application may make use of library functions.
  • One such library function may be one that enables the last word on a line of text to be readily identified and/or copied.
  • Another such library function may be the ability to readjust lines of text to accommodate the insertion of a word or words at the beginning of an earlier line.
  • the host software application may be in the form of an internet browser application.
  • the host software application may be in the form of a word processing package.
  • the word-processing functions may allow for information, for example information concerning the relative position of a word, to be extracted in relation to words or group of words. For example, information that may be extracted may enable the software to ascertain whether a word is located at the end of a line or not.
  • the information that may be extracted may comprise display offset information that concerns the position relative to a display area of a word or words.
  • the host software application is in the form of an internet browser and word or words in the text are provided as tagged elements
  • the last word or words in a line of words may advantageously be identified by means of comparing a vertical offset value in respect of successive tagged text elements.
  • the word-processing functions may include the ability to select the last word on a line of a page, for example in the case where the host software application is in the form of a word processing package, and the last word or words in a line of words may be identified by the method by means of using this word-processing function.
  • the word-processing functions provided may include functions different from those found in conventional word-processing packages, especially when the host software application is not a word-processing package.
  • Such word-processing functions may for example include those functions that reposition subsequent words, when new text is inserted, in order to fit the lines of text into the available line width.
  • the way in which the method takes account of the available line width of a display area may be effected implicitly.
  • the word-processing package or browser may in any case ensure that a line of words is contained within the available line width and shift text at the end of one line to the beginning of the next as appropriate. Text may be automatically resized to be accommodated within the available width on a line.
  • the method may produce a formatted text data product in the form of a data file.
  • Such a data file may for example contain data representing the formatted text.
  • a modified method of the second aspect of the invention may be provided in the form of a method of providing a data file, the data file containing data representing formatted text for display on a display area having an available line width, the method comprising the following steps:
  • step (d) determining a second line of words after the first line of words, in dependence on the available line width of the display area, by inserting at the start of the second line a copy of the last word or words of the first line of words of the formatted text data product and thereafter populating the second line with a second sequence of words, the second sequence of words immediately following the first sequence of words in the source text, and (e) repeating step (c) but in respect of the second line of words, and repeating step (d) to determine a third line of words after the second line of words, creating a data file representing formatted text consisting of the first, second and third determined lines of text (for example in order, one after the other, and possibly concatenated with intervening line breaks), and
  • a computer system for converting a source text comprising a multiplicity of sentences into a formatted text data product in accordance with the first aspect of the invention or the second aspect of the invention.
  • the computer system may comprise a module for identifying words within a text data product.
  • the computer system may comprise a module for identifying the last word or words of a given line of words in a text data product.
  • the computer system may comprise a module for creating a formatted text data product comprising multiple lines of words, the lines having a given line width, such that, for the majority of pairs of adjacent lines of text, the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the computer system may comprise a module for outputting a formatted text data product.
  • a computer software product for programming a computer to create the computer system of the third aspect of the invention.
  • a computer software product may include at least a module for creating a formatted text data product comprising multiple lines of words, the lines having a given line width, such that, for the majority of pairs of adjacent lines of text, the last word or words of the first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • the computer system may for example already have one or more software application(s) already installed thereon, for example a host application as described above, providing one, and preferably all, of the following functions:
  • the host application may also provide a function in which individual lines of text may be identified and distinguished from each other.
  • a method of improving the ease of reading a literary work comprising a multiplicity of sentences each sentence comprising a multiplicity of words, wherein the method includes a step of creating a text product from the literary work by repeating one of the last word or words of the first line of a pair of adjacent lines as the first word or words of the second line of the pair of adjacent lines, for at least the majority of pairs of adjacent lines of text.
  • Figure 1 shows a paragraph of text
  • Figure 2 shows the paragraph of text of Figure 1 processed by a method according to a first embodiment of the invention
  • Figure 3 a shows a paragraph of text
  • Figure 3b shows the paragraph of text of Figure 3a processed by a method according to a second embodiment of the invention
  • Figure 4 shows a flow-chart for a macro according to a third embodiment of the
  • Figure 5 shows a flow-chart for a web-browser plug-in according to a fourth embodiment of the invention for converting text within a web-page;
  • Figure 6a shows tagged html text prior to conversion by the browser plug-in of the fourth embodiment
  • Figure 6b shows how the text of Figure 6a is displayed to the user in the web browser
  • Figure 7a shows tagged html text after conversion by the browser plug-in of the fourth embodiment
  • Figure 7b shows how the text of Figure 7a is displayed to the user in the web browser
  • Figure 8 shows a flow-chart for a computer-implemented method according to a fifth embodiment of the invention for converting text
  • Figure 9 shows the paragraph of text of Figure 1 processed by a method according to a sixth embodiment of the invention.
  • Figure 10 shows the paragraph of text of Figure 1 processed by a method according to a seventh embodiment of the invention.
  • Figure 1 1 shows the paragraph of text of Figure 1 processed by a method according to an eighth embodiment of the invention.
  • the first embodiment of the invention relates to a system for converting printed text so that the last word of each line (apart from the last line in the paragraph) also appears as the first word of the following line.
  • the system is preferably implemented in a computer programmed with appropriate software.
  • Figure 1 shows some sentences of printed text in a paragraph.
  • Figure 2 shows the text of Figure 2 modified so that the last word of each line also appears as the first word of the following line.
  • the repeated words are in the same font, colour and general style as the words of the primary text (the original text in Figure 1) so that the repeated words are not isolated or otherwise distinguished form the primary text.
  • the repeated words can however assist readers with tracking when the eye moves to locate the following line.
  • connection text will be used in relation to the drawings as text that has been converted from a source text - or primary text - to produce a text in which the last word or words in each of one or more lines of the text are repeated as the first word or words on the next line.
  • the connected text is displayed/printed in landscape, not portrait, in Figure 2.
  • Certain embodiments of the invention have particular application to a text printed in landscape, which would normally present difficulties with tracking, as the eye of the reader has further to travel (laterally) to the next line, and the reader is more prone to losing his/her place in the text.
  • An advantage of landscape display /printing, which embodiments of this invention would render more practicable, is that there is less tracking to do, since there are fewer line breaks in a text, by virtue of the text being broader.
  • a 'compound-word-element' denotes a group of words like 'brother in law', where the group forms a commonly used unit of words and where to break the group up would create an obstacle to fluency and understanding.
  • the group of words is repeated as if it were one word.
  • the system that is used to convert a source text into a connected text has access to a dictionary of such compound word elements, so as to ascertain whether the last two or three words on a line belong to a compound-word-element.
  • the system may also ensure that a compound-word-element is not split across two lines in the connected text, if there is insufficient room at the end of the line for the compound-word-element, by pushing the compound-word-element and the preceding word onto the second of the two lines and repeating said preceding word at the end of the first line.
  • Figures 3a and 3b relate to a second embodiment, as a related variation of the first embodiment.
  • Figure 3a shows a source text 92, before having been formatted according to the second embodiment.
  • the last word on each line (apart from the last line in the paragraph) of the primary text 92 is repeated as an isolated word 94 in the left hand margin in the following line.
  • the repeated word 94 is isolated so that its inclusion can be seen to be additional to the primary text 92 and present to aid tracking.
  • the repeated word 94 can be displayed in such a way that its isolation from the primary text 92 is clear. In Figure 3b, this is achieved by indenting the primary text 92 with a sufficiently large margin so as to accommodate most if not all of the repeated words 94 in the left hand margin and leave a gap between the repeated word 94 and the following text on the same line.
  • the repeated words 94 are left justified in the left hand margin.
  • the repeated word 94 could also be displayed in a different colour, shade, style of font. In this case, the repeated word 94 is italicised. It will be clear that the vertical line 96 separating the repeated word and the primary text is merely to illustrate the position of the left hand margin and would normally not be displayed or visible to the user.
  • the computer system used to produce the connected text of Figure 2 may be any suitable system that converts text, such that pairs of adjacent lines in the modified text are such that one of the last word or words of a first line of a pair of adjacent lines is repeated as the first word or words of the second line of the pair of adjacent lines.
  • both are add-ons to existing software applications.
  • the add-on in each case utilises inherent functions in the software application, which for example provide the ability to identify the last word or words in a line of words.
  • Figure 4 shows, according to a third embodiment, a flow chart of a macro (a computer program) that is designed for use within a word processing package, in this case the word processing package produced by Microsoft Corp under the name "Microsoft Office Word".
  • the macro is designed to produce a new document with the same formatting as the source document but converting the text into a connected text format.
  • First (step 10) the macro copies the source document and creates a new file.
  • the macro then performs the same loop of steps in respect of each successive line of text on screen. A check is performed to ensure that the line does not represent the end of the paragraph. This is performed by checking whether the last character (including non-printing characters and characters otherwise not visible to the user) on the line is a carriage return or a new paragraph mark. If the line in question is at the end of a paragraph then the macro proceeds to the next paragraph in the document, or of there are no further paragraphs, the macro ends (box 12).
  • the text unit at the end of the line is selected (box 14), using the corresponding commands in MS Word (in the same way as if a user with the cursor on the line in question presses the "END" key and then simultaneously presses "SHIFT", "CTRL” and "LEFT ARROW”).
  • the text unit selected will be a word, whereas in other cases it will not.
  • the character or characters selected may be one or more punctuation marks.
  • the macro checks whether the text unit is not a word unit and if so, then additionally selects the word unit immediately preceding the text unit (box 16) and intervening space characters. For example, if the two words and comma at the start of this sentence were the last two words on a line, the macro would initially select a single-character text unit consisting of the comma, and the macro would then additionally select the word
  • the text that is selected (with or without punctuation) is then copied and used to create a text insert string.
  • the text insert string is created (box 20) by adding a number of normal spaces (i.e. not non -breaking spaces) before the copied text equal to the free space at the end of the line.
  • the text insert string is then added at the end of the line (box 22).
  • the spaces effectively fill the space at the end of the line, ensuring that the first word on the next line (including punctuation, if any is present) is the same as last word on the line before.
  • MS Word then adjusts subsequent lines automatically to ensure that each line of text does not exceed the available width, thus shifting words at the end of one line to the beginning of the next, as appropriate.
  • the process is then repeated for each line of each paragraph (apart from the last lines of each paragraph).
  • the macro can include other optional processes, such as changing the font colour and background colour to aid readability. There may also be some extra formatting steps carried out to suit a particular display device. If for example, the text is to be outputted on an electronic reader device, such as Amazon's KindleTM device, it may be necessary for the text to be displayed with certain pre-set margins and font-size for the repeated words in the connected text to display properly. It may also be necessary to format the start of a paragraph with the use of a single TAB character to be compatible with the default way in which paragraphs are formatted for the device.
  • an electronic reader device such as Amazon's KindleTM device
  • FIG. 5 shows a flow chart illustrating the operation of an internet browser plug- in, according to a fourth embodiment.
  • the browser in this embodiment is Microsoft's Internet ExplorerTM, but the principles of operation apply equally to other commonly used browsers, such as for example Mozilla's FireFoxTM or Google's ChromeTM browsers.
  • Such browsers are provided with rendering engines that facilitate the arrangement and display of text in a given window-size. If the window is resized, the rendering engine is able to re-display the text appropriately.
  • the plug-in uses html tags to process text to be displayed on screen. Tags are used in html to identify different elements of text, for example to display / format the tagged text in a certain way. Example text-formatting tags and their meanings are set out below: ⁇ B> ⁇ /B> Bold text
  • the plug-in processes the paragraphs one-by-one.
  • the start of a paragraph is denoted by the ⁇ p> tag and the end with a ⁇ /p> tag.
  • the plug-in starts (box 30) by converting the text within a paragraph into separately tagged words.
  • Each word is tagged with a ⁇ SPAN> tag, the ⁇ SPAN> tag being associated with a single identifier in the form of the string "ConnText" so that if text has already been processed by the plug-in it will not be inadvertently processed a second time.
  • Tagging each word in this manner also makes use of the browser's inherent features that enable the relative position of a tagged element on-screen to be determined.
  • the browser effectively automatically calculates a vertical offset parameter for each tagged element, the vertical offset parameter providing information concerning the relative positions vertically down the page/window of the text being displayed on-screen.
  • the plug-in compares successive pairs of tagged words (a step represented by box 34) until either the end of the paragraph is reached, which is signified by the tag ⁇ /P>, or the vertical offset parameters for the pair of tagged words are different, thus indicating that the second word of the pair is on the line of text immediately after the first word of the pair (decision path 36). If the second word of a pair is on a new line, then (box 38) a manual line break ⁇ BR> is inserted after the first word of the pair and a copy of the first word (with the ⁇ SPAN> tag) is inserted after the line break and before the second word, so that the first word on the new line repeats the last word on the previous line.
  • the rendering engine recalculates the positions of the remaining text (shifting along and moving words that spill over onto new lines, as appropriate).
  • the vertical offset parameters associated with any words that are pushed onto the next line down are updated as appropriate.
  • the plug-in processes the next line of text, adds another duplicate word, then the rendering engine adjusts the subsequent text, and so on.
  • the plug-in can be of a size that is much less than might otherwise be the case, resulting in a relatively small file-size for the plug-in.
  • FIGS. 6a to 7b show how html text is processed and displayed by means of the plug-in of the fourth embodiment.
  • Figure 6a shows some source html text
  • Figure 6b shows how this text is displayed in a window of the browser. It will be seen that the html text of Figure 6b, includes paragraph tags, ⁇ p> and ⁇ /p> and some text formatting tags.
  • the text formatting tags in this example are underlining of the words "Dyslexia and reading” by means of ⁇ u> and ⁇ /u> tags, emphasising with bold font the text "10% and 15%” with the ⁇ b> and ⁇ b> tags, and italicising the string " 'tracking' " with the ⁇ i> and ⁇ /i> tags.
  • the format style- 'display: block is sometimes used in websites to denote captions or "asides" within a body of text and other content.
  • the duplicated word in this embodiment is provided within the main body of text, and is not positioned within the left-hand margin or otherwise emphasised as being a special duplicated word within the text.
  • Figure 8 shows, according to a fifth embodiment, a flow chart of a computer program, for converting a source text into a connected text format.
  • the program copies the content, including source text, from the source document and creates a new file into which the source text is inserted (with other content being associated with the text as appropriate).
  • the new file is structured so as to be formatted in a series of lines of text in dependence on character font size, character spacing and display width settings. Each line has a fixed length and is typically populated by one or more works from the source text.
  • the program performs the same loop of steps in respect of each successive line of text in the new document. If the line in question is at the end of a paragraph then the program proceeds to the next paragraph in the document, or of there are no further paragraphs, the program ends (box 112).
  • the program moves to the first (or next) valid line of text (box 1 14).
  • the program checks whether the text unit at the end of the line is punctuation (or otherwise not a word), and if so, copies the word unit immediately preceding the non-word text (box 1 16) up to and including the text at the end of the line. Otherwise, the program copies the text unit (the word) at the end of the line (box 118).
  • a line return (“ ⁇ LR>") is then inserted at the end of the line (box 120) and the copied text unit (comprising the last word on the preceding line) is inserted at the beginning of the next line (box 122).
  • this line-by-line process is performed paragraph-by-paragraph for the whole document.
  • Figure 9 shows the source text 2 shown in Figure 1 after having been formatted by a method according to a sixth embodiment.
  • the last word 204a, 206a on each line (apart from the last line in the paragraph) of the primary text 2 (the text from Figure 1 ) is inserted as a repeated word 204b, 206b on the following line.
  • the repeated words 204b, 206b that are inserted are the first words that appear on the lines concerned.
  • the words on each line after the repeated words 204b, 206b follow immediately with a spacing that matches the rest of the text.
  • the repeated words 204b, 206b are not isolated in a separate column (or margin) as in Figure 3 b, but are provided in-line in the same paragraph as the rest of the text (i.e. with a spatial layout identical to the text product of Figure 2 - the first embodiment).
  • This sixth embodiment differs in that the repeated words 204b, 206b which appear as the first words on a line are presented in a colour (for example) different from the non -repeated words in the text (i.e. the rest of the text).
  • the words that follow each repeated word 204b, 206b on each line are all presented in the same colour (black for example) including the last words 204a, 206a on each line. Two distinct colours are used: blue for any repeated words (at the start of each line concerned) and black for the rest.
  • Figure 10 shows the source text 2 shown in Figure 1 after having been formatted by a method according to a seventh embodiment.
  • This seventh embodiment is similar to the sixth embodiment describred above, so only the notable differences will now be described.
  • both the last word 304a, 305a, 306a on the first line of the pair and the repeated word 304b, 305b, 306b on the second line of the pair are coloured in the same colour. The colour changes from one pair to the next.
  • the last word 304a on the first line and the first (repeated) word 304b at the start of the second line are presented in a first colour, namely green.
  • the last word 305a on the second line and the first (repeated) word 305b at the start of the third line are presented in a second colour, namely blue.
  • the last word on the third line and the first (repeated) word at the start of the fourth line are presented in a third colour, namely red.
  • the colour cycle is then restarted such that the last word 306a on the fourth line and the first (repeated) word 306b (the word "is” in this example) at the start of the fifth line are presented in the first colour (green).
  • the words in the paragraph not being the repeated words 304b, 305b, 306b, or their associated paired last words on the lines are all presented in the same bulk colour (black for example). Four distinct colours are used: red, green and blue for the repeated words (at the start of each line concerned) and the associated paired last words and black for the rest.
  • Figure 1 1 shows the source text 2 shown in Figure 1 after having been formatted by a method according to an eighth embodiment.
  • This eighth embodiment is similar to the seventh embodiment described above.
  • both the last word 404a, 405a, 406a on the first line of the pair and the repeated word 404b, 405b, 406b on the second line of the pair are coloured in the same colour, the colour cycling through three different colours on successive lines (as in the seventh embodiment).
  • the words all preceding a last word 404a, 405a, 406a in the line which is added as a repeated word 404b, 405b, 406b at the start of a following line are in the same colour.
  • the colour of the text cycles through three different colours on successive lines such that the colour changes after the first (repeated) word on a line.
  • the colour of the first word is different from the following words on that line but the same as the words immediately preceding it on the previous line.
  • a tenth embodiment of the invention not illustrated concerns a computer system for converting text comprising a multiplicity of sentences using software to output a formatted text data product in which the text is formatted in a plurality of lines, wherein the software adds extra duplicate words so that the last word or words of each line also appears as the first word or words of the following line.
  • the software may be arranged to output text such that in certain lines the last group of words of a first line also appears as the first group of words of the following line.
  • the software may be arranged to output text such that in certain lines the last single word of each line also appears as the first word in the following line.
  • the formatted text data product may include a left-hand margin and a body of text comprising said multiplicity of sentences positioned to the right of the left-hand margin and the last word or words of a first line also appear as an isolated word or words within the left hand margin of the following line.
  • the software could instead be arranged to insert an appropriate (calculated) number of spaces after the first (duplicated) word at the beginning of a given line to give the appearance of a left hand margin, to the right of first (duplicated) word, so that the vast majority of words after the first (duplicated) word on each line are left justified in line with each other).
  • the system may further include a display device and wherein the formatted text data product is displayed on the display device.
  • the software may comprise a) a text input module for receiving input text in digital form b) a text processing module that receives: i) the input text in digital form, and ii) output requirement data including font size data and page width data, and process the input text to produce successive lines of text formatted using the font size data and page width data and to determine the last word or words on a line of text, then inserting such word or words on the next line of text, the successive lines of text thus defining the formatted text data product, and c) a text output module which outputs the formatted text data product produced by the text producing module.
  • the text producing module may analyses the last words of each line and determines whether the last words are part of a compound- word-element and if not inserts the last word only on the next line and if so inserts all of the last words that form the compound-word-module onto the next line.
  • the system may include an electronic data base of compound-word-elements to enable the text producing module to determine whether two or more words form a compound-word-element or not.
  • a computer software product for use as the software of the computer system of the fourth embodiment.
  • a formatted text data product as produced by a computer system according to the fourth embodiment.
  • FIG. 1 Other styles of text, such as table or list may be appropriately converted into connected text by the browser plug-in of the third embodiment.
  • Connected text could be provided from scratch without a source text.
  • a source text could be converted into connected text by means of a non-computerised system, for example with the use partly or wholly of human assistance.
  • the software of the embodiment may be configured so that the resizing of the window causes the software to process the text so displayed in the resized window to be re-processed, preferably automatically, thus ensuring that extra duplicate words are added so that the last word or words of each line also appears as the first word or words of the following line, but removing any duplicate words previously added (before resizing of the window) if the resizing of the window causes such previously added duplicate words to be displayed on the same line.
  • compound word elements may be duplicated as a group only if connected by special punctuation (for example non-breaking spaces or hyphens).
  • special punctuation for example non-breaking spaces or hyphens.
  • a dictionary of such compound words need not be required.
  • the phrase "brother in law” the words “brother”, “in” and “law” being separated by standard space characters) if it appears at the end of a line of text would not be repeated as a group of words on the next line.
  • the software may simply deem the text in between two space characters, or between a space and a character or other indication indicating the end of a line or paragraph, as a word (even if the "word" so identified includes punctuation, is hyphenated or includes other non text characters).
  • the repeated words when repeated in the margin, could be right justified in the left hand margin, rather than left-justified.
  • the repeated words could be displayed in a different font / colour.
  • the embodiments could be applied to non-English text and could be applicable to foreign language text.
  • the direction of reading may be from right to left.
  • the last word of text on a line may therefore be the leftmost word, and therefore the duplication of that last word on the next line would therefore appear on the right-hand side of the page.
  • the words left and right can be deemed as swapped over when the invention is applied in the context of a language that is written from right to left.
  • the terms "left” and "right” may be replace by "top” and "bottom” as appropriate.
  • the computer program in view of character spacing, page width (or display width information) divide the text into successive lines. Such lines may be stored as an array in data.
  • Source text may be converted into a formatted text product with repeated words, by cutting and pasting (electronically for example) words one at a time into a new document or file. Once a line is populated with words and the next word to be added will cause a new line to be required, a line return may be inserted at the end of the line, and the last word on the line copied and pasted as the first word on the next line. The new document may thus be populated line by line, word by word, in this manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Selon l'invention, un produit de texte formaté comprend un texte formaté de telle sorte que la majorité de paires de lignes adjacentes d'un texte ont été modifiées de telle sorte que l'un du ou des derniers mots (304a, 305a, 306a) de la première ligne d'une paire de lignes adjacentes est répété en tant que premier mot ou premiers mots (304b, 305b, 306b) de la seconde ligne de la paire de lignes adjacentes. Un système informatique et un logiciel sont par conséquent décrits, lesquels fournissent la capacité de convertir un texte source en un tel produit de texte formaté.
PCT/GB2013/052405 2012-09-15 2013-09-13 Amélioration de la lisibilité d'un texte WO2014041365A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1216505.6 2012-09-15
GBGB1216505.6A GB201216505D0 (en) 2012-09-15 2012-09-15 Improving readability of text

Publications (1)

Publication Number Publication Date
WO2014041365A1 true WO2014041365A1 (fr) 2014-03-20

Family

ID=47144342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2013/052405 WO2014041365A1 (fr) 2012-09-15 2013-09-13 Amélioration de la lisibilité d'un texte

Country Status (2)

Country Link
GB (1) GB201216505D0 (fr)
WO (1) WO2014041365A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3611593A (en) 1969-11-14 1971-10-12 Harry S Shapiro Line-group sequential reading aid
WO2004023330A2 (fr) * 2002-09-05 2004-03-18 Vistaprint Technologies Limited Systeme et procede d'identification d'interruptions de lignes
US20070030502A1 (en) * 2005-08-02 2007-02-08 Van Cauwenberghe Jean-Thomas F Zebratext
US20110231755A1 (en) * 2008-07-14 2011-09-22 Daniel Herzner Method of formatting text in an electronic document to increase reading speed

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3611593A (en) 1969-11-14 1971-10-12 Harry S Shapiro Line-group sequential reading aid
WO2004023330A2 (fr) * 2002-09-05 2004-03-18 Vistaprint Technologies Limited Systeme et procede d'identification d'interruptions de lignes
US20070030502A1 (en) * 2005-08-02 2007-02-08 Van Cauwenberghe Jean-Thomas F Zebratext
US20110231755A1 (en) * 2008-07-14 2011-09-22 Daniel Herzner Method of formatting text in an electronic document to increase reading speed

Also Published As

Publication number Publication date
GB201216505D0 (en) 2012-10-31

Similar Documents

Publication Publication Date Title
US11017150B2 (en) System and method for converting the digital typesetting documents used in publishing to a device-specific format for electronic publishing
US9928828B2 (en) Transliteration work support device, transliteration work support method, and computer program product
JP4999938B2 (ja) 文書画像生成装置、文書画像生成方法及びコンピュータプログラム
WO2014041365A1 (fr) Amélioration de la lisibilité d'un texte
WO2014050562A1 (fr) Dispositif de correction de séquence pour région de paragraphe, ainsi que procédé pour commander son fonctionnement et programme pour commander son fonctionnement
Wyke-Smith Stylin'with CSS: a designer's guide
US9984053B2 (en) Replicating the appearance of typographical attributes by adjusting letter spacing of glyphs in digital publications
KR20170043292A (ko) 복잡한 다단 구조의 레이아웃으로 구성된 전자책 및 전자문서 데이터의 음성 합성 방법 및 장치
WO2013051077A1 (fr) Dispositif d'affichage de contenu, procédé d'affichage de contenu, programme et support d'enregistrement
Parhami Computers and challenges of writing in Persian
JP2020064428A (ja) コンテンツの表示方法および装置
CN111476019B (zh) 一种基于表格数据一键成书的自动排版方法
US11379661B2 (en) Word verification editing for simple and detailed text editing
US20230367952A1 (en) Reducing interference between two texts
Lambert et al. MOS 2016 Study Guide for Microsoft Word
JP2006092208A (ja) 差込処理装置、差込処理方法および差込処理プログラム
Lambert MOS Study Guide for Microsoft Word Exam MO-100
JP2021043924A (ja) 電子文書の閲覧用電子データの保存装置、保存方法
Budny et al. Preparation of Papers in Two Column Format for the FIE 2011 Conference
CN104462034A (zh) 双向排版中空格的处理方法及系统
US9405732B1 (en) System and method for displaying quotations
White et al. Exporting and Printing Your Documents
Říha Tvorba elektronických šablon pro UTB ve Zlíně
Whitmore et al. Principles of Form
Parhami Computers and the Challenges of Writing in Persian: A Personal History Spanning Five Decades

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13783620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13783620

Country of ref document: EP

Kind code of ref document: A1