MXPA00006111A

MXPA00006111A - Publication file conversion and display

Info

Publication number: MXPA00006111A
Application number: MXPA/A/2000/006111A
Authority: MX
Inventors: Michael W D Jones; Nicholas Geen; Mischka Hughes
Original assignee: Nicholas Geen; Mischka Hughes; Jones Michael William Dudleston
Priority date: 1997-12-19
Filing date: 2000-06-19
Publication date: 2002-03-26

Abstract

A computerized information display system extracts text data, lists of keywords, story rankings in order of story importance, and image maps identifying the location of stories from an input of publication files from a publisher. The system can generate a simultaneous display of a page image in which a story appears side-by-side with the text for the story when a particular story is selected, in order to allow a viewer can read the text while referring to the page image for visual cues about the text passage. The viewer can select a story from a displayed list of stories ranked in order of importance relative to other stories appearing on a page. The story rankings are derived based upon comparing one or more story importance indicators:location of the story on the page;size of type font of a headline associated with the story, size of type font associated with the story text;and size of text content for the story. The viewer can navigate to the text for a story on a displayed page by clicking in the story area on the page which is linked by image maps to the corresponding text passage. The viewer can also navigate to a text passage and page image by clicking on a keyword from a list of keywords extracted from the text input from the publisher. These computerized contextual display and image navigation tools allow the viewer a highly interactive experience with the publication. They allow a publication to be converted to electronically viewable form frequently, e.g., several times per day, and in a more user friendly form than the original printed copy.

Description

CONVERSION AND DEPLOYMENT OF PUBLICATION FILE FIELD OF THE INVENTION The present invention relates to a method and system for converting files from digital publications into digital information and the use of that information to generate a display in a computer system. The aspects of the invention relate to a system for displaying information and more particularly to an information deployment system which provides simultaneous display of a graphic representation of a printed publication, or part of a publication and text information that appears in the printed publication.

BACKGROUND OF THE INVENTION In today's society, particularly in the business community, you need to receive published information as quickly as possible. This is especially important for financial information. In this way, the desire to provide such information electronically has spread rapidly in recent years. In the United Kingdom, there is a number of news information providers delivered electronically for on-screen consumption or other means. These can be divided into a number of categories: a) an electronic text fed general and specific news articles and information where the single structure consists of headings that detail news category orders (for example, Press Association); b) an electronic text fed with news articles that are directed to sectors of the specific market (for example, Extel Finance); c) an electronic text fed (not in real time) that provides textual information contained in previously published material. This information provided record and file activity as a first facility (for example, FT Profile). The common component of these information provisions is their emphasis on editorial quantity, leaving editorial and sub-editorial functions to the consumer. Essentially, they are suppliers of a raw material to be used by the customer base as one of their ingredients for the production of their products, or as information for customers to filter and generate information for their own internal or external use. Thus, with this vast amount of provision of raw information of no relative importance appended to each of the individual news articles, the user is forced to filter through irrelevant and / or unimportant information to discover their requirements. In addition, in general the texts fed are specifically objective rather than subjective. A further disadvantage of this method of supplying information is that only text information can be provided. Although this text can be registered or processed, unlike a graphic or microfiche image of the publication, it contains less information than the publication. In particular, editorial information is lost. The above problems of the prior art information systems state the need for improvement. Specifically, there is a need for an information deployment system that can make use of information provided in publications such as newspapers and magazines in real time, thereby benefiting from the editorial experience of the editors. In addition, since a large amount of information can be obtained from the publishing scheme of the publication, the above need can be greatly improved by providing a simultaneous image of the actual publication together with the actual text in a clear and legible manner.

BRIEF DESCRIPTION OF THE INVENTION The present invention provides an on-screen information display system, which uses both graphic images of pages of a printed publication, as well as its text information. The present invention allows the simultaneous display of an image of the pages of a publication and text information. It is not enough to simply provide a legible image of the pages of the publication as this only provides a microfiche representation. While this allows the user to read the text, he does it at a representative level, which does not give him the general perspective. The user "the trees do not let him see the forest" is a realistic analogy. The purpose of providing a simultaneous image of the publication is to allow the user to interpret the editorial importance that has been attached to the articles, allowing the user in this way to benefit from the editorial experience of the editors, as well as having immediate access to the edited text. The present invention allows a user to select a passage of the text comprising an article or story on the displayed page of the publication wherein the system of the present invention will simultaneously display the text of the passage adjacent to the image of the complete page of the publication. the publication This allows the user to read the article clearly if desired. In view of the small size of the page image of the publication, the text is unclear and therefore, it is highly advantageous to provide a clear copy of the text separately. The provision of the text separately also allows for additional advantages of the present invention which include allowing identifying words, such as company names, to be clearly seen, for example highlighted. The present invention provides additional information in the identifying word, for example information of the company that is deployed, by selecting the identifying word. Additional information, for example company reports, can then be displayed simultaneously with the image on the publication page. An additional feature of the present invention is that a list of the contents of the pages of the publication can be displayed, where the content list of each page is displayed so that the text passages (articles or stories) are mentioned in the text. order of importance which can be attached to them in the form in which they are formatted on the publication page by publishers. Thus, the list of content for the publication provided by the present invention provides easy means for a user to identify the important passages in the publication. When a particular passage is identified which the user wants to read, it can be selected and the text is displayed along with the image of the publication page from which the text is taken. According to a first modality, a computerized method is provided to generate a display of information from an entry of publication files containing text, graphics and other information that can be seen as page images of a publication that has stories ( text passages) and graphic images that appear in it, consisting of the steps of: extracting text information from the publication files corresponding to stories that appear in the page images of the publication and keeping them as text information files; process page images of the publication files and keep them as page image files; map history areas for respective stories that appear on the page images and index each story area to a text information file corresponding to the passage of text in the story area and keep the story areas mapped as image map files; and generating a display on a page image computer system using the page image files and linking the stories in the story areas of the page images displayed in the corresponding text information using text information files and files. image maps. According to a second modality, a computerized method is provided to generate a display of information from an entry of publication files containing text, graphics and other information that can be viewed as page images of a publication that has stories ( text passages) and graphic images that appear in it, consisting of the steps of: extracting text information from the publication files corresponding to stories that appear in the page images of the publication and keeping them as information files of text; analyze the text information to find predetermined keywords that appear in it, index each keyword to a page number and a story number for the story corresponding to the passage of the text in which the keyword was found and keep the words key indexed in a list of keywords; process page images of the publication files and keep them as page image files; generate a display in a computer system of the list of keywords and display the page image containing the story in which the selected keyword appears when it is selected from the list of keywords. According to a third modality, a computerized method is provided to generate a display of information from an entry of publication files that contain text, graphics and information that can be seen as page images of a publication that has stories (passages). of text) and graphics images that appear in it, consisting of the steps of: extracting text information from the publication files corresponding to stories that appear in the page images in the publication and keeping them as text information files; process page images of the publication files and keep them as page image files; assign to each story that appears on a page of the publication a page number in which the story appears, and a history number classification corresponding to the relative importance of the story with other stories on the page; index the text information files to the page numbers and history number classifications for the corresponding stories that appear in the page images of the publication; generate a display on a computer system of a page image that uses the page image files and a side-by-side display of a list of history titles for the stories that appear on the displayed page sorted in the order of their classifications by assigned number of history.

DESCRIPTION OF THE DRAWINGS Figure 1 illustrates a sample embodiment of a system for implementing the present invention; Figure 2 illustrates a display generated during the operation of the system illustrated in Figure 1; Figure 3 illustrates another display generated by the system of Figure 1; Figure 4 illustrates an additional display generated by the system of Figure 1; Figure 5 illustrates another display generated by the system of Figure 1; Figure 6 illustrates an additional display generated by the system of Figure 1; Figure 7 illustrates another display generated by the system of Figure 1; Figure 8 describes a global process flow for converting a raw publisher data entry into a simultaneous text / image display of a publication; Figure 9a is a flow diagram illustrating a method for converting the publication files provided by the publisher into an information structure; Figure 9b is a flowchart of an example of an importance determining model for ordering a list of stories on a page through relative importance; Figure 10 illustrates the use of keyword lists to navigate in the text passage and page image containing the keywords; Figure 11 illustrates the use of image maps to navigate the text passage in a page image containing the article with image.

DETAILED DESCRIPTION OF THE INVENTION Referring to the drawings and initially to Figure 1, an exemplary embodiment of a system for implementing the present invention is illustrated. Information is received from the publisher in electronic form through the central storage and processing unit 10. Although it is highly desirable that the information be received from the publisher in electronic form, it is not essential to the principle of the present invention. Any means to provide images of the publication and text information separately will suffice. Within the central storage and processing unit 10, portions of each page of the image are defined which correlates the text passages and the defined portions are correlated with the text passages. A content list is then generated for the text passages by selecting the headings of each text passage and ordering them in order of importance which can be attached to each text passage by studying the image of the publication page. For example, when an article has the largest headline in a newspaper publication, this is clearly the most important story on that page. Similarly, if an article has the smallest header, this is the least important story on that page and is thus placed at the end of the content list for that page. Once the content list is generated, it is stored for later assimilation in the invention. A detailed description of an exemplary procedure is provided for the aforementioned in the section entitled "Ordering Textual Passages and Generating a Content List." The image received from the publishers or obtained from the publication may require virtual quality improvement. and therefore in one embodiment of the present invention, the received image is shaped to improve the definition and therefore make it clearer when it is displayed.A detailed description of an exemplary procedure is provided for the aforementioned in the section entitled continued "Virtual Image Page Quality Improvement." Within the text information, there will be certain words such as name companies, which can serve as identifying words for which the central storage and the processing unit 10 have additional information. which may be available to the user. , the text information that is received from the publisher is recorded and compared with known identifying words such as company names. Identified identifying words are then marked in the text and are also entered into an index which is stored for later assimilation in the invention. A detailed description of an exemplary procedure is provided for the aforementioned in the section entitled "Generation of a List of Identified Words". further, within the page images and information and text, there may be price information in the stock market of a variety of stock exchanges around the world, together with the price movement of those shares. These prices and price movements will be those that are maintained at the time of the publication of the newspaper. Within the central storage and processing unit 10 there is additional information about many securities companies, including the current real-time price of those shares. Therefore, the text information and images received from the publisher are recorded and the particular companies used within the identified publication and the additional information and price information in real time within the central storage and processing unit 10, can be available to the user when assimilated in the invention. A detailed description of an exemplary procedure is provided for the aforementioned in the section entitled "Identified Information Link with Additional Data" below. In this way, the information that is available in the central storage and processing unit 10 is a series of images of pages of the publication, the text corresponding to the articles or passages in the publication, whose text passages have been correlated with the portions particular in the image, a list of content of the text passages listed in order of importance for each page, an index of identifying word that identifies the words, for example, names of companies in the text for which additional information is available and additional information about identifying words, for example, company prospects or statistical information along with information on real-time stock prices and any other information about the companies within the newspaper's market price pages. The series of steps will be performed to compile this information for each publication. Thus, in the case of a newspaper for which there are several publications per day, this procedure must be carried out in each publication as quickly as possible in order to make the information available to users without delay. The central storage and processing unit 10 can then communicate the information stored using a communication link 20a and 20b to a single user or groups of users 30 such as a financial institution. In Figure 1, the communication link is a high speed ISDN telephone line. However, any form of communication such as internet, cable, satellite or radio can be used. Typically, in an institution such as that where a plurality of users requires information, each user will be provided with a personal computer or terminal 40 which is connected via a local area network (LAN) to a central processor which is for example , a file server 50 which receives the information via the communication link 20a and 20b of the central storage and processing unit 10. A detailed description of a specific implementation is provided for the aforementioned in the section entitled "Exit of Deployments" of Information System for Users ". Thus, each personal computer or terminal 40 has access to all available information from the central storage 40 and processing unit 10 at the remote location 60. The central storage and processing unit temporarily stores the information as a digital information structure before transmission in a remote location. memory and each personal computer or terminal 40 stores the information in a memory as a digital information structure at the time of reception. Each personal computer or terminal 40 comprises a central processing unit 41, a memory 42, a display 43 and an input device 44 such as a keyboard and / or a pointing device such as a mouse or trackball. Referring to figure 2, an image is described which is displayed when the modality of figure 1 is in operation. In one half of the screen there is a page preview of a page of the publication and the page number is indicated (page 33), as well as the title in the upper left part of the display. On the left side of the screen there is an expanded list of content for the pages of the publication listed by page number and for each page number the articles are listed in order of importance. The content list can be moved in vertical or horizontal direction and you can select the next and previous pages of the publication shown in the preview of the page on the right side of the display, although in this figure there is no previous page since the Publication does not have pages before page 33. Icons are provided at the top of the screen to allow the image on the previous page or on the next page to be displayed. Each icon is part of a control article, the other part is a link to a page image file of the next page or previous page of publication, respectively. Selecting an icon activates the control article which uses the link to retrieve and display the corresponding page image file. The display of the content list is chosen by selecting the content option in the upper part from the left of the display by moving the cursor and pressing the mouse button, that is "clicking" on that icon. This activates a control article and the link to the content list file is used to retrieve and display the content list. It is also possible to select an article on a page to be displayed by moving the cursor to indicate the article listed in the content and clicking on it. This will activate a control article and the text file will be retrieved through a link to display the text of the article in the left part of the display while in the right part of the display the image of the page in which the text appears will be displayed. Article. Referring to figure 3, in this display the article headed "German Animate By Actions and Bonuses" has been selected by moving the cursor to the portion of the image and clicking on it. The image is then highlighted by a colored border or indicated by a network explorer with an "active" cone, for example, a pointing finger like the one used in the Netscape Navigator (TM) browser, while in the On the left side of the screen, the text of the article is displayed. The displayed text can be moved in a vertical or horizontal direction in a conventional manner. In the upper left part of the screen cones are provided to allow the selection of either the previous or next story. Each icon is part of a control article, the other part is a link to the text file of the previous or next story, respectively. By selecting the icon, the control article is activated which uses the link to retrieve and display the corresponding text file. In the deployment of figure 3, there is no previous history since the selected story or article is the first of the publication. Within the history or article, references to companies may appear. When such references occur, they are highlighted in the text and a user can select to view additional information about that company by moving the cursor to the highlighted text acting as an identifying word and clicking on it. The highlighted text (word ¡dentificadora) acts as an icon and is part of a control article, the other part of the control article is a link for additional information. By clicking on the identifier word, the control article is activated and the recovery and display of additional information originates in at least the left part of the screen. That additional information may be for example, a company prospect or a company report. Figures 2 and 3 also show in a lower left part of the display that the "find" cone is available. After this, it is possible to enter a text string which the user wishes to find within the text of the publication, once the text string is entered in the string entry field and the "find" icon is activated. Once the text string is found within the text, the article in which it appears on the left side of the display is displayed along with the page on which it appears on the right side of the display. The text string within the article is highlighted. Deployment in this embodiment of the present invention is provided with the ability to select a company index. This is provided in the lower left corner of the screen as a "company" icon. This icon is part of a control article, the other part is a link to a company index. By selecting the icon, the control article is activated which uses the link to retrieve and display the company index. When the icon is selected, the display of figure 4 is generated. In figure 4, in the left half of the screen, an index of the companies referred to in the publication is provided. By moving the cursor to a particular company name and clicking on it, the text is displayed on the left side in which the first mention of the company name occurs and on the right side of the display the associated page of the company is displayed. publication. When there is a number of publications per day, the index of companies can indicate next to a company name the publication number during the day in which there is a mention of that company. This gives additional information about the number of times a company is mentioned in the publications throughout the day and thus gives an indication of the importance of the activities involved in that company. Figure 5 illustrates a display of financial information in the publication. On the publication page you can select the financial sector and under that sector you can display the financial information about the companies. The available financial information may exceed what is available in the publication as additional financial information is available and can be obtained from other sources and can be collected in the central storage and processing unit to make it available to users. Referring to Figure 6, this illustrates an additional display where the text on the left side of the display not only includes the names of the highlighted companies but includes a processed image which originates from the image portion in the preview from the page on the right side of the display under the heading "Good Relationships Stumble on Iraqi Shockwaves." The processed image of the graphic can be manipulated by the user. Additional information in addition to the one that is available from the publication, may be included in said processed images. Such additional information may be available from alternative sources and may be combined within the central storage and processing unit 10. Referring to FIG. 7, this illustrates an additional display where additional information is selected and displayed, in addition to the which is available in the publication. In the preview of the page on the right side of the display, there is an advertisement for a computer manufacturer. When the cursor is moved to this portion of the image and clicked, the additional information is displayed which consists of additional advertising information on the left side of the display. When the option to request additional information is selected, the software is changed from the current application to another application that contains the additional information required. Such additional information may have any form such as graphic information, textual and video, thus enabling the present invention to operate as a multimedia software system. Thus, the information display system of the present invention, by providing both a graphic image of a publication and a textual information, acts as a gateway through the publication in a broad set of additional information which may be available to the user via the central storage and processing unit 10. A specific implementation of the information deployment system of the present invention for a particular example of an electronic publication is described in detail below. For the described implementation, a general procedure for converting raw publisher input to simultaneous text / image display of a publication in Figure 8 is described. The conversion procedure includes the steps of extracting text information and related graphic images and process page images (indicated in blocks 81, 82, 83) from the raw entry of the publisher (80), generate a content list (84) and a list of company names (85), create the deployments simultaneous text / image of the publication (86,87,88,89), and provide the information deployment system as an output 4 for server / users in a network (90). It is understood that the invention is not limited to the described implementation and that it can be implemented in any equivalent manner using the described principles of the invention.

Sorting Text Passages and Generating a Content List The information display system requires two basic types of input, text information and page images of a publication. Typically, the publisher provides new information for a publication of a publication in electronic and digital form, for example, as publication files such as Quark XPress (TM) files or as PDF (Portable Document Format) files used in readers and paging systems offered by Adobe Systems, Inc. of Boston, Massachusetts. The text is extracted from Quark XPress (TM) or PDF files using the built-in functionality of the collating program and classified as information entries for storage and retrieval of a database as digital text information files. The text of each story has a corresponding digital text information file. Page images can be created from Quark XPress (TM) files by first producing EPS files (encapsulated Postscript), or from PDF files by first converting them to EPS files. Each page image is stored as a digital page image file. The processing of the publication files to create page image files, can be automated as described in the section "Improvement of Visual Quality of Page Images". The publication files are in an appropriate format to edit the publication document or print the publication document. The publication document consists of a number of pages, each of which contains one or more stories. Each story has at least one heading and a portion of text and may also have an associated figure. A representation of each page in the published document is produced from the publication files and stored as digital page image files as described in the section entitled "Visual Image Page Quality Enhancement". Each page image file is associated with the page of the publication in which it appears and can be used to reproduce the image of the page in a visual display unit. Each page image file can be a bitmap of a page of the publication. The publication files are also processed to extract from each page, the stories which are on that page and for each of those stories the header, text portion and any figure associated with that story. This extraction procedure can be achieved in three ways. According to a first method, the publication files contain additional format information, which identifies where each story is placed on each page and where each story begins and ends, where each story heading and the story begins and ends. font size of the text used within the header; where the body of the text that constitutes the story and the font size of the text starts and ends and where a figure associated with the story is placed on the page. This format information can not be observed in the image of the published document, but it describes or controls the format of the published document. A digital processor operates on digital publishing files to extract this additional format information and create information files for each story that includes: a header text file containing information that identifies at least the content of the header text and the size of font of the header; a story text file that contains the text of the story and information that identifies the font size of the text; a figure file containing sufficient information to reproduce a figure associated with the story, such as a bitmap image of the figure; a figure position file indicating where the associated figure is placed in relation to the text and a story position file indicating the limit of the story on the page image. A second method can be used in the absence of additional format information in the publication files. In this method, you can derive format information from the publication files by digital processing. The first stage is the determination of the number of separate stories on a page of the publication. This is achieved by using the format used to divide individual stories, which can be blank lines or margins, for example, to identify the limit of each story. Processing, after identifying the number of stories on the page, in turn takes each story and for each story produces information files that include: a story position file, a header text file, a text file of history, and if appropriate a figure file and a figure position file. The history position file is produced by identifying the limit of the story within the page image. The header text file is produced by identifying the text within the limit of the story which has the largest font size. The header text file stores the header text and information that identifies the font size. The processing can then assign any remaining text within the body of the story to the history text file and also store information that identifies the font size of that text. The processing can then identify figures within the history boundary and create a figure file containing a bitmap image of the figure and a figure position file that stores the information identifying where the figure was placed within the figure. the history. The processing then goes through the same process for each of the stories on the page and for each page within the publication. According to a third method, an operator creates the information files that include the header text file, the history text file, the history position file and any figure files and figure position files, when selected areas of the publication image displayed in a visual display unit using a cursor control device. The story limit on the page image is selected first and this information is stored in the history position file. The operator then selects the history header and creates a header text file, which stores text information and information that identifies the font size of the text. The operator then selects any figure in the story and the bitmap image of the figure stores a figure file with the positioning of the figure within the story that is stored within a figure position file. Digital processing is used to store the remaining text within the limit of the story which has not been previously selected and the information that identifies the font size of the text in a story text file. An information structure is now produced, which interleaves the different components of the publication that include the information files and the page image files. A RECORD is created for each story on each page. Each REGISTER has an individualized correspondence with a story on a page. Each REGISTER contains a number of fields, which associate the REGISTRY with the information files and page image files of its corresponding history. The first field is a POINTER for the corresponding text header file, the second field is a POINTER for the corresponding history text file, the third field is a POINTER for any figure file associated with the corresponding story, the fourth field is a POINTER for the figure position file associated with the corresponding story and the fifth field is a POINTER for the corresponding story history position file. Consequently, digital processing analyzes syntactically the publication in pages and from there in stories, and each story in its component articles such as headings, text portions and figures. This produces an information structure consisting of a plurality of information files, page image files and RECORDS which interlace the components of the publication and from which the publication can be recreated in different electronic formats. The RECORDS are now indexed. Each REGISTER is indexed by a page number (page no) and a history number (no history). For a particular REGISTER, the page number indicates the page of the publication in which the corresponding story appears and the story number, identifies the corresponding story among other stories on that page. Consequently, the combination of history number and page number only identifies each RECORD and its corresponding history. The story number (story no) is used not only to identify a story on a page, but is also used to indicate the importance of a story compared to other stories on a page. To the most important story on a page, you will be assigned a story number 1 with the value of the story number increasing as the importance of the story decreases. The history number can be assigned based on the operator's judgment or through digital processing. Each REGISTER contains fields that have POINTERS for information files that contain all the information associated with a story. Each of the RECORDS corresponding to the stories on a particular page, can be processed to determine the relative importance of the stories on that page. For each of the RECORDS, the processing has access to the associated header text file, the history text file and the history position file. From these files, the processing, in relation to each of the stories, can determine the positioning of the stories on the page in relation to each other, determine the header font sizes in relation to each other and determine the text font sizes of history in relation to each other. Based on this information, the processing can order the stories in relative importance. Generally, any history that continues from a previous page will be given the greatest relative importance and the remaining stories will be classified depending on the font size of their headings with any two stories that have the same font size for the headings that are differentiated with based on the position of the story within the page and the font size of the text in the body of the story. It will be appreciated that the model used to measure the relative importance of the different types of format information will depend on the particular editorial style of the publication and that a different model with different weights applied to different types of format information can be used for different formats. publications. A flow chart of an example of a suitable model to determine the relative importance of a story within a page and to create a list of stories on a page sorted in terms of their relative importance, Figure 9b is shown. The procedure for creating the information structure includes extraction of information files and page image files and creation of records as illustrated in Figure 9a and steps 81, 82, 83 of Figure 8. Once all the stories have been indexed through REGISTERS, the information structure is processed by digital processing to produce output files, or by an output signal which can be used by an end user to access the information stored within the structure of information and therefore, within the publication and display that information in a visual display unit (VDU). The final user will be able to visualize via the image files of the page, precise representations of the pages of the publication. The end user will also be able to visualize the text of each story in a clear way through the history text files. In addition to the VDU, which is being used by the end user, it will have a series of icons on the screen which can be selected using a pointing device, such as a mouse. If an icon is selected, the end user can navigate through the publication. According to one example, digital processing processes the information structure and produces output in an HTML format suitable for use in an end-user browser software such as Netscape Navigator (TM) or Internet Explorer (TM). The processing of the information structure transforms the information structure into a code which, in an end-user machine, produces an electronic publication having controllable items of control. The control articles comprise a visual symbol in the VDU of the end-user machine, such as a word icon and a link from the visual symbol to other information. In HTML this can be achieved by creating an anchor and a hyperlink. By activating the visual symbol using a pointing device, the other information is accessed and allows its deployment in the VDU. Consequently, when an end user loads the code into a computer, a display is produced as illustrated in figures 2 to 7 that have a page preview produced from the page image files, a portion of clear text produced from the information files and a number of icons to navigate through the publication produced by the processing of the information structure. These icons include previous / next story icons, previous / next page icons, and a content cone. The previous / next page icon allows the end user to move through the pages of the publication. If the next page icon is selected, the page image file associated with the page that follows the one currently displayed is loaded to be viewed by the user. If the previous page icon is selected, the page image file associated with the previous page of the publication is loaded to be viewed by the end user. The previous / next story icons allow the end user to navigate through the stories on a particular page. When you select the following story icon, the history with the lowest or next level of importance on the page is displayed in clear text format. This is equivalent to accessing the history corresponding to a REGISTER that has the same number of pages but with a history number higher than the REGISTER corresponding to the story currently displayed on the screen. When selecting the previous history icon, the history with the highest or next level of importance on the page is displayed in clear text format. This is equivalent to accessing the history corresponding to the REGISTER that has the same number of pages but has a history number lower than the history number of the REGISTER corresponding to the story that is currently displayed. The selection of the table of contents icon displays an ordered list of titles equivalent to sorting the RECORDS first according to their associated page number, ordering those RECORDS with the same page number according to their history number and then having access to through the first field of each REGISTER to the header text file for each story and display a list of headings in the same order as the RECORDS. Consequently, a table of contents can be produced, as illustrated in Figure 2, which illustrates the titles for each page of the publication, ordered according to their relative importance. Each title on the table of contents page is an anchor for an interactive link to the story in a clear text format and / or a page view format. When a particular page image file is loaded, it is possible for the end user to place the cursor over a particular story in the page image and select that story. The history text file associated with the selected story will be uploaded and the story will be displayed in a clear text format as illustrated in Figure 3. When the selection is made, the page number associated with the file is known. page image that is currently displayed and the location of the cursor within the page image is known when the selection is made. The display of the selected history is equivalent to examining the RECORDS to select the one associated with the correct page number and which has a POINTER in its fifth field that indicates the history position file that defines the area in which it is placed. the cursor when the selection is made and that displays the text information and other information of the REGISTER selected in the VDU. Once the text information has been extracted and a history and page number have been assigned to the text passages, a list of the content of the publication and a series of links of each of the entries in the list can be generated of content in the corresponding page image file, using digital processing. Each entry in the list is part of a control article, the other part is a link to the page image file that represents the page on which the header entry appears. The selection of an entry in the content list by pointing and clicking on the header entry activates the control article which uses the link to retrieve and display the image of the page on which the header appears. At the exit of the information display system, a content list can be requested to be displayed on the left side of the screen simultaneously with a page image on the right side of the screen (see figure 2), to act as a guide for users on the content of the current page and on the previous and next pages of the publication. Each entry in the list is a selectable "icon" that forms a link to the page on which the entry appears. As shown in Figure 11, the text passages (text stories) are linked by IMx image maps in the A-i points of the corresponding stories in the page images. Each of the A-i points of the story areas in the page images is similar to an icon, since it is part of a control article, the other part is a link to the corresponding text passage (story). When clicking on a history area, a control article is activated which uses the link to retrieve and display the text file corresponding to the history in the selected history area. At the exit of the information display system, it can be requested that the text passage be displayed on the left side of the screen simultaneously with the display of the page image with the story highlighted on the right side of the screen ( see figure 3), to allow users to visualize the text in detail and interact with any link in it, along with the contextual and editorial keys provided by the page image. Image maps are used to make the history area act as a selectable icon or button, that is, a user asks for a passage of text by clicking on the history area and the text passage is retrieved in response to its link in The associated image map for the page image. Image maps can be created using for example mapping software, such as the so-called Web Map, which is available as shared programmatic and stored as digital image map files. Typically, a rectangle or other formula is spread over the processed image through an operator which links the pixels within the mold form of the image map with a page number and a history number in the database. This can be done by indexing the text file to the pixel group using a corresponding file denominator conversion, for example, a suffix for "P1S2" for the text file corresponding to the article area delineated on page # 1 as story # 2 . The text files are read in the database, which stores the coordinates of the pixels contained in the map file with the record for this story. This is done using the file name to identify the corresponding record in the database, in that case, the text record for page # 1, story # 2. A field in the database is updated to contain the indexed information.

The procedure for converting input information may include the extraction of other figures and graphics that appear in the page images, which are related to the text passages, or comic strips, advertisements and other graphics, which may be advisable to deploy in your own right simultaneously with the page images. Graphics images are extracted from EPS or PDF files into individual graphics files using standard graphics editing tools, for example the Adobe Illustrator (TM) system. Graphics related to the stories, such as a character photo from a story or a columnist's note, are indexed in the database to stories by page numbers and story numbers. In addition to extracting the Postscript images in the manner described above, sufficient quality can also be obtained by using "screen-over-printer dumps" of the same collating files and separating the bit-mapped components. This can be achieved, for example, using the Adobe Photoshop (TM) system. The independent graphics can be linked in their points in the page images using a control article and the image mapping described above. In the information system deployment output, the graphic images related to the story that appear in the page images can be retrieved, manipulated and displayed on the left side of the screen in a window adjacent to the text passage (see figure 6). ). You can request that independent graphics images be displayed on the left side of the screen through a mouse click, or they can be used to stimulate an external recovery procedure that results in the display of a linked graphics file, such as an advertisement (the "Dell" logo linked to the advertisement in Figure 7), an externally retrieved output (the stock development chart updated in Figure 5) or the display of an associated text passage. The extracted text information, content list, image maps and extracted graphics images are stored in the database together with the processed page images. The database thus contains an ordered, structured and transformed version of all the related graphic and text components linked in their positions in the page images.

Generation of a List of Identified Words A list of important identifier words that appear on the publication pages can also be generated from the extracted text information. Important identifying words may include the names of companies, important persons, known products, media programs, etc., which are reported in the publication. At the exit of the information deployment system, a list of company names reported in the publication can be requested to be displayed on the left side of the screen (see figure 4). A click or entry of a selected company name will result in a display of the page image and highlighted story in which the company name appears on the right side of the screen and the corresponding text passage on the left side of the screen (see figure 6). Similarly, the display of a text passage with the names of important companies highlighted in it allows the user to click on the highlighted name or word and request another display of additional information about the company. Keywords are often designated in the text by the publisher, for example using specialized type fonts such as bold font for company names or using italics for author names or publication references. This designation in the text constitutes the format information and provides a convenient way to identify keywords from the publisher's entry. For example, the names of companies in the entry coming from the publisher, can be highlighted by means of bold font type labels. In this way, a list of company names can be generated using digital processing to analyze the digital text information syntactically and extract the names delimited by the bold labels in a company index file. The names of companies in the list are then indexed in the number of pages and history numbers where they appear in the page images, as well as by their positions in text as delimited by the bold labels in the text passages. Each keyword text position is consequently indexed to a page number and a history number and a link is formed between the text position and the story (text and / or image) in which the keyword appears. The text and link position form a control article in which it is activated by clicking on the text position. The activation of the control article causes the history to be recovered and displayed. By indexing for their text positions, company names are allowed to be highlighted in text displays and defined as control items that have anchors for interactive links for additional information during the system creation stage. The resulting company index file is stored in the database for the simultaneous text / image information display system. You can also add company names to a list of company names library, which is collected over time. In this way the extensive lists of keywords can be developed and can be used for alternative methods of automated keyword parsing. Another alternative method for generating the list of important identifying words (keywords) is to use digital processing to register text strings in the extracted digital text information, which corresponds to the stored library list entries of names of known companies, name of important people, product names, media names, etc. Library lists can be updated from processed electronic files and / or through manual entry of an operator when a new keyword is recognized. When important identifying words are identified in the text passages, digital processing adds the names to the keyword list, indexes the names in their page numbers and history numbers and the positions of the words in the text passages and creates a link between the name and the story in which the name appears. Also the keywords can be manually added to the keyword list through the operator. Keyword lists serve as a powerful navigation method for the stories covered in the simultaneous text / image information display system. Figure 10 illustrates the selection of a keyword in a list of keywords to navigate to the passage of text and page image containing the keywords. Image maps also provide the ability to navigate between stories on a page and request the corresponding text passages by clicking on the mapped areas of the stories.

Improved Visual Quality of Page Images Along with the above, the page images are processed by processing the encapsulated postscript files from the publisher's input files to form page images mapped with 72 dpi bits or any other resolution suitable for the means of exit that is intended. The Internet, for example, generally requires images in GIF (Graphic Interchange Format) where file sizes are minimized to improve download speed. Optimized palettes are also used to minimize file size and increase visual quality. Image files are manipulated using bitmap processing software, such as Adobe Photoshop (TM) or Debabaliser (TM) software, to produce page images that are visually enhanced and / or compressed information that is reasonably acceptable and quality. small in file size. Scripts can be written to batch process EPS or PDF files into appropriate page image files in a fully automated procedure. These scripts list a series of routines commonly used in image manipulation software such as Photoshop (TM) or Debabaliser (TM) software.

Identified Information Link with Additional Data Additional information is stored in the database by indexing its original page number / history. The "regular" features, for example, where the same page / story is always written by the same author and can include a figure of the author, can be added automatically as a defect in the database. Others are identified by an operator which can use a call menu in memory of regular features or can insert the name manually. The naming convention of "P1S2G3" can typically be used by indicating graphics # 3 connected to page # 1, story # 2. A typical figure or graphic image may appear somewhere before the main body of the text. Its position is indicated in the database by a number which instructs the database to remove the link to this article after the duly numbered text article. When it is desired that a figure / graphic element be present within the main body of the text, a convention of "[n]" (number inside the brackets) may be used for the number of graphics to instruct the stage of exit of the database to replace this sequence with the proper longer form of the graphics name. This is designed to avoid operator error by wrongly typing longer sequences of characters than are necessary in these manual operations. Links to external information sources, that is, external to the original publication, are typically achieved by linking to a predetermined series of hooks in the database. For example, an action price can be obtained for a company identified by the keyword indexing procedure using the company's official name or stock exchange symbol stored in the database. After searching for the unique identifier name in the database, the system performs a search procedure for stock price with an external information source and returns the action price recovered for use in the deployment system.

User Information System Deployment Exit When the conversion of the publisher's input information into the database has been completed, a software routine in the deployment system creates a sequence of files containing the desired sequence and style of deployments, links in both internal and external training and other interactive functions for the information deployment system, as illustrated in Annex 1. Links between history areas and related graphic images, text passages, keywords in the text passages and image maps for the page images defined in the information conversion stage are used to define display buttons, highlighted stories, highlighted words and linked displays in the deployment creation stage. In accordance with the invention, in Figures 2-6 examples of text passages are displayed simultaneously with the page images that provide context keys for the text passages to the user. The resulting processed files constitute a digital information structure that can be viewed using network browser software, such as Netscape Navigator (TM) or Microsoft Internet Explorer (TM) on a file server running a server software, such as Novell ( TM) Netware, Appleshare (TM), or Windows NT (TM) Server. The digital information structure can also be uploaded to a network server running Netscape (TM) Server, Microsoft (TM) Information Server, or Apache (TM) server software. Once in the database, given the structuring of the information as described, the created files can be converted to a digital information structure in one of several possible formats and stored in a memory. The files that can be displayed in a network (digital information structure) are transmitted from the memory of the processing unit 10 to a server using appropriate transmission software, which first identifies files either as new or unmodified from a transmission previous. The files that can be viewed by network (digital information structure) are stored in a memory on the server. If the files are changed, they are compressed into a single file and transmitted on ISDN, PSTN, or on a leased line to the receiving server. The receiving server unpacks the compressed file into its components and copies them in a suitable place on the user's server. This method is used for efficiency when multiple destination types may be required. It does not matter if the user's server is a true file server or if it is a network server. In an information deployment system configured for an intranet as shown in Figure 1, each user is provided with a personal computer linked to a central file server to provide the necessary information. The information deployment system can also be configured as a server for the Internet to which a universe of users and server nodes can access. The information display system of the present invention can be modified and expanded in other ways. For example, since text information is extracted from the publisher's entry and maintained in the system's database, the text information can be easily examined by any search engine to find objective stories, names and references and to recover the publication pages that contain them. Information processed in the form of priority stories by importance and keyword lists can be used to help perform high-quality searches with high efficiency. In this way, published editions can be converted into an information resource that is completely accessible and that external users can search. The input processing of the publisher into the system output files and the creation of links between text passages, keywords, graphics and page images can be further developed for fully automated processing. Batch processing scripts can be developed to automatically extract text information, graphic images and keywords by generating image maps and updating system library files. The stories can be labeled in the database in such a way that the advertisements handled by the system, they will be changed as different stories as they are selected. This would allow the adaptation of advertising opportunities by associating different types of history with different advertisements. The processed information obtained in the present invention can also be used in other ways to provide additional advantages. For example, the image maps that define the story areas for the page images can be used with the original PDF files to provide the capability of enhanced features. The image map can be extended over the same PDF file and with a click allow the simultaneous display of a chosen text similar to the previously described deployment result. Additionally, the PDF file can retain inherent zoom capabilities in the file reader software. Clicking on the history area of an image map can be used to trigger an internal process, such as zooming in or out on a page view, or an external process, such as connecting to a related database. support information. It should be understood that the foregoing description of the present invention is illustrative only. Although some examples of the present invention have been described in detail, the principles of the present invention can be adapted to different variations without departing from the spirit of the invention.

LEGENDS OF FIGURES Figure 2: PREVIOUS HISTORY HISTORY NEXT CONTENT PREVIOUS PAGE NEXT PAGE PAGE PREVIEW PAGE NUMBER (33) COMMERCIAL DAY STANDARD VESPERTINO Page 33 • German ovation for stocks and bonds • £ 90m BAe concentrate on the Jetstream deal • The Japanese put £ 520m of faith in Scotland Page 35 • Everything looks good in Laporte • Exco in the money PAGE OF CONTENT (CLICK WITH THE MOUSE • The new acquisition of Guthrie's IN THE ARTICLE TO DEPLEGATE THE COMPLETE HISTORY AND PREVIEW OF PAGE TO THE RIGHT) Page 36/37 • New engine to reverse actions • How Al made his August with the lottery • Take your free color map and guide yourself today in the City Page 38 • High pressure zone on the Atlantic • Everything is possible for the technological revolution COMPANY (COMPANY INDEX) FIND (FIND TEXT CHAIN IN ANY HISTORY) FIND: CHAIN ENTRY FIELD.

Figure 3: PREVIOUS HISTORY NEXT HISTORY CONTENT PREVIOUS PAGE NEXT PAGE PAGE PREVIEW THE "BEARING" OF THE CURSOR ACTIVATES THE GREEN EDGE FOR SHOW THE AVAILABLE CONTENT, THE CLICK OF THE MOUSE GOES THROUGH THAT CONTENT AND THE FIELD DEPLOYS LEFT DISPLACEMENT PAGE NUMBER (33) VESPERTINO STANDARD COMMERCIAL DAY German ovation for stocks and bonds By Angus McCrone The stock and bond markets rebounded thanks to figures suggesting that one of the demons of 1994, the uncontrollable supply of German money, is coming back under control. Exceeding expectations, the figures for the supply of German M3 money helped the FT-SE 100 index of the main shares to rise modestly, after yesterday fell 41.8 points.

Figure 4: PREVIOUS HISTORY HISTORY NEXT CONTENT PREVIOUS PAGE NEXT PAGE PAGE PREVIEW PAGE NUMBER (34) COMMERCIAL DAY STANDARD VESPERTINO Companies referred to in this edition INDEX OF COMPANIES (THE CLICK OF THE MOUSE IN THE NAME OF THE COMPANY GOES TO THE FIRST REFERENCE TO THAT COMPANY IN THE EDITION. DEPLOYS PREVIEW OF PAGE TO THE RIGHT AND HISTORY, WITH REFERENCE OF COMPANY HIGHLIGHTED) IN THIS FIELD. (NAME OF COMPANY INSERTED IN FIELD "FIND" TO CONTINUE THE SEARCH).

Figure 5: PREVIOUS HISTORY NEXT HISTORY CONTENT PREVIOUS PAGE NEXT PAGE PAGE PREVIEW PAGE NUMBER (34) COMMERCIAL DAY STANDARD VESPERTINO Electrical Engineering and Metal LIST OF SECTORS (CLICK OF MOUSE IN THE FOOD, Hotels SECTOR DEPLOYS LIST OF SECTOR IN HEALTH, HOME A BOTTOM FIELD) Insurance Leisure Media Find FIND (NAME OF COMPANY IN ANY BOC SECTOR). RESULT DEPLOYED IN LOWER FIELD Figure 6: PREVIOUS HISTORY NEXT HISTORY CONTENT PREVIOUS PAGE NEXT PAGE PAGE PREVIEW PAGE NUMBER (39) STANDARD BUSINESS DAY VESPERTINO Stanley has also presented a negative trend in stocks, cutting its forecasted earnings to 46 p from 51 p per share for 1996 and 41 p from 55 p for 1997 for a forecast of sales decline of the anti-ulcer drug Zantac. They have cut their sales estimate to £ 2.056 billion from £ 2.278 billion for 1996 and go for £ 1.7 billion for 1997, against previous expectations of £ 2.2 billion.

Figure 7: PREVIOUS HISTORY NEXT HISTORY CONTENTS PREVIOUS PAGE NEXT PAGE PAGE PREVIEW PAGE NUMBER (40) STANDARD COMMERCIAL DAY VESPERTINO QUESTION: "WHERE DEVIL CAN I MAKE A PC WITHOUT AN EYE OF MY FACE?" ANSWER

Claims

NOVELTY OF THE INVENTION CLAIMS

1. - A computerized system for generating an information display from an entry of the publication files containing a text and non-text material in a plurality of content areas that is observed in page images of a publication, by means of extraction text data (block 81, figure 8) from the publication files corresponding to the text material appearing in the content areas containing the text of the publication page images, and processing the page images (block 83) , figure 8) from the publication files as page image data, characterized in that the improvement comprises: a module / component operated by computer to identify the content areas containing the text material as individual fields in the page images ( figure 11); a module / component operated by computer to index each individual field of a content area containing text to the extracted text data corresponding to the text material in the content area (figure 11), said indexing step generates the data of content mapping; a module / component operated by computer to generate a display on a computer system of a page image of the publication of the page image data; a module / component operated by computer to receive an input from a user observing the display of the page image to select a particular field from a content area containing text that appears in the screen image; a computer-operated module / component for using said content mapping data to retrieve the extracted text data corresponding to the text material in the content area containing text (block 87 figure 8), and a module / component operated by computer to display the extracted text data as a readable display of text simultaneously with the display of the page image (figure 3), where the user is able to navigate in each page image of the publication and select a particular field for display the text contained in it simultaneously in a readable form when observing the page image for contextual signals from the graphic presentation form of the content area containing the corresponding text.

2. A computerized system for generating an information display from an entry of publication files containing the text and non-text material in a plurality of content areas that are observed in page images of a publication, by means of the extraction of text data from the publication files corresponding to the text material that appears in the content areas containing text of the page images of the publication (block 81, figure 8), and processing the page images of the publication files as page image data (block 83, figure 8), wherein the improvement comprises: a module / component operated by computer to maintain a library list of predetermined keywords that have potential meaning for users; a module / component operated by computer to grammatically analyze the extracted text data to find any relationship with the predetermined keywords from said library that appears in it; a computer-operated module / component for indexing the text data containing each keyword related to a respective page number and content area corresponding to the text material in which the related keyword is located, said indexing step generating a list of keywords and associated mapping data (block 85, figure 8); a module / component operated by computer to generate a display in a computer system of the list of keywords (figure 4); a module / component operated by computer to receive a user input to select a keyword from the keyword list; and using the keyword list mapping data to retrieve the extracted text data corresponding to the text material containing the selected keywords and / or the page image of the publication containing the content area in which it is stored. the selected keyword appears (figure 10).

3. A computerized system for generating an information display from an entry of publication files containing text and non-text material in a plurality of content areas that are observed in page images of a publication, by means of extraction of the text data from the publication files corresponding to the text material that appears in the content areas containing text of the page images of the publication (block 81, figure 8), and the processing of the page images of the publication files as page image data (block 83, figure 8), wherein the improvement comprises: a computer-operated module / component to assign to each content area containing text that appears on a page image of the publication a page number and a content area number that corresponds to a range of the relative importance of the content area to other areas of e contained in the page image (Figure 9b); a computer-operated module / component to index the extracted text data to the page numbers and numbers of content areas corresponding to the content areas that contain text in the page images of the publication in which the material is displayed. respective text, said indexing step generates a list of content and associated mapping data (block 84, figure 8); a module / component operated by computer to generate a display in a computer system of the content list (figure 2); a module / component operated by computer to receive an entry from a user to select a particular content area containing text from the content list; a computer-operated module / component for using said content list mapping data to retrieve the extracted text data corresponding to the text material in the content area containing text and / or the page image of the publication containing to the content area the selected text material appears (figure 2).

4. The system according to claim 2, further characterized in that the related keywords are indexed automatically when carrying out a text string search of extracted text data based on the text string entries contained in a list of library keywords.

5. The system according to claim 3, further characterized in that the range of the relative importance of the content areas containing text is based on any of the following groups of importance indicators: location of the content area of the image of page; size of the font type of a header associated with the content area; size of font type associated with the text material in the content area; and the extension of the text material in the content area.

6. The system according to claim 2 or 3, further characterized in that said keyword list mapping data or content list is used to generate a simultaneous display of the text material corresponding to the content areas that contain text along with the page images on which the text material appears. 7 '.- The system according to any of claims 1 to 3, further characterized by comprising the step of mapping areas of graphics content containing respective graphic images that appear in the page images and a module / component operated by computer to index each content area of mapped maps to a number of pages of the page on which a page number appears and a graphic content area number as content mapping data. 8. The system according to claim 7, further characterized in that the content mapping data is used to retrieve the stored or externally obtained data related to the contents of the respective content areas when the respective content areas are selected by the user. SUMMARY OF THE INVENTION A computerized information display system extracts text information, keyword lists, history rankings in order of story importance, and image maps that identify the location of stories from a publisher's journal entry; the system can generate a simultaneous display of a page image in which a story appears side by side with the text for the story when a particular story is selected, so that a user can read the text at the same time refers to the page image for visual cues on the passage of text; the user can select a story from an unfolded list of stories classified in order of importance in relation to other stories that appear on a page; the classifications of history are derived based on the comparison of one or more indicators of importance of history: location of the story on the page; font type size of a header associated with the history; font type size associated with the history text and size of the text content for the story; the user can navigate to the text through a story on a page displayed by clicking on the history area on the page that is linked by image maps to the corresponding text passage; the user can also navigate to a passage of text and page image by clicking on a keyword derived from a list of keywords extracted from the text entry of the publisher; this contextualized computerized deployment and image navigation tools allow the user a highly interactive experience with the publication; they allow a publication to be converted frequently into an electronically viewable form, for example, several times a day and in a manner more friendly to the user than the original printed copy. MC / VM / osu * eos * sff * kra * avc * yac * P00 / 564F