WO2001057786A1 - Conversion automatique de documents statiques en documents dynamiques - Google Patents

Conversion automatique de documents statiques en documents dynamiques Download PDF

Info

Publication number
WO2001057786A1
WO2001057786A1 PCT/US2001/003557 US0103557W WO0157786A1 WO 2001057786 A1 WO2001057786 A1 WO 2001057786A1 US 0103557 W US0103557 W US 0103557W WO 0157786 A1 WO0157786 A1 WO 0157786A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
component
static
static document
documents
Prior art date
Application number
PCT/US2001/003557
Other languages
English (en)
Inventor
Su Chen
Hong Dong
Siamak Khoubyari
Original Assignee
Scansoft, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Scansoft, Inc. filed Critical Scansoft, Inc.
Priority to EP01908813A priority Critical patent/EP1252603A1/fr
Publication of WO2001057786A1 publication Critical patent/WO2001057786A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/00127Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture
    • H04N1/00326Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a data reading, recognizing or recording apparatus, e.g. with a bar-code apparatus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/00127Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture
    • H04N1/00326Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a data reading, recognizing or recording apparatus, e.g. with a bar-code apparatus
    • H04N1/00328Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a data reading, recognizing or recording apparatus, e.g. with a bar-code apparatus with an apparatus processing optically-read information
    • H04N1/00331Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a data reading, recognizing or recording apparatus, e.g. with a bar-code apparatus with an apparatus processing optically-read information with an apparatus performing optical character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/00127Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture
    • H04N1/00204Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a digital computer or a digital computer system, e.g. an internet server

Definitions

  • the present invention relates generally to document image processing. More specifically, the present invention relates to pattern recognition, logical structure recognition, image understanding, and multi-modal document presentation.
  • OCR Optical Character Recognition
  • textual processing deals with the text components of a document image.
  • Textual processing typically determines the skew, or tilt at which the document may have been scanned into the computer and finds columns, paragraphs, text lines, and words.
  • Graphics processing deals with graphics of a document image, or non-textual components determining line size, thickness, and direction, and the existence of corners and curves.
  • the contents of a document which has been scanned to put it in electronic form are generally static in nature. More particularly, when the document is first scanned, the result is a monolithic image.
  • the contents of the document are divided into individual elements, e.g. words, lines, etc., that can be edited by the user.
  • these elements are independent of one another, and hence the document retains its static quality.
  • markup languages The document is considered to be "dynamic" in the sense that various components of the documents interact with one another through "metadata, " which is essentially information that describes the components of the data and their relationships. For instance, a user can activate an embedded link at one portion of the document, such as an entry in a table of contents, to immediately view another portion of the document.
  • Markup language is a type of language in which dynamic documents are created using metadata.
  • the increasing popularity of the Internet has accelerated the desire for documents in such a format, since it permits users to easily navigate within a single document, as well among multiple documents, and is able to provide functionality within documents such as animation, for example.
  • HTML Hypertext Markup Language
  • GUIs Graphical User Interfaces
  • HTML editors are available that allow a user to design, create, and publish a web page in much the same manner that they are familiar with using a word processing program.
  • Some popular HTML editors include Front Page by Microsoft Corporation, Netscape Navigator ® Gold by Netscape Inc., HoTMetaL Pro ® by SoftQuad, and Hot Dog by Anna Wave.
  • HTML code HTML code
  • users may incorporate much of the functionality of HTML code without specific knowledge of the code itself. For example, users may create documents with a hierarchical structure, create tables, import graphics, or create hyperlinks by using the proper commands provided within the GUI.
  • HTML editors incorporate much functionality, presently available HTML editors do not automatically recognize the structure of a pre-existing static document and create a correspondingly-structured HTML document. To the contrary, presently available HTML editors require the user to either input the structure, or create it. For example, in the creation of tables in common HTML editors, the user must manually create the table. In the case of hierarchical structure, the user must indicate the heading level of the text being typed within the HTML editor, for example, heading level one, heading level two, and so on.
  • HTML editors allow a user to import graphics from an external device such as a scanner
  • most of these applications do not have a technique to recognize textual characters, or logical structure.
  • Typical OCR applications while providing a technique for recognizing textual characters, do not provide a method for recognizing document structure.
  • Applications have recently been developed which recognize some structure within documents. For example, one technique uses template-based publishing to allow optimization between hard copies and electronic copies of the same document. By this technique, similar templates are used for both types of documents, however, differences which allow for optimization in each of the respective media in which they will be published are provided.
  • font information in an electronic document template may include information such as color
  • document information in a hard copy, printed version, template may not provide color information if the printer used to print the document is not able to print in color.
  • each of the templates may be optimized for viewing within each of the documents.
  • the template used to print a hard copy of a document will maintain an aspect ratio corresponding to the aspect ratio of the paper size on which it is to be printed.
  • the template used to create an electronic document will be optimized for the aspect ratio and color combination of the screen on which it is to be displayed.
  • Further modifications to the electronic template may make a document HTML compatible.
  • a font style field in a hard copy template may be modified to include hypertext linking information from the text displayed within that field. In this manner, various text field information may be mapped to different text field information for different types of document displays.
  • template-based electronic publishing is that it works well only when the source documents follow a consistent structure.
  • the ODA also utilizes logical structure, which derives the hierarchy of a document resulting from breaking a document down into the sections intended by the original author. Examples of this include paragraphs, distinctions between normal and emphasized text, grouping of figures with captions, position of information within tables, and address information.
  • Another structure analyzed with the ODA is referencing structure, which provides information for hypertext links. This type of information includes information such as table of contents information, reference to figure numbers within the text of a document, reference to various sections of text within the text, and references to network addresses. In this manner, ODA may provide various linking relationships between different pages of a document or different sections of a page.
  • a dynamic document such as an HTML document
  • a dynamic document may be created from a scanned or other static document image by a user without the user having any specific knowledge as to the structure of the document, and allowing for an automated process, which is transparent to the user, to create the dynamic document without any knowledge of the rules for structuring the document.
  • a system and method that allows a user to scan or otherwise load a static document image, partition the document into zones corresponding to various types of information within a document image, perform optical character recognition (OCR) on the document image, outline the document image by performing logical structure recognition on the structure of the document image, and export a converted document in which structural and hierarchical elements have been recognized.
  • OCR optical character recognition
  • Figure 1 is an exemplary computer system in which the system and method of the present invention may be employed.
  • Figure 2 is a flow chart of a method performed by the present invention, according to one embodiment.
  • Figure 3 is a flow chart of a method performed by a logical structure recognition component, in accordance with one embodiment of the present invention.
  • Figure 4 is a flow chart of the operation of a logical component identification engine, in accordance with one embodiment of the present invention.
  • Figure 5 is a flow chart of the operation of a heading level assignment component, in accordance with one embodiment of the present invention.
  • Figure 6 is a flow chart of the operation of a cross-reference identification component, in accordance with one embodiment of the present invention.
  • Figure 7 is an illustration of a first example of the user interface of one embodiment of the present invention.
  • Figure 8 is an illustration of another example of the user interface of one embodiment of the present invention.
  • Figure 9 is an illustration of a further example of the user interface of one embodiment of the present invention.
  • Figure 10 is an illustration of an additional example of the user interface of one embodiment of the present invention.
  • Figure 11 is an illustration of a first portion of an HTML document, displayed in a web browser, rendered by one embodiment of the present invention.
  • Figure 12 is an illustration of a second portion of an HTML document, displayed in a web browser, rendered by one embodiment of the present invention.
  • Figure 13 is an illustration of a third portion of an HTML document, displayed in a web browser, rendered by one embodiment of the present invention.
  • FIG. 1 An exemplary computer system of the type in which the present invention can be employed is illustrated in block diagram form in Figure 1.
  • the structure of the computer itself does not form part of the present invention. It is briefly described here for subsequent understanding of the manner in which the features of the invention cooperate with the structure of the computer.
  • the computer system includes a computer 100 having a variety of peripheral devices 108 connected thereto.
  • the computer 100 includes a central processing unit 112, a main memory which is typically implemented in the form of a random access memory (RAM)118, a static memory that can comprise a read only memory (ROM) 120, and a permanent storage device, such as a magnetic or optical disk 122.
  • the CPU 112 communicates with each of these forms of memory through an internal bus 114.
  • the peripheral devices 108 include a data entry device, such as a keyboard 124, and a pointing or cursor control device 102, such as a mouse, trackball, or the like.
  • a display device 104 such as CRT monitor or an LCD screen, provides a visual display of the information that is being processed within the computer, such as the contents of a document.
  • a hard copy of this information can be provided through printer 106, or similar device.
  • Hard copies of documents from other sources can be scanned into the computer to form a document image by way of a scanning device 107, such as a scanner or digital camera.
  • a static document e.g. , MS Word file, WordPerfect file, PostScript document, PDF file, etc.
  • Each of these external peripheral devices communicates with CPU 112 by means of one or more input/output ports 110 on the computer.
  • the input/output ports 110 also allow the computer 100 to interact with an external network 128, such as a local area network (LAN) or wide area network (WAN), or the Internet 130.
  • LAN local area network
  • WAN wide area network
  • Computer 100 typically includes an operating system, which controls the allocation and usage of the hardware resources such as memory, central processing unit time, disc space, and peripheral devices. In addition to an operating system, computer 100 may also execute a variety of software applications, thereby adding functionality for the computer user. Computer software applications may reside as a component of the operating system, or may be stored in memory or on any type of machine readable medium, such as disk 122. The software application which performs the automatic conversion of static paper documents into dynamic documents is further described herein in connection with one of the preferred embodiments of the present invention.
  • the software application of the present invention performs a method which converts static documents into dynamic documents.
  • these dynamic documents comprise web pages, or HTML documents.
  • HTML documents web pages, or HTML documents.
  • the present invention need not be limited to the conversion of static documents into HTML documents, as a variety of dynamic documents may be rendered by the present invention using an automatic conversion from static documents.
  • the present invention may be utilized with a variety of document markup languages, some of which provide dynamic links to a user.
  • Extensible Markup Language XML
  • Biopolymer Markup Language BIOML
  • Standard Generalized Markup Language SGML
  • Forms Markup Language FML
  • Mathematic Markup Language MathML
  • Java Bio info rmatic Sequence Markup Language BSML
  • Dental Charting Markup Language DCML
  • Electronic Data Markup Language PML
  • Pattern Markup Language PML
  • Chemical Markup Language CML
  • Vector Markup Language VML
  • Java Speech Markup Language JSML
  • Drawing Markup Language DrawML
  • Real Estate Listing Markup Language RELML
  • Cold Fusion Markup Language CFML
  • TeX LaTeX.
  • FIG. 2 illustrates one embodiment of the present invention wherein static document images are converted to dynamic web pages, e.g. HTML documents.
  • static document images 200 are loaded into the computer's memory 118. Each image may comprise one page of a document, for example, in a bitmap form.
  • a document may originally exist on paper and be converted into electronic form by way of a peripheral scanner 107. Alternatively, the document images may already exist in electronic form and be loaded into the computer by way of a network 128, the Internet 130, or a disk 122.
  • an OCR engine 202 performs optical character recognition on the document image, recognizing known textual characters.
  • certain elements of scanned documents such as graphics, tables, and text styles may be recognized and utilized for rendering the overall OCR converted document. While multiple pages of static document images 200 may be loaded into the computer's memory 118, OCR functionality is generally performed on a page-per-page basis. It is contemplated that OCR functions may be carried out on multiple pages at one time.
  • OCR engine 202 Once the static document images 200 have been recognized by the OCR engine 202, they are passed to a buffering component 204, which stores the OCR information as separate pages.
  • a logical structure recognition (LSR) component 206 analyzes the buffered OCR information, recognizing logical components, heading levels, cross reference information, and various other document attributes in preparation for creating a dynamic document, such as a web page or HTML document.
  • LSR component 206 After the LSR component 206 has recognized the general structure of the buffered OCR information, a rendering component 208 subsequently renders a dynamic document utilizing the information from the OCR engine 202, the buffering component 204, the LSR component 206, and user style settings 207 to produce logically structured, hierarchical dynamic documents 210.
  • the input document could be a text file, or a combination of textual data and graphical data, rather than a bit-mapped image.
  • the OCR engine 202 could be replaced by a parser component.
  • This parser component could extract textual and graphical elements directly from electronic files of various formats (e.g., MS Word file; WordPerfect file, PostScript file, PDF file, etc.) and feed this information directly to a buffering component or an LSR component.
  • Figure 3 illustrates the general operation of the LSR component 206.
  • text and graphical information that has been buffered into pages by the buffering component 204 is analyzed by the LSR component 206 and is interpreted by a logical components identification engine 302.
  • This engine 302 classifies a physical paragraph, or grouping of symbols, into structural elements such as a title, heading, body text, header, footer, caption, table, or other element.
  • a basic unit of document structure is a physical paragraph, exceptions being graphics and tables which may contain multiple physical paragraphs. If the document undergoes OCR processing, the individual paragraphs can be identified by the OCR engine 202, and passed on to the engine 302.
  • a heading level assignment component 304 assigns a heading level to each heading paragraph identified by the logical component's identification engine 302.
  • a cross-reference identification component 306 identifies cross-references and potential dynamic links, such as email addresses and URL addresses. After the identification of logical components by the logical components identification engine 302, assignment of heading levels by the heading level assignment component 304, and cross-reference identification by the cross-reference identification component 306 are carried out.
  • a markup component 308 subsequently translates the original OCR information pages 300 into tagged OCR information pages 310, in preparation for rendering a dynamic document with the rendering component 208. The markup component 308 indicates the location and label of each of the detected logical components. In this manner, the LSR component 206 assigns a unique label to each basic unit of the OCR information pages, and tags the OCR information pages using a markup component 308 to produce tagged OCR information pages 310.
  • the present invention may be configured to allow a user to aid the LSR component 206 in recognizing document structure. This may be accomplished, for example, by a user indicating various preferences. Also, a user may prompt the recognition of various structural elements within a page by defining zones within the page which correspond to various elements. In this manner, a user may define various zones, such as rectangular-shaped zones, irregular-shaped zones, text zones, graphics zones, and table zones, and may define various columns or other formatting structural features to be recognized.
  • Figure 4 illustrates the operation of the logical components identification engine 302. This engine is essentially a maximum a posteriori (MAP) classifier.
  • MAP maximum a posteriori
  • This engine receives OCR information pages 300 and uses an attribute gathering component 402 to gather statistics about the document attributes, such as the font and paragraph formatting and style distributions.
  • a probability calibration component 404 uses the distributions gathered by the attribute gathering component 402 to calibrate the pre-defined probability distributions shown in equations 1 and 2 below.
  • Equations 1 and 2 above P represents a probability, and P, and P 2 represent specific probabilities expressed in Equations 1 and 2 respectively.
  • L represents a label of the paragraph under consideration.
  • A represents the i' h attribute of the paragraph under consideration, where i may range between 1 and some number, N, and D represents the given document. Examples of attributes which might be taken into account include style (e.g., bold, italic, underline), justification, font, character size, etc.
  • the probability P represents a conditional probability that a given paragraph attribute, A Ich exists given a paragraph label variable, L, and the document D.
  • probability P 2 represents a conditional probability that a given paragraph attribute A bland occurs given that it is within the given document D.
  • the probability P 3 shown in Equation 3 above describes the conditional probability that the paragraph label, L, may occur within a document D. This value is then used to initialize the probabilities to be determined by the probability calibration component 404. Each of these probabilities can be determined empirically, e.g. by statistical analysis of a large number of sample documents.
  • the probability distributions, P P 2 , and P 3 , , from the calibration component 404 are used in a Bayesian classifier 406.
  • the Bayesian classifier 406 assigns a posterior probability to each of the possible paragraph labels given all of its observed attributes within the document, as shown below in Equation 4.
  • Equation 4 when simplified by assuming independence among paragraph attributes, yields Equation 5, shown below.
  • This classification provides an initial estimate of the posterior probabilities of the paragraph labels, without using the contextual paragraph attributes. It has the benefit of reducing the overall complexity of the calculations carried by the contextual knowledge integration component 408, which applies contextual knowledge experts to improve the initial posterior probabilities estimates.
  • the contextual knowledge experts applied by the contextual knowledge integration component 408 may include expert systems, intelligent agents, neural- networks, or other techniques devised to apply a set of contextual models to a document to decide where a particular element is intended to fit within the overall structure of a document.
  • Some of the contextual experts used by the contextual knowledge integration component 408 may include, but are not limited to: a header expert, a footer expert, a numbered list expert, a bulleted list expert, a consistency expert, a language-dependent expert, and various other experts.
  • the contextual knowledge integration component 408 produces a modified Bayesian probability estimate, or group of estimates 410.
  • the logical component's identification engine 302 may then process this data again in the probability calibration component 404, the Bayesian classifier 406, and the contextual knowledge integration component 408 to thereby provide revised, modified Bayesian probability estimates.
  • Repeating these functions to achieve revised, modified Bayesian probability estimates, as shown by branch 412 of Figure 4, may be carried out a number of times to provide increasing accuracy. In one embodiment of the present invention, however, one iteration is used in order to provide maximum accuracy with minimal computation power. However, one skilled in the art will recognize that multiple iterations may be performed as computing power allows, to provide ever- increasingly accurate modified Bayesian probability estimates.
  • a label assignment component 414 selects a paragraph label that maximizes the modified posterior probabilities. Once the label assignment component 414 has produced identified OCR information pages 416, the information is passed by the logical component's identification engine 302, to the heading level assignment component 304, as shown in Figure 3.
  • Figure 5 illustrates the general operation of the heading level assignment component 304, shown in Figure 3, which assigns a heading level to each heading paragraph. Any desired number of heading levels can be employed. In the case of HTML documents, for example, six levels are employed.
  • identified OCR information pages 416 from the label assignment component 414 are processed by an initial heading level assignment component 502, which sorts the heading paragraphs according to their importance in the document, based on the paragraph attributes. Once the paragraphs have been sorted, the initial heading level assignment component 502 assigns an initial heading level value. In the embodiment of the present invention wherein the dynamic documents produced 210 are web pages or HTML documents, the initial heading level value assigned by the initial heading level assignment component 502 may be larger than 6.
  • the heading level consolidation component 504 attempts to merge the initial heading level groupings by minimizing a cost function. This merging process continues until an optimal number of heading levels has been reached. In the case of HTML documents, an optimal number of heading levels is 6 or less.
  • the heading level consolidation component 504 outputs OCR information pages with heading levels, which is the output of the heading levels assignment component 304 of Figure 3.
  • the cross-reference identification component 306, also shown in Figure 3 detects cross-references within the pages.
  • Figure 6 shows the general operation of the cross-reference identification component 306, which is configured to detect links, such as email addresses and URL addresses, and find cross-references to figures, tables, and section headings.
  • a key word generator 602 processes OCR information pages with heading levels 506 to generate a cross- reference keyword candidate list by determining locations of predefined keywords within the document text. These predefined keywords may correspond to a particular language being used to produce a document. For example, the symbol "@" may be used to determine that an email address is present within a document.
  • a pair establishment component 604 Upon generation of a cross-reference keyword candidate list by the keyword generator 602, a pair establishment component 604 establishes pairs between sources and destinations, by attempting to match keywords and identify the types of references they represent, for example, email addresses, URL addresses, and the like, and identify the source and destination for each cross-reference. Upon establishing source-destination pairs, the pair establishment component 604 produces cross-referenced OCR information pages 606. These cross-referenced OCR information pages 606 are subsequently analyzed by the markup component 308 of Figure 3, which tags the pages to produce tagged OCR information pages 310.
  • the rendering component 208 is then able to produce dynamic documents, such as web pages or HTML documents 210, based on user defined style settings or pre-defined themes.
  • the rendering component 208 may employ a multi-modal document presentation.
  • this multi-modal presentation provides the user with the ability to view an HTML version of the original document in hypertext format, and the original document image. In this manner, the user can navigate between these views by way of hypertext links and image map files, for example, retaining all of the original information contained within the original document.
  • the image map identifies the particular page, or portion of a page, on which a structural element appears, to thereby retrieve the appropriate image file when the user clicks on a structural element in the HTML view. Any information which may have potentially been lost as a result of, OCR functions, LSR functions, or in any other manner, will still be retained within the original image.
  • Figure 7 is an illustration of a user interface that can be employed in one embodiment of the present invention.
  • a window 700 contains various on-screen buttons for minimizing, resizing and maximizing the window, scroll bars to scroll within the documents displayed therein, tool bars 710, 712 and 714 whereupon various commands may be represented by on-screen buttons in a variety of forms, and a status bar 728 which is used to display the status of the document or documents being viewed.
  • the window 700 includes various sub- windows, or panes partitioned within the larger window 700. Thumbnail images of all of the pages within the document which have been loaded into the computer memory, either by scanning, from disk, or by some other method, are displayed within a left pane 702. The image of the page currently being displayed, in this case page one, which is highlighted in window 702, is displayed within sub-window 704. The overall hierarchical structure of the entire document, encompassing all of the pages displayed within sub- window 702, is shown within sub-window 706.
  • the dynamic document in this case an HTML document, is shown in sub-window 708.
  • the standard tool bar 710 allows a user to depress various on-screen buttons to accomplish functions within the application, such as beginning a new document, opening a file, saving the current document, printing the active window, or a variety of other functions.
  • the zone tool bar 712 allows a user to perform various functions relating to the computer's recognition of zones on a page. Using the zone tool bar 712, the user may, for example, draw zones within a scanned document to influence how the computer interprets the structure of the document. For example, the user may draw rectangular, irregular, row, or column zones, or may add to, subtract from, or reorder zones within the document image. The user may also define particular zone properties allowing for greater customization in their interpretation by the logical structure recognition component 206.
  • An Auto Web tool bar 714 contains a variety of different buttons, which allow for convenient access to functions contained within the application.
  • an Auto command button 716 allows a user to select between an Auto Web configuration or a web wizard configuration, wherein a user may be prompted by the application to input various data in order to achieve a particular configuration within a finished, dynamic document.
  • a load image button 718 allows a user to select between retrieving an image file from a disk, or capturing an image using a peripheral scanning device.
  • a zone command button 720 allows a user to select between a variety of different types of documents which he or she wishes to scan. For example, a user may select between single-column pages, multiple-column pages, spreadsheet pages, and mixed pages.
  • the user may provide the application with information concerning the general form of the document about to be loaded into the application, thereby providing a way for the computer to more effectively utilize contextual experts to determine the logical components of the document.
  • the OCR command button 722 allows the user to choose between performing OCR functions and performing OCR functions with proofreading capabilities.
  • the export command button 726 allows the user to choose between various saving and exporting options, such as saving a document in various file formats, saving the document in an HTML format and launching a browser in which to view the document, or deferring export of the document until a later time.
  • the status bar 728 provides the user with various information such as the page currently being viewed, the type of hierarchical structure currently being highlighted, and various other informational items.
  • thumbnail images of all of the pages of the document are displayed. For example, a thumbnail of the first page 730 is displayed, with a page indication 732 in the lower left-hand corner and a status indication icon 734 in the lower right-hand corner.
  • the status indication icon 734 changes as the page for which it is being displayed undergoes various operations.
  • the status indication icon 734 is in the form of eyeglasses representing that the document has been recognized using OCR functionality. This icon changes to represent, for example, that the document has been outlined, or separated into zones.
  • the title of the page, "HotFudge Business Plan”, 735 is displayed within sub- window 704, which shows the original scanned image of page one.
  • the second item 736 on the scanned image of page one contains address information, and authorship information.
  • Each of these elements 735, 736 are displayed in various forms within the other sub-windows.
  • the title 735, from the original document image shown in sub- window 704 is also displayed in sub- window 708, which shows the HTML version of the document as the HTML title 738.
  • the title 735, from the original document shown in sub-window 704 is displayed in sub- window 706, as the document title 740. This is true also for additional remaining information on this and other pages within the document.
  • address and authorship information 736 shown in sub-window 704 is also displayed in sub-window 706, wherein an email address is recognized and identified by an envelope icon and a URL is recognized as an Internet address and identified by a world icon.
  • Sub-window 706 illustrates the general structure of the overall document including each of the pages shown in sub-window 702. Various heading levels are shown spanning all pages of the document.
  • an outlying tool bar 742 is provided within sub-window 706. However, this tool bar could also be incorporated within the general tool bars of window 700. This tool bar provides the user with the ability to promote or demote various structural items by increasing or decreasing their priority within the document, or changes various items to headers and footers, delete certain items, and filter objects.
  • FIG. 8 is an illustration of a second example of the user interface of Figure 7.
  • window 800 a different page of the same document that is displayed within window 700 is shown.
  • window 800 comprises several sub-windows 802, 804, 806, and 808 within the larger overall window.
  • the overall view of the document, along with thumbnail images of each of the pages within the document, is shown within sub- window 802.
  • the original scanned document image, with outlining indications, is shown within sub- window 804.
  • the overall outline of the dynamic document in this case an HTML document, is shown within sub-window 806.
  • the rendered dynamic document in this case a web page or HTML document, is shown within sub- window 808.
  • Each of these sub-windows utilize scroll bars, on-screen buttons, and are adjustable in size to allow for convenience in manipulation by the user.
  • the view of the document image shown in sub-window 804 corresponds to page 7 of the overall document. This information can be ascertained by viewing sub- window 802, wherein the thumbnail image of page 7 of the document is highlighted. As before, the thumbnail images of the pages of the document indicate page number, and status, by way of a status icon. As can be seen in the thumbnail image 830 within sub- window 802, this page of the document contains text and a small image, which corresponds to the image shown in sub- window 804 near the top of the page.
  • the structure of the document, whose image is shown in sub- window 804, is outlined in sub- window 806. For example, a header 810 has been identified and indicated within sub-window 806 as the header 812. Near the text of the header 812 within the document outline shown in sub- window 806, is an icon that is made to represent a header within a page. As can be seen in sub-window 808, this header is rendered as a hypertext header 814.
  • sub-window 804 The structure of the document outlined in sub- window 806 is further organized, in that the main heading 816 on the document page, which is shown in sub-window 804, is assigned the highest heading level, HI, as shown by the representation of the title 818 within the document outline contained in sub- window 806, which contains an HI indicator next to it.
  • each subheading is assigned a lower priority, or lower heading level, within the same page.
  • the heading is rendered within the HTML document shown in sub-window 808 as the largest text 820 on this page of the HTML document.
  • Within the text contained immediately below the heading 816 of the section on page 7 of the document shown in sub-window 804 is a reference 822 to Figure 5.1.
  • This Figure 5.1 is also represented as a graphic 832 within the document outline contained in sub- window 806, which provides an indication that a graphic is contained within the page and indicates the page and paragraph number where it is located, in this case on page 7, in the second paragraph.
  • an icon representing a graphic is displayed, for ease in identification to a user.
  • This graphic, Figure 5.1, though not shown in sub-window 808, is also rendered within the HTML document, in a format germane to HTML documents, for example JPEG or GIF format.
  • subheading 834 "5.1 Domestic Packaging" shown within the document image contained in sub- window 804 is recognized as a subheading below the heading 816.
  • An indication 836 that this subheading is assigned heading level 2, H2 is shown within the document outline contained in sub-window 806. In this manner, the organization of the overall document is preserved.
  • the subheading 834 indicated by the organizational element 836 is rendered as a subheading within the HTML document shown in sub-window 808.
  • FIG 9 a further example of the user interface in accordance with the embodiment of Figure 7 is shown.
  • page 8 is the page currently being displayed, which is indicated by the highlighted thumbnail 930 of page 8.
  • the header 910 is indicated within the document outline shown in sub- window 906, as element 912. This header is created within the HTML document as an HTML header 914 displayed within sub-window 908.
  • the subheading 916 which is a subheading under the heading 816 shown in Figure 8 is assigned heading level 2, H2, as shown in conjunction with element
  • FIG. 918 displayed in the document outline of sub- window 906.
  • a table, 922 which has been recognized by the LSR component 206.
  • This table 922 is indicated within the document outline as element 924, which assigns the name of the first cell as a title, and displays an icon representing a table in sub-window 906.
  • the table is rendered within the HTML document as table 926, shown in sub-window 908. It will be recognized by those skilled in the art, that the icons displayed within Figures 7-9 are illustrative only, and that a variety of different iconic representations of headings, headers, footers, links, graphics, and tables may be displayed utilizing various different graphical forms.
  • FIG 10 another example of the user interface in accordance with the embodiment of Figure 7 is shown.
  • the same page of the overall document shown in Figure 8, namely page 7 of the document is shown.
  • This figure illustrates that all views of a document need not be shown at all times.
  • window 1000 the images of each page of the document are not displayed in a sub- window.
  • the original, outlined document is displayed within sub- window 1004
  • the document outline is displayed within sub-window 1006
  • the rendered, dynamic document in this case an HTML document
  • the various sub- windows may be selectively displayed within the window 1000 as desired by the user.
  • the windows which are to be displayed may be selected by the user or by various default commands programmed into the application to provide convenience for a user.
  • a user may wish, for example, not to view the HTML rendered document shown in sub-window 1008. Accordingly, a user may select an option which hides this window, or may resize the window to a decreased size.
  • an HTML document rendered by the present invention is shown displayed in a web browser window 1100.
  • the web browser used is Microsoft's Internet Explorer ® ; however, it will be recognized by those skilled in the art that a variety of different browsers, such as Netscape Navigator ® , which are capable of displaying an HTML document could be used to view the document.
  • the HTML document rendered by the present invention is created in a manner such that the size of the screen is utilized to optimize the amount of information displayed to the user within a single frame.
  • the information displayed in Figure 11 corresponds to pages 1 and 2 of the document shown in Figure 7.
  • the present invention optimizes the amount of information shown to create a web page that displays the maximum possible information on one web page of the document.
  • the manner in which the present invention determines page breaks within the HTML document may be set as part of a group of user preferences, or may form part of a group of pre-determined rules. These web pages are displayed within a sub- window 1102 contained within the web browser 1100.
  • a table of contents is created and displayed within sub- window 1104 of the web browser 1100.
  • This table of contents is created from the document outline, displayed within sub-window 706 of Figure 7, for example.
  • the HTML document is displayed with the title 1106, which corresponds to the title 735 shown within the original image of the document contained in sub-window 704 in Figure 7, within the document outline shown in sub-window 706 as element 740, also in Figure 7, and the title 738 within the HTML document rendered by the present invention, shown in sub- window 708 of Figure 7.
  • the email address which has been recognized by the present invention, is created as a hypertext link 1108 rendered by the rendering component 208, shown in Figure 2, which provides a user with an easy means for sending an email message to the address listed.
  • a URL address 1109 which has also been rendered as a hypertext link for the convenience of the user.
  • the first heading " 1 Introduction" 1110 is displayed beneath the document title.
  • the table of contents contains various links to the heading levels within the document. For example, when selected, a link to the document title 1112 allows a user to move and to display this portion of the document within sub-window 1102.
  • the link to the introduction 1114 may be selected to display the corresponding portion of the HTML document within sub-window 1102.
  • various navigational hypertext links are displayed at the top and bottom of page 1116, and 1118.
  • the two links shown on this page are links to the table of contents, which allows a user to display the table of contents within the main sub- window 1102, and the next page of the HTML document that follows in sequence after the page currently displayed in sub- window 1102. Additional navigational hypertext links, such as a link to the previous page of the HTML document, can also be displayed, as appropriate.
  • Additional navigational hypertext links such as a link to the previous page of the HTML document, can also be displayed, as appropriate.
  • a link to the previous page of the HTML document can also be displayed, as appropriate.
  • navigational links contained at the top and the bottom of the page 1116, and 1118 need not be limited to links to previous and following pages, and table of contents. Rather, dynamic links to various important pages, graphics, tables, or other important items within the document may be displayed as navigational hypertext links.
  • Figure 12 is an illustration of a user interface, wherein an HTML document rendered by the present invention is displayed within web browser 1200.
  • the portion of the document being displayed in Figure 12 corresponds to the portion of the document displayed in Figure 8, namely page 7 of the document.
  • the web browser 1200 is divided into sub-windows, including a main sub-window 1202 and a table of contents sub- window 1204.
  • a main heading 1206 is displayed in the main sub- window 1202.
  • a reference to Figure 5.1 has been identified by the present invention and a hypertext link 1208 has been created from the text to the image 1210.
  • the heading 1206 is followed by several subheadings, for example, subheadings 1212 and 1214.
  • the heading and subheadings are also shown within the table of contents displayed within sub- window 1204 as hypertext links 1216, 1218, and 1220, respectively.
  • a user may, by selecting one of these hypertext links within the table of contents sub-window 1204, view the portion of the document corresponding to these headings or subheadings within the main sub- window 1202.
  • a view of page 8 of the document, also shown in Figure 9, is displayed within a sub- window 1302 of a web browser window 1300. Also contained within the web browser window 1300 is a table of contents sub-window 1304 which contains the same table of contents shown in sub-window 1104 of
  • Two subheadings 1306 and 1308 are displayed within sub- window 1302, and corresponding links 1312 and 1314 are contained within the table of contents.
  • a table 1310 which corresponds to the table contained within the original document 922 shown in Figure 9.
  • a user currently viewing a different page of the document within sub- window 1302, could view the table 1310 by selecting link 1312 contained within the table of contents displayed in sub-window 1304.
  • This type of HTML document rendering is convenient because it requires minimal effort on the part of the user, and provides maximum functionality to potential clients accessing the information on the Internet.
  • the convenience of the present invention is in its ability to automatically create and render dynamic documents, such as HTML documents, from static documents, which may be loaded from a scanner or an electronic source.
  • a user may, by depressing four buttons 718, 720, 722, 724, and 726, shown in Figure 7, create from a piece of paper, an HTML document suitable for publishing an Internet document on the World-Wide Web (WWW).
  • WWW World-Wide Web
  • the present invention provides a system and method for creating dynamic documents automatically from static documents. In one embodiment, this is accomplished as document images may be scanned or loaded into the application's memory, and functionality such as optical character recognition, logical structure recognition and hypertext rendering, may be carried out on the document images to produce an HTML document.
  • the present invention need not limit itself to this one embodiment.
  • the present invention may, for example, be used in conjunction with a variety of different markup languages, web browsers, and document sources.
  • the presently disclosed embodiments are, therefore, considered in all respects to be illustrative and not restrictive.
  • the scope of the invention is indicated by the appended claims, rather than the foregoing description, and all changes that come within the meaning and range of equivalence thereof are intended to be embraced therein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention concerne un système et un procédé servant à effectuer la conversion automatique de documents statiques en documents dynamiques. Dans un mode de réalisation, cette conversion met en application des images de document en papier statique, qui peut être soit scanné dans un ordinateur, soit chargé dans un ordinateur depuis une mémoire électronique sous forme de fichiers d'images, et converti en documents dynamiques, tels que des documents d'auteur authentifiés au moyen d'un langage de modification (markup). Un exemple important de création de documents dynamiques mis en application dans l'invention consiste à créer des documents HTML appropriés pour une publication Internet sur le Web (WWW). L'invention met en application un moteur de reconnaissance optique de caractères et un moteur de reconnaissance de structure logique afin de reconnaître à la fois des composantes textuelles et structurelles du document en papier statique et utilise une composante de rendu de documents dynamiques afin de créer des documents dynamiques, tels que des documents HTML.
PCT/US2001/003557 2000-02-01 2001-02-01 Conversion automatique de documents statiques en documents dynamiques WO2001057786A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP01908813A EP1252603A1 (fr) 2000-02-01 2001-02-01 Conversion automatique de documents statiques en documents dynamiques

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US49494100A 2000-02-01 2000-02-01
US09/494,941 2000-02-01

Publications (1)

Publication Number Publication Date
WO2001057786A1 true WO2001057786A1 (fr) 2001-08-09

Family

ID=23966580

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/003557 WO2001057786A1 (fr) 2000-02-01 2001-02-01 Conversion automatique de documents statiques en documents dynamiques

Country Status (2)

Country Link
EP (1) EP1252603A1 (fr)
WO (1) WO2001057786A1 (fr)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003050673A2 (fr) * 2001-12-06 2003-06-19 Ge Financial Assurance Holdings, Inc. Systeme et procede destines a distribuer des documents de façon electronique
EP1331591A2 (fr) * 2002-01-25 2003-07-30 Xerox Corporation Procédé et appareil de convertissement d'images en format bitmap pour utilisation dans un éditeur texte/graphique
WO2004029865A1 (fr) * 2002-09-25 2004-04-08 Koninklijke Philips Electronics N.V. Saisie d'une chaine de textes
GB2405508A (en) * 2003-08-27 2005-03-02 Hewlett Packard Development Co System and method for generating an electronically publishable document
US7136082B2 (en) 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
WO2006132584A1 (fr) * 2005-06-08 2006-12-14 Printdreams Ab Systeme et procede de liaison entre environnements numerique et du papier
US20110044539A1 (en) * 2009-08-20 2011-02-24 Fuji Xerox Co., Ltd. Information processing device, computer readable medium storing information processing program, and information processing method
US20130047102A1 (en) * 2011-08-19 2013-02-21 Newsoft Technology Corporation a Taiwan Corporation Method for browsing and/or executing instructions via information-correlated and instruction-correlated image and program product
US10192127B1 (en) 2017-07-24 2019-01-29 Bank Of America Corporation System for dynamic optical character recognition tuning
US10346702B2 (en) 2017-07-24 2019-07-09 Bank Of America Corporation Image data capture and conversion
CN110765743A (zh) * 2019-09-25 2020-02-07 青岛励图高科信息技术有限公司 用于数学公式在HTML中编辑显示和导出到Word文档中的系统
US11397762B2 (en) 2020-01-24 2022-07-26 Accenture Global Solutions Limited Automatically generating natural language responses to users' questions
US11449556B2 (en) * 2020-02-04 2022-09-20 Accenture Global Solutions Limited Responding to user queries by context-based intelligent agents

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0843276A1 (fr) * 1996-11-18 1998-05-20 Canon Information Systems, Inc. Générateur HTML
US5781914A (en) * 1995-06-30 1998-07-14 Ricoh Company, Ltd. Converting documents, with links to other electronic information, between hardcopy and electronic formats
US5963966A (en) * 1995-11-08 1999-10-05 Cybernet Systems Corporation Automated capture of technical documents for electronic review and distribution

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781914A (en) * 1995-06-30 1998-07-14 Ricoh Company, Ltd. Converting documents, with links to other electronic information, between hardcopy and electronic formats
US5963966A (en) * 1995-11-08 1999-10-05 Cybernet Systems Corporation Automated capture of technical documents for electronic review and distribution
EP0843276A1 (fr) * 1996-11-18 1998-05-20 Canon Information Systems, Inc. Générateur HTML

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GANN R: "ACCURATE OCR FOR COMPLEX PAGES", PC USER, 2 October 1996 (1996-10-02), XP002055986 *
GANN R: "CAERE IMPROVES ITS OCR INTERFACE", PC USER, 21 August 1996 (1996-08-21), XP002055985 *
YUAN Y ET AL: "Automatic document processing: A survey", PATTERN RECOGNITION,US,PERGAMON PRESS INC. ELMSFORD, N.Y, vol. 29, no. 12, 1 December 1996 (1996-12-01), pages 1931 - 1952, XP004015742, ISSN: 0031-3203 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003050673A3 (fr) * 2001-12-06 2003-11-13 Ge Financial Assurance Holding Systeme et procede destines a distribuer des documents de façon electronique
WO2003050673A2 (fr) * 2001-12-06 2003-06-19 Ge Financial Assurance Holdings, Inc. Systeme et procede destines a distribuer des documents de façon electronique
US7576753B2 (en) 2002-01-25 2009-08-18 Xerox Corporation Method and apparatus to convert bitmapped images for use in a structured text/graphics editor
EP1331591A2 (fr) * 2002-01-25 2003-07-30 Xerox Corporation Procédé et appareil de convertissement d'images en format bitmap pour utilisation dans un éditeur texte/graphique
EP1331591A3 (fr) * 2002-01-25 2006-05-10 Xerox Corporation Procédé et appareil de convertissement d'images en format bitmap pour utilisation dans un éditeur texte/graphique
US7136082B2 (en) 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
US8875016B2 (en) 2002-01-25 2014-10-28 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
WO2004029865A1 (fr) * 2002-09-25 2004-04-08 Koninklijke Philips Electronics N.V. Saisie d'une chaine de textes
GB2405508A (en) * 2003-08-27 2005-03-02 Hewlett Packard Development Co System and method for generating an electronically publishable document
WO2006132584A1 (fr) * 2005-06-08 2006-12-14 Printdreams Ab Systeme et procede de liaison entre environnements numerique et du papier
US20110044539A1 (en) * 2009-08-20 2011-02-24 Fuji Xerox Co., Ltd. Information processing device, computer readable medium storing information processing program, and information processing method
US8824798B2 (en) * 2009-08-20 2014-09-02 Fuji Xerox Co., Ltd. Information processing device, computer readable medium storing information processing program, and information processing method
US20130047102A1 (en) * 2011-08-19 2013-02-21 Newsoft Technology Corporation a Taiwan Corporation Method for browsing and/or executing instructions via information-correlated and instruction-correlated image and program product
CN102955836A (zh) * 2011-08-19 2013-03-06 力新国际科技股份有限公司 经由信息及指令关联影像来浏览或执行指令的方法
US10192127B1 (en) 2017-07-24 2019-01-29 Bank Of America Corporation System for dynamic optical character recognition tuning
US10346702B2 (en) 2017-07-24 2019-07-09 Bank Of America Corporation Image data capture and conversion
CN110765743A (zh) * 2019-09-25 2020-02-07 青岛励图高科信息技术有限公司 用于数学公式在HTML中编辑显示和导出到Word文档中的系统
US11397762B2 (en) 2020-01-24 2022-07-26 Accenture Global Solutions Limited Automatically generating natural language responses to users' questions
US11449556B2 (en) * 2020-02-04 2022-09-20 Accenture Global Solutions Limited Responding to user queries by context-based intelligent agents

Also Published As

Publication number Publication date
EP1252603A1 (fr) 2002-10-30

Similar Documents

Publication Publication Date Title
US6199080B1 (en) Method and apparatus for displaying information on a computer controlled display device
US5986654A (en) System and method for rendering on-screen iconic buttons with dynamic textual link
US8869023B2 (en) Conversion of a collection of data to a structured, printable and navigable format
RU2357284C2 (ru) Способ обработки цифровых рукописных примечаний для распознавания, привязки и переформатирования цифровых рукописных примечаний и система для его осуществления
US7085999B2 (en) Information processing system, proxy server, web page display method, storage medium, and program transmission apparatus
AU2003204478B2 (en) Method and system for associating actions with semantic labels in electronic documents
US6920610B1 (en) Method and system for browsing a low-resolution image
US6966029B1 (en) Script embedded in electronic documents as invisible encoding
US20070240057A1 (en) User interface element for displaying contextual information
JP4344693B2 (ja) ブラウザの文書編集のためのシステムおよびその方法
US7519906B2 (en) Method and an apparatus for visual summarization of documents
CN100449485C (zh) 信息处理装置和方法
US7225400B2 (en) Techniques for invoking system commands from within a mark-up language document
CA2519216A1 (fr) Procede et systeme expert pour la conversion de documents
US20040202352A1 (en) Enhanced readability with flowed bitmaps
WO2001057786A1 (fr) Conversion automatique de documents statiques en documents dynamiques
US7661063B2 (en) Document processing apparatus and control method thereof
US20020105546A1 (en) Browser container for hypertext application
WO1998036365A1 (fr) Outil de developpement de langage html
US20020166111A1 (en) Navigation in computer software applications developed in a procedural language
KR101115523B1 (ko) 테일러링을 지원하는 멀티미디어 웹 에디터 장치 및 방법
Sharma et al. OpenOffice. org Overview
Look Word 2007 Overview
Miyabe et al. Structured Document Preparation System AutoLayouter
Parker Now–That’s Your Style!!!!!

Legal Events

Date Code Title Description
AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2001908813

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2001908813

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001908813

Country of ref document: EP