EP1272940A1 - Systemes et procedes de traitement de documents numeriques - Google Patents

Systemes et procedes de traitement de documents numeriques

Info

Publication number
EP1272940A1
EP1272940A1 EP20010923807 EP01923807A EP1272940A1 EP 1272940 A1 EP1272940 A1 EP 1272940A1 EP 20010923807 EP20010923807 EP 20010923807 EP 01923807 A EP01923807 A EP 01923807A EP 1272940 A1 EP1272940 A1 EP 1272940A1
Authority
EP
European Patent Office
Prior art keywords
document
data
content
objects
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP20010923807
Other languages
German (de)
English (en)
Inventor
Majid Picsel Technologies Limited ANWAR
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
PICSEL TECHNOLOGIES Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB0009129.8A external-priority patent/GB0009129D0/en
Priority claimed from US09/703,502 external-priority patent/US7055095B1/en
Application filed by PICSEL TECHNOLOGIES Ltd filed Critical PICSEL TECHNOLOGIES Ltd
Publication of EP1272940A1 publication Critical patent/EP1272940A1/fr
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing

Definitions

  • the invention relates to data processing systems, and more particularly, to methods and systems for processing digital documents to generate an output representation of a source document as a visual display, a hardcopy, or in some other display format.
  • digital document is used to describe a digital representation of any type of data processed by a data processing system which is intended, ultimately, to be output in some form, in whole or in part, to a human user, typically by being displayed or reproduced visually (e.g., by means of a visual display unit or printer) , or by text-to-speech conversion, etc.
  • a digital document may include any features capable of representation, including but not limited to the following: text; graphical images; animated graphical images; full motion video images; interactive icons, buttons, menus or hyperlinks.
  • a digital document may also include non-visual elements such as audio (sound) elements .
  • Data processing systems such as personal computer systems, are typically required to process "digital documents," which may originate from any one of a number of local or remote sources and which may exist in any one of a wide variety of data formats ("file formats").
  • file formats data formats
  • the computer system In order to generate an output version of the document, whether as a visual display or printed copy, for example, it is necessary for the computer system to interpret the original data file and to generate an output compatible with the relevant output device (e.g., monitor, or other visual display device or printer) .
  • this process will involve an application program adapted to interpret the data file, the operating system of the computer, a software "driver" specific to the desired output device and, in some cases (particularly for monitors or other visual display units) , additional hardware in the form of an expansion card.
  • the systems and methods described herein provide a display technology that separates the underlying functionality of an application program from the graphical display process, thereby eliminating or reducing the application's need to control the device display and to provide graphical user interface tools and controls for the display. Additionally, such systems reduce or eliminate the need for an application program to be present on a processing system when displaying data created by or for that application program, such as a document or video stream.
  • the systems and methods described herein can display content, including documents, video streams, or other content, and will provide the graphical user functions for viewing the displayed document, such as zoom, pan, or other such functions, without need for the underlying application to be present on the system that is displaying the content.
  • the advantages over the prior art of the systems and methods described herein include the advantage of allowing different types of content from different application programs to be shown on the same display within the same work space. Many more advantages will be apparent to those of ordinary skill in the art and those of those of ordinary skill in the art will also be able to see numerous way of employing the underlying technology of the invention for creating additional systems, devices, and applications. These modified systems and alternate systems and practices will be understood to fall within the scope of the invention.
  • the systems and methods described herein include a digital content processing system that comprises an application dispatcher for receiving an input byte stream representing source data in one of a plurality of predetermined data formats and for associating the input byte stream with one of the predetermined data formats.
  • the system may also comprise a document agent for interpreting the input byte stream as a function of the, associated predetermined data format and for parsing the input byte stream into a stream of document objects that provide an internal representation of primitive structures within the input byte stream.
  • the systems also include a core document engine for converting the document objects into an internal representation data format and for mapping the internal representation data to a location on a display.
  • a shape processor within the system processes the internal representation data to drive an output device to present the content as expressed through the internal representation.
  • Figure 1 is a block diagram illustrating an embodiment of a digital document processing system in accordance with the present invention.
  • Figure 2 is a block diagram that presents in great, detail the system depicted in Figure 1;
  • FIG. 3 is a flowchart diagram of one document agent
  • Figure 4 depicts schematically an exemplary document of the type that can be processed by the system of Figure 1;
  • Figure 5 depicts flowchart diagrams of two exemplary processes employed to reduce redundancy within the internal representation of a document.
  • Figures 6-8 depict an exemplary data structure for storing an internal representation of a processed source document. Detailed Description of Certain Illustrated Embodiments
  • the systems and methods described herein include computer programs that operate to process an output stream or output file generated by an application program for the purpose of presenting the output on an output device, such as a video display.
  • the applications according to the invention can process these streams to create an internal representation of that output and can further process that internal representation to generate a new output stream that may be displayed on an output device as the output generated by the application according to the invention.
  • the systems of the invention decouple the application program from the display process thus relieving the application program from having to display its output onto a particular display device and further removes the need to have the application program present when processing the output of that application for the purpose of displaying that output.
  • Figure 1 provides a high-level functional block diagram of a system 10 that allows a plurality of application programs, shown collectively as element 13, to deliver their output streams to a computer process 8 that processes those output streams and generates a representation of the collective output created by those streams for display on the device 26.
  • the collective output of the application programs 13 is depicted in Figure 1 by the output printer device 26 that presents the output content generated by the different application programs 13.
  • the output device 26 is presenting output generated by the computer process 8 and that this output collectively carries the content of the plural application programs 13.
  • the presented content comprises a plurality of images and the output device 26 is a display.
  • the content may be carried in a format other than images, such as auditory tactile, or any other format, or combination of formats suitable for conveying information to a user.
  • the type of output device 26 will vary according to the application and may include devices for presenting audio content, video content, printed content, plotted content or any other type of content.
  • the systems and methods described herein will largely be shown as displaying graphical content through display devices, yet it will be understood that these exemplary systems are only for the purpose of illustration, and not to be understood as limiting in anyway.
  • each of the representative outputs appearing on display 26 is termed a document, and each of the depicted documents can be associated with one of the application programs 13.
  • the term document as used herein will encompass documents, streamed video, streamed audio, web pages, and any other form of data that can be processed and displayed by the computer process 8.
  • the computer process 8 generates a single output display that includes within that display one or more of the documents generated from the application programs 13.
  • the collection of displayed documents represents the content generated by the application programs 13 and this content is displayed within the program window generated by the computer process 8.
  • the program window for the computer process 8 also may include a set of icons representative of tools provided with the graphical user interface and capable of allowing a user to control the operation, in this case the display, of the documents appearing in the program window.
  • each application program form its own display would result in a presentation on the display device 26 that included several program windows, typically one for each application program 13.
  • each different type of program window would include a different set of tools for manipulating the content displayed in that window.
  • the system 10 of the invention has the advantage of providing a consistent user interface, and only requiring knowledge of one set of tools for displaying and controlling the different documents.
  • the computer process 8 operates on the output of the application programs 13 , thus only requiring that output to create the documents that appear within the program window. Accordingly, it is not necessary that the application programs 13 be resident on the same machine as the process 8, nor that the application programs 13 operate in concert with the computer process 8.
  • the computer process 8 needs only the output from these application programs 13 , and this output can be derived from stored data files that were created by the application programs 13 at an earlier time.
  • the systems and methods described herein may be employed as part of systems wherein an application program is capable of presenting its own content, controlling at least a portion of the display 26 and presenting that content within a program window associated with that application program.
  • the systems and methods of the invention can work as separate applications that appear on the display within a portion of the display provided for its use.
  • Figure 1 depicts a plurality of application programs 13.
  • application programs can include word processing programs such as Word, WordPerfect, or any other similar word processing program. It can further include programs such as Netscape Composer that generates HTML files, Adobe Acrobat that processes PDF files, a web server that delivers XML or HTML, a streaming server that generates a stream of audio-visual data, an e-mail client or server, a database, spreadsheet or any other kind of application program that delivers output either as a file, data stream, or in some other format suitable for use by a computer process .
  • each of the application programs 13 presents its output content to the computer process 8. In operation this can occur by having the application process 13 direct its output stream as an input byte stream to the computer process 8.
  • the application program 13 can create a data file such as a Word document, that can be streamed into the computer process 8 either by a separate application or by the computer process 8.
  • the computer process 8 is capable of processing the various input streams to create the aggregated display shown on display device 26. To this end, and as will be shown in greater detail hereinafter, the computer process 8 processes the incoming streams to generate an internal representation of each of these input streams. In one practice this internal representation is meant to look as close as possible to the output stream of the respective application program 13. However, in other embodiments the internal representation may be created to have a selected, simplified or partial likeness to the output stream generated by the respective application program 13. Additionally and optionally, the systems and methods described herein may also apply filters to the content being translated thereby allowing certain portions of the content to be removed from the content displayed or otherwise presented.
  • the systems and methods described herein may allow alteration of the structure of the source document, allowing for repositioning content within a document, rearranging the structure of the document, or selecting only certain types of data.
  • content can be added during the translation process, including active content such as links to web sites.
  • the internal representation created by computer process 8 may be further processed by the computer process 8 to drive the display device 26 to create the aggregated image represented in Figure 1.
  • Figure 2 depicts the system 10 which includes that computer process 8, the source documents 11, a and a display device 26.
  • the computer process 8 includes a plurality of document agents 12, an internal representation format file and process 14, buffer storage 15, a library of generic objects 16, a core document engine that in this embodiment comprises a parsing module 18, and a rendering module 19, an internal view 20, a shape processor 22 and a final output 24.
  • Figure 2 further depicts an optional input device 30 for transmitting user input 40 to the computer process 8.
  • the depicted embodiment includes a process 8 that comprises a shape processor 22.
  • process 8 is only exemplary and that the process 8 may be realized through alternate processes and architectures.
  • the shape processor 22 may optionally be realized as a hardware component, such as a semiconductor device, that supports the operation of the other elements of the process 8.
  • Figure 2 presents process 8 as a functional block diagram that comprises a single system, it may be that process 8 is distributed across a number of different platforms, and optionally it may be that the elements operate at different times and that the output from one element of process 8 is delivered at a later time as input to the next element of process 8.
  • each source document 11 is associated with a document agent 12 that is capable of translating the incoming document into an internal representation of the content of that source document 11.
  • the system 10 of Figure 1 includes an application dispatcher (not shown) that controls the interface between application programs and the system 10.
  • an application dispatcher (not shown) that controls the interface between application programs and the system 10.
  • API application programming interface
  • the application dispatcher which passes data, calls the appropriate document agent 12, or otherwise carries out the request made by the application program.
  • the application dispatcher advertises the source document 11 to all the loaded document agents 12. These document agents 12 then respond with information regarding their particular suitability for translating the content of the published source document 11.
  • the application dispatcher selects a document agent 12 and passes a pointer, such as a URI of the source document 11, to the selected document agent 12.
  • the computer process 8 may be run as a service under which a plurality of threads may be created thereby supporting multi-processing of plural document sources 11. In other embodiments, the process 8 does, not support multi-threading and the document agent 12 selected by the application dispatcher will be called in the current thread.
  • the exemplary embodiment of Figure 2 provides a flexible and extensible front end for processing incoming data streams of different file formats.
  • the application dispatcher determines that the system lacks a document agent 12 suitable for translating the source document 11, the application dispatcher can signal the respective application program 13 indicating that the source document 11 is in an unrecognized format.
  • the application program 13 may choose to allow the reformatting of the source document 11, such as by converting the source document 11 produced by the application program 13 from its present format into another format supported by that application program 13.
  • an application program 13 may determine that the source document 11 needs to be saved in a different format, such as an earlier version of the file format.
  • the application program 13 can resave the source document 11 in this supported format in order that a document agent 12 provided by the system 10 will be capable of translating the source document 11.
  • the application dispatcher upon detecting that the system 10 lacks a suitable document agent 12, can indicate to a user that a new document agent of a particular type may be needed for translating the present source document 11.
  • the computer process 8 may indicate to the user that a new document agent needs to be loaded into the system 10 and may direct the user to a location, such as a web site, from where the new document agent 12 may be downloaded.
  • the system could fetch automatically the document agent without asking the user, or could identify a generic agent 12, such as a generic text agent that can extract portions of the source document 11 representative of text. Further, agents that prompt a user for input and instruction during the translation process may also be provided.
  • an application dispatcher in conjunction with the document agents 12 acts as an input module that identifies the file format of the source document 11 on the basis of any one of a variety of criteria, such as an explicit file-type identification within the document, from the file name, including the file name extension, or from known characteristics of the content of particular file types .
  • the bytestream is input to the document agent 12, specific to the file format of the source document 11.
  • the system 10 may also be applied to input received from an input device such as a digital' camera or scanner as well as from an application program that can directly stream its output to the process 8, or that has its output streamed by an operating system to the process 8.
  • the input bytestream may originate directly from the input device, rather from a source document 11.
  • the input bytestream will still be in a data format suitable for processing by the system 10 and, for the purposes of the invention, input received from such an input device may be regarded as a source document 11.
  • the document agent 12 employs the library 16 of standard objects to generate the internal representation 14, which describes the content of the source document in terms of a collection of document objects whose generic types are as defined in the library 16, together with parameters defining the properties of specific instances of the various document objects within the document.
  • the library 16 provides a set of types of objects which the document agents 12, the parser 18 and the system 10 have knowledge of.
  • the document objects employed in the internal representation 14 may include: text, bitmap graphics and vector graphics document objects which may or may not be animated and which may be two- or three-dimensional: video, audio and a variety of types of interactive objects such as buttons and icons.
  • Vector graphics document objects may be PostScript-like paths with specified fill and transparency.
  • Bitmap graphic document objects may include a set of sub-object types such as for example JPEG, GIF and PNG object types.
  • Text document objects may declare a region of stylized text. The region may include a paragraph of text, typically understood as a set of characters that appears between two delimiters, like a pair of carriage returns.
  • Each text object may include a run of characters and the styling information for that character run including one or more associated typefaces, points and other such styling information.
  • the parameters defining specific instances of document objects will generally include dimensional co-ordinates defining the physical shape, size and location of the document object and any relevant temporal data for defining document objects whose properties vary with time, thereby allowing the system to deal with dynamic document structures and/or display functions.
  • a stream of video input may be treated by the system 10 as a series of figures that are changing at a rate of, for example, 30 frames per second.
  • the temporal characteristic of this figure object indicates that the figure object is to be updated 30 times per second.
  • the parameters will normally also include a font and size to be applied to a character string.
  • Object parameters may also define other properties, such as transparency.
  • the internal representation may be saved/stored in a file format native to the system and that the range of possible source documents 11 input to the system 10 may include documents in the system's native file format. It is also possible for the internal representation 14 to be converted into any of a range of other file formats if required, using suitable conversion agents.
  • Figure 3 depicts a flow chart diagram of one exemplary process that may be carried out by a document agent 12.
  • Figure 3 depicts a process 50 that represents the operation of an example document agent 12, in this case a document agent 12 suitable for translating the contents of a Microsoft Word document into an internal representation format.
  • the process 50 includes an initialization step 52 wherein the process 50 initializes the data structures, memory space, and other resources that the process 50 will employ while translating the source document 11.
  • the process 50 proceeds to a series of steps, 54, 58 and 60, wherein the source document 11 is analyzed and divided into subsections.
  • steps 54, 58 and 60 subdivide the source document 11 as it is streamed into the document agent 12 first into sections, then subdivides the sections into paragraphs and then subdivides paragraphs into the individual characters that make up that paragraph.
  • the sections, paragraphs and characters identified within the source document 11 may be identified within a piece table that contains pointers to the different subsections identified within the source document 11.
  • the piece table depicted in Figure 3 represents a construct employed by MSWord for providing pointers to different subsections of a document. It will further be understood that the use of a piece table or a piece table like construct is optional and depends on the application at hand, including depending on the type of document being processed.
  • step 60 may proceed to step 62 wherein a style is applied to the character or set of characters identified in step 60.
  • the application of a style is understood to associated the identified characters with a style of presentation that is being employed with those characters.
  • the style of presentation may include properties associated with the character including font type, font size, whether the characters are bold, italic, or otherwise stylized.
  • the process can determine whether the characters are rotated, or being positioned for following a curved path or other shape.
  • style associated with the paragraph in which the characters occur may also be identified and associated with the characters. Such properties can include the line spacing associated with the paragraph, the margins associated with the paragraph, the spacing between characters, and other such properties.
  • step 70 the process 50 proceeds to step 70 wherein the internal representation is built up.
  • the object which describes the structure of the document is created in Step 64 as an object within the internal representation, and the associated style of this object, together with the character run it contains, is created separately within the internal representation at Step 68.
  • decision block 72 the process 50 determines whether the paragraph associated with the last processed character is complete. If the paragraph is not complete the process 50 returns to step 60 wherein the next character from the paragraph is read.
  • the process 50 proceeds to decision block 74 wherein the process 50 determines whether the section is complete. If the section is complete the process returns to step 58 and the next paragraph is read from the piece table. Alternatively if the section is complete the process 50 proceeds to step 54 wherein the next section, if there is a next section is read from the piece table and processing continues. Once the document has been processed the system 8 can transmit, save, export or otherwise store the translated document for subsequent use.
  • the system can store the translated file in a format compatible with the internal representation, and optionally in other formats as well including formats compatible with the file formats of the source documents 11 (for which it may employ 'export document agents' not shown capable of receiving internal representation data and creating source document data) , or in a binary form, a textual document description structure, marked-up text or in any other suitable format; and may employ a universal text encoding model, including Unicode, shiftmapping, big-5, and a luminance/chrominance model .
  • the format of the internal representation 14 separates the "structure” (or “layout") of the documents, as described by the object types and their parameters, from the "content” of the various objects; e.g. the character string (content) of a text object is separated from the dimensional parameters of the object; the image data (content) of a graphic object is separated from its dimensional parameters.
  • This allows document structures to be defined in a compact manner and provides the option for content data to be stored remotely and to be fetched by the system only when needed.
  • the internal representation 14 describes the document and its constituent objects in terms of "high-level" descriptions.
  • the document agent 12 described above with reference to Figure 3 is capable of processing a data file created by the MSWord word processing application and translating that data file into an internal representation that is formed from a set of object types selected from the library 16, that represents the content of the processed document. Accordingly, the document agent 12 analyzes the Word document and translates the structure and content of that document into an internal representation known to the computer process 8.
  • Figure 4 depicts a Word document 32 of the type created by the MSWord application program.
  • the depicted document 32 comprises one page of information wherein that one page includes two columns of text 34 and one figure 36.
  • Figure 4 further depicts that the columns of text 34 and the figure 36 are positioned on the page 38 in such a way that one column of text runs from the top of the page 38 to the bottom of the page 38 and the second column of text runs from about the center of the page to the bottom of the page with the figure 36 being disposed above the second column of text 34.
  • the document agent 12 begins processing the document 32 by determining that the document 32 comprises one page and contains a plurality of different objects. For the one page found by the document agent 12 , the document agent 12 identifies the style of the page, which for example may be a page style of an 8.5 x 11 page in portrait format. The page style identified by the document agent 12 is embodied in the internal representation for later use by the parser 18 in formatting and flowing text into the document created by the process 8.
  • page style information can include that a document comprises a plurality of pages and that the pages are of a certain size.
  • Other page style information may be identified by the document agent 12 and the page style information identified can vary according to the application.
  • different page style information may be identified by a document agent capable of processing a Microsoft Excel document or a real media data stream.
  • the document agent 12 may begin to break the document 32 down into objects that can be mapped to document objects known to the system and typically stored in the library 16.
  • the document agent 12 may process the document 32 to find text objects, bitmap objects and vector graphic objects. Other type of object types may optionally be provided including video type, animation type, button type, and script type.
  • the document agent 12 will identify a text object 34 whose associated style has two columns. The paragraphs of text that occur within the text object 34 may be analyzed for identifying each character in each respective paragraph.
  • Process 50 may apply style properties to each identified character run and each character run identified within the document 32 may be mapped to a text object of the type listed within the library 16.
  • Each character run and the applied style can be understood as an object identified by the document agent 12 as having been found within the document 32 and having been translated to a document object, in this case a text object of the type listed within the library 16.
  • This internal representation object may be streamed from the document agent 12 into the internal representation 14.
  • the document agent 12 may continue to translate the objects that appear within the document 32 into document objects that are known to the system 10 until each object has been translated.
  • the object types may be appropriate for the application and may include object types suitable for translating source data representative of a digital document, an audio/visual presentation, a music file, an interactive script, a user interface file and an image file, as well as any other file types.
  • Figure 5 depicts a process 80 wherein step 82 has a primitive library object A being processed by, in step 84, inserting that primitive object into the document that is becoming the internal representation of the source document 11.
  • step 88 another object B, provided by the document agent 12 is delivered to the internal representation file process 14.
  • the process 80 then undertakes the depicted sequence of steps 92 through 98 wherein characteristics of object A are compared to the characteristics of object B to determine if the two objects have the same characteristics.
  • object A and object B represent two characters such as the letter P and the letter N
  • both characters P and N are the same color, same font, same size and the same style such as bold or italicized
  • the process 80 in step 94 joins the two objects together within one object classification stored within the internal representation. If these characteristics do not match then the process 80 adds them to the internal representation as two separate objects.
  • Figure 5 depicts a process 80 wherein the internal representation file 14 compacts the objects as a function of the similarity of physically adjacent objects.
  • the compaction process may comprise a process for compacting objects that are visually adjacent.
  • Figures 6, 7 and 8 depict the structure of the internal representation of a document that has been processed by the system depicted in Figures 1 and 2.
  • the internal representation of the document may be embodied as a computer file or as data stored in core memory.
  • data structure selected for capturing or transporting the internal representation may vary according to the application and any suitable data structure may be employed with the systems and methods described herein without departing from the scope of the invention.
  • the structure of the internal representation of the processed document separates the structure of the document from the content of the document.
  • the structure of the document is captured by a data structure that shows the different document objects that make up the document, as well as the way that these document objects are arranged relative to each other.
  • This separation of structure from content is shown in Figure 6 wherein the data structure 110 captures the structure of the document being processed and stores that structure in a data format that is independent of the actual content associated with that document.
  • the data structure 110 includes a resource Table 112 and a document structure 114.
  • the resource table 112 provides a list of resources for constructing the internal representation of the document.
  • the resource table 112 can include one or more tables of common structures that occur within the document, such as type faces, links, and color lists. These common structures may be referenced numerically within the resource table 112.
  • the resources of resource table 112 relate to the document objects that are arranged within the document structure 114.
  • the document structure 114 includes a plurality of containers 118 that are represented by the sets of the nested parentheses . Within the containers 118 are a plurality of document objects 120. As shown in Figure 6 the containers 118 represent collections of document objects that appear within the document being processed. As further shown by Figure 6 the containers 118 are also capable of holding sub- containers.
  • the document structure 114 includes one top-level container, identified by the set of outer parentheses labeled 1, and has three nested containers 2, 3 and 4. Additionally, the container 4 is double nested within container 1 and container 3.
  • Each container 118 represents features within a document, wherein the features may be a collection of individual document objects, such as the depicted document objects 120.
  • a document such as the document 32 depicted in Figure 4
  • the different document objects 120 that occur within the character run container may, for example, be representative of the different paragraphs that occur within that character run.
  • the character run container has a style associated with it.
  • the character run depicted in Figure 4 can include style information representative of the character font type, font size, styling, such as bold or italic styling, and style information representative of the size of the column, including width and length, in which the character run, or at least a portion of that character run, occurs.
  • This style information may be later used by the parser 18 to reformat and reflow the text within the context specific view 20.
  • Another example of a container may be a table that, for example, could appear within a column 34 of text in document 32.
  • the table may be a container with objects.
  • the other types and uses of containers will vary according to the application at hand and the systems and methods of the invention are not limited to any particular set of object types or containers .
  • the document agent 12 translates the source document 11, it will encounter objects that are of known object types, and the document agent 16 will request the library 16 to create an object of the appropriate object type. The document agent 12 will then lodge that created document object into the appropriate location within document structure 114 to preserve the overall structure of the source document 11. For example, as the document agent 12 encounters the image 36 within the source document 11, the document agent 12 will recognize the image 36, which may for example be a JPEG image, as an object of type bitmap, and optionally sub-type JPEG. This document agent 12, as shown in steps 64 and 68 of Figure 3, can create the appropriate document object 120 and can lodge the created document object 120 into the structure 114. Additionally, the data for the JPEG image document object 120, or in another example, the data for the characters and their associated style for a character run, may be stored within the data structure 150 depicted in Figure 8.
  • the document agent 12 may identify other containers wherein these other containers may be representative of a subfeature appearing within an existing container, such as a character run.
  • these subfeatures may include links to referenced material, or clipped visual regions or features that appear within the document and that contain collections of individual document objects 120.
  • the document agent 12 can place these document objects 120 within a separate container that will be nested within the existing container.
  • the arrangement of these document objects 120 and the containers 118 are shown in Figure 7A as a tree structure 130 wherein the individual containers 1, 2, 3 and 4 are shown as container objects 132, 134, 138 and 140 respectively.
  • the containers 118 and the document objects 120 are arranged in a tree structure that shows the nested container structure of documents structure 114 and the different document objects 120 that occur within the containers 118.
  • the tree structure of Figure 7A also illustrates that the structure 114 records and preserves the structure of the source document 11, showing the source document as a hierarchy of document objects 120, wherein the document objects 120 include the style information, such as for example the size of columns in which a run of characters appears, or temporal information, such as the frame rate for streamed content.
  • style information such as for example the size of columns in which a run of characters appears
  • temporal information such as the frame rate for streamed content.
  • Table 1 presents an example of parameters that may be used to describe a document's graphical structure.
  • Table one presents examples of such parameters, such as the object type, which in this case is a Bitmap object type.
  • a bounding box parameter is provided and gives the location of the document object within the source document 11.
  • Table one further provides the Fill employed and an alpha factor that is representative of the degree of transparency for the object.
  • a Shape parameter provides a handle to the shape of the object, which in this case could be a path that defines the outline of the object, including irregularly shaped objects.
  • Table 1 also presents a time parameter representative of the temporal changing for that object. In this example, the image is stable and does not change with time. However, if the image object presented streamed media, then this parameter could contain a temporal characteristic that indicates the rate at which the object should change, such as a rate comparable to the desired frame rate for the content .
  • the structural elements are containers with flowable data content, with this flowable data held separately and referenced by a handle from the container.
  • any or all data content can be held remotely from the document structure.
  • This data structure allows for rapid progressive rendering of the internal representation of the source document 11, as the broader and higher level objects can be rendered first, and the finer features can be rendered in subsequent order.
  • the separate structure and data allows visual document to be rendered while streaming data to "fill" the content.
  • the separation of content and structure allows the content of the document to readily be edited or changed. As the document structure is independent from the content, different content can be substituted into the document structure. This can be done on container by container basis or for the whole document.
  • the structure of the document can be delivered separately from the content and the content provided later, or made present on the platform to which the structure is delivered.
  • Figure 7A shows that the structure of a source document 11 can be represented as a tree structure 130.
  • the tree structure may be modified and edited to change the presentation of the source document 11.
  • the tree structure may be modified to add additional structure and content to the tree 130.
  • Figure 7B shows the original tree structure of Figure 7A duplicated and presented under a higher level container.
  • Figure 7B shows that a new document structure, and therefore new representation, may be created by processing the tree structure 130 produced by the document agent 12. This allows the visual position of objects within a document to change, while the relative position of different objects 120 may remain the same.
  • the systems described herein can edit and modify content.
  • the systems described herein can edit the tree structure to duplicate the image of the document, and present side by side images of the document.
  • the tree structure 130 can be edited and supplemented to add additional visual information, such as by adding the image of a new document or a portion of that document.
  • the systems described herein can create the illusion of a document gradually changing, such as sliding across a display, such as display device 26, or gradually changing into a new document.
  • Other effects, such as the creation of thumbnail views and other similar results can be achieved and those of ordinary skill by making modifications to the systems and methods described herein and such modified systems and methods will fall within the scope of the invention.
  • each document object 120 includes a pointer to the data associated with that object and this information may be arranged within an indirection list such as the indirection list 160 depicted in Figure 8.
  • each document object 120 is numbered and an indirection list 152 is created wherein each document object number 154 is associated with an offset value 158.
  • the document object number 1, identified by reference number 160 may be associated with the offset 700, identified by reference number 162.
  • the indirection list associates the object number 1 with the offset 700.
  • the offset 700 may represent a location in core memory, or a file offset, wherein the data associated with object 1 may reside.
  • a data structure 150 may be present wherein the data that is representative of the content associated with a respective document object 120 may be stored.
  • the depicted object 1 at jump location 700 may include the Unicode characters representative of the characters that occur within the character run of the container 1 depicted in Figure 6.
  • the object 2 data, depicted in Figure 8 by reference number 172, and associated with in core memory location 810, identified by reference numeral 170 may be representative of the JPEG bit map associated with a bit map document object 120 referenced within the document structure 114 of Figure 6.
  • the parser 18 parses the objects identified in the structure section of the internal representation, and with reference to the data content associated with this object, it re-applies the position and styling information to each object.
  • the renderer 19 generates a context-specific representation or "view" 20 of the documents represented by the internal representation 14. The required view may be of the all the documents, a whole document or of parts of one or some of the documents.
  • the renderer 19 receives view control inputs 40 which define the viewing context and any related temporal parameters of the specific document view which is to be generated.
  • the system 10 may be required to generate a zoomed view of part of a document, and then to pan or scroll the zoomed view to display adjacent portions of the document.
  • the view control inputs 40 are interpreted by the renderer 19 to determine which parts of the internal representation are required for a particular view and how, when and for how long the view is to be displayed.
  • the context-specific representation/view 20 is expressed in terms of primitive shapes and parameters.
  • the renderer 19 may also perform additional pre- processing functions on the relevant parts of the internal representation 14 when generating the required view 20 of the source document 11.
  • the view representation 20 is input to a shape processor 22 for processing to generate an output in a format suitable fore driving an output device 26, such as a display device or printer.
  • the pre-processing functions of the renderer 19 may include colour correction, resolution adjustment/enhancement and anti-aliasing.
  • Resolution enhancement may comprise scaling functions which preserve the legibility of the content of objects when displayed or reproduced by the target output device.
  • Resolution adjustment may be context-sensitive; e.g. the display resolution of particular objects may be reduced while the displayed document view is being panned or scrolled and increased when the document view is static.
  • the output from the renderer 19 expresses the document in terms of primitive objects.
  • the representation from the renderer 19 defines the object at least in terms of a physical, rectangle boundary box, the actual outline path of the object bounded by the boundary box, the data content of the object, and its transparency.
  • the shape processor 22 interprets the primitive object and converts it into an output frame format appropriate to the target output device 26; e.g. a dot-map for a printer, vector instruction set for a plotter, or bitmap for a display device.
  • An output control input 44 to the shape processor 22 provides information to the shape processor 22 to generate output suitable for a particular output device 26.
  • the shape processor 22 preferably processes the objects defined by the view representation 20 in terms of "shape” (i.e. the outline shape of the object) , "fill” (the data content of the object) and “alpha” (the transparency of the object) , performs scaling and clipping appropriate to the required view and output device, and expresses the object in terms appropriate to the output device (typically in terms of pixels by scan conversion or the like, for most types of display device or printer) .
  • the shape processor 22 optionally includes an edge buffer which defines the shape of an object in terms of scan-converted pixels, and preferably applies anti- aliasing to the outline shape. Anti-aliasing may be performed in a manner determined by the characteristics of the output device 26, by applying a grey-scale ramp across the object boundary.
  • This approach enables memory efficient shape-clipping and shape-intersection processes, and is memory efficient and processor efficient as well.
  • a look-up table, or other technique may be employed to define multiple tone response curves, allowing non-linear rendering control.
  • the individual primitive objects processed by the shape processor 22 are combined in the composite output frame.
  • the design of one shape processor suitable for use with the systems described herein is shown in greater detail in the patent application entitled Shape Processor, filed on even date herewith, the contents of which are incorporated by reference. However, any suitable shape processor system or process may be employed without departing from the scope of the invention.
  • the process 8 depicted in Figure 1 can be realized as a software component operating on a data processing system such as a hand held computer, a mobile telephone, set top box, facsimile machine, copier or other office equipment, an embedded computer system, a Windows or Unix workstation, or any other type of computer/processing platform capable of supporting, in whole or in part, the document processing system described above.
  • a data processing system such as a hand held computer, a mobile telephone, set top box, facsimile machine, copier or other office equipment, an embedded computer system, a Windows or Unix workstation, or any other type of computer/processing platform capable of supporting, in whole or in part, the document processing system described above.
  • the system can be implemented as a C language computer program, or a computer program written in any high level language including C++, Fortran, Java or Basic.
  • the systems can be realized as a computer program written in microcode or written in a high level language and compiled down to microcode that can be executed on the platform employed.
  • the development of such systems is known to those of skill in the art, and such techniques are set forth in Intel® StrongARM processors SA-1110 Microprocessor Advanced Developer ' s Manual .
  • general techniques for high level programming are known, and set forth in, for example, Stephen G. Kochan, Programming in C, Hayden Publishing (1983).
  • DSPs are particularly suited for implementing signal processing functions, including preprocessing functions such as image enhancement through adjustments in contrast, edge definition and brightness. Developing code for the DSP and microcontroller systems follows from principles well known in the art .
  • Figs. 1 and 2 graphically depicts the computer process 8 as comprising a plurality of functional block elements, it will be apparent to one of ordinary skill in the art that these elements can be realized as computer programs or portions of computer programs that are capable of running on the data processing platform to thereby configure the data processing platform as a system according to the invention.
  • Fig. 1 depicts the system 10 as an integrated unit of a document processing process 8 and a display device 26, it will be apparent to those of ordinary skill in the art that this is only one embodiment, and that the systems described herein can be realized through other architectures and arrangements, including system architectures that separate the document processing functions of the process 8 from the document display operation performed by the display 26.
  • the systems of the invention are not limited to those systems that include a display or output device, but that the systems of the invention will encompass those processing systems that process one or more digital documents to create output that can be presented on an output device.
  • this output may be stored in a data file for subsequent presentation on a display device, for long term storage, for delivery over a network, or for some other purpose than for immediate display.
  • the systems and methods described herein can support many different document and content processing applications and that the structure of the system or process employed for a particular application will vary according to the application and the choice of the designer.
  • system of the present invention may be "hard-wired” ; e.g. implemented in ROM and/or integrated into ASICs or other single-chip systems, or may be implemented as firmware (programmable ROM such as flashable ePROM) , or as software, being stored locally or remotely and being fetched and executed as required by a particular device.
  • firmware programmable ROM such as flashable ePROM
  • software being stored locally or remotely and being fetched and executed as required by a particular device.
  • the systems and methods described herein may be stand alone systems for processing source documents 11, but optionally these systems may be incorporated into a variety of types of data processing systems and devices, and into peripheral devices, in a number of different ways.
  • the system of the present invention may be incorporated alongside the operating system and applications of the host system or may be incorporated fully or partially into the host operating system.
  • the systems described herein enable rapid display of a variety of types of data files on portable data processing devices with LCD displays without requiring the use of browsers or application programs.
  • Examples of portable data processing devices which may employ the present system include "palmtop” computers, portable digital assistants (PDAs, including tablet-type PDAs in which the ⁇ primary user interface comprises a graphical display with which the user interacts directly by means of a stylus device) , internet-enabled mobile telephones and other communications devices.
  • PDAs portable digital assistants
  • This class of data processing devices requires small size, low power processors for portability.
  • these devices employ advanced RISC-type core processors designed in to ASICs (application specific integrated circuits) , in order that the electronics package is small and integrated.
  • ASICs application specific integrated circuits
  • This type of device also has limited random access memory and typically has no non-volatile data store (e.g. hard disk) .
  • the system of the invention may be integrated at the BIOS level of portable data processing devices to enable document processing and output with much lower overhead than conventional system models.
  • these systems may be implemented at the lowest system level just above the transport protocol stack.
  • the system may be incorporated into a network device (card) or system, to provide in-line processing of network traffic (e.g. working at the packet level in a TCP/IP system) .
  • the systems herein can be configured to operate with a predetermined set of data file formats and particular output devices; e.g. the visual display unit of the device and/or at least one type of printer.
  • the systems described herein may also be incorporated into low cost data processing terminals such as enhanced telephones and "thin” network client terminals (e.g. network terminals with limited local processing and storage resources) , and "set-top boxes” for use in interactive/internet- enabled cable TV systems.
  • the systems may also be incorporated into peripheral devices such as hardcopy devices (printers and plotters) , display devices (such as digital projectors) , networking devices, input devices (cameras, scanners, etc.) and also multi-function peripherals (MFPs) .
  • MFPs multi-function peripherals
  • the present system directly generates a dot-mapped image of the source document suitable for output by the printer (this is true whether the system is incorporated into the printer itself or into the host system) . Similar considerations apply to other hardcopy devices such as plotters.
  • the system When incorporated into a display device, such as a projector, the system again enables the device to display the content of the original data file correctly without the use of applications or drivers on the host system, and without the need for specific configuration of the host system and/or display device.
  • Peripheral devices of these types when equipped with the present system, may receive and output data files from any source, via any type of data communications network.
  • the systems and methods described herein may be incorporated into in-car systems for providing driver information or entertainment systems, to facilitate the delivery of information within the vehicle or to a network that communicates beyond the vehicle. Further, it will be understood that the systems described herein can drive devices having multiple output sources to maintain a consistent display using modifications to only the control parameters. Examples include, but are not limited to, a STB or in-car system incorporating a visual display and print head, thereby enabling viewing and printing of documents without the need for the source applications and drivers.
  • system of the present invention may be "hard-wired” ; e.g. implemented in ROM and/or integrated into ASICs or other single-chip systems, or may be implemented as firmware (programmable ROM such as flashable ePROM) , or as software, being stored locally or remotely and being fetched and executed as required by a particular device.
  • firmware programmable ROM such as flashable ePROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention concerne des techniques d'affichage permettant de séparer la fonctionnalité sous-jacente d'un programme d'application d'un processus d'affichage graphique, d'où la réduction ou la suppression de la nécessité pour cette application de commander l'affichage du dispositif et de fournir des commandes et des outils d'interface utilisateur graphique pour l'affichage. Par ailleurs, ces systèmes permettent de réduire ou de supprimer la nécessité d'utiliser un programme d'application sur un système de traitement pendant l'affichage de données créées par ou pour ce programme d'application (document ou flot vidéo, par exemple). Ainsi, dans un mode de réalisation, les systèmes et les procédés de l'invention permettent d'afficher un contenu, à savoir des documents, des flots vidéo ou d'autres contenus, et d'obtenir les fonctions d'interface utilisateur graphique en vue d'une visualisation du document affiché, et notamment un zoom, un panoramique ou d'autres fonctions similaires, sans que la présence de l'application sous-jacente ne soit nécessaire sur le système affichant ce contenu. Les avantages par rapport à la technique antérieure incluent la possibilité de présenter différents types de contenu provenant de différents programmes d'application sur un même affichage au sein du même espace de travail.
EP20010923807 2000-04-14 2001-04-17 Systemes et procedes de traitement de documents numeriques Ceased EP1272940A1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
GBGB0009129.8A GB0009129D0 (en) 2000-04-14 2000-04-14 Digital document processing
GB0009129 2000-04-14
US09/703,502 US7055095B1 (en) 2000-04-14 2000-10-31 Systems and methods for digital document processing
US703502 2000-10-31
PCT/GB2001/001720 WO2001080069A1 (fr) 2000-04-14 2001-04-17 Systemes et procedes de traitement de documents numeriques

Publications (1)

Publication Number Publication Date
EP1272940A1 true EP1272940A1 (fr) 2003-01-08

Family

ID=38666813

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20010923807 Ceased EP1272940A1 (fr) 2000-04-14 2001-04-17 Systemes et procedes de traitement de documents numeriques

Country Status (5)

Country Link
EP (1) EP1272940A1 (fr)
JP (1) JP2003531441A (fr)
AU (1) AU5049401A (fr)
HK (1) HK1057111A1 (fr)
WO (1) WO2001080069A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8184128B2 (en) 2004-10-27 2012-05-22 Hewlett-Packard Development Company, L. P. Data distribution system and method therefor
WO2011152224A1 (fr) * 2010-06-01 2011-12-08 日本電気株式会社 Terminal, procédé de sélection de processus, programme de commande, et support d'enregistrement

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03252760A (ja) * 1990-03-02 1991-11-12 Hitachi Ltd 文書編集方法と文書編集装置
JPH05282304A (ja) * 1992-03-31 1993-10-29 Toshiba Corp 情報処理装置
JP3689441B2 (ja) * 1994-12-05 2005-08-31 キヤノン株式会社 電子ファイリング装置および電子ファイリング方法
JP3683925B2 (ja) * 1994-11-18 2005-08-17 キヤノン株式会社 電子ファイリング装置
AU2585797A (en) 1996-03-15 1997-10-01 University Of Massachusetts Compact tree for storage and retrieval of structured hypermedia documents
EP0925542B1 (fr) * 1996-09-09 2002-08-07 Design Intelligence, Inc. Mise en page et mise en forme automatiques d'un contenu avec une presentation dans un support
JPH1153275A (ja) * 1997-08-06 1999-02-26 Hitachi Ltd エージェント候補の選択表示方法
JPH1191187A (ja) * 1997-09-24 1999-04-06 Fuji Xerox Co Ltd ドキュメント編集システム
JPH11175741A (ja) * 1997-12-12 1999-07-02 Fuji Xerox Co Ltd 文書情報処理装置および文書情報処理方法
US6857102B1 (en) * 1998-04-07 2005-02-15 Fuji Xerox Co., Ltd. Document re-authoring systems and methods for providing device-independent access to the world wide web
GB0009129D0 (en) * 2000-04-14 2000-05-31 Picsel Technologies Limited Digital document processing
US7055095B1 (en) * 2000-04-14 2006-05-30 Picsel Research Limited Systems and methods for digital document processing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HALASZ F ET AL: "THE DEXTER HYPERTEXT", COMMUNICATIONS OF THE ACM, ASSOCIATION FOR COMPUTING MACHINERY, INC, UNITED STATES, vol. 37, no. 2, 1 February 1994 (1994-02-01), pages 30,32 - 39, XP000425933, ISSN: 0001-0782, DOI: 10.1145/175235.175237 *
RADA R ET AL: "HYPERTEXT INTERCHANGE USING ICA", JOURNAL OF DOCUMENTATION, ASLIB. LONDON, GB, vol. 51, no. 2, 1 June 1995 (1995-06-01), pages 99 - 117, XP000605025, ISSN: 0022-0418 *
See also references of WO0180069A1 *

Also Published As

Publication number Publication date
AU5049401A (en) 2001-10-30
HK1057111A1 (en) 2004-03-12
WO2001080069A1 (fr) 2001-10-25
JP2003531441A (ja) 2003-10-21

Similar Documents

Publication Publication Date Title
US6925597B2 (en) Systems and methods for digital document processing
JP5306429B2 (ja) デジタルドキュメント処理のための方法
US5897644A (en) Methods and apparatus for fixed canvas presentations detecting canvas specifications including aspect ratio specifications within HTML data streams
US20030016233A1 (en) Method and system for manipulation of graphics information
EP1272922B1 (fr) Traitement de documents numeriques
WO2001080069A1 (fr) Systemes et procedes de traitement de documents numeriques
CA2452079A1 (fr) Procede et systeme de manipulation d'informations graphiques

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020919

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20031031

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: PICSEL (RESEARCH) LIMITED

17Q First examination report despatched

Effective date: 20031031

17Q First examination report despatched

Effective date: 20031031

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: SAMSUNG ELECTRONICS CO., LTD.

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: SAMSUNG ELECTRONICS CO., LTD.

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20150922