WO2005025204A1 - A data structure for an electronic document and related methods - Google Patents

A data structure for an electronic document and related methods Download PDF

Info

Publication number
WO2005025204A1
WO2005025204A1 PCT/EP2004/051940 EP2004051940W WO2005025204A1 WO 2005025204 A1 WO2005025204 A1 WO 2005025204A1 EP 2004051940 W EP2004051940 W EP 2004051940W WO 2005025204 A1 WO2005025204 A1 WO 2005025204A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
pattern
document
file
data structure
Prior art date
Application number
PCT/EP2004/051940
Other languages
French (fr)
Inventor
Miguel Angel Albarran
Andrew Mackenzie
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to US10/571,075 priority Critical patent/US20080114777A1/en
Priority to EP04766624A priority patent/EP1716698A1/en
Publication of WO2005025204A1 publication Critical patent/WO2005025204A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/32Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
    • H04N1/32037Automation of particular transmitter jobs, e.g. multi-address calling, auto-dialing
    • H04N1/32042Automation of particular transmitter jobs, e.g. multi-address calling, auto-dialing with reading of job-marks on a page
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • G06F3/0317Detection arrangements using opto-electronic means in co-operation with a patterned surface, e.g. absolute position or relative movement detection for an optical mouse or pen positioned with respect to a coded surface
    • G06F3/0321Detection arrangements using opto-electronic means in co-operation with a patterned surface, e.g. absolute position or relative movement detection for an optical mouse or pen positioned with respect to a coded surface by optically sensing the absolute position with respect to a regularly patterned surface forming a passive digitiser, e.g. pen optically detecting position indicative tags printed on a paper sheet
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K19/00Record carriers for use with machines and with at least a part designed to carry digital markings
    • G06K19/06Record carriers for use with machines and with at least a part designed to carry digital markings characterised by the kind of the digital marking, e.g. shape, nature, code
    • G06K19/06009Record carriers for use with machines and with at least a part designed to carry digital markings characterised by the kind of the digital marking, e.g. shape, nature, code with optically detectable marking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1439Methods for optical code recognition including a method step for retrieval of the optical code
    • G06K7/1443Methods for optical code recognition including a method step for retrieval of the optical code locating of the code in an image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/21Intermediate information storage

Definitions

  • This invention relates to a data structure for an electronic document and related methods.
  • the device may have an imaging system, such as an infra red camera, within it, which is arranged to image a small area of the page close to the pen nib.
  • the pen includes a processor having image processing capabilities and a memory and is triggered by a force sensor in the nib to record images from the camera as the pen is moved across the document. From these images the pen can determine the position of any marks made on the document by the pen.
  • the pen markings can be stored directly as graphic images, which can then be stored and displayed in combination with other markings on the document.
  • the simple recognition that a mark has been made by the pen on a predefined area of the document can be recorded, and this information used in any suitable way.
  • This allows, for example, forms with check boxes on to be provided and the marking of the check boxes with the pen to be detected.
  • the pen markings are analysed using character recognition tools and stored digitally as text. Systems using this technology are available from Anoto AB and described on their website www.Anoto.com.
  • media generally paper, on which the position identification markings have been provided.
  • This media may comprise a plain media, or may comprise a form or the like on which information is provided in addition to the position identification markings. A user may then use his/her pen to add to the media whether or not it has such additional information.
  • Prior art solutions have provided content (e.g. a layout of a form, etc.) and metadata (i.e. data about the position identification markings) in a variety of formats. Prior solutions have suffered from problems of marrying the correct content to the correct metadata to produce the document required by the user.
  • content e.g. a layout of a form, etc.
  • metadata i.e. data about the position identification markings
  • a data structure which defines an electronic document, the data structure comprising first and second substantially separate portions of data; the first porti on of data defining the content of the document and the second portion comprising data relating to a pattern of position identification markings such that when the electronic document is printed a pattern reading device, such as a pen, is able to determine its position relative to the position identification markings.
  • the data structure most conveniently comprises a single data file with the first and second data portions being embedded within the data file.
  • data structure we mean a set of data which is stored in a structured manner. For example it may be electronic data stored in the memory of a computer or across a number of computers or memories.
  • a single data file defining a data structure such as a single electronic data file, can typically be identified as a collection of data that can be accessed through a single, common, file descriptor, such as a file name.
  • file descriptor such as a file name.
  • An advantage of such a data structure is that it provides a convenient means of storing the electronic document. As such it may be simpler than prior art systems to transfer the electronic document defined by the data structure to various locations between processing apparatus, etc., to electronically process the document and the like. A user can access both the content and the pattern data from a single file, and the content and pattern will not be easily separated.
  • the electronic document can be printed out in hard copy, thus providing a digital document for use with a digital pen. The pattern and content may be superimposed in this digital document.
  • the data structure may be written in such a form that the data structure may be con verted from one format to other formats without losing any of the information from the file, particularly information about the pattern. This may be achieved by providing the second portion of data as metadata and providing one or more controls which control the way in which the second portion of data is converted between formats to preserve the pattern.
  • the second portion of data may comprise XMP language meta-data (Extensible Metadata platform) data.
  • a PDF format Portable Document Format as provided by the AdobeTM corporation
  • JPEG Joint Photographic Experts Group
  • SVG Scalable Vector Graphics
  • GIF Graphics Interchange Format
  • TIFF Tagged Image File Format
  • PNG Portable Network Graphics
  • Use of the XMP format for the metadata means that the data structure can readily be converted using proprietary software known in the art between these formats whilst preserving the pattern information defined by the data. Any software which can scan a file for metadata will be able to identify the pattern data as distinct from the content and so determine which pattern is needed when printing.
  • XMP Metadata the reader is referred to "Embedding XMP Metadata in Application files", June 2002, Adobe Systems Incorporated, 345 Park Avenue, San Jose, CA 95110-2704, USA.
  • the content in the data structure is stored in a graphical format with pattern metadata embedded within the data structure.
  • the graphical format could be a bitmap or vector based format.
  • Prior art data structures are limited in their flexibility as they do not provide such data defining a pattern of position identification markings in the same file as the content yet in a separate portion of the file allowing it to be moved across formats. For example, in the past a single bitmap or vector format file defining both pattern and content suitable for sending to a printer is known. Such a data structure cannot be converted to other formats since specific information indicating which part of the data structure is pattern data and which is content data is not available. If this information is lost as the file is converted to another format the pattern could be lost or corrupted and then the electronic document cannot be printed correctly.
  • the first portion could also contain data other than content data, such as metadata defining the content or other information.
  • the content data could define text characters or graphical marks or other human-identifiable and/or readable information. Of course, in some situations it could comprise zero content in which case the digital document, when printed, may be blank other than for a pattern of positional markings.
  • the second portion defining the pattern may comprise metadata.
  • This metadata may completely define a portion of pattern needed to print the digital document such that it can be understood by a printer driver or a printer and rendered to form the pattern. It could alternatively comprise an entry of information which is a self- describing definition of a portion of pattern within a pattern space.
  • the metadata information about the pattern contained in the data structure may comprise the co-ordinates of at least one corner of a portion of pattern from a two-dimensional pattern space and optionally its size. If the space is fully characterised by a two-dimensional co-ordinate system this is all that is needed for a suitably enabled printer driver to generate the pattern. Additionally or alternatively, it may define the length of a side of at least one side of the portion, the shape of the portion or a set of absolute co-ordinates defining the boundary of the portion in the pattern space.
  • the metadata which is embedded in the second portion of the data structure may identify the location of a portion of pattern in a pattern space in many other ways. It could be a pointer to a server on which the pattern is stored, or which is capable of allocating the pattern to the document. To be useful a pattern space should be very large allowing it to be allocated to many hundreds or thousands of documents such that each document is allocated a unique portion of pattern. To make this more manageable the pattern space could be divided according to rules into sub-regions of known size, each of which may be referred to as a shelf of position identification markings. Each of the shelves may be further subdivided into individual pages.
  • an (X,Y) co-ordinate may be defined for each point within the page of position identification markings to define any portion of the position identification markings used within a printed document.
  • the data embedded in the second data portion may comprise data identifying both a shelf, a page on that shelf and the co-ordinates of a portion of pattern within that page.
  • an algorithm or the like may generate the portion of pattern from the data by identifying co-ordinates or other meta-data identifying the portion of the position identification marking.
  • the data structure may comprise a data file written in a mark up language such as XML and the second portion of data may comprise XML metadata embedded within the data file.
  • the data file may be in any one of a n umber of different formats for example PDF. It could in fact be in any known language which can be interpreted by a suitable printer driver or printer.
  • a method for generating an electronic document comprising: creating an electronic file and storing in that file at least some content and at least some position identification markings arranged to allow a pattern reading device to determine its position within the position identification markings, the electronic file being capable of generating an electronic document.
  • the method may allow the electronic document to be converted from a first file format, in which it is stored, to a second file format.
  • the first and second formats may be any one of the following: a PDF format (Portable Document Format as provided by the Adobe corporation); JPEG (Joint Photographic Experts Group); SVG format (Scalable Vector Graphics); GIF (Graphics Interchange Format); TIFF (Tagged Image File Format) ; PNG (Portable Network Graphics); or any other suitable file format.
  • a PDF format Portable Document Format as provided by the Adobe corporation
  • JPEG Joint Photographic Experts Group
  • SVG format Scalable Vector Graphics
  • GIF Graphics Interchange Format
  • TIFF Tagged Image File Format
  • PNG Portable Network Graphics
  • the invention provides a digital document production application suitable for producing a data structure defining a digital document comprising: content receiving means for receiving the content of the digital document; pattern receiving means for receiving data defining a pattern of position identification markings allocated to at least a portion of the document; and data structure generating means for generating a data structure defining the digital document which data structure comprises a first portion of data defining the content and a second portion of data defining the pattern.
  • the content receiving means may include a graphical user interface. This may present to a user an image of a document on a screen to which a user can add content. Alternatively, it may call up a content file containing content.
  • the content file could be a text file from a word processing package, or a spreadsheet from a database or a drawing from a drawing package. It may obtain content from more than one file.
  • the pattern receiving means may include a means for requesting pattern from a server or from a store of locally held pattern information.
  • the program may make this request once a user has indicated that the design of the document content is complete.
  • the data structure may be generated by the program automatically once a user has indicated that the design of the content is complete.
  • a data carrier containing instructions which when read onto a computer cause that computer to perform the method of the second aspect of the invention or provide the application of the third aspect.
  • a data carrier containing instructions which when read onto a computer provide the electronic document of the first aspect of the invention.
  • the data carrier of any of the above aspects of the invention can compri se a floppy disk, a CDROM, a DVD ROM/RAM (including +RW, -RW), a hard di'ive, a non-volatile memory, any form of magneto optical disk, a wire, a transmitted signal (which may comprise an internet download, an ftp transfer, or the like), or any other form of computer readable medium.
  • a source file for a printed digital document comprising content and a pattern of position identification markings arranged to allow a pattern reading device to determine its position within the position identification markings, the source file comprising al least a first portion defining the content and a second portion comprising metadata which comprises a self-defining description of the pattern.
  • Figure 1 shows a digital document created from an embodiment of a data structure according to an embodiment of the present invention
  • FIG. 1 shows in detail part of the digital document of Figure 1;
  • Figure 3 shows a prior art digital pen for use with the document of Figure 1 ;
  • Figure 4 is a flow diagram showing a method of generating a digital document in accordance with an embodiment of the present invention
  • Figure 5 shows the allocation of pattern space to the document of Figure 1, in accordance with an embodiment of the present invention
  • Figure 6 shows an electronic file defining the document of Figure 1 , in accordance with an embodiment of the present invention.
  • a digital document 100 for use in a digital pen and paper system comprises a carrier 102 in the form of a single sheet of paper
  • the 104 with position identifying markings 106 printed on some parts of it to define pattern areas 107 of a position identifying pattern 108. Also printed on the paper 104 are further markings 109 which are clearly visible to a human user of the document 100. Theses markings make up the content of the document 100.
  • the content 109 will obviously depend entirely on the intended use of the document. In this case an example of a very simple two page questionnaire is shown, and the content includes a number of boxes 1 10, 112 which can be pre-printed with user specific information such as the user' s name 114 and a document identification number 116.
  • the content further comprises a number of check boxes 118 any one of which is to be marked by the user, and two larger boxes 120, 121 in which the user can write comments.
  • the document content also includes a send box 122 to be checked by the user when he has completed the questionnaire to initiate a document completion process by which pen stroke data is forwarded for processing, and typographical information on the document 100 such as the headings or labels 124 for the various boxes 110, 112, 118, 120.
  • the position identifying pattern 108 is only printed onto the parts of the document 100 which the user is expected to write on or mark, that is within the check boxes 118, the comments boxes 120, 121 and the send box 122.
  • the position identifying pattern 108 is made up of a number of dots 130 arranged on an imaginary grid 132.
  • the grid 132 can be considered as being made up of horizontal and vertical lines 134, 136 defining a number of intersections 140 where they cross.
  • the intersections 140 are of the order of 0.3mm apart.
  • One dot 130 is provided at each intersection 140, but offset slightly in one of four possible directions up, down, left or right, from the actual intersection 140 by about l/6 lh of the grid spacing.
  • the dot offsets are arranged to vary in a systematic way so that any group of a sufficient number of dots 130, for example any group of 36 dots arranged in a six by six square, will be unique within a very large area of the pattern.
  • This large area is defined as a total imaginary pattern space, and only a small part of the pattern space is taken up by the pattern on the document 100.
  • the document and any position on the patterned parts of it can be identified from the pattern printed on it.
  • An example of this type of pattern is described in WO 01/26033. It will be appreciated that other position identifying patterns can equally be used. Some examples of other suitable patterns are described in WO 00/73983 and WO 01/71643.
  • a pattern reading device comprising a pen 300 comprises a writing nib 310, and a camera 312 made up of an infra red (IR) LED 314 and an IR sensor 316.
  • the camera 312 is arranged to image a circular area of diameter 3.3mm adjacent to the tip 311 of the pen nib 310.
  • a processor 318 processes images from the camera 312.
  • a pressure sensor 320 detects when the nib 310 is in contact with the document 100 and triggers operation of the camera 312. Whenever the pen is being used on a patterned area of the document 100, the processor 318 can therefore determine from the pattern 108 the position of the nib of the pen whenever it is in contact with the document 100.
  • the processor can determine the position and shape of any marks made on the patterned areas of the document 100.
  • This information is stored in a memory 320 in the pen as it is being used.
  • the user has finished marking the document, in this case when the questionnaire is completed, this is recorded in a document completion process, for example by making a mark with the pen in the send box 122.
  • the pen is arranged to recognise the pattern in the send box 122 and determine from that pattern the identity of the document 100. Suitable pens are available from Logitech under the trade mark Logitech Io.
  • the first step is the design and creation of an electronic document file containing content.
  • the electronic document can be printed as a hard copy digital document or displayed on a screen.
  • the design of the content of the document is carried out on a PC using an application (Step 600).
  • the application is Acrobat Reader and the PC also runs a number of other applications including a word processing package such as 'Word' a database package such as 'Access', and a spreadsheet package such as 'Excel'.
  • Each of these applications can be used to design the content of the document.
  • the user defines areas of the document to which the pattern 1 08 are to be applied, for example, a digital document creation application or form design tool (FDT) in the form of an Acrobat 5.0 plug-in.
  • FDT form design tool
  • the content is converted to a Portable Document Formal (PDF) file (Step 602).
  • Pattern areas for the document are then defined using the FDT (Step 604).
  • the split of the pattern areas within the document is defined (Step 606) producing a digital document defining both the content and the positions and shapes of the pattern areas.
  • the format of this digital document will again compri se a PDF file, the data structure of which will be described hereinafter.
  • the steps of designing the content and the pattern could both be performed by the FDT.
  • the pattern areas 107 can be defined in terms of their absolute positions, sizes and shapes on the document, or in relation to the content, such as by an indication of which of the boxes 1 14, 1 1 6, 1 18, 120, 121, 122 are to have the pattern 108 printed in them.
  • the pattern areas 107 can be defined by a combination of their absolute positions, sizes and shapes on the document, and in relation to the content printed in them. Association of a pattern area 107 with a content feature, such as a check box, can be used such that moving the content feature within the document design moves the associated pattern area 107 with it. This is helpful when designing and modifying the document.
  • a specific pattern area 107 is associated with each of the printed boxes 118, 120, 121 , 122, the pattern areas 107 do not have to correspond exactly to the areas of the printed boxes 118, 120, 121, 122.
  • Each of the pattern areas 107 will generally be made larger than the box 118, 120, 121, 122 with which it is associated. This allows for inaccurate positioning of a user made mark upon the page, whilst ensuring that the pen 300 will still be able to detect where the mark is on the page.
  • the pattern areas 107 have respective positions within the total pattern space area allocated to them. These allocated positions within the total pattern space are requested from, and allocated by, a pattern allocating server.
  • a single page 700 of pattern space required for the document 100 can be broken down by the FDT into a number of separate pattern space areas 718, 720, 721 , 722. These pattern space areas 728, 720, 721 , 722 are to be allocated to the respecti ve boxes 1 1 8, 1 20, 121, 122 on the document 100 (Step 606). These pattern space areas 718, 720, 721 , 722 are arranged on the page 700 of pattern space in any suitable way. In particular, the relative positions of the pattern space areas 718, 720, 721, 722 on the pattern space page 700 can differ from their relative positions on the final document 100.
  • Each area is identified by its coordinates on the page 700.
  • all allocated pattern space areas will be rectangular, and each is identified by the position of its top left and bottom right corners.
  • the coordinate system used has its origin at the lop left hand corner 724 of the page and includes an x coordinate indicating the distance to the right of the origin, and a y coordinate indicating the distance down from the origin.
  • the pattern space area 720 is identified by the coordinates (0,0; i yi).
  • some embodiments may store co-ordinates for a corner and a depth and height of the rectangular area.
  • Other embodiments may not assume that the areas are rectangular. They may assume, for example, that the area is circular and as such store a co-ordinate for the centre of the area together with a radius and/ or diameter for the area.
  • Other embodiments can specify the shape of the area (for example square, circular, elliptical, and the like) and then store information defining that area.
  • Step 608 Functions associated with the various patterned areas, if any, 718, 720, 721 , 722 are defined. This allows an application using the document 100 to process data received back when the document 100 has been written on.
  • the pattern areas in the larger boxes 120, 121 are identified as a graphical input areas, for which any pen markings should be stored graphically, or perhaps analysed using character recognition and stored as text.
  • the pattern associated with the check boxes 118 is associated with the respective response options so that the checking of the boxes 1 18 on a number of the documents 100 produces a standard mark, such as a cross, in the check box of the stored document.
  • the PDF file 800 comprises a first portion which includes graphical information 802 defining the content of the document 100, and a second portion 804 which comprises a pattern area definition defining the sizes and positions of the pattern areas 107 on the document 100.
  • the file may contain other, optional, features.
  • the file 800 may also contain information relating to the functions (if any) associated with the pattern areas on the document 100 and the relative positions of the pattern areas within the pattern space page 700 allocated to the document 100.
  • the PDF file 800 may contain a document ID 806, a traceability code 808 of the pattern associated with the send box 122, and other active information 810, associated with pattern areas other than the send box 122.
  • the traceability code 808 and active information 810 are used when the pattern areas upon the document 100 are passed over by the digital pen 300 such that a correlation between the location of a pattern area within the document and the pattern area' s activity can be established by a processor, either within the pen 300 or remote from the pen 300.
  • the PDF file 800 may also contain mapping in formation 812 for mapping data from databases or other sources onto the document 100. For example data such as the location of the user's name 1 14 and document ID number 1 16 within the database 414 can be extracted therefrom. Also, if pre-filled fields are used within the document 100 values 814 for filling these pre- filled fields can be extracted from the database 414. For example, the user' s name 1 14 and ID 1 16 can be extracted from the database 414 and automatically printed upon the document 100.
  • the PDF file 800 also contains, as in this example, a document instance ID
  • the PDF file 800 basically provides a data structure comprising a first portion of data relating to content of the document 100 (i.e. the graphical information 802) and a second portion of metadata relating to position identification markings within the document (i.e. pattern area definition 804).
  • the pattern data indicates which portions of an overall position identification pattern space have been used within a document and the location of these portions within the document.
  • Such a data structure allows a device, such as a pen 300, to determine its position within the position identification markings and what content is located at the current position of the pen 300.
  • the document defined by the data structure must be printed before it can be used with the pen (or perhaps displayed on a screen).
  • the data structure entries relating to position identification markings within the document can include semantic data about graphical items, typical graphical items include a check box, or a text box.
  • typical graphical items include a check box, or a text box.
  • the information that the text box is to be used to introduce a phone number can be linked to the text box, as can which portion of the overall position identification pattern relates to the text box.
  • This semantic data for the text box is stored as metadata within the PDF file 800.
  • the details of a server which is to be contacted for access permissions, control and tracking of the overall position identification pattern are also typically stored as metadata within the PDF file 800.
  • details of how to print the portion of the overall position identification pattern that relates to the text box, perhaps using the server, and data relating to the pattern printing rights and/ or licences can also be embedded within the PDF file 800, typically as XML data.
  • the data structure entries relating to position identification markings within the document and/or data identifying content within the electronic document 100 may be thought of as metadata (i.e. data about data).
  • the metadata for example XML
  • Appendix A shows a sample XML file where "Pattern (X,Y)” details which section of the overall position identification pattern is to be inserted within a document and "Page(X,Y)" describes the position of the section of position identification pattern within a page of a document to be printed. "Layer” describes the layer where the section of position identification pattern will be pasted.
  • the metadata may be organised into related groups of properties.
  • the groups may be relevant certain modules in a system used to manage, distribute and print an electronic document provi ded by the PDF file 800.
  • the groups may be implemented as schemas that define an XML namespace, such that elements and attributes can have the same name but originate from differing sources. This allows mark up elements within an XML file from the differing sources to be identified.
  • a set of rules are also defined in order to preserve the metadata when a file is opened and then saved in a file format different to that in which it was opened.
  • Custom properties can be added to a document such as a PDF file, each custom property having a name and a value. These "name"/" value” pairs are stored within the data file as metadata and when a file' s format is changed the metadata is transcribed to the correct location within the new file format keeping pairs together.
  • the portion of data defining pattern may be an XML packet containing metadata relating to the position identification pattern.
  • the pattern data is contained in a metadata stream within a PDF object within the file.
  • the file will use a marker (known as an APP1 marker) to designate the location of the XML packet containing the position identification pattern metadata. Therefore, when transcribing a PDF file to a JPEG file the XML packet containing the position identification pattern metadata should be transcribed to the correct location within the APP1 marker of the JPEG file and vice-versa. Similar transcriptions of metadata location data must take place when changing between any file formats, such as GIF, PNG, TIFF, SVG or any other suitable file format.
  • Metadata is enclosed within a file 800 as metadata
  • documents retain their context when they exit their original system or environment.
  • the form and properties of the documents are preserved when the program that uses the documents is not the final authority, i.e. when the program used to read, represent and translate the properties is a different program from that used to create the metadata.
  • the electronic document file 800 having metadata embedded therein allows a single file 800 to be distributed for a given document rather than needing to distribute multiple files, each relating to a separate property of the document.
  • the use of separate multiple files describing a single document has a number of disadvantages including managing a plurality of versions, ensuring all of the files relate to the same version of the document, and the increased risk of loss or corruption of a single file resulting in the loss of a complete document.
  • Providing a single file 800 results in the data content and metadata of the file 800 being edited at the same time.
  • the embedded metadata may include XML schema.
  • the metadata can be embedded using a file embedding mechanism that allows applications to more easily locate metadata in files by scanning of the file 800 rather than needing to parse a specific applications file format. Such an arrangement makes the metadata more accessible and further aid document interchange and management.
  • the metadata is embedded within the data defining the pattern areas 718, 720, 721 , 722 as an invisible font in the file 800.
  • text characters are defined in a predetermined manner by a string of data, and part of the string for each character defines the font in which the character will be printed.
  • the data defining the pattern areas 718, 720, 721, 722 is therefore put into the format of a series of text characters, with a non-valid font definition so that they will not be printed as characters by the printer.
  • a printer or other processing device used to print the file 800, or otherwise process it, is arranged to recognize the non-printable text characters, by means of the non- valid font definition.
  • the printer interprets the data defining the non-printable text characters in a different manner to standard, printable, text characters as identifying the size, shape, and position of the required pattern areas 718, 720, 721 , 722.
  • the non-valid font definition either provides the pattern of the position identification markings or provides instructions as to how the printer can obtain the pattern, typically from a networked resource, such as a server.
  • the definition of the pattern areas 718, 720, 721, 722 can be further enhanced by means of tags at the ends of the data string defining them. These tags alert the printer, or other processing device, to the fact that the data between them is to be interpreted as a definition of the pattern areas 718, 720, 721, 722.
  • the creation of the data set defining the digital document is performed by a form design tool, requiring pattern to be allocated at the design stage.
  • the data structure may be created by a printer driver upon receipt of a file which comprises content and a file which defines at least one pattern area. Before receipt by the printer driver the area need not have actual pattern allocated to it, this being performed by the printer driver, perhaps by accessing a pattern allocation server.
  • the output of such a printer driver would be a data structure in accordance with at least one embodiment of the present invention.
  • the output of the form design tool could be an embodiment of a data structure which is within the scope of at least one aspect of the invention.
  • the output of the FDT may comprise a data structure which includes a portion of data defining content and a second portion of data which defines the location of pattern areas within the document rather than the location of pattern for those areas within pattern space.
  • this second portion of data may comprise metadata about where in the document pattern is to be placed.
  • the metadata may indicate that the designed document is to contain some pattern at its upper left corner, and that the pattern is to cover one third of the page.
  • the printer driver - upon reading this metadata - allocates an appropriate portion of pattern and replaces the original metadata with new metadata defining the position of a portion of pattern in pattern space.

Abstract

A data structure which defines an electronic document comprises first and second substantially separate portions of data. The first portion of data defining the content of the document and the second portion comprising data relating to a pattern of position identification markings (106) such that when the electronic document is printed a pattern reading device, such as a pen (300), is able to determine its position relative to the position identification markings.

Description

A DATA STRUCTURE FOR AN ELECTRONIC DOCUMENT AND RELATED METHODS
FIELD OF THE INVENTION
This invention relates to a data structure for an electronic document and related methods.
BACKGROUND OF THE INVENTION
It is known to use documents having position identification markings in combination with a pattern reading device such as a digital pen. The device may have an imaging system, such as an infra red camera, within it, which is arranged to image a small area of the page close to the pen nib. The pen includes a processor having image processing capabilities and a memory and is triggered by a force sensor in the nib to record images from the camera as the pen is moved across the document. From these images the pen can determine the position of any marks made on the document by the pen. The pen markings can be stored directly as graphic images, which can then be stored and displayed in combination with other markings on the document. In some applications the simple recognition that a mark has been made by the pen on a predefined area of the document can be recorded, and this information used in any suitable way. This allows, for example, forms with check boxes on to be provided and the marking of the check boxes with the pen to be detected. In further applications the pen markings are analysed using character recognition tools and stored digitally as text. Systems using this technology are available from Anoto AB and described on their website www.Anoto.com.
It will be appreciated that in order to use a pen and a document using the position identification markings it is necessary to have media, generally paper, on which the position identification markings have been provided. This media may comprise a plain media, or may comprise a form or the like on which information is provided in addition to the position identification markings. A user may then use his/her pen to add to the media whether or not it has such additional information.
Prior art solutions have provided content (e.g. a layout of a form, etc.) and metadata (i.e. data about the position identification markings) in a variety of formats. Prior solutions have suffered from problems of marrying the correct content to the correct metadata to produce the document required by the user.
SUMMARY OF THE INVENTION
According to a first aspect of the invention there is provided a data structure which defines an electronic document, the data structure comprising first and second substantially separate portions of data; the first porti on of data defining the content of the document and the second portion comprising data relating to a pattern of position identification markings such that when the electronic document is printed a pattern reading device, such as a pen, is able to determine its position relative to the position identification markings.
The data structure most conveniently comprises a single data file with the first and second data portions being embedded within the data file.
The skilled man will understand that in using the term data structure we mean a set of data which is stored in a structured manner. For example it may be electronic data stored in the memory of a computer or across a number of computers or memories. A single data file defining a data structure, such as a single electronic data file, can typically be identified as a collection of data that can be accessed through a single, common, file descriptor, such as a file name. There must exist something that links the data together to form the single file, perhaps by storing them together or naming each piece of data with a common identifier that links them. This common link allows all of the data to be accessed or moved together at the request of the user.
An advantage of such a data structure is that it provides a convenient means of storing the electronic document. As such it may be simpler than prior art systems to transfer the electronic document defined by the data structure to various locations between processing apparatus, etc., to electronically process the document and the like. A user can access both the content and the pattern data from a single file, and the content and pattern will not be easily separated. The electronic document can be printed out in hard copy, thus providing a digital document for use with a digital pen. The pattern and content may be superimposed in this digital document.
The data structure may be written in such a form that the data structure may be con verted from one format to other formats without losing any of the information from the file, particularly information about the pattern. This may be achieved by providing the second portion of data as metadata and providing one or more controls which control the way in which the second portion of data is converted between formats to preserve the pattern. In one embodiment of the invention the second portion of data may comprise XMP language meta-data (Extensible Metadata platform) data. This can be embedded in a data structure saved in the following formats: a PDF format (Portable Document Format as provided by the Adobe™ corporation); JPEG (Joint Photographic Experts Group); SVG format (Scalable Vector Graphics); GIF (Graphics Interchange Format); TIFF (Tagged Image File Format); PNG (Portable Network Graphics). Use of the XMP format for the metadata means that the data structure can readily be converted using proprietary software known in the art between these formats whilst preserving the pattern information defined by the data. Any software which can scan a file for metadata will be able to identify the pattern data as distinct from the content and so determine which pattern is needed when printing. For a more detailed explanation of such XMP metadata the reader is referred to "Embedding XMP Metadata in Application files", June 2002, Adobe Systems Incorporated, 345 Park Avenue, San Jose, CA 95110-2704, USA.
It is preferred that the content in the data structure is stored in a graphical format with pattern metadata embedded within the data structure. The graphical format could be a bitmap or vector based format.
Prior art data structures are limited in their flexibility as they do not provide such data defining a pattern of position identification markings in the same file as the content yet in a separate portion of the file allowing it to be moved across formats. For example, in the past a single bitmap or vector format file defining both pattern and content suitable for sending to a printer is known. Such a data structure cannot be converted to other formats since specific information indicating which part of the data structure is pattern data and which is content data is not available. If this information is lost as the file is converted to another format the pattern could be lost or corrupted and then the electronic document cannot be printed correctly.
The first portion could also contain data other than content data, such as metadata defining the content or other information. The content data could define text characters or graphical marks or other human-identifiable and/or readable information. Of course, in some situations it could comprise zero content in which case the digital document, when printed, may be blank other than for a pattern of positional markings.
The second portion defining the pattern may comprise metadata. By this we mean "data about data". This metadata may completely define a portion of pattern needed to print the digital document such that it can be understood by a printer driver or a printer and rendered to form the pattern. It could alternatively comprise an entry of information which is a self- describing definition of a portion of pattern within a pattern space.
For example, the metadata information about the pattern contained in the data structure may comprise the co-ordinates of at least one corner of a portion of pattern from a two-dimensional pattern space and optionally its size. If the space is fully characterised by a two-dimensional co-ordinate system this is all that is needed for a suitably enabled printer driver to generate the pattern. Additionally or alternatively, it may define the length of a side of at least one side of the portion, the shape of the portion or a set of absolute co-ordinates defining the boundary of the portion in the pattern space.
The metadata which is embedded in the second portion of the data structure may identify the location of a portion of pattern in a pattern space in many other ways. It could be a pointer to a server on which the pattern is stored, or which is capable of allocating the pattern to the document. To be useful a pattern space should be very large allowing it to be allocated to many hundreds or thousands of documents such that each document is allocated a unique portion of pattern. To make this more manageable the pattern space could be divided according to rules into sub-regions of known size, each of which may be referred to as a shelf of position identification markings. Each of the shelves may be further subdivided into individual pages. Within each page an (X,Y) co-ordinate may be defined for each point within the page of position identification markings to define any portion of the position identification markings used within a printed document. In this case, the data embedded in the second data portion may comprise data identifying both a shelf, a page on that shelf and the co-ordinates of a portion of pattern within that page.
Providing data about the pattern as metadata within a file in this way ensures that together with some knowledge of the rules which define the pattern space the document contains all the information needed to print the correct content with the correct pattern. All that is needed is knowledge of the pattern space the portion is selected from.
In other words, an algorithm or the like may generate the portion of pattern from the data by identifying co-ordinates or other meta-data identifying the portion of the position identification marking.
The data structure may comprise a data file written in a mark up language such as XML and the second portion of data may comprise XML metadata embedded within the data file. The data file may be in any one of a n umber of different formats for example PDF. It could in fact be in any known language which can be interpreted by a suitable printer driver or printer.
According to a second aspect of the invention there is provided a method for generating an electronic document comprising: creating an electronic file and storing in that file at least some content and at least some position identification markings arranged to allow a pattern reading device to determine its position within the position identification markings, the electronic file being capable of generating an electronic document. The method may allow the electronic document to be converted from a first file format, in which it is stored, to a second file format. The first and second formats may be any one of the following: a PDF format (Portable Document Format as provided by the Adobe corporation); JPEG (Joint Photographic Experts Group); SVG format (Scalable Vector Graphics); GIF (Graphics Interchange Format); TIFF (Tagged Image File Format) ; PNG (Portable Network Graphics); or any other suitable file format.
According to a third aspect the invention provides a digital document production application suitable for producing a data structure defining a digital document comprising: content receiving means for receiving the content of the digital document; pattern receiving means for receiving data defining a pattern of position identification markings allocated to at least a portion of the document; and data structure generating means for generating a data structure defining the digital document which data structure comprises a first portion of data defining the content and a second portion of data defining the pattern.
The content receiving means may include a graphical user interface. This may present to a user an image of a document on a screen to which a user can add content. Alternatively, it may call up a content file containing content. The content file could be a text file from a word processing package, or a spreadsheet from a database or a drawing from a drawing package. It may obtain content from more than one file.
The pattern receiving means may include a means for requesting pattern from a server or from a store of locally held pattern information. The program may make this request once a user has indicated that the design of the document content is complete. The data structure may be generated by the program automatically once a user has indicated that the design of the content is complete.
Of course, it will be readily understood that a technically competent user could produce such a data structure directly using a text editor. The program of this aspect of the invention makes the process considerably simpler and allows users of low technical ability to produce digital documents.
According to a fourth aspect of the invention there is provided a data carrier containing instructions which when read onto a computer cause that computer to perform the method of the second aspect of the invention or provide the application of the third aspect.
According to a fifth aspect of the invention there is provided a data carrier containing instructions which when read onto a computer provide the electronic document of the first aspect of the invention.
The data carrier of any of the above aspects of the invention can compri se a floppy disk, a CDROM, a DVD ROM/RAM (including +RW, -RW), a hard di'ive, a non-volatile memory, any form of magneto optical disk, a wire, a transmitted signal (which may comprise an internet download, an ftp transfer, or the like), or any other form of computer readable medium.
According to a further aspect of the invention there is provided a source file for a printed digital document, the printed document comprising content and a pattern of position identification markings arranged to allow a pattern reading device to determine its position within the position identification markings, the source file comprising al least a first portion defining the content and a second portion comprising metadata which comprises a self-defining description of the pattern. Preferred embodiments of a data structure defining a digital document in accordance with the present invention will now be described by way of example only with reference to the accompanying drawings in which:
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a digital document created from an embodiment of a data structure according to an embodiment of the present invention;
Figure 2 shows in detail part of the digital document of Figure 1;
Figure 3 shows a prior art digital pen for use with the document of Figure 1 ;
Figure 4 is a flow diagram showing a method of generating a digital document in accordance with an embodiment of the present invention; Figure 5 shows the allocation of pattern space to the document of Figure 1, in accordance with an embodiment of the present invention;
Figure 6 shows an electronic file defining the document of Figure 1 , in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Referring to Figure 1 a digital document 100 for use in a digital pen and paper system comprises a carrier 102 in the form of a single sheet of paper
104 with position identifying markings 106 printed on some parts of it to define pattern areas 107 of a position identifying pattern 108. Also printed on the paper 104 are further markings 109 which are clearly visible to a human user of the document 100. Theses markings make up the content of the document 100. The content 109 will obviously depend entirely on the intended use of the document. In this case an example of a very simple two page questionnaire is shown, and the content includes a number of boxes 1 10, 112 which can be pre-printed with user specific information such as the user' s name 114 and a document identification number 116. The content further comprises a number of check boxes 118 any one of which is to be marked by the user, and two larger boxes 120, 121 in which the user can write comments. The document content also includes a send box 122 to be checked by the user when he has completed the questionnaire to initiate a document completion process by which pen stroke data is forwarded for processing, and typographical information on the document 100 such as the headings or labels 124 for the various boxes 110, 112, 118, 120. The position identifying pattern 108 is only printed onto the parts of the document 100 which the user is expected to write on or mark, that is within the check boxes 118, the comments boxes 120, 121 and the send box 122.
Referring to Figure 2, the position identifying pattern 108 is made up of a number of dots 130 arranged on an imaginary grid 132. The grid 132 can be considered as being made up of horizontal and vertical lines 134, 136 defining a number of intersections 140 where they cross. The intersections 140 are of the order of 0.3mm apart. One dot 130 is provided at each intersection 140, but offset slightly in one of four possible directions up, down, left or right, from the actual intersection 140 by about l/6lh of the grid spacing. The dot offsets are arranged to vary in a systematic way so that any group of a sufficient number of dots 130, for example any group of 36 dots arranged in a six by six square, will be unique within a very large area of the pattern. This large area is defined as a total imaginary pattern space, and only a small part of the pattern space is taken up by the pattern on the document 100. By allocating a known area of the pattern space to the document 100, for example by means of a co-ordinate reference, the document and any position on the patterned parts of it can be identified from the pattern printed on it. An example of this type of pattern is described in WO 01/26033. It will be appreciated that other position identifying patterns can equally be used. Some examples of other suitable patterns are described in WO 00/73983 and WO 01/71643.
Referring to Figure 3, a pattern reading device comprising a pen 300 comprises a writing nib 310, and a camera 312 made up of an infra red (IR) LED 314 and an IR sensor 316. The camera 312 is arranged to image a circular area of diameter 3.3mm adjacent to the tip 311 of the pen nib 310. A processor 318 processes images from the camera 312. A pressure sensor 320 detects when the nib 310 is in contact with the document 100 and triggers operation of the camera 312. Whenever the pen is being used on a patterned area of the document 100, the processor 318 can therefore determine from the pattern 108 the position of the nib of the pen whenever it is in contact with the document 100. From this the processor can determine the position and shape of any marks made on the patterned areas of the document 100. This information is stored in a memory 320 in the pen as it is being used. When the user has finished marking the document, in this case when the questionnaire is completed, this is recorded in a document completion process, for example by making a mark with the pen in the send box 122. The pen is arranged to recognise the pattern in the send box 122 and determine from that pattern the identity of the document 100. Suitable pens are available from Logitech under the trade mark Logitech Io.
The foregoing discussion is related to known systems and preferred embodiments of the present invention are described hereinafter. In order to produce a digital document 100, the first step is the design and creation of an electronic document file containing content. The electronic document can be printed as a hard copy digital document or displayed on a screen. Referring to Figure 4, the design of the content of the document is carried out on a PC using an application (Step 600). In this example the application is Acrobat Reader and the PC also runs a number of other applications including a word processing package such as 'Word' a database package such as 'Access', and a spreadsheet package such as 'Excel'. Each of these applications can be used to design the content of the document. The user defines areas of the document to which the pattern 1 08 are to be applied, for example, a digital document creation application or form design tool (FDT) in the form of an Acrobat 5.0 plug-in.
In this example the content is converted to a Portable Document Formal (PDF) file (Step 602). Pattern areas for the document are then defined using the FDT (Step 604). The split of the pattern areas within the document is defined (Step 606) producing a digital document defining both the content and the positions and shapes of the pattern areas. The format of this digital document will again compri se a PDF file, the data structure of which will be described hereinafter. Of course, it will be appreciated that the steps of designing the content and the pattern could both be performed by the FDT.
Depending on the FDT, the pattern areas 107 can be defined in terms of their absolute positions, sizes and shapes on the document, or in relation to the content, such as by an indication of which of the boxes 1 14, 1 1 6, 1 18, 120, 121, 122 are to have the pattern 108 printed in them. Alternatively, the pattern areas 107 can be defined by a combination of their absolute positions, sizes and shapes on the document, and in relation to the content printed in them. Association of a pattern area 107 with a content feature, such as a check box, can be used such that moving the content feature within the document design moves the associated pattern area 107 with it. This is helpful when designing and modifying the document. Although, a specific pattern area 107 is associated with each of the printed boxes 118, 120, 121 , 122, the pattern areas 107 do not have to correspond exactly to the areas of the printed boxes 118, 120, 121, 122. Each of the pattern areas 107 will generally be made larger than the box 118, 120, 121, 122 with which it is associated. This allows for inaccurate positioning of a user made mark upon the page, whilst ensuring that the pen 300 will still be able to detect where the mark is on the page.
The pattern areas 107 have respective positions within the total pattern space area allocated to them. These allocated positions within the total pattern space are requested from, and allocated by, a pattern allocating server. Referring to Figure 5, a single page 700 of pattern space required for the document 100 can be broken down by the FDT into a number of separate pattern space areas 718, 720, 721 , 722. These pattern space areas 728, 720, 721 , 722 are to be allocated to the respecti ve boxes 1 1 8, 1 20, 121, 122 on the document 100 (Step 606). These pattern space areas 718, 720, 721 , 722 are arranged on the page 700 of pattern space in any suitable way. In particular, the relative positions of the pattern space areas 718, 720, 721, 722 on the pattern space page 700 can differ from their relative positions on the final document 100.
Each area is identified by its coordinates on the page 700. In this case it is assumed that all allocated pattern space areas will be rectangular, and each is identified by the position of its top left and bottom right corners. The coordinate system used has its origin at the lop left hand corner 724 of the page and includes an x coordinate indicating the distance to the right of the origin, and a y coordinate indicating the distance down from the origin. The pattern space area 720, for example, is identified by the coordinates (0,0; i yi).
It would of course be possible to use other co-ordinate systems. For example, some embodiments may store co-ordinates for a corner and a depth and height of the rectangular area. Other embodiments may not assume that the areas are rectangular. They may assume, for example, that the area is circular and as such store a co-ordinate for the centre of the area together with a radius and/ or diameter for the area. Other embodiments can specify the shape of the area (for example square, circular, elliptical, and the like) and then store information defining that area.
Functions associated with the various patterned areas, if any, 718, 720, 721 , 722 are defined (Step 608). This allows an application using the document 100 to process data received back when the document 100 has been written on. In the case of the questionnaire document 100 the pattern areas in the larger boxes 120, 121 are identified as a graphical input areas, for which any pen markings should be stored graphically, or perhaps analysed using character recognition and stored as text. The pattern associated with the check boxes 118 is associated with the respective response options so that the checking of the boxes 1 18 on a number of the documents 100 produces a standard mark, such as a cross, in the check box of the stored document.
Finally the designed electronic document 100 is saved as a .single electronic document file and allocated a document name (Step 610).
Upon completion of the design of the document 100 a data structure (in this example a PDF file 800) will have been created, as shown in Figure 6. The PDF file 800 comprises a first portion which includes graphical information 802 defining the content of the document 100, and a second portion 804 which comprises a pattern area definition defining the sizes and positions of the pattern areas 107 on the document 100.
Also as shown in Figure 8, the file may contain other, optional, features. For example, the file 800 may also contain information relating to the functions (if any) associated with the pattern areas on the document 100 and the relative positions of the pattern areas within the pattern space page 700 allocated to the document 100.
Additionally, the PDF file 800 may contain a document ID 806, a traceability code 808 of the pattern associated with the send box 122, and other active information 810, associated with pattern areas other than the send box 122. The traceability code 808 and active information 810 are used when the pattern areas upon the document 100 are passed over by the digital pen 300 such that a correlation between the location of a pattern area within the document and the pattern area' s activity can be established by a processor, either within the pen 300 or remote from the pen 300.
The PDF file 800 may also contain mapping in formation 812 for mapping data from databases or other sources onto the document 100. For example data such as the location of the user's name 1 14 and document ID number 1 16 within the database 414 can be extracted therefrom. Also, if pre-filled fields are used within the document 100 values 814 for filling these pre- filled fields can be extracted from the database 414. For example, the user' s name 1 14 and ID 1 16 can be extracted from the database 414 and automatically printed upon the document 100.
The PDF file 800 also contains, as in this example, a document instance ID
816 which is unique to the individual document to be printed. Usually, this data is not placed into the file 800 until the time of printing. Normally, there will only be one printed document with a particular instance ID 816 so that individual documents can be tracked and identified. However, in some instances it is desirable to be able to print more than one copy of exactly the same document with the same document instance ID 816, for example in secret ballots where anonymity is desirable. Therefore provision is made to allow the printing of more than one copy of a given document with the same instance ID 816.
Thus, the PDF file 800 basically provides a data structure comprising a first portion of data relating to content of the document 100 (i.e. the graphical information 802) and a second portion of metadata relating to position identification markings within the document (i.e. pattern area definition 804). The pattern data indicates which portions of an overall position identification pattern space have been used within a document and the location of these portions within the document. Such a data structure allows a device, such as a pen 300, to determine its position within the position identification markings and what content is located at the current position of the pen 300. Of course, the document defined by the data structure must be printed before it can be used with the pen (or perhaps displayed on a screen).
The data structure entries relating to position identification markings within the document can include semantic data about graphical items, typical graphical items include a check box, or a text box. For example, the information that the text box is to be used to introduce a phone number can be linked to the text box, as can which portion of the overall position identification pattern relates to the text box. This semantic data for the text box is stored as metadata within the PDF file 800.
The details of a server which is to be contacted for access permissions, control and tracking of the overall position identification pattern are also typically stored as metadata within the PDF file 800. Similarly, details of how to print the portion of the overall position identification pattern that relates to the text box, perhaps using the server, and data relating to the pattern printing rights and/ or licences can also be embedded within the PDF file 800, typically as XML data.
The data structure entries relating to position identification markings within the document and/or data identifying content within the electronic document 100 may be thought of as metadata (i.e. data about data). In one embodiment, the metadata (for example XML) is embedded within the PDF file 800 and is used to provide a self- describing representation of the position identification markings. Appendix A shows a sample XML file where "Pattern (X,Y)" details which section of the overall position identification pattern is to be inserted within a document and "Page(X,Y)" describes the position of the section of position identification pattern within a page of a document to be printed. "Layer" describes the layer where the section of position identification pattern will be pasted. This "Layer" descriptor is useful where an overlap of sections of position identi fication pattern occurs, as the layer with the lowest "Layer" val ue will be printed. "IsMagicBox" is merely an attribute for the section of position identification pattern contained within the document.
The metadata may be organised into related groups of properties. For example, the groups may be relevant certain modules in a system used to manage, distribute and print an electronic document provi ded by the PDF file 800. The groups may be implemented as schemas that define an XML namespace, such that elements and attributes can have the same name but originate from differing sources. This allows mark up elements within an XML file from the differing sources to be identified. A set of rules are also defined in order to preserve the metadata when a file is opened and then saved in a file format different to that in which it was opened.
When transcribing between file formats the original representation used in the writing of the metadata should be preserved in the output. Custom properties can be added to a document such as a PDF file, each custom property having a name and a value. These "name"/" value" pairs are stored within the data file as metadata and when a file' s format is changed the metadata is transcribed to the correct location within the new file format keeping pairs together.
For example, in an embodiment of a data structure which is the form of a PDF file the portion of data defining pattern may be an XML packet containing metadata relating to the position identification pattern. The pattern data is contained in a metadata stream within a PDF object within the file. On the other hand, if the data structure is in the form of a JPEG file, and the pattern data is again provided in an XML packet, the file will use a marker (known as an APP1 marker) to designate the location of the XML packet containing the position identification pattern metadata. Therefore, when transcribing a PDF file to a JPEG file the XML packet containing the position identification pattern metadata should be transcribed to the correct location within the APP1 marker of the JPEG file and vice-versa. Similar transcriptions of metadata location data must take place when changing between any file formats, such as GIF, PNG, TIFF, SVG or any other suitable file format.
Therefore, because the metadata is enclosed within a file 800 as metadata, documents retain their context when they exit their original system or environment. Thus, the form and properties of the documents are preserved when the program that uses the documents is not the final authority, i.e. when the program used to read, represent and translate the properties is a different program from that used to create the metadata.
Use of metadata for pattern information enables users to store, retrieve, distribute and share digital paper documents that can be easily and correctly viewed by any user with access to them. Further, the electronic document file 800 having metadata embedded therein allows a single file 800 to be distributed for a given document rather than needing to distribute multiple files, each relating to a separate property of the document. The use of separate multiple files describing a single document has a number of disadvantages including managing a plurality of versions, ensuring all of the files relate to the same version of the document, and the increased risk of loss or corruption of a single file resulting in the loss of a complete document. Providing a single file 800 results in the data content and metadata of the file 800 being edited at the same time. Further, the embedded metadata may include XML schema.
Further, the metadata can be embedded using a file embedding mechanism that allows applications to more easily locate metadata in files by scanning of the file 800 rather than needing to parse a specific applications file format. Such an arrangement makes the metadata more accessible and further aid document interchange and management.
In an alternative embodiment the metadata is embedded within the data defining the pattern areas 718, 720, 721 , 722 as an invisible font in the file 800. For example, text characters are defined in a predetermined manner by a string of data, and part of the string for each character defines the font in which the character will be printed. The data defining the pattern areas 718, 720, 721, 722 is therefore put into the format of a series of text characters, with a non-valid font definition so that they will not be printed as characters by the printer. In this embodiment a printer or other processing device used to print the file 800, or otherwise process it, is arranged to recognize the non-printable text characters, by means of the non- valid font definition. The printer, or other processing device, interprets the data defining the non-printable text characters in a different manner to standard, printable, text characters as identifying the size, shape, and position of the required pattern areas 718, 720, 721 , 722. The non-valid font definition either provides the pattern of the position identification markings or provides instructions as to how the printer can obtain the pattern, typically from a networked resource, such as a server.
The definition of the pattern areas 718, 720, 721, 722 can be further enhanced by means of tags at the ends of the data string defining them. These tags alert the printer, or other processing device, to the fact that the data between them is to be interpreted as a definition of the pattern areas 718, 720, 721, 722.
Thus, when the PDF file 800 is sent for printing each graphic object contained within the PDF file 800 is received by the printer and the valid graphic objects are printed in the conventional manner. Those characters with non-valid font definitions are interpreted such that the pattern areas 718, 720, 721, 722 are printed in their defined areas of the document.
In the embodiments described hereinbefore it is stated that the creation of the data set defining the digital document is performed by a form design tool, requiring pattern to be allocated at the design stage. This need not be the case in other embodiments. For example, the data structure may be created by a printer driver upon receipt of a file which comprises content and a file which defines at least one pattern area. Before receipt by the printer driver the area need not have actual pattern allocated to it, this being performed by the printer driver, perhaps by accessing a pattern allocation server. The output of such a printer driver would be a data structure in accordance with at least one embodiment of the present invention. Also, it is possible that the output of the form design tool could be an embodiment of a data structure which is within the scope of at least one aspect of the invention.
The output of the FDT may comprise a data structure which includes a portion of data defining content and a second portion of data which defines the location of pattern areas within the document rather than the location of pattern for those areas within pattern space. As before, this second portion of data may comprise metadata about where in the document pattern is to be placed. As an example, the metadata may indicate that the designed document is to contain some pattern at its upper left corner, and that the pattern is to cover one third of the page. The printer driver - upon reading this metadata - allocates an appropriate portion of pattern and replaces the original metadata with new metadata defining the position of a portion of pattern in pattern space.
APPENDIX A
- < Document DLDVersιonNumber="0" DLDSubVers!onNumber = "1 " DLDPπntableVersιonNumber="0.1 " NumberOfForms="1 "> - < FormnumberOfPages="1 "form! D="gfggg"userdata= "Not Used "form I n stance! D = ""tern pi at el D="PODTem plate V1 " iocal = "0"standardSιze="A4"pagesιzβheιght="0" pagesιzewιdth = "0"> - < Form PagepageOrιentatιon= "Portrait " tcX="0" tcY="601" mιtιa!XIViargιn = "0" init!alYMargin = "0"> < Drawing Area p tternX="1402" patternY="1165" pageX="597" pageY="89l" wιdth = "82"
Figure imgf000024_0001
< DrawmgArea patternX="1126 " pattei nY= "365" pageX="321 " pageY="91 " wfdth = "l 24" heιght="64" layer="l" lsa agιcBox = "0" /> < Drawing Area patternX="1301 " pattornY="365" pageX="496" pageY="9l " wιdth="159" eιght="74" layer="l" !saiVlagιcBox = "0" /> </ Form Page> </Form> </ Docurnent>

Claims

1. A data structure which defines an electronic document, the data structure comprising first and second substantially separate portions of data; the first portion of data defining the content of the document and the second portion comprising data relating to a pattern of position identification markings such that when the electronic document is printed a pattern reading device, such as a pen, is able to determine its position relative to the position identification markings, the data structure comprising a single data file with the first and second data portions being embedded within the data file.
2. A data structure according to claim 1 which is written in such a form that the data structure can be converted from one format to other formats without losing any of the information from the document.
3. A data structure according to any preceding claim in which the second portion of data comprises metadata and in which the data structure includes one or more controls which control the way in which the second portion of data is converted between formats to preserve the pattern.
4. A data structure according to any preceding claim in which the data in the second portion comprises any one or more of the following: data from which an algorithm or the like can generate the pattern; co-ordinates or other metadata identifying the portion of the position identification marking.
5. A data structure according to any preceding claim in which the at least one portion providing the position of the position identification markings within the document and/or data identifying the content of the position identification marking in the document is provided in XML
6. A data structure according to any preceding claim in which a schema, generally an XML schema, is provided.
7. An application adapted to produce an electronic document, the application comprising: content receiving means for receiving the content of the electronic document, pattern receiving means for receiving data defining a pattern of positional markings allocated to at least a portion of the document; and data structure generating means for generating a data structure defining the electronic document which data structure comprises first and second substantially separate portions of data, the first portion of data defining the content and the second portion of data relating to the pattern.
8. A method for generating an electronic document comprising creating an electronic file and storing in that file data and metadata, the data defining at least some content and the metadata relating to a pattern of position identification markings arranged to allow a device, such as a pen, to determine its position within the position identification markings, the electronic file capable of generating an electronic document.
9. A method according to claim 8 in which a file embedding mechanism is used to embed metadata, generally XML metadata, within the electronic document.
10. A data carrier containing instructions which when read onto a computer cause that computer to perform the method of claim 8 or claim 9.
11. A data carrier containing instructions which when read onto a computer cause that computer to provide the data structure of any of claims 1 to 6.
12. A data carrier containing instructions which when read onto a computer cause that computer to provide the digital document creation application of claim 7.
13. A data carrier containing instructions which when read onto a computer cause that computer to perform the method of claim 8 or 9.
14. A source file for a digital document, the digital document comprising content and a pattern of position identification markings arranged to allow a device, such as a pen, to determine its position within the position identification markings, the source file comprising at least first and second portions of data, the first portion defining the content and the second portion comprising metadata which provides a self-defining description of the pattern.
PCT/EP2004/051940 2003-09-10 2004-08-27 A data structure for an electronic document and related methods WO2005025204A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/571,075 US20080114777A1 (en) 2003-09-10 2004-08-27 Data Structure for an Electronic Document and Related Methods
EP04766624A EP1716698A1 (en) 2003-09-10 2004-08-27 A data structure for an electronic document and related methods

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0321171.1 2003-09-10
GBGB0321171.1A GB0321171D0 (en) 2003-09-10 2003-09-10 A data structure for an electronic document and related methods

Publications (1)

Publication Number Publication Date
WO2005025204A1 true WO2005025204A1 (en) 2005-03-17

Family

ID=29226816

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2004/051940 WO2005025204A1 (en) 2003-09-10 2004-08-27 A data structure for an electronic document and related methods

Country Status (4)

Country Link
US (1) US20080114777A1 (en)
EP (1) EP1716698A1 (en)
GB (1) GB0321171D0 (en)
WO (1) WO2005025204A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7753274B2 (en) * 2006-04-12 2010-07-13 Fuji Xerox Co., Ltd. Writing information processing device, writing information processing method and computer-readable medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0321167D0 (en) * 2003-09-10 2003-10-08 Hewlett Packard Development Co Printing digital documents
US20080084573A1 (en) * 2006-10-10 2008-04-10 Yoram Horowitz System and method for relating unstructured data in portable document format to external structured data
US8358964B2 (en) * 2007-04-25 2013-01-22 Scantron Corporation Methods and systems for collecting responses
JP2009098710A (en) * 2007-10-12 2009-05-07 Canon Inc Portable terminal, print method for content in its terminal, printer communicating with its terminal, its control method and print system
US20140250055A1 (en) * 2008-04-11 2014-09-04 Adobe Systems Incorporated Systems and Methods for Associating Metadata With Media Using Metadata Placeholders
WO2010114845A1 (en) * 2009-03-31 2010-10-07 Adapx Inc. Determining an object location relative to a digital document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987136A (en) * 1997-08-04 1999-11-16 Trimble Navigation Ltd. Image authentication patterning
WO2001016691A1 (en) * 1999-08-30 2001-03-08 Anoto Ab Notepad
US20020020750A1 (en) * 1998-04-01 2002-02-21 Xerox Corporation Marking medium area with encoded identifier for producing action through network
US20030091233A1 (en) * 1999-05-25 2003-05-15 Paul Lapstun Method and system for note taking using a form with coded marks
WO2003042912A1 (en) * 2001-09-13 2003-05-22 Anoto Ab Product with a coding pattern and method, device and computer program for coding and decoding the pattern

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477012A (en) * 1992-04-03 1995-12-19 Sekendur; Oral F. Optical position determination
US5668931A (en) * 1993-03-31 1997-09-16 Dermer; Richard A. Method for automatic trap selection for correcting for plate misregistration in color printing
US5613046A (en) * 1993-03-31 1997-03-18 Miles Inc. Method and apparatus for correcting for plate misregistration in color printing
US5313570A (en) * 1993-03-31 1994-05-17 Miles, Inc. Method for determining color boundaries for correcting for plate misregistration in color printing
GB2305525B (en) * 1995-09-21 1997-08-20 Ricoh Kk Document data administrating system and method of administrating document data
US6081261A (en) * 1995-11-01 2000-06-27 Ricoh Corporation Manual entry interactive paper and electronic document handling and processing system
JP3478681B2 (en) * 1996-10-07 2003-12-15 株式会社リコー Document information management system
JP3478725B2 (en) * 1997-07-25 2003-12-15 株式会社リコー Document information management system
US6518950B1 (en) * 1997-10-07 2003-02-11 Interval Research Corporation Methods and systems for providing human/computer interfaces
US6065021A (en) * 1998-04-07 2000-05-16 Adobe Systems Incorporated Apparatus and method for alignment of graphical elements in electronic document
US7105753B1 (en) * 1999-05-25 2006-09-12 Silverbrook Research Pty Ltd Orientation sensing device
US6792165B1 (en) * 1999-05-25 2004-09-14 Silverbrook Research Pty Ltd Sensing device
US6502756B1 (en) * 1999-05-28 2003-01-07 Anoto Ab Recording of information
US7070098B1 (en) * 2000-05-24 2006-07-04 Silverbrook Res Pty Ltd Printed page tag encoder
US6871204B2 (en) * 2000-09-07 2005-03-22 Oracle International Corporation Apparatus and method for mapping relational data and metadata to XML
US6912555B2 (en) * 2002-01-18 2005-06-28 Hewlett-Packard Development Company, L.P. Method for content mining of semi-structured documents
EP1333402B1 (en) * 2002-02-04 2008-09-10 Baumer Optronic GmbH Redundant twodimensional code and decoding method
US7120872B2 (en) * 2002-03-25 2006-10-10 Microsoft Corporation Organizing, editing, and rendering digital ink
SE523931C2 (en) * 2002-10-24 2004-06-01 Anoto Ab Information processing system arrangement for printing on demand of position-coded base, allows application of graphic information and position data assigned for graphical object, to substrate for forming position-coded base
US20040145610A1 (en) * 2003-01-17 2004-07-29 Vortx Group Customized wall border imaging solution
US7653876B2 (en) * 2003-04-07 2010-01-26 Adobe Systems Incorporated Reversible document format
US6962450B2 (en) * 2003-09-10 2005-11-08 Hewlett-Packard Development Company L.P. Methods and apparatus for generating images
US20050052700A1 (en) * 2003-09-10 2005-03-10 Andrew Mackenzie Printing digital documents

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5987136A (en) * 1997-08-04 1999-11-16 Trimble Navigation Ltd. Image authentication patterning
US20020020750A1 (en) * 1998-04-01 2002-02-21 Xerox Corporation Marking medium area with encoded identifier for producing action through network
US20030091233A1 (en) * 1999-05-25 2003-05-15 Paul Lapstun Method and system for note taking using a form with coded marks
WO2001016691A1 (en) * 1999-08-30 2001-03-08 Anoto Ab Notepad
WO2003042912A1 (en) * 2001-09-13 2003-05-22 Anoto Ab Product with a coding pattern and method, device and computer program for coding and decoding the pattern

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7753274B2 (en) * 2006-04-12 2010-07-13 Fuji Xerox Co., Ltd. Writing information processing device, writing information processing method and computer-readable medium

Also Published As

Publication number Publication date
US20080114777A1 (en) 2008-05-15
EP1716698A1 (en) 2006-11-02
GB0321171D0 (en) 2003-10-08

Similar Documents

Publication Publication Date Title
CN100354881C (en) Information processing system containing an arrangement for enabling printing on demand of positiom coded bases
US20030229857A1 (en) Apparatus, method, and computer program product for document manipulation which embeds information in document data
US7617047B2 (en) Map information system and map information processing method and program
EP1748365A1 (en) Document Template Generation
CA2400604A1 (en) Method and device for processing of information
JP4053100B2 (en) Document information management system and document information management method
US20070036433A1 (en) Recognizing data conforming to a rule
JP4770614B2 (en) Document management system and document management method
US20050060644A1 (en) Real time variable digital paper
US20080089586A1 (en) Data processing system, data processing terminal and data processing program of digital pen
JP2003528388A (en) Document processing
EP1672473A2 (en) Stamp sheet
CA2118344C (en) Using a category to analyze an image showing a graphical representation
CN103999104A (en) Tracing a document in an electronic publication
Norrie et al. Print-n-link: weaving the paper web
Breuel The hOCR microformat for OCR workflow and results
US7562822B1 (en) Methods and devices for creating and processing content
US20080114777A1 (en) Data Structure for an Electronic Document and Related Methods
US20070273918A1 (en) Printing Digital Documents
US20080278734A1 (en) Digital paper-enabled products and methods relating to same
US20080049258A1 (en) Printing Digital Documents
US8130391B2 (en) Printing of documents with position identification pattern
KR20230041036A (en) Method, device, system and storage medium for managing user handwriting data
JP4741363B2 (en) Image processing apparatus, image processing method, and image processing program
JP2007264715A (en) Electronic pen form sheet production system, server, program and specification method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004766624

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2004766624

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 10571075

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 10571075

Country of ref document: US