WO2006133136A2 - Structuration de donnees pour des documents de traitement de textes - Google Patents

Structuration de donnees pour des documents de traitement de textes Download PDF

Info

Publication number
WO2006133136A2
WO2006133136A2 PCT/US2006/021825 US2006021825W WO2006133136A2 WO 2006133136 A2 WO2006133136 A2 WO 2006133136A2 US 2006021825 W US2006021825 W US 2006021825W WO 2006133136 A2 WO2006133136 A2 WO 2006133136A2
Authority
WO
WIPO (PCT)
Prior art keywords
document
modular
computer
relationship
word processor
Prior art date
Application number
PCT/US2006/021825
Other languages
English (en)
Other versions
WO2006133136A3 (fr
Inventor
Brian Jones
Robert Little
Andrew Bishop
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/398,339 external-priority patent/US7617451B2/en
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Publication of WO2006133136A2 publication Critical patent/WO2006133136A2/fr
Publication of WO2006133136A3 publication Critical patent/WO2006133136A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting

Definitions

  • An open file format is used to represent the features and data associated with a word processing application within a document.
  • the open file format is directed at simplifying the way a word processing application organizes document features and data, and presents a logical model that is easily accessible.
  • a document structured according to the open file format is designed such that it is made up of a collection of modular parts that are stored within a container.
  • the modular parts are logically separate but are associated with one another by one or more relationships.
  • Each of the modular parts is capable of being interrogated separately regardless of whether or not the application that created the document is running.
  • Each modular part is capable of having information extracted from it and copied into another document and reused. Information may also be changed, added, and deleted from each of the modular parts.
  • Common data such as strings, functions, etc.
  • Common data may be stored in their own modular part such that the document does not contain excessive amounts of redundant data.
  • code, personal information, comments, as well as any other determined information might be stored in a separate modular part such that the information may be easily parsed and/or removed from the document.
  • the open file formats not only work with document files, but also work with templates.
  • the improved schema supports features that work in templates. For example, such features may include auto text, the ability to have auto text as a collection of document fragments richly formatted inside of a template accessible for insertion into documents.
  • FIGURE 1 illustrates an exemplary computing device that may be used in exemplary embodiments of the present invention
  • FIGURE 2 shows an exemplary document container with modular parts
  • FIGURE 3 shows a high-level relationship diagram of a word processing document within a container
  • FIGURES 4a-4c are block diagrams illustrating a document relationship hierarchy for various modular parts utilized in a file format for representing a word processing document.
  • FIGURES 5-6 are illustrative routines performed in structuring data for word processing documents in a modular content framework, in accordance with aspects of the invention.
  • FIGURE 1 and the corresponding discussion are intended to provide a brief, general description of a suitable computing environment in which embodiments of the invention may be implemented. While the invention will be described in the general context of program modules that execute in conjunction with program modules that run on an operating system on a personal computer, other types of computer systems and program modules may be used.
  • program modules include routines, programs, operations, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.
  • other computer system configurations including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like may be used.
  • a distributed computing environment where tasks are performed by remote processing devices that are linked through a communications network may also be utilized.
  • program modules may be located in both local and remote memory storage devices.
  • FIGURE 1 an illustrative computer architecture for a computer 100 will be described.
  • the computer architecture shown in FIGURE 1 illustrates a computing apparatus, such as a server, desktop, laptop, or handheld computing apparatus, including a central processing unit 5 ("CPU"), a system memory 7, including a random access memory 9 (“RAM”) and a read-only memory (“ROM”) 11, and a system bus 12 that couples the memory to the CPU 5.
  • CPU central processing unit
  • RAM random access memory
  • ROM read-only memory
  • the computer 100 further includes a mass storage device 14 for storing an operating system 16, application programs, and other program modules, which will be described in greater detail below.
  • the mass storage device 14 is connected to the CPU 5 through a mass storage controller (not shown) connected to the bus 12.
  • the mass storage device 14 and its associated computer-readable media provide non-volatile storage for the computer 100.
  • computer-readable media can be any available media that can be accessed by the computer 100.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks ("DVJS'), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 100.
  • the computer 100 may operate in a networked environment using logical connections to remote computers through a network 18, such as the Internet.
  • the computer 100 may connect to the network 18 through a network interface unit 20 connected to the bus 12.
  • the network interface unit 20 may also be utilized to connect to other types of networks and remote computer systems.
  • the computer 100 may also include an input/output controller 22 for receiving and processing input from a number of other devices, including a keyboard, mouse, or electronic stylus (not shown). Similarly, an input/output controller 22 may provide output to a display screen, a printer, or other type of output device.
  • a number of program modules and data files may be stored in the mass storage device 14 and RAM 9 of the computer 100, including an operating system 16 suitable for controlling the operation of a networked personal computer, such as the WINDOWS XP operating system from MICROSOFT CORPORATION of Redmond, Washington.
  • the mass storage device 14 and RAM 9 may also store one or more program modules.
  • the mass storage device 14 and the RAM 9 may store a word processor application program 10.
  • the word processor application program 10 is operative to provide functionality for the creation and structure of a word processor document, such as a document 27, in an open file format 24.
  • the word processor application program 10 and other application programs 26 comprise the OFFICE suite of application programs from MICROSOFT CORPORATION including the WORD, EXCEL, and POWERPOINT application programs.
  • the open file format 24 simplifies and clarifies the organization of document features and data.
  • the word processor program 10 organizes the 'parts' of a document (styles, strings, document properties, application properties, custom properties, functions, and the like) into logical, separate pieces, and then expresses relationships among the separate parts. These relationships, and the logical separation of 'parts' of a document, make up a file organization that can be easily accessed without having to understand a proprietary format.
  • the open file format 24 may be formatted according to extensible markup language ("XML")- XML is a standard format for communicating data.
  • XML data format a schema is used to provide XML data with a set of grammatical and data type rules governing the types and structure of data that may be communicated.
  • the modular parts may also be included within a container. According to one embodiment, the modular parts are stored in a container according to the ZIP format. Additionally, since the open file format 24 is expressed as XML, formulas within a word processor document are represented as standard text making them easy to locate as well as modify.
  • the openness of the open file format also translates to more secure and transparent files.
  • Documents can be shared confidently because personally identifiable information and business sensitive information, such as user names, comments and file paths, can be easily identified and removed from the document.
  • files containing content such as OLE objects or Visual Basic® for Applications (VBA) code can be identified for special processing.
  • VBA Visual Basic® for Applications
  • FIGURE 2 shows an exemplary document container with modular parts.
  • document container 200 includes document properties 210, markup language 220, custom-defined XML 230, embedded code/macros 240, strings 250, functions 260, personal information 270, other properties 280, and document 1 (290) that are associated with a word processor document (See FIGURE 3 and related discussion).
  • Each modular part (210 - 290) is enclosed by container 205.
  • the container is a ZIP container.
  • the combination of XML with ZIP compression allows for a very robust and modular format that enables a large number of new scenarios.
  • Each file may be composed of a collection of any number of parts that defines the document.
  • XML files that describe application data, metadata, and even customer data stored inside the container 205.
  • Other non-XML parts may also be included within the container, and include such parts as binary files representing images or OLE objects embedded in the document.
  • Parts of the document specify a relationship to other parts (See FIGURES 4a-4c and related discussion). While the parts make up the content of the file, the relationships describe how the pieces of content work together. The result is an open file format for documents that is tightly integrated but modular and highly flexible.
  • container 205 When users save or create a document, a single file is written to storage within container 205.
  • the container 205 may then easily be opened by any application that can process XML. By wrapping the individual parts of a file in a container 205, each document remains a single file instance. Once a container 205 has been opened, developers can manipulate any of the modular parts (210 - 290) that are found within the container 205 that define the document.
  • a developer can open a word processor document container that uses the open file format, locate the XML part that represents a particular portion of the word processor document, such as sheet 1, alter the part corresponding to document 1 (290) by using any technology capable of editing XML, and return the XML part to the container package 205 to create an updated word processor document.
  • This scenario is only one of the essentially countless others that will be possible as a result of open format.
  • the modularity of the parts making up the document enables a developer to quickly locate a specific part of the file and work directly with just that part.
  • the individual parts can be edited, exchanged, or even removed depending on the desired outcome of a specific business need.
  • the modular parts can be of different physical content types.
  • the parts used to describe program data are stored as XML. These parts conform to the XML reference schema(s) (220, 230) that defines the associated feature or object. For example, in a word processor file, the data that represents a word processor document header is found in an XML part that adheres to the schema for a Word Processor Document.
  • modules may be stored in their native content type.
  • images may be stored as binary files (.png, .jpg, and so on) within the container 205. Therefore, the container 205 may be opened by using a ZIP utility and the image may then be immediately viewed, edited, or replaced in its native format. Not only is this storage approach more accessible, but it requires less internal processing and disk space than storing an image as encoded XML.
  • Other example parts that may be stored natively as binary parts include VBA projects and embedded OLE objects. Obviously, many other parts may also be stored natively. For developers, accessibility makes many scenarios more attractive. For instance, a developer could implement a solution that iterates a collection of word processor documents to update an embedded spreadsheet with an updated value/string/function.
  • Code that is included within documents is not the only potential security threat. Developers can circumvent potential risks from binaries, such as OLE objects or even images, by interrogating the documents and removing any exposures that arise. For example, if a specific OLE object is identified as a known issue, a program could be created to locate and cleanse or quarantine any documents containing the object. Likewise, any external references being made from a document can be readily identified. This identification allows solution developers to decide if external resources being referenced from a document are trustworthy or require corrective action.
  • the open file format enables access to this information that may be useful in other ways.
  • a developer may create a solution that uses the personal information 270 to return a list of documents authored by an individual person or from a specific organization. This list can be produced without having to open an application or use its object model with the open file format.
  • an application could loop through a folder or volume of documents and aggregate all of the comments within the documents. Additional criteria could be applied to qualify the comments and help users better manage the collaboration process as they create documents. This transparency helps increase the trustworthiness of documents and document-related processes by allowing programs or users to verify the contents of a document without opening the file.
  • the open file format enables users or applications to see and identify the various parts of a file and to choose whether to load specific components. For example, a user can choose to load macro-code independently from document content and other file components.
  • the ability to identify and handle embedded code 240 supports compliance management and helps reduce security concerns around malicious document code.
  • personally identifiable or business-sensitive information for example, comments, deletions, user names, file paths, and other document metadata
  • personally identifiable or business-sensitive information for example, comments, deletions, user names, file paths, and other document metadata
  • FIGURE 3 shows a high-level relationship diagram of a word processor document within a container.
  • the exemplary container 300 includes word processor document 310, document properties 320, application properties 322, and custom properties 324.
  • the word processor document includes a reference to styles 340, strings 342 and chart 344.
  • Many other configurations of the modular parts and the relationships may be defined. For example, referring to FIGURES 4a-4c which provides more detail regarding relationships among modular parts, it can be seen that a word processor document may include many more modular parts and relationships.
  • the relationships are the method used to specify how the collection of parts comes together to form the actual document.
  • the relationships are defined by using XML, which specifies the connection between a source part and a target resource. For example, the connection between a word processor document and a style that appears in that word processor document is identified by a relationship.
  • the relationships are stored within XML parts or "relationship parts" in the document container 300. If a source part has multiple relationships, all subsequent relationships are listed in same XML relationship part. Each part within the container is referenced by at least one relationship.
  • the implementation of relationships makes it possible for the parts never to directly reference other parts, and connections between the parts are directly discoverable without having to look within the content.
  • the references to relationships are represented using a Relationship ID, which allows all connections between parts to stay independent of content-specific schema.
  • Target media/imagel.jpeg
  • the relationships may represent not only internal document references but also external resources. For example, if a document contains linked pictures or objects, these are represented using relationships as well. This makes links in a document to external sources easy to locate, inspect and alter. It also offers developers the opportunity to repair broken external links, validate unfamiliar sources or remove potentially harmful links.
  • Relationships simplify the process of locating content within a document.
  • the documents parts don't need to be parsed to locate content whether it is internal or external document resources.
  • Relationships also allow a user to quickly take inventory of all the content within a document. For example, if the number of footnotes in a word processor document need to be counted, the relationships could be inspected to determine how many sheet parts exist within the container.
  • the relationships may also be used to examine the type of content in a document.
  • relationships allow developers to manipulate documents without having to learn application specific syntax or content markup. For example, without any knowledge of how to program a word processor application, a developer solution could easily remove a footnote by editing the document's relationships.
  • documents saved in the open file format are considered to be macro-free files and therefore do not contain code. This behavior helps to ensure that malicious code residing in a default document can never be unexpectedly executed. While documents can still contain and use macros, the user or developer specifically saves these documents as a macro-enabled document type. This safeguard does not affect a developer's ability to build solutions, but allows organizations to use documents with more confidence.
  • Macro-enabled files have the same file format as macro-free files, but contain additional parts that macro-free files do not. The additional parts depend on the type of automation found in the document.
  • a macro-enabled file that uses VBA contains a binary part that stores the VBA project. Any word processor document that utilizes macros that are considered safe, such as XLM macros they may be saved as macro-enabled files. If a code-specific part is found in a macro-free file, whether placed there accidentally or maliciously, an application may be configured to not allow the code to execute.
  • Documents saved by using the open file format may be identified by their file extensions.
  • the extensions borrow from existing binary file extensions by appending a letter to the end of the suffix.
  • the default extensions for documents created in MICROSOFT WORD, EXCEL, and POWERPOINT using the open file format append the letter "x" to the file extension resulting in .docx, .xlsx, and .pptx, respectively.
  • the file extensions may also indicate whether the file is macro-enabled versus those that are macro-free.
  • Documents that are macro-enabled have a file extension that ends with the letter "m” instead of an "x.”
  • a macro-enabled word processor document has a .docx extension, and thereby allows any users or software program, before a document opens, to immediately identify that it might contain code.
  • One such scenario could involve personalizing thousands of documents to distribute to customers.
  • Information programmatically extracted from an enterprise database or customer relationship management (CRM) application could be inserted into a standard document template by a server application that uses XML.
  • CRM customer relationship management
  • Creating these documents is highly efficient because there is no requirement that the creating programs need to be run; yet the capability still exists for producing high-quality, rich documents.
  • Custom schemas in one or more applications is another way documents can be leveraged to share data. Information that was once locked in a binary format is now easily accessible and therefore, documents can serve as openly exchangeable data sources. Custom schemas not only make insertion or extraction of data simple, but they also add structure to documents and are capable of enforcing data validation.
  • Editing the contents of existing documents is another valuable example where the open file format enhances a process.
  • the edit may involve updating small amounts of data, swapping entire parts, removing parts, or adding new parts altogether.
  • the open file format makes content easy to find and manipulate.
  • XML and XML schema means common XML technologies, such as XPath and XSLT, can be used to edit data within document parts in virtually endless ways.
  • Another scenario might be one in which an existing document must be updated by changing only an entire part.
  • an entire spreadsheet or chart that contained old data or outdated calculation models could be replaced with a new one by simply overwriting its part.
  • This kind of updating also applies to binary parts.
  • An existing image or even an OLE object could be swapped out for a new one, as necessary.
  • a drawing embedded as an OLE object in a document, for instance, could be updated by overwriting that binary part.
  • URLs in hyperlinks could be updated to point to new locations.
  • a word processor application may store only one copy of repetitive text within the word processor file.
  • the word processor application may implement a shared table.
  • the shared table may be stored in a document part such as "fontTable.xml.” Each unique font value found within a word processor document may then only be listed once in this part of the document. Individual document parts then reference the table to derive their values.
  • the modularity of the open file format opens up the possibility for generating content once and then repurposing it in a number of other documents.
  • a number of core templates could be created and used as building blocks for other documents.
  • One example scenario is building a repository of images used in documents.
  • a developer can create a solution that extracts images out of a collection of documents and allow users to reuse them from a single access point. Since the documents may store the images in their native format, the solution could build and maintain a library of images without much difficulty.
  • a developer could build a similar application that reuses document "thumbnail" images extracted from documents, and add a visual aspect to a document management process.
  • word processor document is a separate part that is readily accessible as it is self-contained in its own XML part within the container.
  • a custom solution can leverage this architecture to automate the assembly process.
  • Custom XML could be used to hold metadata pertaining to the individual word processor document, thus allowing users to easily search it by using predefined keywords.
  • the open file formats segment, store, and compress file components separately, they reduce the risk of corruption and improve the chances of recovering data from within damaged files.
  • a cyclic redundancy check (CRC) error detection may be performed on each part within a document container to help ensure the part has not been corrupted. If one part has been corrupted, the remaining parts can still be used to open the remainder of the file. For example, a corrupt image or error in an embedded macro does not prevent users from opening the entire file, or from recovering the XML data and text- based information. Programs that utilize the open file format can easily deal with a missing or corrupt part by ignoring it and moving on to the next, so that any accessible data is salvaged.
  • the file formats are open and well documented, anyone can create tools for recovering parts that have been created improperly, for correcting XML parts that are not well formed, or for compensating when required elements are missing.
  • the open file format also addresses compatibility with both past file formats and future file formats that have not been anticipated. For example, a compatibility mode automatically restricts features and functionality that are unavailable in target versions help to ensure that users can exchange files seamlessly with other versions of an application or collaborate in mixed environments with no loss of fidelity or productivity.
  • the word processor document container 300 includes both user entered information as well as the feature and formatting information. Once the container 300 is opened and the desired file is accessed, there are a number of different ways to locate information. One way is by using an arbitrary schema for mapping data. A set of XML vocabularies defined within the schemas included herein fully define the features for the word processor document application.
  • Word processor documents such as word processor document 310, may be created without ever launching the word processor application. For example, suppose that a customer of a Wall Street analyst company has access to information on certain companies. The customer accesses the analyst's website, logs on, and chooses to view the metrics for the evaluating a company in the automotive industry. The information returned could be streamed into a newly created word processor document that was never touched by the word processor application but which is now a word processor file, such that when the customer selects the file, the word processor application opens it up.
  • common information is stored within its own part. Some examples include, common strings, functions, and name ranges.
  • the open file format is designed such that previous and future versions of an application may still work with a document.
  • a future storage area is included within a part such that information that has not been thought of yet may be included within a document. In this way, a future version of the word processor application could access information within the future storage area, whereas a current version of the word processor application does not.
  • the future storage area resides in the schema, and the schema allows any kind of content to be in there. In this way, previous versions of an application may still appear to work without corrupting the values for the future versions.
  • FIGURES 4a-4c block diagrams illustrating a document relationship hierarchy for various modular parts utilized in a file format 24 for representing a word processor document are shown.
  • the document relationship hierarchy illustrates lists specific file format relationships some with a reference indicator 418 indicating a reference to that relationship in the content of the modular part, for example via a relationship identifier.
  • the checkbox containers in the indicator indicate either that no explicit reference to relationship in content (not checked) or there is an explicit relationship in the content (checked). In some embodiments, it may not be enough to just have the relationship to the image part from a parent or referring modular part, for example from a document part.
  • the parent part may also need to have an explicit reference to that image part relationship inline so that it is known where the image goes.
  • Non-explicit indicators indicate that a referring modular part is associated, but not called out directly in the parent part's content.
  • An example of this may be a styles part, where it is implied that there is always a styles part associated, and therefore there is no need to call out the styles part in the content. All anyone needs to do to find the styles part is just look for a relationship of that type.
  • the various modular parts or components of the word processor document hierarchy are logically separate but are associated by one or more relationships.
  • Each modular part is also associated with a relationship type and is capable of being interrogated separately and understood with or without the word processor application program 10 and/or with or without other modular parts being interrogated and/or understood.
  • it is easier to locate the contents of a document because instead of searching through all the binary records for document information, code can be written to easily inspect the relationships in a document and find the document parts effectively ignoring the other features and data in the open file format.
  • the code is written to step through the document in a much simpler fashion than previous interrogation code. Therefore, an action such as removing all the code, personal information, and the like, while tedious in the past, is now less complicated.
  • a modular content framework may include a file format container associated with the modular parts.
  • the modular parts include, the document part 420 operative as a guide for properties of the word processor document.
  • the document hierarchy may also include a document properties part 417 containing built-in properties associated with the file format 24, a application part 405 containing core document properties associated with the file format 24, and a hyperlink part 419 facilitating a relationship with an external target part. It should be appreciated that each modular part is capable of being extracted from or copied from the document and reused in a different document along with associated modular parts identified by traversing relationships of the modular part reused. Associated modular parts are identified when the word processor application 10 traverses inbound and outbound relationships of the modular part reused.
  • FIGURE 4b other modular parts are illustrated that are associated with start part or word document part 420.
  • the connections between the parts may be determined by locating the same reference number and/or title.
  • FIGURE 4b includes styles part 423, word lists part 422, settings part 426, comments part 424, and a data store item part 425 associated with an item properties part 427.
  • the modular parts also include a header part 430, a footer part 432, footnotes part 434, endnotes part 435, image part 437, and a fontTable part 438.
  • the modular parts that are shared in more than one relationship are typically only written to memory once.
  • Some modular parts may be global, and thus, can be used anywhere in the file format. In contrast, some modular parts are non- global and thus, can only be shared on a limited basis.
  • FIGURES 5-6 are illustrative routines performed in structuring and representing word processor documents in a modular content framework.
  • routines presented herein it should be appreciated that the logical operations of various embodiments of the present invention are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance requirements of the computing system. Accordingly, the logical operations illustrated making up the embodiments described herein are referred to variously as operations, structural devices, acts or modules. These operations, structural devices, acts and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof.
  • the routine 500 begins at operation 510, where an application program, such as a word processor application, writes a document part.
  • the routine 500 continues from operation 510 to operation 520, where the application program queries the document for relationship types to be associated with modular parts logically separate from the document part but associated with the document part by one or more relationships.
  • the application writes modular parts of the file format separate from the document part.
  • Writing modular parts may include opening and storing referenced modular parts. Additional details regarding writing modular parts are described below with respect to FIGURE 6.
  • Each modular part is capable of being interrogated separately without other modular parts being interrogated and understood. According to one embodiment, any modular part to be shared between other modular parts is written only once.
  • the routine 500 then continues to operation 540.
  • the application 10 establishes relationships between newly written and previously written modular parts. The routine 500 then terminates at the end operation.
  • FIGURE 6 illustrates a process for writing modular parts of a document.
  • an application examines data in the word processor application.
  • the application opens a referenced modular part.
  • the routine 600 then continues to detect operation 620 where a determination is made as to whether the data has been written to the open modular part.
  • the routine 600 continues from detect operation 620 to operation 630 where the word processor application writes a modular part including the data examined.
  • the application stores the modular part including the new data.
  • the routine 600 then continues to detect operation 640 described below.
  • the routine 600 When at detect operation 620, the data examined has been written to a modular part, the routine 600 continues from detect operation 620 to detect operation 640. At detect operation 640 a determination is made as to whether all the data has been examined. If all the data has been examined, the routine 600 returns control to other operations at return operation 660. When there is still more data to examine, the routine 600 continues from detect operation 640 to operation 650 where the word processor application points to other data. The routine 600 then returns to operation 610 described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Selon l'invention, un format de fichier ouvert est utilisé pour structurer des éléments et des données dans un document associé à une application de traitement de textes. Le format de fichier simplifie la manière dont une application de traitement de textes organise les éléments et les données d'un document, et présente un modèle logique facilement accessible. Le format de fichier est constitué d'un ensemble de parties modulaires stockées dans une mémoire. Le contenu des parties modulaires est enregistré au format XML, lequel est basé sur le format ASCII. Le schéma XML fournit un cadre de définition de l'interaction des parties modulaires. Le contenu permet à des outils d'interroger un document de traitement de textes afin d'examiner et d'utiliser le contenu, et de veiller à ce que le fichier soit écrit correctement. Chaque partie modulaire peut être interrogée séparément, indépendamment de l'exécution de l'application qui a créé le document. Des données peuvent également être modifiées, ajoutées ou effacées de chacune des parties modulaires.
PCT/US2006/021825 2005-06-03 2006-06-05 Structuration de donnees pour des documents de traitement de textes WO2006133136A2 (fr)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US68726105P 2005-06-03 2005-06-03
US60/687,261 2005-06-03
US71680505P 2005-09-13 2005-09-13
US60/716,805 2005-09-13
US11/398,339 US7617451B2 (en) 2004-12-20 2006-04-05 Structuring data for word processing documents
US11/398,339 2006-04-05

Publications (2)

Publication Number Publication Date
WO2006133136A2 true WO2006133136A2 (fr) 2006-12-14
WO2006133136A3 WO2006133136A3 (fr) 2009-04-16

Family

ID=37499028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/021825 WO2006133136A2 (fr) 2005-06-03 2006-06-05 Structuration de donnees pour des documents de traitement de textes

Country Status (1)

Country Link
WO (1) WO2006133136A2 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924395B2 (en) 2010-10-06 2014-12-30 Planet Data Solutions System and method for indexing electronic discovery data
CN110073349A (zh) * 2016-12-15 2019-07-30 微软技术许可有限责任公司 考虑频率和格式化信息的词序建议
CN113065154A (zh) * 2021-03-19 2021-07-02 深信服科技股份有限公司 一种文档检测方法、装置、设备和存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030237048A1 (en) * 2002-06-24 2003-12-25 Microsoft Corporation Word processor for freestyle editing of well-formed XML documents
US20040205539A1 (en) * 2001-09-07 2004-10-14 Mak Mingchi Stephen Method and apparatus for iterative merging of documents
US20050108278A1 (en) * 2002-06-28 2005-05-19 Microsoft Corporation Word-processing document stored in a single XML file that may be manipulated by applications that understand XML

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040205539A1 (en) * 2001-09-07 2004-10-14 Mak Mingchi Stephen Method and apparatus for iterative merging of documents
US20030237048A1 (en) * 2002-06-24 2003-12-25 Microsoft Corporation Word processor for freestyle editing of well-formed XML documents
US20050108278A1 (en) * 2002-06-28 2005-05-19 Microsoft Corporation Word-processing document stored in a single XML file that may be manipulated by applications that understand XML

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VALORIS.: 'Comparative Assessment of Open Documents Formats Market Overview' SPECIFIC AGREEMENT, [Online] 06 April 2004, Retrieved from the Internet: <URL:http://ec.europa.eu/idabden/document/2387> [retrieved on 2004-07-03] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8924395B2 (en) 2010-10-06 2014-12-30 Planet Data Solutions System and method for indexing electronic discovery data
CN110073349A (zh) * 2016-12-15 2019-07-30 微软技术许可有限责任公司 考虑频率和格式化信息的词序建议
CN110073349B (zh) * 2016-12-15 2023-10-10 微软技术许可有限责任公司 考虑频率和格式化信息的词序建议
CN113065154A (zh) * 2021-03-19 2021-07-02 深信服科技股份有限公司 一种文档检测方法、装置、设备和存储介质
CN113065154B (zh) * 2021-03-19 2023-12-29 深信服科技股份有限公司 一种文档检测方法、装置、设备和存储介质

Also Published As

Publication number Publication date
WO2006133136A3 (fr) 2009-04-16

Similar Documents

Publication Publication Date Title
US7617451B2 (en) Structuring data for word processing documents
US20070022128A1 (en) Structuring data for spreadsheet documents
US20060277452A1 (en) Structuring data for presentation documents
US7617444B2 (en) File formats, methods, and computer program products for representing workbooks
AU2006200047B2 (en) Data store for software application documents
EP1672526A2 (fr) Formats de fichiers, procédés et produits de programme informatique pour représenter des documents
US20050289446A1 (en) System and method for management of document cross-reference links
KR101311123B1 (ko) 문서의 xml 데이터 저장소에 대한 프로그램가능성
US20070174307A1 (en) Graphic object themes
WO2007030586A1 (fr) Programmabilite de magasins de donnees xml destinee a des documents
US20060259854A1 (en) Structuring an electronic document for efficient identification and use of document parts
JP2002099428A (ja) ハッシュコンパクトxmlパーサ
US7130862B2 (en) Methods, systems and computer program prodcuts for validation of XML instance documents using Java classloaders
US7865481B2 (en) Changing documents to include changes made to schemas
WO2008130768A1 (fr) Description des relations d&#39;entité attendues dans un modèle
US20070061351A1 (en) Shape object text
US20080065678A1 (en) Dynamic schema assembly to accommodate application-specific metadata
US20110265058A1 (en) Embeddable project data
US20110078552A1 (en) Transclusion Process
WO2006133136A2 (fr) Structuration de donnees pour des documents de traitement de textes
US20080263070A1 (en) Common drawing objects
US9965453B2 (en) Document transformation
Headquarters XMP–Extensible Metadata Platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06772218

Country of ref document: EP

Kind code of ref document: A2