US20070038930A1 - Method and system for an architecture for the processing of structured documents - Google Patents

Method and system for an architecture for the processing of structured documents Download PDF

Info

Publication number
US20070038930A1
US20070038930A1 US11/413,051 US41305106A US2007038930A1 US 20070038930 A1 US20070038930 A1 US 20070038930A1 US 41305106 A US41305106 A US 41305106A US 2007038930 A1 US2007038930 A1 US 2007038930A1
Authority
US
United States
Prior art keywords
document
circuit
output
instructions
transformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/413,051
Inventor
John Derrick
Richard Trujillo
Daniel Cermak
Bryan Dobbs
Howard Liu
Rakesh Bhakta
Udi Kalekin
Russell Davoli
Clifford Hall
Avinash Palaniswamy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to US11/413,051 priority Critical patent/US20070038930A1/en
Assigned to INTEL CORPORATION reassignment INTEL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CONFORMATIVE SYSTEMS, INC
Publication of US20070038930A1 publication Critical patent/US20070038930A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/221Parsing markup language streams

Definitions

  • the invention relates in general to methods and systems for processing structured documents, and more particularly, to the design and implementation of efficient architectures for the processing, transformation or rendering of structured documents.
  • Structured documents may be loosely defined as any type of document that adheres to a set of rules. Because the structured document conforms to a set of rules it enables the cross-platform distribution of data, as an application or platform may process or render a structured document based on the set of rules, no matter the application that originally created the structured document.
  • structured documents to facilitate the cross-platform distribution of data is not without its own set of problems, however.
  • the structured document does not itself define how the data it contains is to be rendered, for example for presentation to a user. Exacerbating the problem is the size of many of these structured documents.
  • these structured documents may contain a great deal of meta-data, and thus may be larger than similar proprietary documents, in some cases up to twenty times larger or more.
  • instructions may be provided for how to transform or render a particular structured document.
  • one mechanism implemented as a means to facilitate processing XML is the extensible stylesheet language (XSL) and stylesheets written using XSL.
  • Stylesheets may be written to transform XML documents from one markup definition (or “vocabulary”) defined within XML to another vocabulary, from XML markup to another structured or unstructured document form (such as plain text, word processor, spreadsheet, database, pdf, HTML, etc.), or from another structured or unstructured document form to XML markup.
  • stylesheets may be used to transform a document's structure from its original form to a form expected by a given user (output form).
  • structured documents are transformed or rendered with one or more software applications.
  • software applications to transform or render these structured documents may be prohibitively inefficient.
  • FIG. 1 depicts an embodiment of an architecture for the implementation of web services.
  • FIG. 2 depicts one embodiment of the processing of structured documents using a document processor.
  • FIG. 3 depicts one embodiment of an architecture for a device for the processing of structured documents.
  • FIG. 4 depicts one embodiment of an architecture for the processing of structured documents utilizing an embodiment of the device depicted in FIG. 3 .
  • FIG. 1 depicts an embodiment of one such architecture for implementing a web service.
  • web services provide a standard means of interoperating between different software applications running on a variety of platforms and/or frameworks.
  • a web service provider 110 may provide a set of web services 112 .
  • Each web service 112 may have a described interface, such that a requestor may interact with the web service 112 according to that interface.
  • a user at a remote machine 120 may wish to use a web service 112 provided by web service provider 110 .
  • the user may use a requester agent to communicate message 130 to a service agent associated with the desired web service 112 , where the message is in a format prescribe by the definition of the interface of the desired web service 112 .
  • the definition of the interface describes the message formats, data types, transport protocols, etc. that are to be used between a requester agent and a provider agent.
  • the message 130 may comprise data to be operated on by the requested web service 112 . More particularly, message 130 may comprise a structured document and instructions for transforming the structured document.
  • message 130 may be a SOAP (e.g. Simple Object Access Protocol) message comprising an eXtensible Markup Language (XML) document and an XSL Transformation (XSLT) stylesheet associated with the XML document.
  • SOAP Simple Object Access Protocol
  • XML eXtensible Markup Language
  • XSLT XSL Transformation
  • transformation instructions e.g. a DTD, schema, or stylesheet
  • the transformation instructions may be extracted from the document before being utilized in any subsequent method or process.
  • the provider agent associated with-a particular web service 112 may receive message 130 ; web service 112 may process the structured document of message 130 according to the instructions for transforming the structured document included in message 130 ; and the result 140 of the transformation returned to the requester agent.
  • many structured documents may be sent to a particular web service 112 with one set of transformation instructions, so that each of these documents may be transformed according to the identical set of instructions.
  • one structured document may be sent to a particular web service 112 with multiple sets of transformation instructions to be applied to the structured document.
  • web services 112 it may be highly desired to process these structured documents as efficiently as possible such that web services 112 may be used on many data sets and large data sets without creating a bottleneck during the processing of the structured documents and processing resources of web service provider 110 may be effectively utilized.
  • Embodiments of the present invention may allow a transformation to be performed on a structured document according to transformation instructions.
  • embodiments of the architecture may comprise logical components including a parser, a pattern expression processor, a transformation engine and an output generator, one or more of which may be implemented in hardware circuitry, for example a hardware processing device such as an Application Specific Integrated Circuit (ASIC) which comprises all the above mentioned logical components
  • ASIC Application Specific Integrated Circuit
  • embodiments of the invention may compile the transformation instructions to create instruction code and a set of data structures.
  • the parser parses the structured document associated with the transformation instructions to generate structures representative of the structured document.
  • the pattern expression processor (PEP) identifies data in the structured document corresponding to definitions in the transformation instructions.
  • the transformation engine transforms the parsed document or identified data according to the transformation instructions and the output generator assembles this transformed data into an output document.
  • the transformation instructions may be analyzed to determine which of the transformation instructions may be executed substantially simultaneously, or in parallel, to speed the transformation of a structured document (it will be understood that for purposes of this disclosure that the occurrence of two events substantially simultaneously indicates that each of the two events may at least partially occur before the completion of the other event).
  • similar content in a structured document may be identified such that any transformations on this content may also be done substantially in parallel.
  • multiple sets of instruction code corresponding to various jobs may also be executed in parallel.
  • the compiler may be implemented in software and the logical components for the architecture implemented in hardware.
  • transformation instructions e.g. stylesheets and/or schemas, etc.
  • a given stylesheet may be applied to multiple documents before any changes to a stylesheet are made (e.g. to an updated stylesheet or to apply a different stylesheet altogether).
  • capturing the relatively invariant information from the transformation instructions in data structures that may be efficiently accessed by dedicated, custom hardware e.g. logical components
  • having compilation of transformation instructions performed in software provides the flexibility to accommodate different formats for transformation instructions and to implement changes in the language specifications for these transformation instructions without having to change the custom hardware.
  • XSLT, XPath, and XML schema may evolve and new features added to these languages in the future.
  • the compiler may be adapted to handle these new features.
  • the compiler may be implemented in hardware; one or more of the logical components may be implemented in software; or both the logical components and compiler may be implemented in a combination of hardware and software.
  • a structured document may be received at a web service 112 from a variety of sources such as a file server, database, internet connection, etc. Additionally, a set of transformation instructions, for example an XSLT stylesheet, may also be received. Document processor 210 may apply the transformation instructions to the structured document to generate an output document which may be returned to the requesting web service 112 , which may, in turn, pass the output document to the requestor.
  • sources such as a file server, database, internet connection, etc.
  • a set of transformation instructions for example an XSLT stylesheet
  • Document processor 210 may apply the transformation instructions to the structured document to generate an output document which may be returned to the requesting web service 112 , which may, in turn, pass the output document to the requestor.
  • compiler 220 which may comprise software (i.e. a plurality of instructions) executed on one or more processors (e.g. distinct from document processor 210 ) may be used to compile the transformation instructions to generate data structures and instruction code in memory 270 for use by document processor 210 .
  • Document processor 210 may be one or more ASICs operable to utilize the data structures and instruction code generated by compiler 220 to generate an output document.
  • FIG. 3 depicts a block diagram of one embodiment of an architecture for a document processor operable to produce an output document from a structured document.
  • Document processor 210 comprises Host Interface Unit (HIU) 310 , Parser 320 , PEP 330 , Transformation Engine (TE) 340 , Output Generator (OG) 350 , each of which is coupled to memory interface 360 , to Local Command Bus (LCB) 380 and, in some embodiments, to one another through signal lines or shared memory 270 (e.g. a source unit may write information to be communicated to a destination unit to the shared memory and the destination unit may read the information from the shared memory), or both.
  • Shared memory 270 may be any type of storage known in the art, such as RAM, cache memory, hard-disk drives, tape devices, etc.
  • HIU 310 may serve to couple document processor 210 to one or more host processors (not shown). This coupling may be accomplished, for example, using a Peripheral Component Interconnect eXtended (PCI-X) bus. HIU 310 also may provide an Applications Programming Interface (API) through which document processor 210 can receive jobs. Additionally, HIU 310 may interface with LCB 380 such that various tasks associated with these jobs may be communicated to components of document processor 210 .
  • PCI-X Peripheral Component Interconnect eXtended
  • API Applications Programming Interface
  • these jobs may comprise context data, including a structured document and the data structures and instruction code generated from the transformation instructions by the compiler.
  • the API may allow the context data to be passed directly to HIU 310 , or, in other embodiments, may allow references to one or more locations in shared memory 270 where context data may be located to be provided to HIU 310 .
  • HIU 310 may maintain a table of the various jobs received through this API and direct the processing of these jobs by document processor 210 .
  • these jobs may be substantially simultaneously processed (e.g. processed in parallel) by document processor 210 , allowing document processor 210 to be more efficiently utilized (e.g. higher throughput of jobs and lower latency).
  • Parser 320 may receive and parse a structured document, identifying data in the structured document for PEP 330 and generating data structures comprising data from the structured document by, for example, creating data structures in shared memory 270 for use by TE 340 or OG 350 .
  • An exemplary embodiment of parser 320 is illustrated in Appendix A.
  • PEP 330 receives data from parser 320 identifying data of the structured document being processed and compares data identified by the parser 320 against expressions identified in the transformation instructions. PEP 330 may also create one or more data structures in shared memory 270 , where the data structures comprises a list of data in the structured document which match expressions. An exemplary embodiment of PEP 330 is illustrated in Appendix A.
  • Transformation engine 340 may access the data structures built by parser 320 and PEP 330 and execute instruction code generated by compiler 220 and stored in memory 270 to generate results for the output document.
  • one or more instructions of the instruction code generated by compiler 220 may be operable to be independently executed (e.g. execution of one instruction does not depend directly on the result of the output of the execution of another instruction), and thus execution of the instruction code by transformation engine 340 may occur in substantially any order.
  • An exemplary embodiment of a transformation engine is illustrated in Appendix A.
  • Output generator 350 may assemble the results generated by transformation engine 340 in an order specified by the transformation instructions or corresponding to the structured document and provide the output document to the initiating web service 112 through HIU 310 , for example, by signaling the web service 112 or a host processor that the job is complete and providing a reference to a location in memory 270 where an output document exists.
  • An exemplary embodiment of an output generator is illustrated in Appendix A.
  • embodiments of the present invention may be applied with respect to almost any structured document (e.g. a document having a defined structure that can be used to interpret the content) whether the content is highly structured (such as an XML document, HTML document, .pdf document, word processing document, database, etc.) or loosely structured (such as a plain text document whose structure may be, e.g., a stream of characters) and associated transformation instructions (which is used generally referred to a file which may be used with reference to a structured document e.g. document type definitions (.dtd) schema such as .xsd files, XSL transformation files, etc.) for the structured document, it may be helpful to illustrate various embodiments of the present invention with respect to a particular example of a structured document and transformation instructions.
  • a structured document and transformation instructions e.g. a document having a defined structure that can be used to interpret the content
  • an XML document is a structured document which has a hierarchical tree structure, where the root of the tree identifies the document as a whole and each other node in the document is a descendent of the root.
  • Various elements, attributes, and document content form the nodes of the tree.
  • the elements define the structure of the content that the elements contain. Each element has an element name, and the element delimits content using a start tag and an end tag that each include the element name.
  • An element may have other elements as sub-elements, which may further define the structure of the content. Additionally, elements may include attributes (included in the start tag, following the element name), which are name/value pairs that provide further information about the element or the structure of the element content.
  • XML documents may also include processing instructions that are to be passed-to the application reading the XML document, comments, etc.
  • An XSLT stylesheet is a set of transformation instructions which may be viewed as a set of templates.
  • Each template may include: (i) an expression that identifies nodes in a document's tree structure; and (ii) a body that specifies a corresponding portion of an output document's structure for nodes of the source document identified by the expression.
  • Applying a stylesheet to a source document may comprise attempting to find a matching template for one or more nodes in the source document, and instantiating the structures corresponding to the body of the matching template in an output document.
  • the body of a template may include one or more of: (i) literal content to be instantiated in the output document; (ii) instructions for selection of content from the matching nodes to be copied into the output document; and (iii) statements that are to be evaluated, with the result of the statements being instantiated in the output document.
  • the content to be instantiated and the statements to be evaluated may be referred to as “actions” to be performed on the nodes that match the template.
  • the body of a template may include one or more “apply templates” statements, which include an expression for selecting one or more nodes and causing the templates in the stylesheet to be applied to the selected nodes, thus effectively nesting the templates. If a match to the apply templates statement is found, the resulting template is instantiated within the instantiation of the template that includes the apply templates statement.
  • Other statements in the body of the template may also include expressions to be matched against nodes (and the statements may be evaluated on the matching nodes).
  • the expressions used in a stylesheet may generally comprise node identifiers and/or values of nodes, along with operators on the node identifiers to specify parent/child (or ancestor/descendant) relationships among the node identifiers and/or values.
  • Expressions may also include predicates, which may be extra condition(s) for matching a node.
  • a predicate is an expression that is evaluated with the associated node as the context node (defined below), where the result of the expression is either true (and the node may match the expression node) or false (and the node does not match the expression).
  • an expression may be viewed as a tree of nodes to be matched against a document's tree.
  • a given document node may satisfy an expression if the given document node is selected via evaluation of the expression. That is, the expression node identifiers in the expression match the given document node's identifier or document node identifiers having the same relationship to the given document node as specified in the expression, and any values used in the expression are equal to corresponding values related to the given document node.
  • a document node may also be referred to as a “matching node” for a given expression if the node satisfies the given expression.
  • a node may be referred to as an “expression node” if the node is part of an expression tree, and a node may be referred to as a “document node” if the node is part of the document being processed.
  • a node identifier may comprise a name (e.g. element name, attribute name, etc.) or may comprise an expression construct that identifies a node by type (e.g.
  • a node test expression may match any node, or a text test expression may match any text node).
  • a name may belong to a specific namespace.
  • the node identifier may be a name associated with a namespace.
  • the namespace provides a method of qualifying element and attribute names by associating them with namespace names.
  • the node identifier may be the qualified name (the optional namespace prefix, followed by a colon, followed by the name).
  • a name, as used herein (e.g. element name, attribute name, etc.) may include a qualified name.
  • transformation instructions may comprise any specification for transforming a source document to an output document, which may encompass, for example, statements indented to identify data of the source document or statements for how to transform data of the source document.
  • the source and output documents may be in the same language (e.g. the source and output documents may be different XML vocabularies), or may differ (e.g. XML to pdf, etc.).
  • FIG. 4 an example application of one embodiment of the present invention to an XML document and an XSLT stylesheet is illustrated. It is noted that, while the description herein may include examples in which transformation instructions are applied to a single source document, other examples may include applying multiple sets of transformation instructions to a source document (either concurrently or serially, as desired) or applying a set of transformation instructions to multiple source documents (either concurrently with context switching or serially, as desired).
  • an XML document and an associated XSL stylesheet may be received by web service 112 .
  • Web service 112 may invoke embodiments of the present invention to transform the received document according to the received stylesheet.
  • compiler 220 may be used to compile the XSL stylesheet to generate data structures and instruction code for use by document processor 210 .
  • Compiler 220 may assign serial numbers to node identifiers in the stylesheet so that expression evaluation may be performed by document processor 210 by comparing numbers, rather than node identifiers (which would involve character string comparisons).
  • Compiler 220 may also store a mapping of these node identifiers to serial numbers in one or more symbol tables 410 in memory 270 . Additionally, compiler 220 may extract the expressions from the stylesheet and generate expression tree data structures in memory 270 to be used by the document processor 210 for expression matching (e.g. one or more parse-time expression trees 420 comprising expression nodes). Still further, compiler 220 may generate an instruction table 430 in memory 270 with instructions to be executed for one or more matching expressions. The instructions in the instruction table 430 may be executable by document processor 210 that, when executed by the document processor 210 , may result in performing the actions defined when an expression associated with the instruction is matched. In some embodiments, the instructions may comprise the actions to be performed (i.e.
  • the compiler may also generate whitespace tables 440 defining how various types of whitespace in the source document are to be treated (e.g. preserved, stripped, etc.), an expression list table 450 , a template list table 460 and one or more DTD tables 462 to map entity references to values or specify default values for attributes.
  • Parser 320 receives the structured document and accesses the symbol tables 410 , whitespace tables 440 , or DTD tables 462 in memory 470 to parse the structured document, identify document nodes, and generate events (e.g. to identify document nodes parsed from the document) to PEP 330 . More particularly, parser 320 converts node identifiers in the source document to corresponding serial numbers in the symbol tables 410 , and transmits these serial numbers as part of the events to the PEP 330 . Additionally, parser 320 may generate a parsed document tree 470 representing the structure of the source document in memory.
  • Nodes of the parsed document tree may reference corresponding values stored in one or more parsed content tables 472 created in memory by parser 320 .
  • PEP 330 receives events from the parser 320 and compares identified document nodes (e.g. based on their serial numbers) against parse-time expression tree(s) 420 in memory 270 . Matching document nodes are identified and recorded in template or expression match lists 480 in memory 270 .
  • Transformation engine 340 executes instructions from instruction table 430 .
  • transformation engine 430 may accesses the template or expression match lists 480 , the parsed document tree 470 , the parsed content tables 472 or the instruction table 430 in memory 270 .
  • These instructions may, in turn, be associated with one or more templates of a stylesheet.
  • Transformation engine 340 may execute the instructions on each of the document nodes that matches the expression associated with the template, for example to transform or format document nodes according to the template. Transformation engine 340 may request that the results of the execution of these instructions to be stored in one or more output data structures 490 in memory 270 .
  • a set of output data structures 490 are created in memory 270 representing the structure of an output document, and content for the output document placed in, or associated with, these output data structures 490 .
  • Output generator 350 may receive results from transformation engine 340 for storing in output data structures 490 in memory 270 . Output generator may access these output data structures 490 or data structures 410 , 420 , 450 , 460 , 470 , 472 created by parser 320 or PEP 330 to assemble an output document. In some embodiments, output generator 350 may access a set of formatting parameters for the assembly of the output document. After the output document is assembled, or as the output document is being assembled, the output document (or portions thereof) may be returned to the proper web service 112 .

Abstract

Embodiments of systems, methods and apparatuses for an architecture for the processing of structured documents are disclosed. More specifically, embodiments of the architecture may comprise hardware circuitry operable to parse a structured document and transform the document according to a set of transformation instructions to produce an output document.

Description

    RELATED APPLICATIONS
  • This application claims a benefit of priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Nos. 60/675,349, by inventors Howard Tsoi, Daniel Cermak, Richard Trujillo, Trenton Grale, Robert Corley, Bryan Dobbs and Russell Davoli, entitled “Output Generator for Use with System for Creation of Multiple, Hierarchical Documents”, filed on Apr. 27, 2005; 60/675,347, by inventors Daniel Cermak, Howard Tsoi, John Derrick, Richard Trujillo, Udi Kalekin, Bryan Dobbs, Ying Tong, Brendon Cahoon and Jack Matheson, entitled “Transformation Engine for Use with System for Creation of Multiple, Hierarchical Documents”, filed on Apr. 27, 2005; 60/675,167, by inventors Richard Trujillo, Bryan Dobbs, Rakesh Bhakta, Howard Tsoi, Jack Randall, Howard Liu, Yongjian Zhou and Daniel Cermak, entitled “Parser for Use with System for Creation of Multiple, Hierarchical Documents”, filed on Apr. 27, 2005 and 60/675,115, by inventors John Derrick, Richard Trujillo, Daniel Cermak, Bryan Dobbs, Howard Liu, Rakesh Bhakta, Udi Kalekin, Russell Davoli, Clifford Hall and Avinash Palaniswamy, entitled “General Architecture for a System for Creation of Multiple, Hierarchical Documents”, filed on Apr. 27, 2005 the entire contents of which are hereby expressly incorporated by reference for all purposes.
  • TECHNICAL FIELD OF THE INVENTION
  • The invention relates in general to methods and systems for processing structured documents, and more particularly, to the design and implementation of efficient architectures for the processing, transformation or rendering of structured documents.
  • BACKGROUND OF THE INVENTION
  • Electronic data, entertainment and communications technologies are growing increasingly prevalent with each passing day. In the past, the vast majority of these electronic documents were in a proprietary format. In other words, a particular electronic document could only be processed or understood by the application that created that document. Up until relatively recently this has not been especially troublesome.
  • This situation became progressively more problematic with the advent of networking technologies, however. These networking technologies allowed electronic documents to be communicated between different and varying devices, and as these network technologies blossomed, so did user's desires to use these networked devices to share electronic data.
  • Much to the annoyance of many users, however, the proprietary formats of the majority of these electronic documents prevented them from being shared between different platforms: if a document was created by one type of platform it usually could not be processed, or rendered, by another type of platform.
  • To that end, data began to be placed in structured documents. Structured documents may be loosely defined as any type of document that adheres to a set of rules. Because the structured document conforms to a set of rules it enables the cross-platform distribution of data, as an application or platform may process or render a structured document based on the set of rules, no matter the application that originally created the structured document.
  • The use of structured documents to facilitate the cross-platform distribution of data is not without its own set of problems, however. In particular, in many cases the structured document does not itself define how the data it contains is to be rendered, for example for presentation to a user. Exacerbating the problem is the size of many of these structured documents. To facilitate the organization of data intended for generic consumption these structured documents may contain a great deal of meta-data, and thus may be larger than similar proprietary documents, in some cases up to twenty times larger or more.
  • In many cases, instructions may be provided for how to transform or render a particular structured document. For example, one mechanism implemented as a means to facilitate processing XML is the extensible stylesheet language (XSL) and stylesheets written using XSL. Stylesheets may be written to transform XML documents from one markup definition (or “vocabulary”) defined within XML to another vocabulary, from XML markup to another structured or unstructured document form (such as plain text, word processor, spreadsheet, database, pdf, HTML, etc.), or from another structured or unstructured document form to XML markup. Thus, stylesheets may be used to transform a document's structure from its original form to a form expected by a given user (output form).
  • Typically, structured documents are transformed or rendered with one or more software applications. However, as many definitions for these structured languages were designed and implemented without taking into account conciseness or efficiency of parsing and transformation, the use of software applications to transform or render these structured documents may be prohibitively inefficient.
  • Thus, as can be seen, there is a need for methods and systems for an architecture for the efficient processing of structured documents.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The drawings accompanying and forming part of this specification are included to depict certain aspects of the invention. A clearer impression of the invention, and of the components and operation of systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawings, wherein identical reference numerals designate the same components. Note that the features illustrated in the drawings are not necessarily drawn to scale.
  • FIG. 1 depicts an embodiment of an architecture for the implementation of web services.
  • FIG. 2 depicts one embodiment of the processing of structured documents using a document processor.
  • FIG. 3 depicts one embodiment of an architecture for a device for the processing of structured documents.
  • FIG. 4 depicts one embodiment of an architecture for the processing of structured documents utilizing an embodiment of the device depicted in FIG. 3.
  • DETAILED DESCRIPTION
  • Embodiments of the invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well known starting materials, processing techniques, components and equipment are omitted so as not to unnecessarily obscure the invention in detail. Skilled artisans should understand, however, that the detailed description and the specific examples, while disclosing preferred embodiments of the invention, are given by way of illustration only and not by way of limitation. Various substitutions, modifications, additions or rearrangements within the scope of the underlying inventive concept(s) will become apparent to those skilled in the art after reading this disclosure.
  • Reference is now made in detail to the exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts (elements).
  • Before describing embodiments of the present invention it may be useful to describe an exemplary architecture for a web service. Although web services are known in the art, a description of such an architecture may be helpful in better explaining the embodiments of the invention depicted herein.
  • FIG. 1 depicts an embodiment of one such architecture for implementing a web service. Typically, web services provide a standard means of interoperating between different software applications running on a variety of platforms and/or frameworks. A web service provider 110 may provide a set of web services 112. Each web service 112 may have a described interface, such that a requestor may interact with the web service 112 according to that interface.
  • For example, a user at a remote machine 120 may wish to use a web service 112 provided by web service provider 110. To that end the user may use a requester agent to communicate message 130 to a service agent associated with the desired web service 112, where the message is in a format prescribe by the definition of the interface of the desired web service 112. In many cases, the definition of the interface describes the message formats, data types, transport protocols, etc. that are to be used between a requester agent and a provider agent.
  • The message 130 may comprise data to be operated on by the requested web service 112. More particularly, message 130 may comprise a structured document and instructions for transforming the structured document. For example, message 130 may be a SOAP (e.g. Simple Object Access Protocol) message comprising an eXtensible Markup Language (XML) document and an XSL Transformation (XSLT) stylesheet associated with the XML document. It should be noted that, in some cases, transformation instructions (e.g. a DTD, schema, or stylesheet) may be embedded in a structured document, for example, either directly or as a pointer. In such cases the transformation instructions may be extracted from the document before being utilized in any subsequent method or process.
  • Thus, in some cases the provider agent associated with-a particular web service 112 may receive message 130; web service 112 may process the structured document of message 130 according to the instructions for transforming the structured document included in message 130; and the result 140 of the transformation returned to the requester agent.
  • In some cases, many structured documents may be sent to a particular web service 112 with one set of transformation instructions, so that each of these documents may be transformed according to the identical set of instructions. Conversely, one structured document may be sent to a particular web service 112 with multiple sets of transformation instructions to be applied to the structured document.
  • Hence, as can be seen from this brief overview of the architecture for implementing web services 112, it may be highly desired to process these structured documents as efficiently as possible such that web services 112 may be used on many data sets and large data sets without creating a bottleneck during the processing of the structured documents and processing resources of web service provider 110 may be effectively utilized.
  • Attention is now directed to embodiments of systems, methods and apparatuses for a general architecture for the efficient transformation or processing of structured documents. Embodiments of the present invention may allow a transformation to be performed on a structured document according to transformation instructions. To this end, embodiments of the architecture may comprise logical components including a parser, a pattern expression processor, a transformation engine and an output generator, one or more of which may be implemented in hardware circuitry, for example a hardware processing device such as an Application Specific Integrated Circuit (ASIC) which comprises all the above mentioned logical components
  • More particularly, embodiments of the invention may compile the transformation instructions to create instruction code and a set of data structures. The parser parses the structured document associated with the transformation instructions to generate structures representative of the structured document. The pattern expression processor (PEP) identifies data in the structured document corresponding to definitions in the transformation instructions. The transformation engine transforms the parsed document or identified data according to the transformation instructions and the output generator assembles this transformed data into an output document.
  • By compiling transformation instructions corresponding to the structured document, and processing the structured document accordingly, certain efficiency advantages may be attained by embodiments of the present invention.
  • Specifically, the transformation instructions may be analyzed to determine which of the transformation instructions may be executed substantially simultaneously, or in parallel, to speed the transformation of a structured document (it will be understood that for purposes of this disclosure that the occurrence of two events substantially simultaneously indicates that each of the two events may at least partially occur before the completion of the other event). Similarly, by analyzing a structured document before the transformation takes place, similar content in a structured document may be identified such that any transformations on this content may also be done substantially in parallel. Likewise, by producing instruction code from transformation instructions where the code is executable to transform at least a portion of a structured document, multiple sets of instruction code corresponding to various jobs, may also be executed in parallel.
  • Certain other advantages may also accrue to the architecture described according to embodiments of the present invention. As mentioned above, in one embodiment the compiler may be implemented in software and the logical components for the architecture implemented in hardware. In many cases, transformation instructions (e.g. stylesheets and/or schemas, etc.) may change relatively infrequently as compared to the number of documents being processed. For example, a given stylesheet may be applied to multiple documents before any changes to a stylesheet are made (e.g. to an updated stylesheet or to apply a different stylesheet altogether). Accordingly, capturing the relatively invariant information from the transformation instructions in data structures that may be efficiently accessed by dedicated, custom hardware (e.g. logical components) may provide a high performance solution to the transformation of structured documents. Additionally, having compilation of transformation instructions performed in software provides the flexibility to accommodate different formats for transformation instructions and to implement changes in the language specifications for these transformation instructions without having to change the custom hardware. For example, XSLT, XPath, and XML schema may evolve and new features added to these languages in the future. The compiler may be adapted to handle these new features.
  • While the advantages discussed above have been discussed with respect to a compiler implemented in software and logical components implemented in hardware, in other embodiments, the compiler may be implemented in hardware; one or more of the logical components may be implemented in software; or both the logical components and compiler may be implemented in a combination of hardware and software.
  • Turning to FIG. 2, a block diagram for the transformation of structured documents using embodiments of the present invention is depicted. A structured document may be received at a web service 112 from a variety of sources such as a file server, database, internet connection, etc. Additionally, a set of transformation instructions, for example an XSLT stylesheet, may also be received. Document processor 210 may apply the transformation instructions to the structured document to generate an output document which may be returned to the requesting web service 112, which may, in turn, pass the output document to the requestor.
  • In one embodiment, compiler 220, which may comprise software (i.e. a plurality of instructions) executed on one or more processors (e.g. distinct from document processor 210) may be used to compile the transformation instructions to generate data structures and instruction code in memory 270 for use by document processor 210. Document processor 210 may be one or more ASICs operable to utilize the data structures and instruction code generated by compiler 220 to generate an output document.
  • FIG. 3 depicts a block diagram of one embodiment of an architecture for a document processor operable to produce an output document from a structured document. Document processor 210 comprises Host Interface Unit (HIU) 310, Parser 320, PEP 330, Transformation Engine (TE) 340, Output Generator (OG) 350, each of which is coupled to memory interface 360, to Local Command Bus (LCB) 380 and, in some embodiments, to one another through signal lines or shared memory 270 (e.g. a source unit may write information to be communicated to a destination unit to the shared memory and the destination unit may read the information from the shared memory), or both. Shared memory 270 may be any type of storage known in the art, such as RAM, cache memory, hard-disk drives, tape devices, etc.
  • HIU 310 may serve to couple document processor 210 to one or more host processors (not shown). This coupling may be accomplished, for example, using a Peripheral Component Interconnect eXtended (PCI-X) bus. HIU 310 also may provide an Applications Programming Interface (API) through which document processor 210 can receive jobs. Additionally, HIU 310 may interface with LCB 380 such that various tasks associated with these jobs may be communicated to components of document processor 210.
  • In one embodiment, these jobs may comprise context data, including a structured document and the data structures and instruction code generated from the transformation instructions by the compiler. Thus, the API may allow the context data to be passed directly to HIU 310, or, in other embodiments, may allow references to one or more locations in shared memory 270 where context data may be located to be provided to HIU 310. HIU 310 may maintain a table of the various jobs received through this API and direct the processing of these jobs by document processor 210. By allowing multiple jobs to be maintained by HIU 310, these jobs may be substantially simultaneously processed (e.g. processed in parallel) by document processor 210, allowing document processor 210 to be more efficiently utilized (e.g. higher throughput of jobs and lower latency).
  • Parser 320 may receive and parse a structured document, identifying data in the structured document for PEP 330 and generating data structures comprising data from the structured document by, for example, creating data structures in shared memory 270 for use by TE 340 or OG 350. An exemplary embodiment of parser 320 is illustrated in Appendix A.
  • PEP 330 receives data from parser 320 identifying data of the structured document being processed and compares data identified by the parser 320 against expressions identified in the transformation instructions. PEP 330 may also create one or more data structures in shared memory 270, where the data structures comprises a list of data in the structured document which match expressions. An exemplary embodiment of PEP 330 is illustrated in Appendix A.
  • Transformation engine 340 may access the data structures built by parser 320 and PEP 330 and execute instruction code generated by compiler 220 and stored in memory 270 to generate results for the output document. In some embodiments, one or more instructions of the instruction code generated by compiler 220 may be operable to be independently executed (e.g. execution of one instruction does not depend directly on the result of the output of the execution of another instruction), and thus execution of the instruction code by transformation engine 340 may occur in substantially any order. An exemplary embodiment of a transformation engine is illustrated in Appendix A.
  • Output generator 350 may assemble the results generated by transformation engine 340 in an order specified by the transformation instructions or corresponding to the structured document and provide the output document to the initiating web service 112 through HIU 310, for example, by signaling the web service 112 or a host processor that the job is complete and providing a reference to a location in memory 270 where an output document exists. An exemplary embodiment of an output generator is illustrated in Appendix A.
  • While it should be understood that embodiments of the present invention may be applied with respect to almost any structured document (e.g. a document having a defined structure that can be used to interpret the content) whether the content is highly structured (such as an XML document, HTML document, .pdf document, word processing document, database, etc.) or loosely structured (such as a plain text document whose structure may be, e.g., a stream of characters) and associated transformation instructions (which is used generally referred to a file which may be used with reference to a structured document e.g. document type definitions (.dtd) schema such as .xsd files, XSL transformation files, etc.) for the structured document, it may be helpful to illustrate various embodiments of the present invention with respect to a particular example of a structured document and transformation instructions.
  • Generally, an XML document is a structured document which has a hierarchical tree structure, where the root of the tree identifies the document as a whole and each other node in the document is a descendent of the root. Various elements, attributes, and document content form the nodes of the tree. The elements define the structure of the content that the elements contain. Each element has an element name, and the element delimits content using a start tag and an end tag that each include the element name. An element may have other elements as sub-elements, which may further define the structure of the content. Additionally, elements may include attributes (included in the start tag, following the element name), which are name/value pairs that provide further information about the element or the structure of the element content. XML documents may also include processing instructions that are to be passed-to the application reading the XML document, comments, etc.
  • An XSLT stylesheet is a set of transformation instructions which may be viewed as a set of templates. Each template may include: (i) an expression that identifies nodes in a document's tree structure; and (ii) a body that specifies a corresponding portion of an output document's structure for nodes of the source document identified by the expression. Applying a stylesheet to a source document may comprise attempting to find a matching template for one or more nodes in the source document, and instantiating the structures corresponding to the body of the matching template in an output document.
  • The body of a template may include one or more of: (i) literal content to be instantiated in the output document; (ii) instructions for selection of content from the matching nodes to be copied into the output document; and (iii) statements that are to be evaluated, with the result of the statements being instantiated in the output document. Together, the content to be instantiated and the statements to be evaluated may be referred to as “actions” to be performed on the nodes that match the template.
  • The body of a template may include one or more “apply templates” statements, which include an expression for selecting one or more nodes and causing the templates in the stylesheet to be applied to the selected nodes, thus effectively nesting the templates. If a match to the apply templates statement is found, the resulting template is instantiated within the instantiation of the template that includes the apply templates statement. Other statements in the body of the template may also include expressions to be matched against nodes (and the statements may be evaluated on the matching nodes).
  • The expressions used in a stylesheet may generally comprise node identifiers and/or values of nodes, along with operators on the node identifiers to specify parent/child (or ancestor/descendant) relationships among the node identifiers and/or values. Expressions may also include predicates, which may be extra condition(s) for matching a node. A predicate is an expression that is evaluated with the associated node as the context node (defined below), where the result of the expression is either true (and the node may match the expression node) or false (and the node does not match the expression). Thus, an expression may be viewed as a tree of nodes to be matched against a document's tree.
  • A given document node may satisfy an expression if the given document node is selected via evaluation of the expression. That is, the expression node identifiers in the expression match the given document node's identifier or document node identifiers having the same relationship to the given document node as specified in the expression, and any values used in the expression are equal to corresponding values related to the given document node.
  • A document node may also be referred to as a “matching node” for a given expression if the node satisfies the given expression. In some cases in the remainder of this discussion, it may be helpful for clarity to distinguish nodes in expression trees from nodes in a structured document. Thus, a node may be referred to as an “expression node” if the node is part of an expression tree, and a node may be referred to as a “document node” if the node is part of the document being processed. A node identifier may comprise a name (e.g. element name, attribute name, etc.) or may comprise an expression construct that identifies a node by type (e.g. a node test expression may match any node, or a text test expression may match any text node). In some cases, a name may belong to a specific namespace. In such cases, the node identifier may be a name associated with a namespace. In XML, the namespace provides a method of qualifying element and attribute names by associating them with namespace names. Thus, the node identifier may be the qualified name (the optional namespace prefix, followed by a colon, followed by the name). A name, as used herein (e.g. element name, attribute name, etc.) may include a qualified name. Again, while XSLT stylesheets may be used in one example herein of transformation instructions, generally a “transformation instructions” may comprise any specification for transforming a source document to an output document, which may encompass, for example, statements indented to identify data of the source document or statements for how to transform data of the source document. The source and output documents may be in the same language (e.g. the source and output documents may be different XML vocabularies), or may differ (e.g. XML to pdf, etc.).
  • Moving now to FIG. 4, an example application of one embodiment of the present invention to an XML document and an XSLT stylesheet is illustrated. It is noted that, while the description herein may include examples in which transformation instructions are applied to a single source document, other examples may include applying multiple sets of transformation instructions to a source document (either concurrently or serially, as desired) or applying a set of transformation instructions to multiple source documents (either concurrently with context switching or serially, as desired).
  • Returning to the example of FIG. 4, an XML document and an associated XSL stylesheet may be received by web service 112. Web service 112 may invoke embodiments of the present invention to transform the received document according to the received stylesheet. More specifically, in one embodiment, compiler 220 may be used to compile the XSL stylesheet to generate data structures and instruction code for use by document processor 210. Compiler 220 may assign serial numbers to node identifiers in the stylesheet so that expression evaluation may be performed by document processor 210 by comparing numbers, rather than node identifiers (which would involve character string comparisons).
  • Compiler 220 may also store a mapping of these node identifiers to serial numbers in one or more symbol tables 410 in memory 270. Additionally, compiler 220 may extract the expressions from the stylesheet and generate expression tree data structures in memory 270 to be used by the document processor 210 for expression matching (e.g. one or more parse-time expression trees 420 comprising expression nodes). Still further, compiler 220 may generate an instruction table 430 in memory 270 with instructions to be executed for one or more matching expressions. The instructions in the instruction table 430 may be executable by document processor 210 that, when executed by the document processor 210, may result in performing the actions defined when an expression associated with the instruction is matched. In some embodiments, the instructions may comprise the actions to be performed (i.e. there may be a one-to-one correspondence between instructions and actions). In other embodiments, at least some actions may be realized by executing two or more instructions. The compiler may also generate whitespace tables 440 defining how various types of whitespace in the source document are to be treated (e.g. preserved, stripped, etc.), an expression list table 450, a template list table 460 and one or more DTD tables 462 to map entity references to values or specify default values for attributes.
  • At this point, processing of the source document by document processor 210 may begin. Parser 320 receives the structured document and accesses the symbol tables 410, whitespace tables 440, or DTD tables 462 in memory 470 to parse the structured document, identify document nodes, and generate events (e.g. to identify document nodes parsed from the document) to PEP 330. More particularly, parser 320 converts node identifiers in the source document to corresponding serial numbers in the symbol tables 410, and transmits these serial numbers as part of the events to the PEP 330. Additionally, parser 320 may generate a parsed document tree 470 representing the structure of the source document in memory. Nodes of the parsed document tree may reference corresponding values stored in one or more parsed content tables 472 created in memory by parser 320. PEP 330 receives events from the parser 320 and compares identified document nodes (e.g. based on their serial numbers) against parse-time expression tree(s) 420 in memory 270. Matching document nodes are identified and recorded in template or expression match lists 480 in memory 270.
  • Transformation engine 340 executes instructions from instruction table 430. When executing these instructions, transformation engine 430 may accesses the template or expression match lists 480, the parsed document tree 470, the parsed content tables 472 or the instruction table 430 in memory 270. These instructions may, in turn, be associated with one or more templates of a stylesheet. Transformation engine 340 may execute the instructions on each of the document nodes that matches the expression associated with the template, for example to transform or format document nodes according to the template. Transformation engine 340 may request that the results of the execution of these instructions to be stored in one or more output data structures 490 in memory 270. Thus, as transformation engine 340 executes instructions of instruction table 430, a set of output data structures 490 are created in memory 270 representing the structure of an output document, and content for the output document placed in, or associated with, these output data structures 490.
  • Output generator 350 may receive results from transformation engine 340 for storing in output data structures 490 in memory 270. Output generator may access these output data structures 490 or data structures 410, 420, 450, 460, 470, 472 created by parser 320 or PEP 330 to assemble an output document. In some embodiments, output generator 350 may access a set of formatting parameters for the assembly of the output document. After the output document is assembled, or as the output document is being assembled, the output document (or portions thereof) may be returned to the proper web service 112.
  • In the foregoing specification, the invention has been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the invention as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of invention. For example, it will be apparent to those of skill in the art that although the present invention has been described with respect to a protocol controller in a routing device the inventions and methodologies described herein may be applied in any context which requires the determination of the protocol of a bit stream.
  • Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims (23)

1. An apparatus, comprising
a parser circuit operable to parse a structured document to create a first set of data structures;
a pattern expression processor circuit operable to create a second set of data structures based on an output of the parser circuit;
a transformation engine circuit operable to generate a set of results corresponding with an output document utilizing the first data structure or second data structure, wherein the output document corresponds to a transformation of the structured document according to a set of transformation instructions; and
an output generator circuit, operable to create a set of output document structures, associate the set of results generated by the transformation engine with the set of output data structures and assemble the output document from the output data structures.
2. The apparatus of claim 1, wherein the transformation engine circuit is operable to execute a set of instructions generated from the transformation instructions.
3. The apparatus of claim 2, comprising a host interface circuit wherein the host interface circuit is operable to receive the structured document and pass the structured document to the parser.
4. The apparatus of claim 3, comprising a memory interface coupled to each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit and the output generator circuit, and operable to interface between each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit, the output generator circuit and a memory.
5. The apparatus of claim 4, wherein the first data structure, the second data structure and the set of instructions are in the memory.
6. The apparatus of claim 5, comprising a first bus coupled to each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit and the output generator circuit.
7. The apparatus of claim 6, comprising a second bus coupling the parser to the pattern expression processor.
8. The apparatus of claim 7, comprising a bus coupling the transformation engine to the output generator circuit.
9. A system, comprising:
a compiler operable to generate a set of instructions from a set of transformation instructions corresponding to a structured document; and
a document processor circuit operable to execute the set of instructions generate an output document corresponding to a transformation of the structured document according to a set of transformation instructions, the document processor circuit comprising:
a parser circuit operable to parse the structured document to create a first set of data structures;
a pattern expression processor circuit operable to create a second set of data structures based on an output of the parser circuit;
a transformation engine circuit operable execute the set of instructions to generate a set of results corresponding with the output document utilizing the first data structure or second data structure; and
an output generator circuit, operable to assemble the output document from the set of results.
10. The system of claim 9, wherein the transformation engine circuit executes the set of instructions utilizing the first set of data structures or second set of data structures.
11. The system of claim 10, wherein the document processor circuit comprises a host interface circuit wherein the host interface circuit is operable to receive a structured document and pass the structured document to the parser.
12. The system of claim 11, further comprising a memory, wherein the document processor circuit comprises a memory interface coupled to each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit and the output generator circuit, and operable to interface between each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit, the output generator circuit and the memory.
13. The apparatus of claim 12, wherein the first data structure, the second data structure and the set of instructions are in the memory.
14. The system of claim 13, wherein the document processor circuit comprises a first bus coupled to each of the parser circuit, the pattern expression processor circuit, the transformation engine circuit and the output generator circuit.
15. The system of claim 14, wherein the document processor circuit comprises a second bus coupling the parser to the pattern expression processor.
16. The system of claim 15, wherein the document processor circuit comprises a third bus coupling the transformation engine to the output generator circuit.
17. A method, comprising:
parsing a first structured document to create a first data structure representative of the first structured document;
generating a second set of data structures, each of the second set of data structures comprising a related set of data associated with the first structured document;
executing a first set of instructions to generate a first set of results associated with a first output document corresponding to a transformation of the first structured document according to a first set of transformation instructions, wherein the first set of instructions were generated from the first set of transformation instructions; and
generating a first output document from the first set of results, wherein generating the first output document comprises assembling the first set of results in an order corresponding to the first output document.
18. The method of claim 17, comprising:
generating a first set of output data structures; and
associating the first set of results with the first output data structures.
19. The method of claim 18, wherein executing the first set of instructions further comprises accessing the first data structure and the second set of data structures.
20. The method of claim 19, wherein generating the first output document comprises formatting the set of results according to a type of the first output document.
21. The method of claim 17, comprising executing a second set of instructions to generate a second set of results associated with a second output document corresponding to a transformation of the second structured document according to a second set of transformation instructions, wherein the second set of instructions were generated from the first set of transformation instructions, and the second set of instructions is executes substantially simultaneously with the first set of instructions.
22. The method of claim 21, comprising generating a second output document from the second set of results, wherein generating the second output document comprises assembling the second set of results in an order corresponding to the second output document, and the second output document is generated substantially simultaneously with the second.
23. The method of claim 22, wherein the first output document and the second output document are output substantially as they are generated.
US11/413,051 2005-04-27 2006-04-27 Method and system for an architecture for the processing of structured documents Abandoned US20070038930A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/413,051 US20070038930A1 (en) 2005-04-27 2006-04-27 Method and system for an architecture for the processing of structured documents

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US67534705P 2005-04-27 2005-04-27
US67511505P 2005-04-27 2005-04-27
US67534905P 2005-04-27 2005-04-27
US67516705P 2005-04-27 2005-04-27
US11/413,051 US20070038930A1 (en) 2005-04-27 2006-04-27 Method and system for an architecture for the processing of structured documents

Publications (1)

Publication Number Publication Date
US20070038930A1 true US20070038930A1 (en) 2007-02-15

Family

ID=37215515

Family Applications (4)

Application Number Title Priority Date Filing Date
US11/413,070 Abandoned US20070012601A1 (en) 2005-04-27 2006-04-27 Method, system and apparatus for an output generator for use in the processing of structured documents
US11/413,052 Expired - Fee Related US8065685B2 (en) 2005-04-27 2006-04-27 Method, system and apparatus for a transformation engine for use in the processing of structured documents
US11/413,051 Abandoned US20070038930A1 (en) 2005-04-27 2006-04-27 Method and system for an architecture for the processing of structured documents
US11/412,698 Abandoned US20070136698A1 (en) 2005-04-27 2006-04-27 Method, system and apparatus for a parser for use in the processing of structured documents

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US11/413,070 Abandoned US20070012601A1 (en) 2005-04-27 2006-04-27 Method, system and apparatus for an output generator for use in the processing of structured documents
US11/413,052 Expired - Fee Related US8065685B2 (en) 2005-04-27 2006-04-27 Method, system and apparatus for a transformation engine for use in the processing of structured documents

Family Applications After (1)

Application Number Title Priority Date Filing Date
US11/412,698 Abandoned US20070136698A1 (en) 2005-04-27 2006-04-27 Method, system and apparatus for a parser for use in the processing of structured documents

Country Status (2)

Country Link
US (4) US20070012601A1 (en)
WO (4) WO2006116650A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109524A1 (en) * 2006-11-07 2008-05-08 International Business Machines Corporation Method and system for dynamically specifying a format for data provided by a web service invocation
US20090265380A1 (en) * 2008-03-31 2009-10-22 Justin Wright Systems and methods for tables of contents
US20130110792A1 (en) * 2011-10-28 2013-05-02 Microsoft Corporation Contextual Gravitation of Datasets and Data Services
US20170103113A1 (en) * 2015-10-09 2017-04-13 Bank Of America Corporation System for inline message detail extraction and transformation
US11900229B1 (en) * 2023-05-17 2024-02-13 Dita Strategies, Inc. Apparatus and method for iterative modification of self-describing data structures

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7823127B2 (en) * 2003-11-25 2010-10-26 Software Analysis And Forensic Engineering Corp. Detecting plagiarism in computer source code
US20070168857A1 (en) * 2006-01-17 2007-07-19 Oracle International Corporation Transformation of Source Data in a Source Markup Language to Target Data in a Target Markup Language
JP4447596B2 (en) * 2006-06-28 2010-04-07 パナソニック株式会社 Pulse generation circuit and modulator
US7539953B1 (en) * 2006-12-05 2009-05-26 Xilinx, Inc. Method and apparatus for interfacing instruction processors and logic in an electronic circuit modeling system
US8204856B2 (en) 2007-03-15 2012-06-19 Google Inc. Database replication
US7865823B2 (en) 2007-06-28 2011-01-04 Intel Corporation Method and apparatus for schema validation
US7937395B2 (en) * 2008-02-22 2011-05-03 Tigerlogic Corporation Systems and methods of displaying and re-using document chunks in a document development application
US8126880B2 (en) * 2008-02-22 2012-02-28 Tigerlogic Corporation Systems and methods of adaptively screening matching chunks within documents
US8145632B2 (en) 2008-02-22 2012-03-27 Tigerlogic Corporation Systems and methods of identifying chunks within multiple documents
US8924374B2 (en) * 2008-02-22 2014-12-30 Tigerlogic Corporation Systems and methods of semantically annotating documents of different structures
US8924421B2 (en) * 2008-02-22 2014-12-30 Tigerlogic Corporation Systems and methods of refining chunks identified within multiple documents
US8078630B2 (en) 2008-02-22 2011-12-13 Tigerlogic Corporation Systems and methods of displaying document chunks in response to a search request
US9129036B2 (en) * 2008-02-22 2015-09-08 Tigerlogic Corporation Systems and methods of identifying chunks within inter-related documents
US9965453B2 (en) * 2009-10-15 2018-05-08 Microsoft Technology Licensing, Llc Document transformation
US20110119262A1 (en) * 2009-11-13 2011-05-19 Dexter Jeffrey M Method and System for Grouping Chunks Extracted from A Document, Highlighting the Location of A Document Chunk Within A Document, and Ranking Hyperlinks Within A Document
US8307277B2 (en) * 2010-09-10 2012-11-06 Facebook, Inc. Efficient event delegation in browser scripts
US9772889B2 (en) * 2012-10-15 2017-09-26 Famous Industries, Inc. Expedited processing and handling of events
US10908929B2 (en) 2012-10-15 2021-02-02 Famous Industries, Inc. Human versus bot detection using gesture fingerprinting
US11386257B2 (en) 2012-10-15 2022-07-12 Amaze Software, Inc. Efficient manipulation of surfaces in multi-dimensional space using energy agents
US10877780B2 (en) 2012-10-15 2020-12-29 Famous Industries, Inc. Visibility detection using gesture fingerprinting
US9501171B1 (en) 2012-10-15 2016-11-22 Famous Industries, Inc. Gesture fingerprinting
US10223637B1 (en) 2013-05-30 2019-03-05 Google Llc Predicting accuracy of submitted data
US9038004B2 (en) 2013-10-23 2015-05-19 International Business Machines Corporation Automated integrated circuit design documentation
US10248474B2 (en) * 2014-01-29 2019-04-02 Microsoft Technology Licensing, Llc Application event distribution system
US9454412B2 (en) 2014-10-03 2016-09-27 Benefitfocus.Com, Inc. Systems and methods for classifying and analyzing runtime events
US10325014B2 (en) 2015-04-30 2019-06-18 Workiva Inc. System and method for convergent document collaboration
US20170154019A1 (en) * 2015-11-30 2017-06-01 Open Text Sa Ulc Template-driven transformation systems and methods
JP2018069512A (en) * 2016-10-27 2018-05-10 セイコーエプソン株式会社 Printer and control method for the same
US11562143B2 (en) * 2017-06-30 2023-01-24 Accenture Global Solutions Limited Artificial intelligence (AI) based document processor
JP7261083B2 (en) 2019-05-09 2023-04-19 株式会社日立製作所 Software analysis support system
US11755825B2 (en) 2019-09-12 2023-09-12 Workiva Inc. Method, system, and computing device for facilitating private drafting
JP2021051420A (en) * 2019-09-24 2021-04-01 株式会社東芝 Virtualization support device and control method of virtualization support device
US11100281B1 (en) * 2020-08-17 2021-08-24 Workiva Inc. System and method for maintaining links and revisions
US11443108B2 (en) 2020-08-17 2022-09-13 Workiva Inc. System and method for document management using branching
US11100277B1 (en) 2021-02-15 2021-08-24 Workiva Inc. Systems, methods, and computer-readable media for flow-through formatting for links
US11354362B1 (en) 2021-05-06 2022-06-07 Workiva Inc. System and method for copying linked documents
US11640495B1 (en) 2021-10-15 2023-05-02 Workiva Inc. Systems and methods for translation comments flowback

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6631379B2 (en) * 2001-01-31 2003-10-07 International Business Machines Corporation Parallel loading of markup language data files and documents into a computer database
US20040064475A1 (en) * 2002-09-27 2004-04-01 International Business Machines Corporation Methods for progressive encoding and multiplexing of web pages
US20040221319A1 (en) * 2002-12-06 2004-11-04 Ian Zenoni Application streamer
US7062708B2 (en) * 2002-09-19 2006-06-13 International Business Machines Corporation Tree construction for XML to XML document transformation
US7131116B1 (en) * 2002-12-30 2006-10-31 Oracle International Corporation Transformation of electronic messages to an extensible data format
US7134075B2 (en) * 2001-04-26 2006-11-07 International Business Machines Corporation Conversion of documents between XML and processor efficient MXML in content based routing networks
US7243156B2 (en) * 2000-11-01 2007-07-10 Digital Integrator, Inc. Information distribution method and system
US7251697B2 (en) * 2002-06-20 2007-07-31 Koninklijke Philips Electronics N.V. Method and apparatus for structured streaming of an XML document

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998037655A1 (en) * 1996-12-20 1998-08-27 Financial Services Technology Consortium Method and system for processing electronic documents
US6966027B1 (en) * 1999-10-04 2005-11-15 Koninklijke Philips Electronics N.V. Method and apparatus for streaming XML content
US7590644B2 (en) * 1999-12-21 2009-09-15 International Business Machine Corporation Method and apparatus of streaming data transformation using code generator and translator
JP3879350B2 (en) * 2000-01-25 2007-02-14 富士ゼロックス株式会社 Structured document processing system and structured document processing method
US20020143823A1 (en) * 2001-01-19 2002-10-03 Stevens Mark A. Conversion system for translating structured documents into multiple target formats
US20020147745A1 (en) * 2001-04-09 2002-10-10 Robert Houben Method and apparatus for document markup language driven server
US6829745B2 (en) * 2001-06-28 2004-12-07 Koninklijke Philips Electronics N.V. Method and system for transforming an XML document to at least one XML document structured according to a subset of a set of XML grammar rules
US20050086584A1 (en) * 2001-07-09 2005-04-21 Microsoft Corporation XSL transform
US7305615B2 (en) * 2001-07-30 2007-12-04 Gigalogix, Inc. Methods and apparatus for accelerating data parsing
US6880125B2 (en) * 2002-02-21 2005-04-12 Bea Systems, Inc. System and method for XML parsing
US7246358B2 (en) * 2002-04-09 2007-07-17 Sun Microsystems, Inc. Methods, system and articles of manufacture for providing an extensible serialization framework for an XML based RPC computing environment
EP1502196A4 (en) * 2002-05-02 2008-04-02 Sarvega Inc System and method for transformation of xml documents using stylesheets
US7065704B1 (en) * 2002-07-18 2006-06-20 Embedded Internet Solutions, Inc. Methods for fast HTML rendering
US7287217B2 (en) * 2004-01-13 2007-10-23 International Business Machines Corporation Method and apparatus for processing markup language information
US8024703B2 (en) * 2004-10-22 2011-09-20 International Business Machines Corporation Building an open model driven architecture pattern based on exemplars

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7243156B2 (en) * 2000-11-01 2007-07-10 Digital Integrator, Inc. Information distribution method and system
US6631379B2 (en) * 2001-01-31 2003-10-07 International Business Machines Corporation Parallel loading of markup language data files and documents into a computer database
US7134075B2 (en) * 2001-04-26 2006-11-07 International Business Machines Corporation Conversion of documents between XML and processor efficient MXML in content based routing networks
US7251697B2 (en) * 2002-06-20 2007-07-31 Koninklijke Philips Electronics N.V. Method and apparatus for structured streaming of an XML document
US7062708B2 (en) * 2002-09-19 2006-06-13 International Business Machines Corporation Tree construction for XML to XML document transformation
US20040064475A1 (en) * 2002-09-27 2004-04-01 International Business Machines Corporation Methods for progressive encoding and multiplexing of web pages
US20040221319A1 (en) * 2002-12-06 2004-11-04 Ian Zenoni Application streamer
US7131116B1 (en) * 2002-12-30 2006-10-31 Oracle International Corporation Transformation of electronic messages to an extensible data format

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109524A1 (en) * 2006-11-07 2008-05-08 International Business Machines Corporation Method and system for dynamically specifying a format for data provided by a web service invocation
US7926065B2 (en) * 2006-11-07 2011-04-12 International Business Machines Corporation Method and system for dynamically specifying a format for data provided by a web service invocation
US20090265380A1 (en) * 2008-03-31 2009-10-22 Justin Wright Systems and methods for tables of contents
US8600942B2 (en) * 2008-03-31 2013-12-03 Thomson Reuters Global Resources Systems and methods for tables of contents
US20140089350A1 (en) * 2008-03-31 2014-03-27 Thomson Reuters Global Resources Systems and methods for tables of contents
US9424295B2 (en) * 2008-03-31 2016-08-23 Thomson Reuters Global Resources Systems and methods for tables of contents
US20130110792A1 (en) * 2011-10-28 2013-05-02 Microsoft Corporation Contextual Gravitation of Datasets and Data Services
US8538934B2 (en) * 2011-10-28 2013-09-17 Microsoft Corporation Contextual gravitation of datasets and data services
US20170103113A1 (en) * 2015-10-09 2017-04-13 Bank Of America Corporation System for inline message detail extraction and transformation
US10489418B2 (en) * 2015-10-09 2019-11-26 Bank Of America Corporation System for inline message detail extraction and transformation
US11061921B2 (en) * 2015-10-09 2021-07-13 Bank Of America Corporation System for inline message detail extraction and transformation
US11900229B1 (en) * 2023-05-17 2024-02-13 Dita Strategies, Inc. Apparatus and method for iterative modification of self-describing data structures

Also Published As

Publication number Publication date
US20070136698A1 (en) 2007-06-14
US20090106775A1 (en) 2009-04-23
US20070012601A1 (en) 2007-01-18
WO2006116651A2 (en) 2006-11-02
WO2006116612A2 (en) 2006-11-02
WO2006116651A3 (en) 2007-08-02
WO2006116650A2 (en) 2006-11-02
WO2006116650A3 (en) 2007-10-11
US8065685B2 (en) 2011-11-22
WO2006116649A9 (en) 2007-04-12
WO2006116649A3 (en) 2009-04-16
WO2006116612A3 (en) 2007-11-15
WO2006116649A2 (en) 2006-11-02

Similar Documents

Publication Publication Date Title
US20070038930A1 (en) Method and system for an architecture for the processing of structured documents
US7366973B2 (en) Item, relation, attribute: the IRA object model
US7458022B2 (en) Hardware/software partition for high performance structured data transformation
US7409400B2 (en) Applications of an appliance in a data center
US7703009B2 (en) Extensible stylesheet designs using meta-tag information
US6487566B1 (en) Transforming documents using pattern matching and a replacement language
US7437666B2 (en) Expression grouping and evaluation
US8286132B2 (en) Comparing and merging structured documents syntactically and semantically
US7328403B2 (en) Device for structured data transformation
US8533693B2 (en) Embedding expressions in XML literals
JP2005018776A (en) Query intermediate language method and system
US20070050707A1 (en) Enablement of multiple schema management and versioning for application-specific xml parsers
JP2006092529A (en) System and method for automatically generating xml schema for verifying xml input document
US20070050760A1 (en) Generation of application specific xml parsers using jar files with package paths that match the xml xpaths
US7539981B2 (en) XML-based preprocessor
US20070050705A1 (en) Method of xml element level comparison and assertion utilizing an application-specific parser
US20070050706A1 (en) Method of xml transformation and presentation utilizing an application-specific parser
US20050177788A1 (en) Text to XML transformer and method
KR20040056298A (en) A data integration system and method using XQuery for defining the integrated schema
JP2004529427A (en) Design of extensible style sheet using meta tag information
Hunt et al. XML and Java

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTEL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CONFORMATIVE SYSTEMS, INC;REEL/FRAME:018141/0727

Effective date: 20060815

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION