WO2004040447A2 - Hardware accelerated validating parser - Google Patents
Hardware accelerated validating parser Download PDFInfo
- Publication number
- WO2004040447A2 WO2004040447A2 PCT/US2003/031315 US0331315W WO2004040447A2 WO 2004040447 A2 WO2004040447 A2 WO 2004040447A2 US 0331315 W US0331315 W US 0331315W WO 2004040447 A2 WO2004040447 A2 WO 2004040447A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- token
- state table
- recited
- validation
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
Definitions
- the present invention generally relates to validating parser processing for parsing and validating documents, such as XMLTM documents, for use in individual data processors interconnected by a network and, more particularly, to hardware validating processors for acceleration of the validation of such documents.
- certain character strings correspond to certain commands or identifications, including special characters and other important data (collectively referred to as control words) which allow data or operations to, in effect, identify themselves so that they may be, thereafter treated as "objects" such that associated data and commands can be translated into the appropriate formats and commands of different applications in different languages in order to engender a degree of compatibility of respective connected platforms sufficient to support the desired processing at a given machine.
- the detection of these character strings is performed by an operation known as parsing, similar to the more conventional usage of resolving the syntax of an expression, such as a sentence, into its component parts and describing them grammatically.
- This processing is known as validation and generally proceeds in much the same fashion as processing for finding character strings of interest discussed above but operating on sixteen-bit (or longer) tokens corresponding to sequences of bytes rather than single eight-bit (or longer) bytes representing characters and checking for consistency between tokens and the content or arguments of other tokens to accommodate the self-definition characteristics and properties of languages such as XML, SGMLTM (of which XMLTM is a simplified form) and HTMLTM (which is essentially a special case of XMLTM) which support platform independence and interconnectivity.
- languages such as XML, SGMLTM (of which XMLTM is a simplified form) and HTMLTM (which is essentially a special case of XMLTM) which support platform independence and interconnectivity.
- Both the parsing for finding tokens and the parsing for validation are generally implemented using a conceptually table-based finite state machine (FSM) or state table to search for these strings of interest or consistency between elements found and represented by tokens.
- the state table resides in memory and is designed to search for the specific patterns of characters or tokens in the document.
- the current state is used as the base address into the state table and the ASCII representation of the input character or the token is an index into the table.
- Character strings of interest may be of any of several types such as an element, an attribute/attribute list or data and elements may be simple elements or aggregates and may be nested.
- the parsing for validation principally looks at the types of character strings presented and the nesting itself to determine which elements or tokens are associated with another specific token (s) and the hierarchical relationship between them.
- the goal of this processing is not only to determine that the document is a valid document that conforms to the language (e.g. XMLTM) standard and have the correct structure as defined by a DTD or XMLTM schema in its entirety but to develop a hierarchical data structure such as a tree structured document object in which the structure will fully represent the informational content of the data. Therefore, while parsing to find character strings of interest is very time consuming and processor intensive, parsing for validation is much more so. That is, since the XMLTM data, for example, are textual and not only the data but the data structure, which may be freely specified to express the informational content, must be extracted from such text, it can be readily appreciated that the required processing is particularly time consuming and processor intensive.
- the present invention to provide a hardware accelerator for validation processing in which substantial performance gains are derived with limited hardware.
- method of accelerating validation parsing and a hardware accelerated validation parser for a tokenized text document in a. computer language supporting platform independence and interoperability comprising, an arrangement for retrieving data from both a data dictionary and a state table in accordance with a token, and adder for merging the data from the state table and the data dictionary to form a transition control word, and adder for merging part of the transition control word with another token to retrieve further data from the state table, and logic for forming a tree structured data object corresponding to the tokenized text document under control of part of the transition control word.
- Figure 1 is a high level schematic diagram of the hardware validating parser accelerator in accordance with the invention
- Figure 1A is a diagram of an exemplary tree structured document object illustrative of formation thereof
- Figure 2 is a diagram of a preferred logical layout of state tables in accordance with the invention.
- Figure 3 is a diagram illustrating a preferred format of a state table entry
- Figure 4 is a diagram illustrating a preferred logical layout of the element and attribute buffer of Figure 1 in accordance with the invention
- Figure 5 is a diagram of a preferred exemplary data dictionary entry format
- FIG. 6 is a diagram illustrating a preferred logical layout of transition control words (TCW) in accordance with the invention
- Figure 7 is a flow chart illustrating an overview of the operation of the validating parser accelerator illustrated schematically in Figure 1,
- FIG. 8 is a flow chart illustrating operation of the invention to implement TCW update rules
- FIGS 9, 10 and 11A - HE are flow charts illustrating TCW operations.
- This tokenized document is stored in memory/buffer 110 which is accessed in accordance with registers 112, 114 and 116 containing the document base address, the document limit address and the document next address, respectively.
- Tokens are fetched in sequence and stored in token buffer 120. Portions of the token are then used to provide a portion of an address to adder 130
- Stack 190 is basically for the purpose of interchanging order of tokens as they are considered in order to accommodate evaluation of nested tokens and to determine if the associations or relationships between parent and child elements is correct and well-constructed. Control of the stack is derived from a combination of information from the state table 160 and the data dictionary 150 which is derived in parallel based on the current token. Therefore, stack 190 may be considered as accommodating particular features (e.g. aggregates) of XMLTM syntax and further discussion deferred until the detailed discussion below.
- the gross function of loop A is to derive information from the state tables including a next state, add further data at adder 170, derive data, stack commands, interrupts and control signals for developing the final tree structured data object (TSDO) from the combined signals in the TCW register 180 and add the next state and/or the base address in registers 192 and 194, respectively to the next token.
- the extraction of a control signal for providing an addition of a node to the TSDO corresponds to a validation of the current token and a detected error will be represented by the issuance of an interrupt.
- the TCW is basically a combination of control flags and state transition controls from the state table 160 and data dictionary 150 with certain update and interrupt and stack rules applied.
- the TCW register thus functions to merge and redirect control signals and data retrieved from among the contents of the state table 160 and the data dictionary 150 which represents the permissible syntax of tokens such that the effect of progress through the state table is controlled in a manner to detect errors and, while no error has yet been detected, to properly construct the TSDO 300 with data buffered at element and attribute buffer 200 in accordance with base and limit addresses supplied from registers 302, 304 respectively.
- This architecture supports standard and established types of document definitions being used as a data dictionary by virtue of accommodation by the state table contents which can be arranged at will to do so.
- the accelerator processing unit 100 is responsible for orchestrating the operations of the invention which will be described below and to construct the TSDO.
- a host processor 400 which interfaces with the accelerator processing unit is also provided and used to control initiation of operation of the hardware accelerator in accordance with the invention. It will be appreciated from the following that the role of the host processor 400 in the validation parsing process is reduced, by virtue of the hardware accelerator of the invention, operations requiring only simple memory accesses and to provide interrupt servicing. In essence, substantially all validation processing overhead corresponding to the following of document syntax and element nesting and aggregate elements is removed to the validation parsing accelerator and the processor operations are limited to simple responses to processing calls, as needed, from the hardware accelerator 100.
- a support processor can also be provided to handle some or all of the processing calls from the hardware accelerator 100.
- the TSDO is preferably constructed in memory as a doubly linked list data structure having seven elements (e.g. sibling element, child element, attribute list, name length, name pointer, value length and value pointer) per member/node as illustrated in the TSDO member layout.
- the eight rows of elements depicted, each corresponding to the TSDO member layout, thus correspond to eight individual members of the TSDO and are offset horizontally to indicate sibling and parent/child relationships.
- a doubly linked list several elements each contain two pointers p, n, to the previous and next member containing the previous and next element of the specified or same type.
- a "next sibling element pointer” points to the next member equally indented with the current member.
- pointers are provided to and from attribute list elements and the attributes of members.
- the "previous child element” points to the parent member.
- previous and next pointers form chains among sibling members.
- the remaining elements are lengths and locations of actual data. From this information five controls are preferably provided in correspondence with a current member/node (e.g. the parent node, the current element node, the current attribute node, the element parent and the first attribute) , as illustrated. These controls basically track whether the current member is an element or an attribute and the nodes on which they exist, the first attribute of the attribute list of an element, the immediate parent member and the root node. Following these controls allows traversing of the tree to locate any desired information while the nature of each piece of information is fully and unambiguously defined in the structure of the TSDO.
- a current member/node e.g. the parent node, the current element node, the current attribute node, the element parent and the first attribute
- the elements and data will be embodied in a file which is essentially text but following rules which can be freely structured for different classes of documents.
- the rules embodied in the text document are, in effect, summarized in a file called "document model" which is used to validate the text document and which may be considered as separate from the text file although it need not be.
- the rules may be freely defined by a developer but some standardized sets of rules exist and which are often utilized as a matter of convenience.
- a document model defines the elements which can appear within the document along with attributes that can be associated with a given element and also defines structural information about the document such as child-parent relationships of elements, the sequence in which the child elements can appear and the number of child elements as well as whether an element is empty or can include text as well as default values for attributes.
- Document type definitions are a well-known example of a description of XMLTM document models .
- the DTD language was developed specifically for defining validation rules for SGMLTM documents.
- XMLTM is a simplified sub-set of SGMLTM and DTDs can also be used to define XMLTM validation rules. It should be recognized, however, that since the information required for validation of a particular document or class of documents must be the same regardless of the form in which it is transmitted or utilized, conversion between types of expressions of the validation information (e.g. DTDs and XMLTM schema) should, in theory, be substantially trivial and discussion in terms of DTDs should be equally applicable to any other form of the same information. By the same token, details of DTD syntax, which is very complex with heavy reliance on typographical symbols, is not of importance to the principles of the invention and need not be discussed in detail .
- XMLTM documents and documents in other languages supporting platform independence and interoperability
- a software module capable of reading XMLTM documents and providing access to their content and structure is referred to as an XMLTM processor or XML API which may also be freely implemented by developers although it is the common and recommended practice to use accepted, commercially available and industry standard APIs generally as a matter of being able to run under any compliant implementation of the same API.
- the DOM is based on an in- memory tree representation of the XMLTM document.
- the processor When an XMLTM document is loaded into a processor, the processor must build an in-memory tree structure which properly represents the document .
- the DOM also defines the programmatic interface (including the names of the methods and properties) that should be used to programmatically traverse an XMLTM tree and manipulate its elements, values and attributes.
- the TSDO data structure developed in the course of validation supports the DOM APIs or other similar APIs and implementations allowing use of the content of the document .
- processing for validation parsing of an XMLTM or other document in a language supporting interoperability using software on a general purpose computer can be extremely processor intensive and is slowed by the need for many memory accesses for multiple comparisons in regard to the content, structure and syntax specified for each element of a DTD or the like.
- the number of elements in a DTD or the like is theoretically unlimited and can run into the thousands in relatively simple data structures while the number of attributes child elements and sibling elements may be as large as necessary and the data document may easily contain millions of instances of any given element or attribute.
- the very generality which must be accommodated in software on a general purpose processor imposes major processing complexities and burdens.
- the hardware accelerated validation parser in accordance with the invention handles data in a relatively simple and consistent pipelined manner which can be performed at very high speed with a relatively small amount of hardware since the comparisons and the rules with which the comparisons are performed are embodied in the DTD embodied in the data dictionary and the signals present in the state table entries which can be rapidly merged and redirected to control parallel operations of the processor concurrent with the rapid sequential evaluation of tokens in loop A of Figure 1.
- FIG 2 The logical layout of the state table 160 is schematically shown. The state tables are built in accordance with a particular DTD or the like and accommodates all permissible elements.
- the token effectively contains the next state since the next state is specified in respective state table entries (as shown in Figure 3 and discussed below) which are, in part, indexed by the sixteen bit token values.
- the value in the token buffer 120 (or a portion of the value) is combined at adder 130 with the state table base address and the next state offset address to access an entry in the state table.
- the state table in sections indexed by the state table base address to reduce high speed storage requirements and then to index a column of the state table section using the token (which can be optimized through use of a pallette mechanism) and to index rows by the next state offset.
- the three portions of the address can be simply concatenated to form the full address as can be done, for example, by simply loading different portions of a register from the respective address portion sources.
- Each state table entry is preferably of sixty-four bits in length and divided into two thirty-two bit words.
- Other formats could be used as will be apparent to those skilled in the art.
- the lower address word is, in turn, divided into a sixteen bit token value and two eight bit flag fields for the token flags and the control flags, respectively.
- the tokens are preferably defined in a manner consistent with a hardware parser accelerator described in concurrently filed U. S. Provisional Patent Application 60/ , (Attorney's docket No FS-00766/02890051PR) , hereby fully incorporated by reference and their form is otherwise not of particular importance to the practice of the invention in accordance with its basic principles.
- the sixteen-bit token value field is somewhat redundant for the present invention since the token value is already being used for indexing the state table columns and thus could be designated as a reserved field.
- the token flags are principally provided to track nesting, whether or not a given element is an aggregate, and the type of element represented by the token. Individual flags represented by each of the eight bits of the field are preferably: Increment nesting, element is an aggregate, new element name, element value, attribute name, attribute value, end of element and decrement nesting, respectively.
- control flags represented by individual bits of the control flag field are preferably: set end interrupt to host/main processor, set special interrupt to host/main processor, (Reserved) , stop state table engine processing (these control flags are copied into the TCW as shown in Figures 9 and 10) , save element or attribute name, save element or attribute value, character palette skip enable which is largely redundant but included to correspond to the hardware processor accelerator described in the above-incorporated application and could be a reserved field here) , and end current token.
- This data from the state table is merged with data from the data dictionary (corresponding to a DTD, XMLTM schema or the like) at adder 170, preferably as a concatenation thereof and portions of the combined result are provided to the element and attribute buffer; the preferred logical layout of which is illustrated schematically in Figure 4.
- the preferred format of the data dictionary entry is illustrated in Figure 5.
- the data dictionary is preferably structured as a hash table; the hash key being derived at 125 from the token.
- the preferred format of the data dictionary entry is one hundred twenty-eight bits in length.
- the entries are derived from the DTD, XMLTM schema or the like describing the rules which are (to be) followed by the data document in XMLTM or other language supporting interoperability.
- type flags are provided for sixteen bits completing a byte. Eight bits are provided for type flags. These bits are used to identify the data type (e.g. Boolean, binary, decimal, etc.) associated with the element. With eight bits, 256 different data types can be identified using the type flags field. It should be noted that the type flags field is not necessary to the operation of the invention in accordance with its basic principle but allows for upgrading to allow validation of value to match against the element data type. These fields are followed by three thirty-two bit words providing the state table base address, and two pointers to the data pattern and range reference and attribute rule reference, respectively.
- the stack command flags, the type flags and the state table base address associated with the current token are directed to the TCW register 180 while the remainder are used for comparisons with the token and/or the associated test of the (e.g. XMLTM) document, preferably in special purpose logic circuits which may be free-running and which respond very quickly.
- the token and/or the associated test of the (e.g. XMLTM) document preferably in special purpose logic circuits which may be free-running and which respond very quickly.
- transition control word TCW
- register 180 The preferred logical layout of the transition control word (TCW) and the register 180 provided to buffer it is illustrated in Figure 6.
- Use of a buffer register is, in theory, not necessary but is considered desirable and thus preferable as a simple and inexpensive expedient to assure synchronism as data and control signals are redirected, as alluded to above. It is important to an understanding of the invention to observe the sources of signals applied to the TCW and the respective portions of the validating parser architecture of Figure 1 to which they are directed. Details of the source/sink for each field is shown in Figures 8 - 10.
- the preferred logical layout of the TCW as shown in Figure 6 comprises three thirty-two bit words including the state table base address received from the data dictionary and forwarded to register 194 and the next state offset received from the state table and forwarded to register 192.
- the remaining thirty-two bits include a four bit stack command field received from the data dictionary and used to control operation of the stack 190, a four bit aggregate status flag field (of which only two bits are preferably used to indicate whether the current token and/or the previous token are aggregates since aggregates can include elements at different levels of a tree structure) , a eight-bit type flag field received from the data dictionary (the derivation being shown in Figure 8 and their usage shown in Figure 10, as noted above), and two eight-bit fields for the token flags and control flags received from the state table and used to control EAB and TSDO operations as shown in Figure 10.
- the data dictionary token value field is not carried over from either the data dictionary or the state table since (assuming these values compare favorably) the token value is readily available in the token buffer.
- FIG. 3A The hash value of a token text string associated with certain tokens (with token flags set to new element names) is used to index the data dictionary.
- the token value is a special numeric value assigned to the token and is used to look up the columns of the state table. Sometimes it is a generic value indicating that the token is representing a character string literal or an integer value. Other times it is an encoded number of an element name or a tag name.
- EAB element and attribute buffer
- the EAB collects and holds structure-specifying pointers to point to attributes or data around a specific node (not yet placed in the TSDO) to develop, when complete, the doubly linked structure TSDO as alluded to above with reference to Figure 1A.
- element nesting starting and ending counts may be counted/accumulated directly from the token flags (e.g. first and eighth bits) discussed above in connection with Figure 3 while the name base addresses and lengths for elements and attributes and their values corresponding to each token are readily available from the token buffer 120.
- the parent/child and aggregate relationship and type and control information for the current node of the TSDO 300 is directly available from the TCW as illustrated in Figure 6, allowing a new node to be added to the TSDO as each element/token of the tokenized document is validated.
- the operation of the hardware accelerator is started by initialization through loading the state table 160 and data dictionary 150 with data (705, 710) corresponding to the document to be parsed and validated in a format such as the preferred format discussed above.
- the tokenized document to be validated is then loaded (715) into memory 110, the stack control registers and state table base address are initialized (720,
- Processing of the tokenized document begins with extracting (735) the first (or next) token into the token buffer 120.
- the token is hashed into a hash key which is used for data dictionary look up operations.
- a look-up operation 740 is then performed in memory 140 to update the data dictionary base address with which a look-up operation 745 can be performed in the data dictionary 150 corresponding to the data dictionary base address.
- the state table may be concurrently accessed 750 using the new token and current contents of the state table registers 192 and 194 provided through adder 130.
- the TCW register is updated 755 with the data derived from the data dictionary 150 and the state table 160 in accordance with rules which will be discussed in detail below in connection with Figures 8 and 9.
- the EAB 200 is then updated 760 according to rules based on the flag settings in the TCW.
- An interrupt can concurrently be sent to host/main processor if the interrupt flag is set in the TCW in accordance with a control flag contained the state table entry. If an interrupt is not issued, the information gathered in the EAB is added 765 to the TSDO in accordance with rules based on token and control flag settings, checked at 770 for issuing an interrupt, in the TCW which are also derived from the accessed state table entry. Concurrently, a push, pop or pass-through operation 775 is performed in stack 190 to support multi-level aggregates (e.g. XMLTM elements which are made up of other XMLTM elements) .
- multi-level aggregates e.g. XMLTM elements which are made up of other XMLTM elements
- the stack command field "pop" flag bit is set and the type flags and state table base address fields are cleared 845, as shown in Figure 9. Otherwise, the stack command field "pass-through” flag bit is set and the type flags and state table base address fields are cleared 850.
- the control flag bits derived directly from the state machine output for entry corresponding to the current token are simply followed (as indicated by dashed arrows) or tested in parallel to provide an end interrupt or special interrupt to the host/main processor as indicated at 910 or 920.
- the state table base address and the next state offset are pushed onto the stack 190 and the state table base address register 192 and next state offset register 194 are updated from the corresponding TCW fields as indicated at 930.
- the stack command flag corresponds to a"pop" command
- the state table base address and next state offset values are popped off of the stack and used to update registers 192, 194 respectively, as indicated at 940.
- registers 192, 194 are updated from the respective TCW fields with no operation being performed on the stack 190, as indicated at 950.
- the operations in the EAB and TSDO are also preferably controlled by flags in the TCW as illustrated in Figure 10. If the "end of an element" token flag is set and the "element is an aggregate” token flag is also set the TSDO operation to complete a child element, as will be discussed below, is triggered, as shown at 1010. Otherwise, the "save element of attribute name" flag is used in operations 1020 and 1030 (and will be assumed to be set in the following discussion) in combination with other flags to trigger appropriate operations based on the element type as may be reflected in those flags.
- the "new element name” flag is additionally set the element name base address and element name length fields in the EAB are updated 1022 from the token buffer 120 and the element value base address and length fields are cleared 1023 in the EAB. If (1024) the "attribute name” flag is set, the attribute base address and length fields , are reset 1025 in the EAB from the token buffer 120 and the attribute value base address and length fields are cleared 1026 in the EAB. If (1031) the "element value” flag is set, the element value base address and length fields are updated 1032, 1033 from the token buffer. If (1034) the "attribute value” flag is set, the attribute value base address and length fields are updated 1035, 1036 from the token buffer 120. In summary, chosen fields in EAB 200 are updated and cleared based on element type. If the "save element or attribute name and the
- attribute value flags trigger the TSDO to start the "add attribute” operation.
- the five TSDO operations alluded to above may be performed autonomously under control of the accelerator processing unit 400 once initiated, as described above, by the hardware validation parser accelerator in accordance with the invention. Such processing may be performed concurrently with the validation parsing operation, further supporting the acceleration of processing by the invention. All of these operations are very simple, short and straightforward and thus may be executed quickly with little, if any, host processor burden.
- the "add sibling element” operation illustrated in Figure HA comprises allocation of a new TSDO entry, setting the "next" pointer of the current node to the newly allocated entry address, setting the "previous” pointer of the new entry to the current node address, copying the EAB element name base address and length fields to the corresponding fields (see Figure 1A) in the new entry, and setting the "current element node” and the “element first attribute” of the TSDO controls ( Figure 3) to the new entry and null, respectively.
- the "complete child element” operation illustrated in Figure 11C is performed by traversing the TSDO structure formed to that point starting with the entry pointed to by "current element node” TSDO control field and using the sibling element "p" pointer to move up the tree hierarchy until a sibling element "p” pointer becomes null. Then copy the pointer in the "child entry” "p” pointer of that TSDO entry to the "current element node” TSDO control field. Then the "nesting entry start count” TSDO control field is copied to the "nesting level end count” TSDO control field and the "nesting entry start count” TSDO control field is decremented.
- the "add attribute" TSDO operation begins with allocation of a new TSDO entry and copying the attribute name base address and length and attribute value base address and length pointers to the new entry. Then, if (1120) the "element first attribute” TSDO control register is null, the "current attribute node” TSDO control is set to point to the new TSDO entry and the "n" and “p” pointers of the current node and the new entry are linked as discussed above while “element first attribute” of the current node is set to the new entry (e.g. before the "p” pointer is added to the new entry) . If the element first attribute is not null, the "n” and “p” pointers of the current node and the new entry are linked and the "current attribute node” TSDO control is set to the new TSDO entry.
- the invention provides an apparatus and method to provide extremely rapid validation parsing of an XMLTM or other language document supporting interoperability while removing such processing operations and complex supporting overhead from a host processor; resulting in substantial acceleration of the validation parsing process.
- the acceleration is particularly supported by the potentially and preferably autonomous operation of forming a TSDO in parallel with the parsing for validation and parallel retrieval of data dictionary and state table information.
- the hardware required to provide such acceleration is very simple and very limited in quantity and, hence, inexpensive and highly cost-effective. While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Document Processing Apparatus (AREA)
- Devices For Executing Special Programs (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP03809943A EP1579321A2 (en) | 2002-10-29 | 2003-10-03 | Hardware accelerated validating parser |
CA002504491A CA2504491A1 (en) | 2002-10-29 | 2003-10-03 | Hardware accelerated validating parser |
JP2004548350A JP2006505044A (en) | 2002-10-29 | 2003-10-03 | Validation parser accelerated by hardware |
AU2003277250A AU2003277250A1 (en) | 2002-10-29 | 2003-10-03 | Hardware accelerated validating parser |
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US42177402P | 2002-10-29 | 2002-10-29 | |
US42177302P | 2002-10-29 | 2002-10-29 | |
US42177502P | 2002-10-29 | 2002-10-29 | |
US60/421,775 | 2002-10-29 | ||
US60/421,774 | 2002-10-29 | ||
US60/421,773 | 2002-10-29 | ||
US10/334,086 US7080094B2 (en) | 2002-10-29 | 2002-12-31 | Hardware accelerated validating parser |
US10/334,086 | 2002-12-31 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004040447A2 true WO2004040447A2 (en) | 2004-05-13 |
WO2004040447A3 WO2004040447A3 (en) | 2004-09-30 |
Family
ID=32234360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2003/031315 WO2004040447A2 (en) | 2002-10-29 | 2003-10-03 | Hardware accelerated validating parser |
Country Status (6)
Country | Link |
---|---|
EP (1) | EP1579321A2 (en) |
JP (1) | JP2006505044A (en) |
KR (1) | KR20050072777A (en) |
AU (1) | AU2003277250A1 (en) |
CA (1) | CA2504491A1 (en) |
WO (1) | WO2004040447A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005064461A1 (en) * | 2003-12-18 | 2005-07-14 | Intel Corporation | Efficient small footprint xml parsing |
US9411853B1 (en) | 2012-08-03 | 2016-08-09 | Healthstudio, LLC | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8117347B2 (en) | 2008-02-14 | 2012-02-14 | International Business Machines Corporation | Providing indirect data addressing for a control block at a channel subsystem of an I/O processing system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5995963A (en) * | 1996-06-27 | 1999-11-30 | Fujitsu Limited | Apparatus and method of multi-string matching based on sparse state transition list |
US20020038320A1 (en) * | 2000-06-30 | 2002-03-28 | Brook John Charles | Hash compact XML parser |
US20020099734A1 (en) * | 2000-11-29 | 2002-07-25 | Philips Electronics North America Corp. | Scalable parser for extensible mark-up language |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3368883B2 (en) * | 2000-02-04 | 2003-01-20 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Data compression device, database system, data communication system, data compression method, storage medium, and program transmission device |
-
2003
- 2003-10-03 CA CA002504491A patent/CA2504491A1/en not_active Abandoned
- 2003-10-03 JP JP2004548350A patent/JP2006505044A/en active Pending
- 2003-10-03 AU AU2003277250A patent/AU2003277250A1/en not_active Abandoned
- 2003-10-03 KR KR1020057007621A patent/KR20050072777A/en not_active Application Discontinuation
- 2003-10-03 WO PCT/US2003/031315 patent/WO2004040447A2/en active Application Filing
- 2003-10-03 EP EP03809943A patent/EP1579321A2/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5995963A (en) * | 1996-06-27 | 1999-11-30 | Fujitsu Limited | Apparatus and method of multi-string matching based on sparse state transition list |
US20020038320A1 (en) * | 2000-06-30 | 2002-03-28 | Brook John Charles | Hash compact XML parser |
US20020099734A1 (en) * | 2000-11-29 | 2002-07-25 | Philips Electronics North America Corp. | Scalable parser for extensible mark-up language |
Non-Patent Citations (2)
Title |
---|
ANDRIVET ET AL: "A SIMPLE XML PARSER" July 1999 (1999-07), C/C++ USERS JOURNAL, R&D PUBLICATIONS, LAWRENCE, KS,, US, PAGE(S) 22,24,26-28,30,32 , XP008015172 ISSN: 1075-2838 the whole document * |
COOPER C: "Using Expat" 1 September 1999 (1999-09-01), , XP002177815 the whole document * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005064461A1 (en) * | 2003-12-18 | 2005-07-14 | Intel Corporation | Efficient small footprint xml parsing |
US9411853B1 (en) | 2012-08-03 | 2016-08-09 | Healthstudio, LLC | In-memory aggregation system and method of multidimensional data processing for enhancing speed and scalability |
Also Published As
Publication number | Publication date |
---|---|
EP1579321A2 (en) | 2005-09-28 |
AU2003277250A1 (en) | 2004-05-25 |
CA2504491A1 (en) | 2004-05-13 |
WO2004040447A3 (en) | 2004-09-30 |
JP2006505044A (en) | 2006-02-09 |
KR20050072777A (en) | 2005-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7080094B2 (en) | Hardware accelerated validating parser | |
US7458022B2 (en) | Hardware/software partition for high performance structured data transformation | |
US7437666B2 (en) | Expression grouping and evaluation | |
US7328403B2 (en) | Device for structured data transformation | |
Evert | The CQP query language tutorial | |
US20040172234A1 (en) | Hardware accelerator personality compiler | |
WO2006116649A2 (en) | Parser for structured document | |
KR20050083877A (en) | Intrusion detection accelerator | |
WO2004017166A2 (en) | Xml streaming transformer | |
US7752212B2 (en) | Orthogonal Integration of de-serialization into an interpretive validating XML parser | |
US8397158B1 (en) | System and method for partial parsing of XML documents and modification thereof | |
WO2005111824A2 (en) | Method and system for processing of text content | |
US7143101B2 (en) | Method and apparatus for self-describing externally defined data structures | |
WO2004040447A2 (en) | Hardware accelerated validating parser | |
Cameron | Rex: Xml shallow parsing with regular expressions | |
Møller | Document Structure Description 2.0 | |
US20080313620A1 (en) | System and method for saving and restoring a self-describing data structure in various formats | |
CN100380322C (en) | Hardware accelerated validating parser | |
JP2006505043A (en) | Hardware parser accelerator | |
Ryu | Parsing fortress syntax | |
Zhang | Efficient XML stream processing and searching | |
Bernstein et al. | CIFtbx: Fortran tools for manipulating CIFs | |
Libes | The NIST EXPRESS Toolkit | |
Team et al. | In the name of Allah, the Merciful, the Compassionate… Before indulging into the technical details of our project, we would like to start by thanking our dear supervisor, Prof. Dr. Mohammad Saeed Ghoneimy, for his continuous support, endless trust, and encouraging appreciation of our work. He indeed was a very important factor in the success of this project, as he smoothed away | |
Libes | NIST EXPRESS Toolkit: Using Applications National PDES Testbed Report Series |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003277250 Country of ref document: AU Ref document number: 2004548350 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2504491 Country of ref document: CA Ref document number: 1020057007621 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 812/KOLNP/2005 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003809943 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038A61661 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 1020057007621 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2003809943 Country of ref document: EP |