CN1726465A - Hardware accelerated validating parser - Google Patents

Hardware accelerated validating parser Download PDF

Info

Publication number
CN1726465A
CN1726465A CN 200380106166 CN200380106166A CN1726465A CN 1726465 A CN1726465 A CN 1726465A CN 200380106166 CN200380106166 CN 200380106166 CN 200380106166 A CN200380106166 A CN 200380106166A CN 1726465 A CN1726465 A CN 1726465A
Authority
CN
China
Prior art keywords
data
token
state table
tsdo
control word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200380106166
Other languages
Chinese (zh)
Other versions
CN100380322C (en
Inventor
迈克尔·C·达普
埃里克·C·莱特
吴赛伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lockheed Martin Corp
Original Assignee
Lockheed Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lockheed Corp filed Critical Lockheed Corp
Publication of CN1726465A publication Critical patent/CN1726465A/en
Application granted granted Critical
Publication of CN100380322C publication Critical patent/CN100380322C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

A hardware accelerated validation parser is provided to remove a large portion if not all of the processing and overhead burden of validation parsing from a host processor by parallel access to both a state table and a data dictionary based on a token and merging and selective redirection of the respective outputs thereof; a portion of a transition control word (TCW) formed by the merged data being used to advance through the state table and a portion of the TCW being used to control formation of a tree structured data object (TSDO) corresponding to a text document in a language such as XML<TM> which supports interoperability and platform independence. A stack is provided to accommodate nesting of elements and aggregate elements. The formation of the TSDO can be and preferably is performed asynchronously and autonomously in parallel with the validation parsing.

Description

Hardware-accelerated validating parser
Explanation
Background of invention
Invention field
The present invention relates generally to the parsing and the authenticating documents that use in the individual data processor by the network interconnection, for example the XML file carries out validating parser and handles, and relates in particular to be used for hardware verification processor that the checking of these files is quickened.
The description of prior art
In recent years, the digital communicating field that is connected at computing machine with computing machine between the link in the network had obtained development rapidly, and this all is similar with increasing rapidly of personal computer several years ago in many aspects.This effective performance and the function that has increased the single computing machine in above-mentioned network system at the interconnectivity of teleprocessing and the growth pole the earth aspect the possibility.Yet, the purposes of single computing machine and system and be placed into their user's of when service hobby and the diversity of these aspects of state of the art has caused the performance of individual machines and their operating system and configuration to produce basic degree change when computing machine, individual machines and their operating system jointly are called as " platform ", these platforms are mutually incompatible on some degree usually, especially on the aspect of operating system and program language.
Platform identity and to communication capacity and teleprocessing ability and for the compatibility of enough degree of supporting it time this incompatible development that has caused Object-oriented Programming Design of demand (it has admitted such conception of species, that is, the frame of reference by entity, attribute and relation is one group of more or less general module with application program and data acquisition) and a large amount of programming language that embodies this conception of species.Extend markup language TM(XML TM) language that comes to this, it its used widely, and can be as file and in the transmission over networks of any configuration and architecture.
In such language, some character string is corresponding with some instruction or identifier, comprise special character and other important data (they gather and are known as control word), it allows data or efficient in operation ground to carry out oneself's identification so that they can be taken as " object " subsequently, like this, relevant data and order can be translated into the appropriate format and the order of different application with different language, thereby produce the compatibility to a certain degree of processing of expectation that is enough to be supported in given machine place of each platform that links to each other.The detection of these character strings realizes by the operation that is called parsing (parsing), and this is decomposed into its ingredient and to its more conventional application class that carries out syntactic description seemingly with grammer with expression formula (for example sentence).
Work as analyzing XML TMDuring file, most of and may be used to travel through this document Search Control word, special character and other main central processing unit (CPU) execution time and be defined as just at processed concrete XML TMThe significant data of standard.This measure is typically finished by software, this software is inquired about each character and is determined whether it belongs to the set of predefined character string interested, for example, one group of character string that comprises "<command〉", "<data type=dataword〉", "</command〉" etc.If any one of target string is detected, then token is saved, and this token has hereof the pointer of the position of pointing to the starting point that is used for token and length.These tokens are accumulated up to all files resolved.
This process must have been followed processing subsequently, so that contrast is included in " file model " (for example document type definition (DTD) standard or XML TMStandard) token is assessed in rule in and definition, is favourable structure with collection and their represented hereof character strings of guaranteeing token, to form no ambiguity and inner consistent file.These processing are commonly referred to checking, and usually to carry out with the processing mode much at one that is used to search interested character string discussed above, but it is to operating except corresponding 16 (or longer) tokens of byte sequence of representing single 8 (or longer) bytes that character saves, and consistance is with the self-defined feature and the performance of the language of provide support platform-neutral and interconnectivity between the interior perhaps variable (argument) of detection token and other token, and the language of these platform-neutrals and interconnectivity is XML, SGMLT for example TM(XML TMBe its reduced form) and HTML TM(it is XML in essence TMSpecial case).
The parsing that is used to search the parsing of token and is used to verify all uses conceptual finte-state machine based on form (FSM) or state table to realize with the consistance (consistency) between the element of searching for these interested character strings or finding and represented by token usually.State table is arranged in storer, and it is designed to the concrete pattern (specificpattern) of searching character hereof or token.In order to resolve to search interested character string, current state is used as the plot of the table that gets the hang of, and the ascii table of input character or token shows it is the index that enters this table.Interested character string can have any one in the several types, for example is element, attribute/attribute list or data, and element can be simple elements or set (aggregate) and can be by nested.The parsing that is used to verify mainly is to check that the represented type of character string is nested with the oneself relevant with other special token to determine which element or token, and determines the classification relationship between them.
The target of these processing is not only and is determined that whether this document is to meet language (XML for example TM) standard and having on the whole by DTD or XML at it TMThe effective document of the correct structure of normalized definition, but also developed hierachical data structure, tree structure file object for example, wherein this structure is with the information content of complete representative data.Therefore, though resolve to search interested character string very consuming timely and when consuming processor, the parsing that is used to verify is all the more so.That is to say, because (for example) XML TMData are texts, and not only data also have data structure (it can freely be illustrated with the statement information content) to extract from this text, are consuming time especially and consume processor so can easily be interpreted as this required processing.
Simultaneously, the suitable processing set element that needs and the potential complicacy of the processing of nested (it is rank at many levels) flexibly make the use of special use or the hardware processor complexity that becomes, thereby the processing that has reduced on the CPU of local computer is loaded.That is to say, often can provide the processing speed of comparing increase with application specific processor although it has been generally acknowledged that special use or hardware processor because having reduced the expense of control application specific processor self, but when processing capacity becomes more complicated or more needs dirigibility, can not determine that application specific processor is whether practical or can not provide significant advantage on performance.Usually, the complicacy of increase and/or the demand of the dirigibility of function can only be provided by the further increase of hsrdware requirements, this may be uneconomic for many application or possible performance gain.Owing to this reason, no matter how many processing times that needs is, checking is resolved and is carried out on programmed general purpose computer.
Summary of the invention
The invention provides a kind of hardware accelerator that is used to verify processing, wherein obtain the essence performance gain with limited hardware.
In order to realize these or other target of the present invention, be provided for method and hardware accelerated verification resolver that the acceleration checking of the text of the tokenization of writing with the computerese of supporting platform independence and interconnectivity is resolved, comprise: be used for according to the device of token from data dictionary and state table retrieve data, be used for to merge from the data of state table and data dictionary to form the totalizer of conversion and control word, be used for the part of conversion and control word and other token are merged with from the more multidata totalizer of state table retrieval, and be used under the control of the part of conversion and control word, forming logical circuit (logic) corresponding to the tree structure data object (tree structured data object) of the text of tokenization.
Brief description of drawings
Aforementioned and other target, aspect and beneficial effect will be better understood after the reference accompanying drawing is described in detail preferred implementation of the present invention, wherein:
Fig. 1 is the high-level diagram of resolving accelerator according to hardware verification of the present invention;
Figure 1A is the tree structure file object figure of exemplary its formation of explanation;
Fig. 2 is the figure according to the preferred logic placement of state table of the present invention;
Fig. 3 is the figure of the preferred format of expression state table entries (entry);
Fig. 4 is the figure of expression according to the preferred logic placement of the element of Fig. 1 of the present invention and attribute buffer;
Fig. 5 is the figure of preferred exemplary data dictionary entry format;
Fig. 6 is the figure of expression according to the preferred logic layout of conversion and control word of the present invention (TCW);
Fig. 7 be in the presentation graphs 1 schematically the checking of expression resolve the process flow diagram of the operation overview of accelerator;
Fig. 8 is that expression the present invention operation is to carry out the process flow diagram of TCW update rule; And
Fig. 9,10 and 11A-11E be the process flow diagram of expression TCW operation.
The detailed description of the preferred embodiments of the invention
Referring now to accompanying drawing, more specifically with reference to Fig. 1, Fig. 1 shows the overview of resolving accelerator according to hardware verification of the present invention with high-level schematic form.Be appreciated that the present invention operates the resolved file that is used to provide or token is set, wherein, the source document of token and common text formatting (typically is XML TMFile) interested character string is corresponding in.Although (the present invention will be in conjunction with analyzing XML TMThe advantageous applications of file is described, but be appreciated that, tangible more from the following description principle of the present invention also be can be applicable to resolve the file of representing with any programming language, specifically, can be applicable to the object-oriented programming language, according to fixing or definable rule relative object of identification of user and their structure) file storage of this tokenization is in storer/impact damper 110, and storer/impact damper 110 is according to include file plot respectively, the register 112 of file limit address (limit address) and file next address, 114 and 116 conduct interviews.
Token is by fetched in sequence and be stored in the token buffer 120.The part token be used to then be provided to totalizer 130 address (it provides the address of the table 160 that gets the hang of), enter data dictionary 150 and (avoid using different file the XML of the problem of same masurium by different developers to NameSpace (namespace) mapping memory TMIncident (incident), this those skilled in the art should understand that, does not need to be further discussed) part of 140 address.For the NameSpace mapping memory 140 that the plot that enters data dictionary 150 is provided by register 152, default addresses for use provides from register 142.
Consider the remainder of very high-level abstract Fig. 1, totalizer 170, conversion and control word (TCW) register 180, storehouse 190, totalizer 130 and state table 160 form circulation, usually by the arrow A indication, its enable state table is advanced to another state from a state.Storehouse 190 mainly is to be considered so that the assessment to nested token to be provided at token, and the contact between definite parent element (parent element) and the child element (child element) and whether relation correct and during good the structure, for the purpose of the order that exchanges token.The control of storehouse is by the combination of the information of the data dictionary 150 that obtains concurrently from state table 160 with based on current token and obtain.Therefore, storehouse 190 can be counted as providing XML TMThe specific characteristic of grammer (for example set) will further be discussed when it will be deferred to following going through.
Therefore, total function of circulation A is to obtain information, further carry out data, stack command, interruption and the control signal that obtains to be used to create final tree structure data object (TSDO) in additive operation, the composite signal from TCW register 180 in the 170 pairs of data of totalizer from the state table that comprises NextState, and plot in register 192 and 194 and/or NextState are added to next token respectively.Be used to provide that node is increased to the extraction of control signal of TSDO is corresponding with the checking of current token, detected mistake will interrupt representing by sending.TCW has the state exchange control of some data dictionary 150 that upgrades, interrupts and used the storehouse rule and state table 160 and the combination of control mark.
Therefore the TCW register moves to merge and changed course (redirect) control signal and the data (it represents the grammer of the permission of token) from retrieving between the content of state table 160 and data dictionary 150, so that be controlled to detect mistake in some sense by the propelling effect of state table, if and do not have mistake to be detected, then, utilize data suitably to make up TSDO300 in element and attribute buffer 200 bufferings respectively according to the plot and the limit address of attribute buffer 302 and 304 supplies.This architecture relies on the adjustment of being undertaken by state table contents (accommodation) to support to be used as the standard and the set type of the document definition of data dictionary, and wherein, state table contents can at random be provided with to realize above-mentioned support.Accelerator processing unit 100 is responsible for cooperating the operation of the present invention that will be described below and is made up TSDO.
Primary processor 400 with accelerator processing unit interface also is provided simultaneously, and it is used to control the initialization according to the operation of hardware accelerator of the present invention.It will be appreciated that from below the task of the primary processor 400 in the checking dissection process has been reduced owing to utilized hardware accelerator of the present invention, required operation is just carried out simple memory access and break in service is provided.In essence, the checking processing expenditure of nearly all and set element nested with element corresponding to the following files grammer is moved to checking and resolves accelerator, and processor operations is restricted to does simple response (when needed) to calling from the processing of hardware accelerator 100.The processor that also can provide support calls to handle some or all of processing from hardware accelerator 100.
Have above-mentioned overview as a setting, help to understand hardware accelerated verification resolver of the present invention, with the Figure 1A with reference to the formation of representing example T SDO this exemplary TSDO 300 is discussed, TSDO 300 is the targets by the checking dissection process of the present invention's realization.TSDO preferably is configured to the doubly linked list data structure in storer, each unit (member)/node has 7 elements (for example: fraternal element (sibling element), child element, attribute list, title length, title pointer, value length and value pointer), as shown in TSDO cell layout.Like this, 8 row elements that are described (each meets TSDO cell layout) meet 8 individual elements of TSDO, and horizontal-shift is to indicate brother and parent/child relation.
In order to form doubly linked list, each of several elements all comprises two pointer p, n, points to respectively to comprise the last element with specified type or same type and the last unit and the next unit of next element.Therefore, " next brother element pointer " points to the unit that has identical indentation (indent) with active cell.Similarly, provide and point to or from the attribute list element of unit and the pointer of attribute." last child element " points to parent element.Shown in the the 3rd to five and second, six and seven row, preceding and backpointer forms the chain between the fraternal unit.Remaining element is the length and the position of real data.According to these information, five controls (for example: parent node, currentElement node, current attribute node, parent element and first attribute) preferably are provided to meet with current unit/node, as shown in the figure.Whether these controls are main follows the tracks of whether active cell is element or attribute, and be node, first attribute of attribute of an element table, direct parents unit and the root node that they are positioned.When in the structure of character (nature) at TSDO of every information by in the definition comprehensively and clearly, follow these controls and allow tree traveled through with the information to expectation and position.
Similarly, looking back may be helpful to understanding the present invention according to the characteristic of its data file of operating.At XML TMExemplary cases in, element and data are specialized in document, the document be in essence text but follow and can be the rule that dissimilar files freely makes up.In fact be summarized in the document that is called " file model " by the rule of specializing in text, the document of this document model is used to text is verified that it can be considered with text document and separate, though this is not need.These rules can freely be defined by the developer, but have some standardized rule sets, and it often is used for simplicity.
At present, definition is by XML TMFile or the document with rule that the file of other interoperable arbitrarily (interoperable) linguistic form follows are known as file model, and it follows a kind of in several forms usually, although other form is not also developed fully.File model has defined the element that can be in file occurs together with can be with given element relevant attribute, it has also defined the structural information about this document, the order that for example children of element-parents concern, child element may occur, the quantity of child element, and whether element is sky or the default value that can not comprise text and attribute.Document type definition (DTD) is XML TMThe known example that file model is described.
The DTD language is developed especially is used to define SGML TMThe proof rule of file.As above hint XML TMBe SGML TMThe subclass of simplification, DTD also can be used to define XML TMProof rule.But should be realized that, because concrete file or files classes verify that no matter needed information must be identical and its form that is transmitted or uses, so at the expression type of authorization information (for example DTD and XML TMStandard) conversion between should be very insignificant (trivial) in theory, and the discussion aspect DTD should be applied to any other form of identical information with being equal to.By identical token, the very complicated details that also depends on the DTD grammer of typographic(al) mark (typographicalsymbol) strongly is not an importance to principle of the present invention, does not therefore need to go through.
It is further to be understood that XML TMThe file file of the language compilation of other supporting platform independence and interconnectivity (and with) mainly provides data structure, the use of these data structures to need ground able to programme (programmatically) ergodic data structure with the ability of visit data selectively.Can read XML TMFile also provides the software module to the visit of its content and structure to be known as XMLTM processor or XMLTM API, though it is common and the convention (practice) of recommending availablely can also can freely be realized by the developer in any adaptation implementation tool (compliant implementation) operation down with identical API with the API of industrial standard acceptable, commercial.
Current have two kinds can be counted as the main API standard of industrial standard at present: document dbject model (DOM) and be used for XML TMSimple API (SAX).Because DOM is vague generalization more, thus hereinafter the present invention is described with reference to DOM, so those skilled in the art also can use SAX to realize the present invention.DOM is based on XML TMThe tree representation of the stored of file.Work as XML TMWhen file was loaded in the processor, this processor must be constructed the tree structure of the stored of suitable expression this document.(on the contrary, checking is the structure of following the tree structure of the file of appropriate structuring in essence.) DOM also defined and be used to travel through XML able to programmely TMSet and handle the programmable interface (title that comprises method and attribute) of its element, value and attribute.In other words, the TSDO data structure of setting up in proof procedure supports DOM API or other similar permission to use the API and the implementation tool of this document content.
With aforementioned content as a setting, can easily understand, on multi-purpose computer, use software XML TMOr support the file of the linguistic form of interoperability to verify that the processing of parsing may consume processor in the extreme with other, and slower, this is to be used for carrying out multiple ratio about content, structure and grammer (each element of the designated DTD of being used for etc.) because need lots of memory to visit.The number of the element among the DTD etc. is unrestricted (unlimited) in theory, and can reach thousands of with simple relatively data structure, child element and fraternal attribute of an element number can reach the size that needs simultaneously, and data file can easily comprise millions of example (instance) of any given element or attribute.In other words, must on general processor, handle (generality) and cause main processing complexity and load with the generality that provides in the software.By comparison, can from following discussion, find out, hardware accelerated verification resolver according to the present invention is with the mode deal with data of simple with consistent relatively streamline, can carry out and need the hardware of relatively small amount in this way with very high speed, this is because of comparing and carrying out rule relatively and specialize in DTD (being contained in data dictionary), and signal appears in the state table entries, and state table entries can be merged apace and alter course with the parallel work-flow of the concurrent processor of the quick order assessment of the token in the circulation A of control and Fig. 1.
Referring now to Fig. 2, the logic placement of state table 160 is schematically shown.State table is fabricated and has the element of all permissions according to concrete DTD etc.Should be appreciated that equally, because NextState is designated in the state table entries separately (as shown in Figure 3, will discuss below) with 16 token value index partly, so in fact token comprises NextState.As shown in Figure 1, at totalizer 130 places, the value in the token buffer 120 (the maybe part that should be worth) makes up with the clauses and subclauses in the Access status table with state table base address and NextState offset address.
Particularly, preferably provide state table by the state table base address index to reduce the high speed storing demand with the form of fragment, use token (it can be optimized by using palette (pallette) mechanism) that the row of state table are carried out index then, and be offset the line index of advancing by NextState.In this case, this three parts address can be by simple connect (concatenate) to form full address, for example forms by loading from the different piece of the register in location part source respectively simply.
The preferred form of state table entries is represented in Fig. 3.The length of each state table entries is preferably 64, and is divided into two 32 word.Certainly, as the skilled person will be apparent, also can use other form.
Conversely, the low order address word is divided into 16 token value and two 8 tag fields that are respectively applied for token flag and control mark.(token by preferably with the U.S. Provisional Patent Application 60/_ that submits at the same time, the mode that the hardware of describing in _ (procuratorial file number is FS-00766/0289005lPR) is resolved the accelerator unanimity defines, it is incorporated into this paper as a reference fully, but according to its ultimate principle, their form does not have special importance concerning enforcement of the present invention.But should be noted that in this: some is unnecessary for the present invention for 16 token value field, and this is because this token value has been used to the row of state table are carried out index, thereby can be designated as reserved field.) whether provide token flag to be mainly used in to follow the tracks of nested, given element be set and by the type of the represented element of token.Each eight represented separate marking by field preferably are respectively: increment nested (increment nesting), element are set, new element name, element value, attribute-name, property value, element end nested with decrement (decrement nesting).Similarly, the single marking represented by the single byte of control mark field preferably is respectively: set and finish to interrupt to primary processor, be set to the special interruption of primary processor, (reservation), halted state table engine is handled (these control marks can be copied among as shown in Figures 9 and 10 the TCW), preserve element or attribute-name, preserve element or property value, the character palette jumps and enables (skipenable) and finish current token, wherein, the character palette jumps and to enable is unnecessary but involved with corresponding with the hardware processor accelerator described in the application that merges above to a great extent, can be reserved field here.
From these data of state table with from data dictionary (with DTD, XML TMCorrespondences such as standard) data merge in totalizer 170, are preferably their cascade, and the result that part merges is provided to element and attribute buffer; Its preferred logic placement is schematically shown in Fig. 4.The preferred form of data dictionary entry is shown in Figure 5.
Data dictionary preferably is constructed to hash table; Hash key obtains from token at 125 places.The preferred format of data dictionary entry be 128 long.These clauses and subclauses are from describing with XML TMOr other support the language compilation of interconnectivity data file (general) DTD, the XML of the rule followed TMStandards etc. obtain.(outside composition is with DTD, XML TMThe text message of inside such as standard is converted to the data dictionary form, and is such as known to those skilled in the art.Therefore have no need for further discussion the mechanism of actual this conversion of execution.) 16 positions hold and current token corresponding token value from register 120, and be used to and from the token value information of state table relatively with built-in check as the validating parser proper operation, specifically be used for the synchronization check between state table and the data dictionary.(that is to say that this is relatively more optional in common operation, but be valuable for test and debugging.) for the stack command mark provides 4, wherein 3 are used (for example, be respectively applied for push on, pop and transmit), 1 reservation.The stack command mark is represented the nested of the needed element of (for example) given element and set.Other 4 that are used to finish a byte also are retained.8 are provided for type flags.These are used to discern the data type relevant with element (for example Boolean type, scale-of-two, the decimal system etc.).With regard to 8, can use 256 kinds of different data types of the type tag field identification.Should be noted that according to its ultimate principle, the type mark field is optional for operation of the present invention, but it consider upgrading with the checking of permissible value to mate with the element data type.The back of these fields is three 32 words, provide respectively state table base address, two point to data pattern and scope with reference to and the attribution rule reference.In these top fields, stack command mark, type mark and the state table base address relevant with current token are directed to TCW register 180, and remaining is used to the token comparison and/or to file (for example, XML TM) carry out dependence test, but preferably with free-running operation, the response very fast dedicated logic circuit carry out.
Conversion and control word (TCW) and being provided to cushions the preferred logic placement of its register 180 and represents in Fig. 6.In theory, the use of buffer register is optional but think desirablely, and the synchronism when therefore guaranteeing that preferably as simple and cheap makeshift data and control signal are altered course as above hints.For understanding the present invention, attention is applied to the signal source of TCW and the various piece of the validating parser architecture that they are guided Fig. 1 is important.The details of the source/receiver of each field (source/sink) is shown in Fig. 8-10.
The preferred logic placement of TCW as shown in Figure 6 comprises three 32 words, comprises the state table base address that receives and send to register 194 from data dictionary, and the NextState skew that receives and be sent to register 192 from state table.Remaining 32 comprise: 4 stack command fields that receive and be used to control storehouse 190 operations from data dictionary, 4 aggregate status flags field (wherein have only 2 to be preferably used for representing whether current token and/or last token are set, this is because set can comprise element on the different levels of tree structure), (its source is shown in Figure 8 from 8 type mark fields that data dictionary receives, their purposes is shown in Figure 10, as mentioned above) and be used for from state table receive and be used to control as shown in figure 10 EAB and token flag and two 8 bit fields of control mark of TSDO operation.The data dictionary token value field is not written into (carry over) from data dictionary or state table, and this is because (supposing that these values successfully compare) this token value is easy to obtain in token buffer.
The complete definition that should be noted that token is shown in Fig. 3 A.The hashed value of the token text string relevant with particular token (having the token flag of setting for new masurium) is used to the index data dictionary.Token value is the particular values of distributing for token, and it is used to search the row of state table.Sometimes it is that the expression token is expression character string literal or integer-valued general value.It is the coded number of masurium or tag name At All Other Times.
Refer again to Fig. 4, now element and attribute buffer (EAB) are discussed in more detail.Those skilled in the art should understand that, the above-mentioned part of architecture has made data to use, all fields of EAB can be come from its filling by mainly comprising memory access and operation relatively, wherein, this is simple to operate, directly and apace carried out by the accelerator processing unit, and grammer, is nestedly handled with relevant accelerator register according to the present invention by finite state machine fully with operation with the tracking of set.
Particularly, given resolved element can be any element type, attribute type or value type; Wherein each must differently be handled in TSDO 300.EAB collects and keeps sensing concrete node (also not being placed in TSDO) attribute on every side or the specified structure of data (structure-specifying) pointer to create doubly linked list structure TSDO when finishing, and as above hints with reference to Figure 1A.Concrete field about EAB, the nested beginning of element and finish counting can be from the direct calculating/accumulative total of the top token flag of describing in conjunction with Fig. 3 (for example the 1st and the 8th), and be used for the title plot and the length of element and attribute, and their values corresponding with each token can easily obtain from token buffer 120.The relation of parent/child and set and type, and the control information that is used for the present node of TSDO 300 directly obtains from as shown in Figure 6 TCW allow new node to be added to TSDO when each element/token of the file of tokenization is verified.
Referring now to Fig. 7, sum up the integrated operation and the function of resolving accelerator 100 according to hardware verification of the present invention.The operation of hardware accelerator is from by being the initialization (705,710) of state table 160 and data dictionary 150 loading datas, and the data that are loaded are corresponding with the file with preferred format discussed above for example for the treatment of resolved and checking.Then, the file of the tokenization that this is to be verified is loaded into (715) in the storer 110, and storehouse control register and state table base address are initialised (720,725), and the NextState skew is set to original state (730).
To the processing of the file of tokenization from (or the next) token that extracts (735) first to token buffer 120 beginning.Token is hashed into the hash key that is used for the data dictionary search operation.Search operation 740 is carried out in storer 140 then to upgrade the data dictionary plot, and search operation 745 can use this address to carry out in the data dictionary corresponding with this data dictionary plot 150.New token that state table serviceable condition table register 192 and 194 provides by totalizer 130 and current content and by concurrent visit 750.Next, by totalizer 170, the TCW register root uses the data that obtain from data dictionary 150 and state table 160 to upgrade 755 according to the rule that goes through below in conjunction with Fig. 8 and 9.EAB 200 is then according to upgrading 760 based on the rule of the mark setting among the TCW.If in TCW, set interrupt flag, then can primary processor will be interrupted sending to simultaneously according to the control mark that is included in the state table entries.If do not interrupt, then the information of collecting among the EAB is added 765 to TSDO according to the rule based on token and controlled flag setting among the TCW, and is checking whether interruption 770 is set, above-mentioned TCW also obtains from accessed state table entries.Simultaneously, push on, pop or transmit operation (pass through) 775 and in storehouse 190, carry out to support multistage set (for example by other XML TMThe XML that element is formed TMElement).Unless data are pulled to the storehouse top, new plot and NextState offset data will be from this storehouse outputs, and register 192 and 194 is updated 780.By top process, hardware accelerator now empirical tests token and node is added to TSDO, and by repeating top process to next token reset (for the grammer of register 192,194, nested and set element are suitably assessed).
The TCW update rule is very simple, represents in Fig. 8.At first, token flag, control mark and NextState offset field are replicated 805 to TCW register/buffer 180 from state table entries.Then, in aggregate status flags field, " currentElement is set " mark is replicated 810 in " last element is set " mark.If (815) " new masurium " is marked in the token flags field and is set up, " stack command ", " type mark " and " state table base address " field are duplicated 820 to TCW from data dictionary output so, then, if " element is set " (825) in the token flags field is set up, " currentElement the is set " marker bit in " Set Status mark " field is set up 830 so.Otherwise " currentElement is set " mark is reset 835.If " new masurium " mark in the token flags field is not set up, " end of element " marker bit in the token field and " element is set " marker bit are set up, as 840 determined, the stack command field marker bit of " popping " is set up so, type mark and state table base address field are eliminated 845, as shown in Figure 9.Otherwise stack command field " transmission " marker bit is set up, and type mark and state table base address field are eliminated 850.
As shown in Figure 9, in order to send interruption, follow (represented) or tested concurrently from control mark position that the output of the state machine of the clauses and subclauses corresponding with current token directly obtains by parallel simply as dotted arrow, so that end interrupt or the special primary processor that interrupts to be provided, as shown in 910 or 920.Similarly, if stack command (directly obtaining from data dictionary 150 outputs) is " pushing on ", the skew of state table base address and NextState is pulled to storehouse 190 tops so, state table base address register 192 and NextState offset register 194 are updated by the TCW field of correspondence, as shown in 930.If the stack command mark meets " popping " order, state table base address and NextState off-set value are popped from this storehouse so, are used to upgrade respectively register 192,194 then, as shown in 940.If " transmission " stack command field mark is set, there is not the TCW field by separately of register 192,194 under the situation of executable operations to be updated at storehouse 190 so, as shown in 950.
Similarly, the operation among EAB and the TSDO is also preferably by the marking of control among the TCW, as shown in figure 10.If " end of element " token flag is set up and " element be set " token flag also is set up, then for the TSDO operation (following will the description) of finishing child element is triggered, as shown in 1010.Otherwise the element of attribute-name " preserve " mark is used in the operation 1020 and 1030 (hypothesis is set up in the following discussion) with based on triggering suitable operation at the element type of above-mentioned mark reflection in conjunction with other mark.If (1021) " new element name " mark additionally is provided with, element name base address among the EAB and element name length fields are updated by token buffer 120 so, and element value plot and length field are eliminated 1023 in EAB.If (1024) " attribute-name " mark is set up, attribute base address and length field are reset 1025 according to token buffer 120 in EAB so, and attribute value base address and length field are eliminated 1026 in EAB.If (1031) " element value " mark is set up, then element value plot and length field are updated 1032,1033 by token buffer.If (1034) " property value " mark is set up, attribute value base address and length field are updated 1035,1036 by token buffer 120 so.In a word, the field of the selection among the EAB 200 is updated and removes according to element type.
If " preserve element or attribute-name and " new masurium " and all be set up (as mentioned above); and " last element the is set " mark in the aggregate status flags field of TCW also is set up; the add operation (following will the description) that begins " increase child element " operation so is triggered, and EAB " the nested counting that begins of element " field value is increased.Otherwise if " last element is set " mark is not set up, operation is triggered " to increase fraternal element " so in TSDO.
On the other hand, if " preserving element or attribute " and the combination of " element value " mark are set, (describing below) is triggered in TSDO " to upgrade the element value operation " so.Similarly, the combination triggering TSDO of " preservation element or attribute " and " property value " mark begins " increase attribute " operation.
In case be initialised, five TSDO operation above-mentioned just can be resolved accelerator by hardware verification according to the present invention and automatically be carried out under the control of the accelerator processing unit 400 that is activated, as mentioned above.This processing can be carried out simultaneously with the checking parse operation, and further supports the acceleration handled by the present invention.All these operations are all very simple, brief and direct, therefore can be carried out and take few primary processor burden fast, if any.
" increasing fraternal element " shown in Figure 11 A operation comprises that " next " pointer that distributes new TSDO clauses and subclauses, present node is set is newly assigned clauses and subclauses address, new clauses and subclauses " last " pointer is set is the present node address, duplicate EAB element name base address and length field corresponding field (referring to Figure 1A) and " currentElement node " and " element first attribute " that TSDO control is set respectively in the new clauses and subclauses is new clauses and subclauses and is set to sky.The operation of " the increase child element " shown in Figure 11 B is identical with the operation of " increasing fraternal element " except the addition step 1110 of carrying out the increase of EAB " nesting level begins counting " field.
" finishing child element " operation shown in Figure 11 C is by traversing the point of the beginning of the clauses and subclauses of being pointed to by " currentElement node " TSDO control field to the TSDO structure that forms, and use fraternal element " p " pointer in tree-like hierarchical structure, to move, become sky up to fraternal element " p " pointer and carry out.Then the pointer in " children's clauses and subclauses " " p " pointer of these TSDO clauses and subclauses is copied to " currentElement node " TSDO control field." nested clauses and subclauses begin counting " TSDO control field is copied to " nesting level finishes counting " TSDO control field then, and the value of " nested clauses and subclauses begin counting " TSDO control field reduces." renewal element value " TSDO operation shown in Figure 11 D includes only element value plot and length EAB field is copied to by " value pointer " and " value length " field of the TSDO clauses and subclauses of " currentElement node " TSDO control field sensing.
" increase attribute " TSDO operation at first distributes new TSDO clauses and subclauses, and attribute-name plot and length and attribute value base address and length pointer are copied to new clauses and subclauses.Then, if (1120) " element first attribute " TSDO control register is empty, " current attribute node " TSDO control is configured to point to new TSDO clauses and subclauses, " n " of present node and " p " pointer and new clauses and subclauses links (as mentioned above), and " element first attribute " of present node is set to new clauses and subclauses (for example, before " p " pointer is added to new clauses and subclauses).If element first attribute is not empty, " n " of present node and " p " pointer and new clauses and subclauses link so, " current attribute node " TSDO control is set to new TSDO clauses and subclauses.
Because it is aforementioned, can see that the invention provides a kind of apparatus and method to support the language file of interoperability to verify parsing very fast to XML or other, eliminate this processing operation and complicated support expense from primary processor simultaneously, thereby improved the speed of checking dissection process in a large number.This acceleration is specifically supported by the potential and preferred operation automatically that forms TSDO concurrently with the parsing that is used to verify and parallel search data dictionary and state table information.Being needed to the hardware of above-mentioned acceleration is provided is to be limited very simply and quantitatively, therefore is cheap and high cost benefit.
Though the present invention is described with single preferred embodiment, one skilled in the art would recognize that enforcement of the present invention can make amendment in the spirit and scope of additional claim.

Claims (11)

  1. After the description of process to my invention, the new right with expectation that my requirement is protected by patent certificate is as follows:
    1. hardware accelerated verification resolver that is used for the text of tokenization, the text of described tokenization is write by the computerese of supporting platform independence and interoperability, and described validating parser comprises:
    Be used for according to the device of token from data dictionary and state table retrieve data;
    Be used for to merge from the described data of described state table and described data dictionary to form the device of conversion and control word;
    Be used for the part of described conversion and control word is merged further to fetch the device of data from described state table with another token; And
    Be used under the control of the part of described conversion and control word, forming the device of the tree structure data object corresponding with the text of described tokenization.
  2. 2. hardware accelerated verification resolver as claimed in claim 1 also comprises:
    Be used to use the device of the operation of described conversion and control word control storehouse, the described nested data structure language definition of supporting with next transition status of supported nested data structure.
  3. 3. hardware accelerated verification resolver as claimed in claim 2 also comprises:
    Be used for forming the device of data structure from the text of described tokenization.
  4. 4. hardware accelerated verification resolver as claimed in claim 3, wherein, the described device that is used to form data structure comprises element and attribute buffer.
  5. 5. hardware accelerated verification resolver as claimed in claim 1 also comprises:
    Be used for forming the device of data structure from the text of described tokenization.
  6. 6. hardware accelerated verification resolver as claimed in claim 3, wherein, the described device that is used to form data structure comprises element and attribute buffer.
  7. 7. the document to tokenization quickens to verify the method for parsing, said method comprising the steps of:
    According to token from data dictionary and state table retrieve data;
    To merge from the described data of described state table and described data dictionary to form the conversion and control word; And
    The part of described conversion and control word is merged to retrieve more multidata from described state table with another token.
  8. 8. method as claimed in claim 7 further comprises:
    Use described conversion and control word that storehouse is operated to obtain the step of next transition status.
  9. 9. method as claimed in claim 7 further comprises the steps:
    In checking inlet flow and the one group of effective list entries one is consistent.
  10. 10. method as claimed in claim 9 further comprises the steps:
    When departing from mutually with one group of effective, admissible list entries, inlet flow produces notice.
  11. 11. method as claimed in claim 7 further comprises:
    When departing from mutually with one group of effective, admissible list entries, inlet flow produces the step of notice.
CNB2003801061661A 2002-10-29 2003-10-03 Hardware accelerated validating parser Expired - Fee Related CN100380322C (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US42177302P 2002-10-29 2002-10-29
US60/421,775 2002-10-29
US60/421,774 2002-10-29
US60/421,773 2002-10-29
US10/334,086 2002-12-31

Publications (2)

Publication Number Publication Date
CN1726465A true CN1726465A (en) 2006-01-25
CN100380322C CN100380322C (en) 2008-04-09

Family

ID=35925173

Family Applications (3)

Application Number Title Priority Date Filing Date
CNB2003801061642A Expired - Fee Related CN100357846C (en) 2002-10-29 2003-10-03 Intrusion detection accelerator
CNB2003801061657A Expired - Fee Related CN100430896C (en) 2002-10-29 2003-10-03 Hardware parser accelerator
CNB2003801061661A Expired - Fee Related CN100380322C (en) 2002-10-29 2003-10-03 Hardware accelerated validating parser

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CNB2003801061642A Expired - Fee Related CN100357846C (en) 2002-10-29 2003-10-03 Intrusion detection accelerator
CNB2003801061657A Expired - Fee Related CN100430896C (en) 2002-10-29 2003-10-03 Hardware parser accelerator

Country Status (1)

Country Link
CN (3) CN100357846C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101946246A (en) * 2008-02-14 2011-01-12 国际商业机器公司 Providing indirect data addressing for a control block at a channel subsystem of an i/o processing system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4945410B2 (en) * 2006-12-06 2012-06-06 株式会社東芝 Information processing apparatus and information processing method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5319776A (en) * 1990-04-19 1994-06-07 Hilgraeve Corporation In transit detection of computer virus with safeguard
US5414833A (en) * 1993-10-27 1995-05-09 International Business Machines Corporation Network security system and method using a parallel finite state machine adaptive active monitor and responder
JP3606387B2 (en) * 1994-09-13 2005-01-05 松下電器産業株式会社 Compilation device
US5799307A (en) * 1995-10-06 1998-08-25 Callware Technologies, Inc. Rapid storage and recall of computer storable messages by utilizing the file structure of a computer's native operating system for message database organization
US5995963A (en) * 1996-06-27 1999-11-30 Fujitsu Limited Apparatus and method of multi-string matching based on sparse state transition list
JP4153989B2 (en) * 1996-07-11 2008-09-24 株式会社日立製作所 Document retrieval and delivery method and apparatus
JP3958902B2 (en) * 1999-03-03 2007-08-15 富士通株式会社 Character string input device and method
US6427202B1 (en) * 1999-05-04 2002-07-30 Microchip Technology Incorporated Microcontroller with configurable instruction set
CA2307529A1 (en) * 2000-03-29 2001-09-29 Pmc-Sierra, Inc. Method and apparatus for grammatical packet classifier
AUPQ849500A0 (en) * 2000-06-30 2000-07-27 Canon Kabushiki Kaisha Hash compact xml parser
CN1132390C (en) * 2001-03-16 2003-12-24 北京亿阳巨龙智能网技术有限公司 Telecom service developing method based on independent service module

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101946246A (en) * 2008-02-14 2011-01-12 国际商业机器公司 Providing indirect data addressing for a control block at a channel subsystem of an i/o processing system
CN101946246B (en) * 2008-02-14 2013-04-10 国际商业机器公司 I/O processing method and system for prompting I/O operation at host computer system

Also Published As

Publication number Publication date
CN100380322C (en) 2008-04-09
CN1735850A (en) 2006-02-15
CN100357846C (en) 2007-12-26
CN100430896C (en) 2008-11-05
CN1726464A (en) 2006-01-25

Similar Documents

Publication Publication Date Title
US7080094B2 (en) Hardware accelerated validating parser
US7458022B2 (en) Hardware/software partition for high performance structured data transformation
US7437666B2 (en) Expression grouping and evaluation
US7590644B2 (en) Method and apparatus of streaming data transformation using code generator and translator
CN100470480C (en) Hardware accelerator personality compiler
US8739022B2 (en) Parallel approach to XML parsing
US7519577B2 (en) Query intermediate language method and system
US7328403B2 (en) Device for structured data transformation
US7627589B2 (en) High performance XML storage retrieval system and method
US20060167869A1 (en) Multi-path simultaneous Xpath evaluation over data streams
Arroyuelo et al. Fast in‐memory XPath search using compressed indexes
US20070028163A1 (en) Lightweight application program interface (API) for extensible markup language (XML)
Lee et al. Reasoning about XML schema languages using formal language theory
US6981006B2 (en) Schema-based file conversion
US7143101B2 (en) Method and apparatus for self-describing externally defined data structures
CN100380322C (en) Hardware accelerated validating parser
US20080313620A1 (en) System and method for saving and restoring a self-describing data structure in various formats
Grün Pushing XML Main Memory Databases to their Limits.
Moro Storage Format for Almost-Homogeneous Data Sets
AU2003277250A1 (en) Hardware accelerated validating parser
Zhang Efficient XML stream processing and searching
Tollefson Importing and Creating Data
Zhang et al. IBP: An index-based XML parser model
Neiman A proposed change to the SOAP-based XML standard for the improved delivery of web services
Alvestad et al. Development of a Demand Driven Dom Parser

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080409

Termination date: 20101003