XML (extensible Markup Language) analytic method in the XML (extensible Markup Language) Secure Application
Technical field
The invention belongs to the microcomputer data processing field, particularly a kind of quick XML (extensible Markup Language) analytic method of the efficient storage based on finite-state automata and side-play amount technology.
Background technology
XML (extensible Markup Language) is resolved (eXtensible Markup Language Parsing) and is related to the basic problem in the computer science, be with XML (extensible Markup Language) (eXtensible Markup Language, abbreviation XML) data in, be converted to the process of other form or increase auxiliary expression information from its serializing string format, thereby other module in the system can be handled XML message.
XML is the rule of a cover definition semantic marker, and these marks are divided into many parts with document and these parts are labelled.With Hypertext Markup Language (Hypertext Markup Language is called for short HTML) or other formatted program difference, XML is a kind of meta-tag language, and the user can define the mark that oneself needs.
In XML Secure Application system, comprise that XML resolves and two modules of safe handling, latter's requirement to the part given content of XML message sign, cryptographic operation, thereby the XML analytic technique is basic technology and key link under this kind application.The XML analytic technique can promote the performance of XML parsing module on the one hand efficiently, can promote in the safe handling module extraction efficiency on the other hand to the part given content of XML message, (add and encrypt and signature contents) shared time and space when XML message being operated, thereby make the XML Secure Application have higher overall performance with reduction.
Solving the classical way that XML resolves problem is a kind of method based on tree structure, and the tree structure correspondence the hierarchical structure in the XML message, and each node wherein is used for representing element and the data content among the XML.People such as Vidur Apparao proposed DOM Document Object Model (Document Object Model is called for short DOM) in 1998, and became global World Wide Web Consortium recommendation (World Wide Web Consortium Recommendation).The outstanding advantage of DOM method is its ease for operation based on tree structure, can be used for the various application scenarios (for example Web service, next generation internet pattern Web 2.0 etc.) based on XML, is a kind of analytic method of XML easily.
In the XML Secure Application, can utilize DOM method analyzing XML message, then according to the security strategy of appointment, find corresponding data or daughter element, it is extracted, sign, cryptographic operation, again the result is added, is substituted in the former XML message according to the form of appointment, the new XML message through safe handling will be transferred on the node or inner service routine of far-end.Illustrate as follows:
If the XML message of serializing string format is as follows:
<?xml?version="1.0"?>
<class>
<number>1</number>
<roster>
<student>
<name>wangwei</name>
<sid>1</sid>
</student>
</roster>
</class>
Wherein, first row "<? xml version=" 1.0 "?〉" be XML statement, that next uses " ◇ " expression be the beginning label of element, with "</" what represent is the end mark of element; Content in mark, " class ", " number ", " roster ", " student ", " name ", " sid " are element term, element " class " is a root element; " wangwei ", " 1 " are data.
In this XML Secure Application, establish security strategy and be: element and all the elements thereof represented to path expression "/class/roster ", to sign and cryptographic operation, this path expression is the two-stage path expression; If the signature mode is signature (enveloped signature) in the envelope, promptly represent signature contents is inserted into before root element finishes, in the corresponding example "</class〉" before.In this XML Secure Application, utilize DOM method analyzing XML message and carry out the step of safe handling as follows:
1) resolution phase
Adopt the DOM method to resolve corresponding dom tree shape structure as shown in Figure 1, each node of this tree structure is used for representing element and the data content among the XML, and this tree structure correspondence the hierarchical structure of element and data in the XML message;
2) the extraction stage
According to the path expression "/class/roster " of appointment in the security strategy, top-down searching in dom tree shape structure finally obtains one to subtree that should the path, as shown in Figure 2;
3) serializing
With step 2) in subtree carry out serializing, obtain the serializing character data of the XML form that will sign and encrypt;
4) safe handling
According to the security strategy that sets, obtain corresponding algorithm information, key, initialization vector, character data in the step 3) is signed earlier, encrypt then, and the signature value and the encrypt data that will obtain, add in the XML signature and encryption standard template that meets global World Wide Web Consortium recommendation, and be converted into dom tree shape structure;
5) XML message operation
With the signature that generates in the step 4) and the tree structure of encryption section, add in the dom tree shape structure of former XML message, the tree structure of encryption section is replaced the represented subtree of path expression "/class/roster ", the tree structure of signature section is according to the requirement of signature in the envelope, join under the root node, as a subtree, afterwards whole tree structure sequence is turned to the character data of XML form.
The XML message of final output sequence string format is as follows:<? xml version=" 1.0 "?〉<class〉<number〉1</number〉<EncryptedData Type=" http://www.w3.org/2001/04/xmlenc#Element " 〉
<EncryptionMethod?Algorithm="http://www.w3.org/2001/04/xmlenc#tripledes-cbc″/>
<CipherData>
<CipherValue>
D+WNcQfTqsmJJV3tTL9uRCqSQvqdJaBiDXGXkKDbpAlPiAvmgy9L0tTZ9PJKnlmlOqJ4pXtETmUWjCeo0/yrlRiwZAwt2U5X6OXUnluo3FGvvwOXKYBUuw==
</CipherValue>
</CipherData>
</EncryptedData>
<Signature>
<SignedInfo>
<CanonicalizationMethodAlgorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315#WithComments"/>
<SignatureMethod?Algorithm="http://www.w3.org/2000/09/xmldsig#hmac-shal"/>
<Reference?URI="">
<Transforms>
<TransformAlgorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
<XPath>
/class/roster
</XPath>
</Transform>
</Transforms>
<DigestMethod?Algorithm="http://www.w3.org/2000/09/xmldsig#shal"/>
<DigestValue>askxS/A3BaLCjFjZ/ttU9c12kA4=</DigestValue>
</Reference>
</SignedInfo>
<SignatureValue>oYEdQYG1IHzbkR1UcJ9Q5VriRPs=</SignatureValue>
</Signature></class>
Use DOM to be in the XML Secure Application: on the one hand as the major defect of analytic method, full content in the XML message need be converted into dom tree shape structure, and in the safe handling process of reality, can't use all data, system can consume the unnecessary time and resolve the data that these can not be used to like this, therefore memory cost also increases simultaneously, when the signature and encrypted content be whole XML message 20% the time, utilizing the DOM method to carry out safe handling, is unnecessary with parsing time and the memory cost that has 80%; On the other hand, though tree structure provides simple mode for the operation of XML, the conversion process between XML message format and the tree structure need consume a large amount of processing times, and the storage of tree structure need consume a large amount of memory headrooms.Thereby for the XML Secure Application, the DOM analytic method remains to be improved.
The method that the present invention proposes is based on finite-state automata technology and side-play amount technology, utilize finite-state automata to express the security strategy path expression, resolve at the used data of safe handling part, and utilize side-play amount storing and resolving result, avoid the bigger consumption of tree structure on time and space.
Finite-state automata, it is a kind of data structure with discrete I/O, entire work process is finished in the user mode migration, this state automata has the internal state of any limited quantity, a character is read in system from input of character string under any one state, forward new state to according to current state and this character that reads in, the specific character that can finish redirect between two states is called two redirect conditions between the state, and this redirect is oriented; Result in the operation of done state output automaton.Finite-state automata is the research object of automaton theory, is usually used in the XML research field in XML checking, XML filtration and the XML inquiry.In XML checking, the XML pattern (XML Schema) of serializing string format is resolved to the data structure of automaton form, again with XML message to be verified as input, if mate with automaton, be that automaton stops at leaf node, then export the result for legal, otherwise be illegal; In XML filtration and XML inquiry, the path expression of data in XML message to be filtered or inquiry resolved to the data structure of automaton form, XML message that again will be to be judged is as input, be output as among the XML to be filtered or the inquiry data, this path expression usually is the regular expression of logic complexity, thereby corresponding automaton is also comparatively complicated, and its output result has just read out the data of XML message, can't utilize this result that XML message is handled.
Summary of the invention
The objective of the invention is to propose the XML analytic method in a kind of XML Secure Application for overcoming the weak point of prior art.The present invention is based on finite-state automata and side-play amount technology, characteristics at the XML Secure Application, promptly only need carry out other extraction of element level and operation, the full content that does not need analyzing XML message, do not need to handle complicated path expression yet, but need the output result that the support of operating for XML message can be provided, realize the advantages of simplicity and high efficiency information stores simultaneously, thereby aspect time and space consuming two, improve the overall performance of XML Secure Application.
XML analytic method in the XML Secure Application that the present invention proposes is characterized in that, may further comprise the steps:
(1) content of signing, encrypting according to treating in the known safe strategy, and signature way is determined all path expressions of relevant position information;
(2) according to the path expression in the step 1), set up the data structure of corresponding finite-state automata, and with the character in the path expression, as the state redirect condition in this automaton;
(3) will use the XML message character string of this security strategy, input to step 2 successively) in finite-state automata mate, from the output of this finite-state automata, acquisition is carried out the needed positional information of safe handling to this XML message, with from XML message starting position offset calculated, represent described positional information, and export the side-play amount sequence of described positional information correspondence;
(4) with the XML message of original serializing string format, and the side-play amount sequence of expression positional information is as analysis result in the step 3), and the safe handling module that is input in the XML Secure Application is carried out safe handling.
Characteristics and technique effect that the present invention compares with existing DOM method:
The present invention is based on the characteristics of XML Secure Application, and under the prerequisite that keeps its function, above-mentioned two shortcomings of DOM method have been improved: at first according to known security strategy, before analyzing XML message, carry out the preliminary treatment of security strategy earlier, resolving subsequently is only at the needed information of safe handling module, rather than the parsing full content; On the other hand, take side-play amount to express analysis result, realize the advantages of simplicity and high efficiency information stores.Thereby the present invention is at the overall performance that has improved the XML Secure Application aspect time and the space consuming two.
The present invention has more the specific aim towards Secure Application, can improve the disposed of in its entirety speed of XML Secure Application greatly, be not to change continually at the security strategy path expression, perhaps is not very complicated, perhaps XML message is not under the very little situation, can reduce memory headroom consumption greatly.Practical application in most cases, the present invention is more efficiently.
Practical application in most cases, though the concrete performance of the present invention is relevant with the XML message size with different security strategies, but efficiently.Compare with the DOM method, disposed of in its entirety speed is its 2 to 3 times, overall memory consumption excursion is bigger, for example, under a kind of defined security strategy of two-stage path expression that contains 50 characters, when pending XML message size was the 1K byte, the method committed memory that the present invention proposes was about 40% of DOM method; Under a kind of defined security strategy of tertiary road footpath expression formula that contains 100 characters, when pending XML message size was the 10K byte, the method committed memory that the present invention proposes was about 24% of DOM method.
Description of drawings
Fig. 1 is a kind of dom tree shape structural representation of XML message.
Fig. 2 is in the XML message of Fig. 1, by the DOM sub-tree structure schematic diagram of security strategy path expression specified portions.
The finite-state automata embodiment schematic diagram that Fig. 3 sets up according to the security strategy path expression for the present invention.
Embodiment
The XML analytic method that is used for the XML Secure Application that the present invention proposes may further comprise the steps:
(1) content of signing and encrypting according to treating in the known safe strategy, and signature way is determined all path expressions of relevant position information;
(2) according to the path expression in the step 1), set up the data structure of corresponding finite-state automata, and with the character in the path expression, as the state redirect condition in this automaton;
(3) will use the XML message character string of this security strategy, input to step 2 successively) in finite-state automata mate, from the output of this finite-state automata, acquisition is carried out the needed positional information of safe handling to this XML message, with from XML message starting position offset calculated, represent above-mentioned positional information, and export all side-play amount sequences;
(4) with the XML message of original serializing string format, and the side-play amount sequence of expression positional information is as analysis result in the step 3), and the safe handling module that is input in the XML Secure Application is carried out safe handling.
In the above-mentioned steps (2), the process of setting up corresponding finite-state automata may further comprise the steps:
(21) obtaining element term information in the path from described all path expressions, is several levels according to '/' identifier cutting route promptly, and for example, the first order of "/class/roster " is class, and the second level is roster;
(22) character string with the element term of all path expressions joins in the finite-state automata successively, constitutes multi-level state redirect structure, and each character in the described element term is the redirect condition between the state;
(23) set up the secondary data structure that this automaton is made up of outgoing position adjuster and exciter etc., be used for carrying out special processing according to the characteristics of XML message.
The detailed process that constitutes multi-level state redirect structure in the above-mentioned steps (22) is: for each path expression, begin the character string on each rank is inserted into the appropriate level of automaton from the first order, the rule of inserting is, the state that duplicates is then skipped, non-existent state is then created, and in the end behind last state of one-level state of termination-" End " is set; Character in the element term is a state redirect condition, determines the state of next jumping, if the character in the XML message of input does not meet the redirect condition of current state, then returns current other " Start " state of level.
Set up the secondary data structure of this automaton in the above-mentioned steps (23), by outgoing position adjuster, exciter and the redirect device that handles accordingly according to the distinctive expression way of XML;
Described outgoing position adjuster is used for revised side-play amount is revised and exported to current side-play amount;
According to the special expression mode of XML, redirect device of the present invention comprises following two kinds of respective handling that special circumstances are done:
A. for avoiding appearance<xxx yyy="<zzz〉"〉during such XML message (content in the quotation marks is an element term), cause state of automata unusual, corresponding special processing is: when running into quotation marks, automaton is out of service, opens automaton when next quotation marks arrive again;
B. for avoiding appearances<xxx ss=" "〉during such XML message (rational space is arranged behind the element term xxx), cause state of automata unusual, special processing accordingly is: the space is joined in the path expression, as an adequate condition of state redirect;
In addition, for avoid appearance<xxx sss=" "/during such XML message (element not corresponding/xxx as ending, but with/form ending), cause state of automata unusual, corresponding special processing is: set up an automatic handset of corresponding state two-stage for last state of each start element, be called exciter, "/〉 " character string that exciter can be transfused to excites, make automaton to done state, and output offset amount information; When arriving last state of start element, then start exciter, with original automaton parallel running, receive the input of XML message, behind intact two states of exciter redirect, will excite automaton to done state, and output offset amount information.
In the above-mentioned steps (3), obtain that this XML message is carried out the needed positional information of safe handling and comprise: the signature starting position, the signature end position is encrypted the starting position, encrypts end position, the signature insertion position; The process that XML message is input in the automaton is as follows: after finite-state automata is set up and finished, with pending XML message is that unit sequence is input in the automaton with the character, automaton starts, the redirect device provides the redirect state of input character under the current state, the final state of termination that arrives provides above-mentioned desired position information by position regulator.
Above-mentioned finite-state automata can only be to be encrypted by XML message, signature section activates, and can skip non-encrypted signature section; XML message is input in the automaton, output ciphering signature desired position information, with the storage of side-play amount form, these offset information are enough to follow-up safe handling module is provided support.Therefore, the present invention has also accelerated the speed of safe handling, thereby the overall performance of XML Secure Application is improved when above-mentioned two aspects improve the time and space efficiency of XML parsing.
Be example still below, in conjunction with the accompanying drawings, be described in the processing overall process that adopts XML analytic method of the present invention in the XML Secure Application with the described XML message of background technology part.The security strategy of setting is still to the represented element in "/class/roster " path and all the elements are signed and cryptographic operation, and the signature mode is signature (enveloped signature) in the envelope.
(1) determines path expression.The path expression of the starting point of signature and encryption is "/class/roster ", the path expression of terminal point is "/class ∧ roster ", the expression-form of terminal point defines for the present invention, the end mark of the roster element under the expression class element, in order to realize sealing interior signature, also need find the end mark of root element, path expression is " ∧ class ", the end mark of expression class element.Each path expression all corresponding a location variable, the position of this path expression correspondence of access is used from the side-play amount of XML message starting position and is represented;
(2) set up finite-state automata according to path expression; Detailed process is: insert each paths successively in automaton, when inserting a paths, earlier according to '/' identifier cutting route, for example, the first order of "/class/roster " is class in the present embodiment, the second level is roster, next begin the character string on each rank is inserted into the appropriate level of automaton from the first order, the rule of inserting is, the state that duplicates is then skipped, non-existent state is then created, and in the end behind last state of one-level state of termination-" End " and position regulator is set, position regulator output offset amount wherein, comprising an outgoing position adjuster revises current side-play amount, for example the outgoing position adjuster that start element " roster " is corresponding is: with current offset minus 7,7 is the length of "<roster " character string, to obtain the side-play amount of this element starting position.Next, add other secondary data structure again: the space is joined in the path expression, as an adequate condition of state redirect; When running into quotation marks, automaton is out of service, opens automaton when next quotation marks arrive again; For last state of start element " roster " is set up a corresponding exciter.
The data structure of the corresponding finite-state automata of present embodiment as shown in Figure 3.Among the figure, each circlec method is all represented state, and this automaton has two-stage, and the circlec method that wherein is marked with " Start " is represented the initial state in the one-level, and the ring ◎ that indicates " End " represents the done state in the one-level; At least also have the one-level state below the state redirect of a certain paths expression formula of state representation of sign " leaf node " end among the figure, and corresponding side-play amount output, its state of state representation of sign " father node ", do not produce output herein; The arrow that connects between the state has been represented the direction of state redirect, is a character of the element term of path expression, and for example the condition that jumps to state " 2 " from state " 1 " is input character " c "; Hypographous circle ● the corresponding exciter of the state of expression, the concrete structure of this exciter is illustrated in the upper right corner of figure, the space between redirect condition blockage " " the expression character among Fig. 3;
(3) automaton imported successively in the character of the XML message of described serializing string format, from left side first order state " Start ", when "<class〉" when character string is imported continuously, the character of input is identical with the character of state redirect condition, state will be from " 1 ", " 2 ", " 3 ", " 4 ", " 5 ", " End " jumps to the second level, the right state " Start " shown in Fig. 3 always and locates, the continuous input of ensuing "<roster " character string, the state that can make jumps to the shaded circles among Fig. 3 ● state, open exciter simultaneously, after this character will be input in automaton and the exciter simultaneously, '〉when the back ' when character is imported, the not redirect of exciter state, state of automata arrives rightmost leaf node, exports a side-play amount 54, and closes corresponding exciter.This side-play amount corresponding signature starting position and encryption starting position.The rest may be inferred, and all side-play amount outputs of present embodiment are as shown in table 3:
Table 3 side-play amount example
Variable | Effect in Secure Application | Side-play amount (since 0) |
The signature starting position | Obtain the starting point of signing | ?54 |
The signature end position | Obtain the end point of signing | ?142 |
Encrypt the starting position | Obtain initialization vector, and indicate the starting point of encrypting back replacement content | ?54 |
Encrypt end position | Obtain encrypting end point, and indicate the end point of encrypting back replacement content | ?142 |
The signature insertion position | Indicate the position that signature inserts | ?145 |
(4) utilize side-play amount to carry out safe handling.The content of signature can obtain by " signature starting position " 54 and " signature end position " 142, the content of encrypting can obtain by " encryption starting position " 54 and " encryption end position " 142, intercept corresponding XML message fragment by side-play amount, the i.e. full content of " roster " element, algorithm according to appointment, key, initialization vector is operated, with signature value and the encrypt data that obtains, add in the XML signature and encryption standard template that meets global World Wide Web Consortium recommendation, replace with " encryption starting position " XML message fragment with encrypted template with " encryption end position " expression, and the template of will signing is inserted into " signature insertion position " and locates, and the output result who finally obtains is identical with the Secure Application that adopts the DOM analytic method.