Background technology
In recent years, (Cross Site Scripting XSS) was illegally occupying the umber one of Web security threat to cross-site attack all the time.The assailant is through XSS, can reach to hang horse, http session abduction, information such as fishing, website and illegal objective such as steal, and therefore, XSS has a strong impact on the safety and normal operation of Web.
In order to detect XSS; In the systems such as existing IPS, IDS or WAF; All be based on the guard technology of regular expression coupling; Also promptly describe the XSS behavior, confirm as the XSS behavior, for example adopt following regular expression with the behavior that regular expression is complementary through regular expression: s+ (s*=|height).
But the form that can be used for XSS is very many, and almost each html tag, CSS label can be used to attack, and for fear of failing to report, need mate through a large amount of regular expressions, can directly produce the reduction of network performance undoubtedly.Simultaneously; HTML, CSS label to obscure form very many; For example the assailant can be on the defensive and hide through HTTP coding, HTML coding, interpolation idle character means such as (control character, newline, punctuation marks), and failing to report appears in feasible guard technology based on regular expression.For fear of failing to report, need relax the description precision to regular expression, and this way can directly cause the appearance reported by mistake.In addition, adopt guard technology, after rule is hit, do not have extra processing, thereby the safety officer can not know the detailed behavior of attack, brought extra analysis burden for safety officer's work based on regular expression.
To sum up, because the limitation of the descriptive power of regular expression own makes and reports height by mistake and fail to report high shortcoming based on the XSS detection technique existence of regular expression.
In order to overcome the weakness of regular expression on the statement ability, existing a lot of WAF series products adopt the detection technique based on machine learning.Usually such technology all needs a learning phase; Also promptly: adopt methods such as statistical analysis, data mining; After learning the data characteristics of normal discharge; At follow-up detection-phase, the flow that receives is classified under the guidance of the normal discharge model of learning, adopt the mode of " improper promptly unusual " to detect attack.
But; Owing to had the problem of learning and owing to learn in the machine learning process; Cross study and mean that usually attack traffic sneaked in the normal discharge by study by error, thereby the characteristic that causes attacking is brought into final normal discharge model, can bring the problem of failing to report this moment when detecting; Owe to learn to mean usually that the study flow does not comprise all normal traffic patterns, cause the model of part normal discharge not to be learned to, in testing process, can cause the problem of reporting by mistake.
To sum up, owing to be difficult to intactly learn all characteristics of normal discharge in the reality, make based on still there being the problem of reporting by mistake and failing to report in the detection technique of machine learning.
Problem to the detection method of cross-site attack in the correlation technique is reported by mistake easily and failed to report does not propose effective solution at present as yet.
Summary of the invention
Main purpose of the present invention be to provide a kind of cross-site attack detection method, install and have the fire compartment wall of this device, the problem of reporting by mistake easily and failing to report with the detection method that solves cross-site attack.
To achieve these goals, according to an aspect of the present invention, a kind of detection method of cross-site attack is provided.
Detection method according to cross-site attack of the present invention comprises: the data to the user submits to are carried out parameter decomposition, the parameter after obtaining decomposing; Parameter to after decomposing is carried out the decoding of HTTP coding, obtains decoded parameter; Decoded parameter is carried out morphological analysis, obtain the parameter after the morphological analysis; Parameter after the morphological analysis is carried out syntactic analysis, and set up syntax tree; And when syntax tree is set up successfully, confirm that the corresponding behavior of data that the user submits to is a cross-site attack.
Further, the data of the user being submitted to are carried out parameter decomposition and comprised following any one or multiple parameter decomposition: the URI in the data request row carries out parameter decomposition; Cookie stem to the data request is carried out parameter decomposition; Cookie2 stem to the data request is carried out parameter decomposition; Referer stem to the data request is carried out parameter decomposition; And the entity that POST is asked carries out parameter decomposition.
Further; Before decoded parameter is carried out morphological analysis; This method also comprises: judge whether decoded parameter is made up of numeral and/or letter; Wherein, decoded parameter is carried out morphological analysis comprise: when decoded parameter is not made up of numeral and/or letter, decoded parameter is carried out morphological analysis.
Further, decoded parameter is carried out morphological analysis, comprise: decoded parameter is carried out the morphological analysis of HTML, to obtain the parameter after the HTML morphological analysis with the parameter that obtains after the morphological analysis; And decoded parameter carried out the morphological analysis of CSS, to obtain the parameter after the CSS morphological analysis.
Further, the parameter after the morphological analysis being carried out syntactic analysis comprises: the parameter after the HTML morphological analysis is carried out the HTML syntactic analysis; And the parameter after the CSS morphological analysis carried out the CSS syntactic analysis.
Further, after the behavior of confirming the data correspondence that the user submits to was cross-site attack, this method also comprises: the syntax tree that success is set up carried out semantic analysis; And the purpose of confirming cross-site attack according to the result of semantic analysis.
Further, after the behavior of confirming the data correspondence that the user submits to was cross-site attack, this method also comprised: the data that the user submits to are carried out in virtual machine; And the purpose of confirming cross-site attack according to execution result.
To achieve these goals, according to a further aspect in the invention, a kind of checkout gear of cross-site attack is provided.Checkout gear according to cross-site attack of the present invention is used to carry out any one detection method provided by the invention.
To achieve these goals, according to a further aspect in the invention, a kind of checkout gear of cross-site attack is provided.Checkout gear according to cross-site attack of the present invention comprises: protocol analyzer, and be used for the data that the user submits to are carried out parameter decomposition, and the parameter after decomposing is carried out the decoding of HTTP coding, obtain decoded parameter; Lexical analyzer is used for decoded parameter is carried out morphological analysis, obtains the parameter after the morphological analysis; And syntax analyzer, be used for the parameter after the morphological analysis is carried out syntactic analysis, and set up syntax tree, and when syntax tree is set up successfully, confirm that the corresponding behavior of data that the user submits to is a cross-site attack.
Further; This device also comprises: judging unit; Be used for before decoded parameter is carried out morphological analysis, judge whether decoded parameter is made up of numeral and/or letter, wherein; Lexical analyzer also is used for when decoded parameter is not made up of numeral and/or letter, decoded parameter being carried out morphological analysis.
Further, this device also comprises: semantic parser, be used for after the behavior of confirming the data correspondence that the user submits to is cross-site attack, and the syntax tree that success is set up carries out semantic analysis, and confirms the purpose of cross-site attack according to the result of semantic analysis.
Further, this device also comprises: virtual machine is used for after the behavior of confirming the data correspondence that the user submits to is cross-site attack, carries out the data that the user submits to, and confirming the purpose of cross-site attack according to execution result.
To achieve these goals, according to a further aspect in the invention, a kind of fire compartment wall is provided.
Fire compartment wall according to the present invention comprises the checkout gear of any one cross-site attack provided by the invention.
Through the present invention; The detection method of the cross-site attack that employing may further comprise the steps: the data of at first user being submitted to are carried out parameter decomposition, then the parameter after decomposing are carried out the decoding of HTTP coding, at last decoded parameter are carried out morphological analysis and syntactic analysis; And syntax tree is set up in trial; When syntax tree can successfully be set up, confirm that the corresponding behavior of data that the user submits to is a cross-site attack, realized a kind of Intelligent Measurement means based on syntactic analysis; Not to attack the form of expression through describing; But, solved the problem that the detection method of cross-site attack is reported by mistake easily and failed to report, and then reached the effect that reduces wrong report and fail to report simultaneously through describing the cross-site attack substantive characteristics to reach the purpose that detects cross-site attack.
Embodiment
Need to prove that under the situation of not conflicting, embodiment and the characteristic among the embodiment among the application can make up each other.Below with reference to accompanying drawing and combine embodiment to specify the present invention.
Fig. 1 is the work sketch map according to the fire compartment wall of the embodiment of the invention; As shown in Figure 1, to the WEB server, pass through router or interchanger, fire compartment wall and load balance from the internet successively; Fire compartment wall has the cross-site attack measuring ability; Be arranged between internet and the server, the user data of cross-site attack is filtered, prevent the malice visit of assailant server.
Wherein, Fire compartment wall among this embodiment carries out the analysis of morphology and grammer to the data that the user submits to when carrying out the cross-site attack detection, detect cross-site attack through describing the attack substantive characteristics; Can better detect and organize cross-site attack, reduce the probability that erroneous detection and omission are surveyed.
The embodiment of the invention also provides the checkout gear of cross-site attack, below the checkout gear of the cross-site attack that the embodiment of the invention provided is introduced.Need to prove, all can be applicable to fire compartment wall of the present invention at the checkout gear of the cross-site attack of the embodiment of the invention.
Fig. 2 is the block diagram according to the checkout gear of the cross-site attack of first embodiment of the invention, and is as shown in Figure 2, and this checkout gear comprises protocol analyzer 10, lexical analyzer 30 and syntax analyzer 50.
Protocol analyzer 10 at first carries out the http protocol analysis to the HTTP flow through checkout gear, and the advanced line parameter of also promptly the user being submitted to of data decomposes, and then the parameter after decomposing is carried out the decoding of HTTP coding, the parameter behind the final output decoder.
Preferably, what the object that protocol analyzer 10 is analyzed was primarily aimed at is the object that cross-site attack takes place usually, and the data of submitting to according to data, analysis request row, request header and request entity successively.Wherein, concrete analytic target comprises the URI in the request row, the Cookie stem of request, the Cookie2 stem of request, the Referer stem of request and the entity of POST request.
After obtaining decoded parameter behind the protocol analyzer 10, lexical analyzer 30 carries out morphological analysis to each decoded parameter, and removes note, insignificant character, for example: for descending column label:
<script?src%$#=”xxx”>
Browser can be ignored " src " back to "=" character in front, and these characters are exactly meaningless character.Strictly speaking, writing like this is a kind of mistake, but browser has initiatively been selected the ignorance idle character for fault-tolerant and need (because a lot of developers because clerical mistake, mistake has been write idle character) easily.Though this lets browser obtain better fault freedom, also provides convenience to the assailant, can utilize invalid character to carry out attack signature and obscure, obtain the parameter after the morphological analysis.
Syntax analyzer 50 is safeguarded perfect, a strict parsing table that passes through the context-free language description, carries out syntactic analysis to the parameter after analyzing through lexical analyzer 30, attempts setting up syntax tree according to linguistic norm.
The principle of cross-site attack XSS is that injection can be at the browser end execution script; So, implement cross-site attack, how no matter the malicious data that the assailant submitted to change; Finally all must satisfy syntax gauge, otherwise victim's browser can not be carried out the malicious code of embedding.Therefore; After process syntax analyzer 50 carries out syntax parsing according to complete standard; If the data that the user submits to can and successfully be set up syntax tree through syntactic analysis; This comprises the code of grammaticalness standard with regard to meaning data that the user submits to, confirms that promptly the corresponding behavior of data that the user submits to is suspicious cross-site attack.
Adopt the checkout gear of the cross-site attack that this embodiment provides; Realized a kind of Intelligent Measurement means based on syntactic analysis, the data of in real time user being submitted to are carried out analyzing and testing, are not to attack the form of expression through describing; But through describing the cross-site attack substantive characteristics to reach the purpose that detects cross-site attack; Reduce rate of false alarm and rate of failing to report that cross-site attack detects simultaneously, can prevent cross-site attack better, maintaining network safety.
Fig. 3 is the block diagram according to the checkout gear of the cross-site attack of second embodiment of the invention; As shown in Figure 3; Protocol analyzer at first carries out the http protocol analysis to the HTTP flow through checkout gear; The advanced line parameter of also promptly the user being submitted to of data decomposes; What the object of analyzing was primarily aimed at is the object that cross-site attack takes place usually, specifically comprises the URI in the request row, the Cookie stem of request, the Cookie2 stem of request, the Referer stem of request and the entity of POST request, then each parameter of each object is carried out the decoding of HTTP coding.
For example, protocol analyzer carries out parameter decomposition and decoding to the following URI that the user submits to:
/seach.asp?id=1&&find=%3C%26%23%78%34%39%3B%4D%47%20%53%54%59%4C%45%3D%22%78%73%73%3A%65%78%70%72%2F%2A%58%53%53%2A%2F%65%73%73%69%6F%6E%28%61%6C%65%72%74%28%27%58%53%53%27%29%29%22%3E%0A
Through behind the protocol analyzer, will be decomposed, decoding is reduced into two parameters:
Id(name=”id”,value=1);
Find(name=”find”,value=<I;MG?STYLE=″xss:expr/*XSS*/ession(alert(′XSS′))″>)。
In order to improve the processing speed of checkout gear; Before carrying out morphology and syntactic analysis, tentatively judge earlier, to get rid of tangible normal discharge; This checkout gear also comprises judging unit; After carrying out parameter decomposition through protocol analyzer and decoding, before decoded parameter was carried out morphological analysis, whether the decoded parameter of judgment unit judges was made up of numeral and/or letter.For example, to above-mentioned two parameter I d and Find, because the value of first parameter I d is a character string of being made up of numeral; So cross-site attack can not take place in this parameter certainly; And second parameter F ind comprises punctuation mark in its value, need carry out further morphological analysis.If two the parameter I d of URI and Find form by combination, numeral or the letter of numeral and letter, then this URI be normally obvious flow, need not carry out the detection of cross-site attack.
In above-mentioned judgement, if decoded parameter by numeral and/or letter when not forming, then need carry out morphological analysis to decoded parameter.Wherein, the HTML lexical analyzer is responsible for reducing HTML coding, and removes note, insignificant character; The CSS lexical analyzer is responsible for reducing the coding among the CSS, and removes note, insignificant character.For above-mentioned Find parameter, behind the HTML lexical analyzer, will be reduced into:
<IMG?STYLE=″xss:expression(alert(′XSS′))″>)。
After carrying out morphological analysis, syntax analyzer is carrying out syntactic analysis through the parameter after the lexical analyzer analysis, and wherein, the HTML syntax analyzer is used to safeguard perfect, a strict HTML parsing table that passes through the context-free language description; The CSS syntax analyzer is used to safeguard perfect, a strict CSS parsing table that passes through the context-free language description.The effect of syntax analyzer is through the decoded character string of lexical analyzer, attempts setting up syntax tree according to linguistic norm.For above-mentioned Find parameter, through behind the syntax analyzer, can set up syntax tree as shown in Figure 4, thereby can confirm that the corresponding behavior of data that the user submits to is suspicious cross-site attack.
On the basis of successfully setting up syntax tree; Through final HTML semantic parser and CSS semantic parser syntax tree is analyzed; If find that the code that the user submits to is the Script operation semantically; Perhaps be added with the label of potential threat, for example IFRAME, OBJECT, LINK, SCRIPT, STYLE, APPLET, META, EMBED or the like then can further confirm assailant's attack purpose; Analyze the attack purpose that obtains the most at last and be stored in the log collection device, and report to the police.
In this embodiment; A kind of Intelligent Measurement means have been realized based on syntactic analysis and semantic execution; Attack substantive characteristics detection cross-site attack rather than attack the form of expression through describing through describing; And further the cross-site attack of confirming is carried out semantic analysis, the analysis result of cross-site attack is provided to the safety officer.
Adopt the checkout gear of the cross-site attack that this embodiment provides; Can overcome the weak shortcoming of regular expression descriptive power; Detection means than based on the regular expression characteristic has higher detectability; Have better adaptive faculty to resisting to attack in 1st, higher recall rate and lower rate of failing to report are arranged; Than the detection means based on machine learning, this device has overcome crosses the problem of learning and owing to learn in the machine learning problem, make that rate of failing to report and rate of false alarm are lower; And can provide ability detailed behavior purpose explanation, effectively help the safety officer to carry out ex-post analysis.
In order to obtain the purpose of cross-site attack more accurately; Can adopt virtual machine to substitute the semantic parser in embodiment illustrated in fig. 3; After the behavior of confirming the data correspondence that the user submits to is cross-site attack; Carry out the data that the user submits to through virtual machine, thereby confirm the purpose of cross-site attack accurately according to execution result.
The embodiment of the invention also provides the detection method of cross-site attack, below the detection method of the cross-site attack that the embodiment of the invention provided is introduced.Need to prove; Detection method at the cross-site attack of the embodiment of the invention can be carried out through the checkout gear of the cross-site attack that the embodiment of the invention provided, and the checkout gear of the cross-site attack of the embodiment of the invention also can be used to carry out the detection method of the cross-site attack that the embodiment of the invention provides.
Fig. 5 is the flow chart according to the detection method of the cross-site attack of the embodiment of the invention, and is as shown in Figure 5, and this method comprises that following step S102 is to step S110:
Step S102: the data to the user submits to are carried out parameter decomposition, the parameter after obtaining decomposing.
Step S104: the parameter to after decomposing is carried out the decoding of HTTP coding, obtains decoded parameter.
Above-mentioned step S102 and step S104 can carry out through the protocol analyzer in embodiment illustrated in fig. 2 10; Through two above-mentioned steps; HTTP flow to through the cross-site attack safeguard carries out the http protocol analysis; The advanced line parameter of also promptly the user being submitted to of data decomposes, and then the parameter after decomposing is carried out the decoding of HTTP coding, the parameter behind the final output decoder.
Preferably, the object of analyzing in above-mentioned two steps is primarily aimed at is the object that cross-site attack takes place usually, and the data of submitting to according to data, analysis request row, request header and request entity successively.Wherein, concrete analytic target comprises the URI in the request row, the Cookie stem of request, the Cookie2 stem of request, the Referer stem of request and the entity of POST request.
Step S106: decoded parameter is carried out morphological analysis, obtain the parameter after the morphological analysis.
This step can be carried out through the lexical analyzer in embodiment illustrated in fig. 2 30, in the morphological analysis process, can directly note and insignificant character be removed, and preferably, decoded parameter is carried out morphological analysis comprise HTML morphological analysis and CSS morphological analysis.
Step S108: the parameter after the morphological analysis is carried out syntactic analysis, and set up syntax tree.
Step S110: when syntax tree is set up successfully, confirm that the corresponding behavior of data that the user submits to is a cross-site attack.
Above-mentioned step S108 and step S110 can carry out through the language analyzer in embodiment illustrated in fig. 2 50; The principle of cross-site attack XSS is that injection can be at the browser end execution script; So, implement cross-site attack, how no matter the malicious data that the assailant submitted to change; Finally all must satisfy syntax gauge, otherwise victim's browser can not be carried out the malicious code of embedding.Therefore; After step S108 and step S110 carried out syntax parsing according to complete standard, if the data that the user submits to can and successfully be set up syntax tree through syntactic analysis, this comprised the code of grammaticalness standard with regard to meaning data that the user submits to; Confirm that promptly the corresponding behavior of data that the user submits to is suspicious cross-site attack; Preferably, the parameter after the HTML morphological analysis is carried out the HTML syntactic analysis, the parameter after the CSS morphological analysis is carried out the CSS syntactic analysis.
Adopt the detection method of the cross-site attack that this embodiment provides; Can overcome the weak shortcoming of regular expression descriptive power; Detection means than based on the regular expression characteristic has higher detectability; Have better adaptive faculty to resisting to attack in 1st, higher recall rate and lower rate of failing to report are arranged; Than the detection means based on machine learning, this device has overcome crosses the problem of learning and owing to learn in the machine learning problem, make that rate of failing to report and rate of false alarm are lower.
In order to improve detection method speed, before carrying out morphology and syntactic analysis, tentatively judge earlier; To get rid of tangible normal discharge, preferably, before this step S106; This method also comprises: judge whether decoded parameter is made up of numeral and/or letter, wherein, if decoded parameter is made up of numeral and/or letter; Then the data of this user's submission belong to normal discharge, need not to carry out the processing of follow-up step S108 and step S110, if decoded parameter is not made up of numeral and/or letter; The processing of execution in step S108 and step S110 is carried out morphological analysis and syntactic analysis to decoded parameter successively.
For the behavior purpose of detailed cross-site attack can be provided to the safety officer; Effectively help the safety officer to carry out ex-post analysis; Preferably, after step S110, confirm that the corresponding behavior of data that the user submits to is cross-site attack after; This method also comprises: the syntax tree that success is set up carries out semantic analysis, and confirms the purpose of cross-site attack according to the result of semantic analysis.
For the purpose that obtains cross-site attack more accurately is provided to the safety officer; Help the safety officer to carry out ex-post analysis more effectively, further preferably, after step S110; The data that the user submits to are carried out in virtual machine, and confirmed the purpose of cross-site attack according to execution result.
From above description, can find out that the present invention has realized following technique effect: realized a kind of Intelligent Measurement means based on syntactic analysis; The data of in real time user being submitted to are carried out analyzing and testing; Not to attack the form of expression, but, reduce rate of false alarm and rate of failing to report that cross-site attack detects simultaneously through describing the cross-site attack substantive characteristics to reach the purpose that detects cross-site attack through describing; Can prevent cross-site attack better, maintaining network safety.
Need to prove; Can in computer system, carry out in the step shown in the flow chart of accompanying drawing such as a set of computer-executable instructions; And; Though logical order has been shown in flow chart, in some cases, can have carried out step shown or that describe with the order that is different from here.
Obviously, it is apparent to those skilled in the art that above-mentioned each module of the present invention or each step can realize with the general calculation device; They can concentrate on the single calculation element; Perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element; Thereby; Can they be stored in the storage device and carry out, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize by calculation element.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is merely the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.All within spirit of the present invention and principle, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.