Summary of the invention
In view of this, the object of this invention is to provide species to the method for the unauthorized access request of website and device, can obtain the legal range of required parameter accurately, easily, the legal range be conducive to according to getting identifies accurately illegal request and tackles.
For achieving the above object, the invention provides technical scheme as follows:
Identify the method to the unauthorized access request of website, comprising:
Obtain the legal range of the http access request parameters of website, and the validity rule of described legal range as parameter is loaded;
Intercept and capture user browser to the http access request of website;
The http access request intercepted is mated with the validity rule of parameter, determines that whether the http access request intercepted is legal according to matching result.
Alternatively, the legal range of the http access request parameters of described acquisition website specifically comprises:
Obtain the http access log file of website;
From described journal file, filter out log recording corresponding to legal http request, obtain legal log recording set;
According to described legal log recording set, extract the legal range of access request parameters.
Alternatively, the legal range of access request parameters be following in one or more:
The parameter name allowed; The type of parameter; The maximum length of parameter value; The spcial character occurred is allowed in parameter value.
Alternatively, described according to described legal log recording set, extract the legal range of access request parameters, specifically comprise:
Requesting method corresponding to each http request and request resource is obtained from described log recording set;
For each request resource, obtain corresponding parameter name-parameter value list;
According to described parameter name-parameter value list legal range getparms.
Alternatively, described the http access request intercepted to be mated with the validity rule of parameter, determines that whether the http access request intercepted is legal according to matching result, specifically comprise:
Http access request is resolved, obtains the resource of request, the method for request and required parameter;
Mated with described validity rule by analysis result, the match is successful, then determine that http access request is legal, it fails to match, then determine that http access request is illegal.
Identify the device to the unauthorized access request of website, comprising:
Load-on module, for obtaining the legal range of the http access request parameters of website, and loads the validity rule of described legal range as parameter;
Interception module, for intercepting and capturing the http access request of user browser to website;
According to matching result, matching module, for the http intercepted access request being mated with the validity rule of parameter, determines that whether the http access request intercepted is legal.
Alternatively, described load-on module specifically for:
Obtain the http access log file of website;
From described journal file, filter out log recording corresponding to legal http request, obtain legal log recording set;
According to described legal log recording set, extract the legal range of access request parameters.
Alternatively, the legal range of access request parameters be following in one or more:
The parameter name allowed; The type of parameter; The maximum length of parameter value; The spcial character occurred is allowed in parameter value.
Alternatively, described according to described legal log recording set, extract the legal range of access request parameters, specifically comprise:
Requesting method corresponding to each http request and request resource is obtained from described log recording set;
For each request resource, obtain corresponding parameter name-parameter value list;
According to described parameter name-parameter value list legal range getparms.
Alternatively, described matching module specifically for:
Http access request is resolved, obtains the resource of request, the method for request and required parameter;
Mated with described validity rule by analysis result, the match is successful, then determine that http access request is legal, it fails to match, then determine that http access request is illegal.。
According to technique scheme of the present invention, do not need manual analysis, automatically the legal range of http request parameter can be obtained from web log file, obtain m odel validity rule (being similar to white list to filter), just can realize identifying accurately illegal request and tackling according to this m odel validity rule.
Embodiment
For ease of better understanding the present invention, first the access log file of website is simply introduced here.
IIS is the abbreviation of InternetInformationServer, is meant to Internet Information Services.The WEB daily record of IIS is exactly the log of website under IIS, and each visitor sends a http request to website, and no matter whether this access is successful, and journal file all can carry out record.
Daily record comprises following message: who have accessed website, and which content visitor has checked and checked the time etc. of information for the last time.Due to the relative recording recording all access Web services of IIS loyalty, therefore make full use of daily record, just can carry out intrusion detection, traffic statistics analysis, solve IIS server failure, and solve page fault.
IIS6.0 WEB journal file acquiescence deposit position be %systemroot% system32 LogFiles, acquiescence daily record every day.If do not protected journal file; invaded person can be easy to find and the vestige in daily record is removed; therefore the catalogue not using acquiescence is advised; change the path of a log; journal file access rights are set simultaneously, only allow keeper and SYSTEM (system) to be the authority controlled completely.
The name format of journal file is: the ex+ time last two digits+month+date, the WEB journal file as on August 10th, 2002 is ex020810.log.The journal file of IIS is all text, any editing machine can be used to open, such as organizer program, and suggestion uses UltraEdit editing machine to edit.
Journal format is fixing ASCII fromat, carries out record by World Wide Web Consortium (WorldWideWebConsortium, W3C) standard.
Journal file beginning four lines is descriptive information, as follows:
#Software generates software
#Version version
There is the date in #Date daily record
#Fields field, the form of display recorded information, can be self-defined by IIS.
The main body of daily record is solicited message one by one, and the form of solicited message is by Field Definition, and each interfield space separates.
Conventional field is explained as follows:
There is the date of request in data;
There is the time of request in time;
S-sitename meets the website example number of request;
S-ip generates the server ip address of journal entry;
Cs-method requesting method, namely client attempts the operation (such as GET or POST method) of execution;
The resource of cs-uri-stem access, such as Index.htm;
The subsidiary parameter of cs-uri-query reference address, if do not have parameter, represents by hyphen "-";
The port numbers that s-port client's side link arrives;
Cs-username access services device by the user's name of authentication, anonymous hyphen represents;
The client ip address of c-ip access services device;
Cs-version client protocol version;
The browser type that cs (User-Agent) client uses;
Cs (Referer) quotes website (website of user's last visit, this website provides and the linking of current site);
Sc-status responsive state code, common are 200 expression successes, 403 represent do not have authority, and 404 expressions can not find this page, and 500 representation programs are wrong;
The sub-state code of sc-substatus;
Sc-win32-statusWindows state code.
Enumerate the form (each journal file has following 4 row) that journal file is described below:
#Software:MicrosoftInternetInformationServices6.0
#Version:1.0
#Date:2008-03-3108:00:03
#Fields:datetimes-sitenames-ipcs-methodcs-uri-stemcs-uri-querys-portcs-usernamec-ip
cs(User-Agent)sc-statussc-substatussc-win32-status
2008-03-3108:02:34W3SVC72812902192.168.1.133GET/login.htm-80-192.168.1.127
Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.2;+(R1+1.5);+.NET+CLR+1.1.4322)20000
Each row has clearly write down Terminal Server Client respectively above:
Access time 2008-03-3108:02:34
The IP address 192.168.1.133 of institute's access services device
The operation GET/login.htm performed
Access port 80
Client ip address 192.168.1.127
Browser type Mozilla/4.0+
Http response conditional code 200
The present invention carrys out the legal range of analyzing web site access request parameters just according to the journal file of website, after obtaining the legal range of required parameter, this legal range is considered as the validity rule of parameter, according to the validity rule of parameter, just can identifies unauthorized access request.
Describe the present invention below in conjunction with accompanying drawing.
Fig. 1 is the method flow diagram of analyzing web site access request parameters legal range according to an embodiment of the invention.With reference to Fig. 1, described method can comprise the steps:
Step 101, obtains the http access log file of website;
Step 102, filters out log recording corresponding to legal http request from described journal file, obtains legal log recording set;
Responsive state code can be that the log recording of 200 is as log recording corresponding to legal http request by the corresponding log recording of each http request in journal file.
Step 103, extracts the legal range of access request parameters according to described legal log recording set.
Further, after the legal range extracting parameter, can also using the legal range of parameter as xml file output.The relative plain text format of xml form has the following advantages: be convenient to exchanges data; The clear relation shown between data; Logic is stronger, is convenient to program and reads.
Wherein, the legal range of parameter can be following in one or more:
The parameter name allowed; The type (such as, character type, numeric type etc.) of parameter; The maximum length of parameter value; Allow in parameter value occur spcial character (in the present invention, numeral and letter are considered as ordinary symbol, in addition be considered as spcial character, such as, underscore, dollar mark () etc. are all considered as spcial character).
Due to each bar log recording in legal log recording set corresponding be legal http request, therefore, wherein each kind of parameter is then legal parameters, can obtain the legal range of access request parameters, specifically can comprise the steps: according to these legal parameters
Step S1, obtains requesting method corresponding to each http request and request resource from described log recording set;
Requesting method can be get, post etc.In the present invention, each request resource in described log recording set can be identified as legal request resource, wherein, described request resource can identify with URL.
Step S2, for each request resource, obtains corresponding parameter name-parameter value list;
Due to corresponding many log recordings of each request resource possibility, therefore, the parameter name that this request resource is corresponding can be identified as legal parameters title (parameter name of permission).Further, for certain parameter name under this request resource, the corresponding multiple parameter values of possibility, so, these parameter values are just added up, just can obtain described parameter name-parameter value list.
Step S3, according to described parameter name-parameter value list legal range getparms.
Wherein, for the maximum length that parameter value allows, the maximum length that the maximum length can choosing the parameter value existed in described parameter name-parameter value list allows as the parameter value that relevant parameter title is corresponding.
For the spcial character that parameter value allows, for all parameter names, the spcial character occurred in parameter value and frequency can be added up, frequency of occurrences height and the spcial character of pre-determined threshold are thought the spcial character that can occur in corresponding parameter value.
Suppose to have filtered out following legal log recording set:
test.com/a.php?userid=10&product_name=good&price=123.4
test.com/a.php?userid=20&product_name=1234&price=100
test.com/a.php?userid=303&product_name=a-b&price=10
test.com/a.php?userid=40&product_name=a-c&price=1
test2.com/a.php?userid=10&product_name=_asd&price=123
Then for the maximum length that parameter value allows: the maximum length that the userid under a.php occurs is 3 (Article 3), the maximum length that product_name occurs is 4.Therefore the last rule generated out is: the parameter value maximum length that under a.php, userid parameter is corresponding is parameter value maximum length corresponding to 3, product_name parameter is 4.
Spcial character for allowing out in parameter value: according to above-mentioned daily record, can do following statistics, for product_name parameter:
Spcial character number of times
-(horizontal line) 2
_ (underscore) 1
If arrange if there is spcial character number of times be more than or equal to 2 times, just think that this spcial character can occur, then can think-(horizontal line) be allow appearance spcial character.
Further, following rule can also be generated: can occur in all product_name of website-(horizontal line), but not allow to occur _ (underscore).
In addition, for the maximum length that parameter value allows, the embodiment of the present invention also provides a kind of method of supervised learning, by a part of log recording in legal log recording set as training set, remaining part is as test set, by constantly adjusting the parameter of anticipation function, and verify the accuracy of function with test set, thus obtain best anticipation function.
Suppose the record having now 10 band parameters of accessing for specific url, can using front 5 as training set, rear 5 as test set, learning function infers by front 5 records the maximum length parameter value, then utilize rear 5 conduct tests, judge that whether this maximum length is reasonable.
For the spcial character allowing in parameter value to occur, the embodiment of the present invention also provides a kind of method of deduction, namely feature database is set up, store spcial character and occurrence frequency (adding up with file-name field) that in all study websites, field (parameter name) allows, and think that the special field that certain field occurrence frequency is high in feature database is the attribute that this field generally has, so, the spcial character that in feature database, certain field occurrence frequency is high is the spcial character that the same file-name field in website to be learned (allow the prerequisite of spcial character in this field under) allows.
Further, the embodiment of the present invention also provides a kind of incremental learning method to obtain the legal range of access parameter.Incremental learning is namely: on the basis of the rule of this website existing, by the access log file of this website, extract the information such as url, required parameter wherein, obtain the spcial character of specifying and may occur in the parameter length of url and parameter in conjunction with above-mentioned study thinking, these information to be recorded and stored in database.
For a simple example:
For www.test.com website, there are 2 rules, the number of the parameter that such as www.test.com/a.php allows is 2, the maximum length that parameter value allows is 5, present www.test.com has again new daily record, can supplement and improve existing rule according to daily record in conjunction with existing rule, the number improving out the parameter that a.php allows is 3, and the maximum length that parameter value allows is 10.
Said method can use software simulating, also can use hardware implementing.When implemented in software, under this program may operate in python environment, and be support with mysql, such as, operate in Linux system.
According to the technique scheme of the embodiment of the present invention, do not need manual analysis, automatically can obtain the legal range of http request parameter from web log file.
Further, after the legal range obtaining http request parameter, just can, using this legal range as m odel validity rule, just can realize identifying accurately illegal request and tackling according to this m odel validity rule.
Fig. 2 is the method flow diagram of the unauthorized access request identified according to an embodiment of the invention website.With reference to Fig. 2, described method can comprise the steps:
Step 201, obtains the legal range of the http access request parameters of website, and is carried out loading (or storage) by the validity rule of described legal range as parameter;
Step 202, intercepts and captures user browser to the http access request of website;
Step 203, mates the http access request intercepted with the validity rule of parameter, and determines that whether the http access request intercepted is legal according to matching result.
Can resolve http access request, obtain the resource of request, the method for request and various parameter, then, mated with described validity rule by analysis result, the match is successful, then determine that http access request is legal, otherwise, determine that http access request is illegal.
In step 203, matching process specifically can comprise:
Judge that whether the resource of asking is legal, in the legal range of the required parameter that conducts interviews is analyzed, legal the Resources list can be obtained, by judging whether the resource of asking can determine in legal the Resources list that the resource of asking whether could;
Judge that whether request type (get/post) is legal;
Whether the type (numeral/character string) of identification parameter, the length of parameter value is legal.
According to said method, for the request of a website, the m odel validity rule according to correspondence identifies, if effectively, can by request forward to actual site, otherwise directly tackle, record the relevant information of this request simultaneously, to improve corresponding rule afterwards.
Further, the relevant information of illegal request can also be recorded in journal file.
As a kind of implementation, the method for the above-mentioned identification illegal request of the embodiment of the present invention according to existing m odel validity rule, and can tackle illegal request in conjunction with ngnix program.Identify legal request according to valid parameter value, compare conventional method more intelligent, efficient.Meanwhile, utilize efficient nginx server to support filter request, can greatly reduce costs.
Below the device realizing above-mentioned acquisition methods and recognition methods is provided respectively.
Fig. 3 is the structure drawing of device of analyzing web site access request parameters legal range according to an embodiment of the invention.With reference to Fig. 3, described device can comprise acquisition module 31, screening module 32 and analysis module 33, wherein:
Acquisition module 31 is for obtaining the http access log file of website;
Screening module 32, for filtering out log recording corresponding to legal http request from described journal file, obtains legal log recording set;
Analysis module 33, for according to described legal log recording set, extracts the legal range of access request parameters.
The specific works principle of described device can be shown in Figure 1 method, repeat no more here.
Fig. 4 is the structure drawing of device of the unauthorized access request identified according to an embodiment of the invention website.With reference to Fig. 4, described device can comprise: load-on module 41, interception module 42 and matching module 43, wherein:
The validity rule of described legal range as parameter for obtaining the legal range of the http access request parameters of website, and loads by load-on module 41;
Interception module 42 is for intercepting and capturing the http access request of user browser to website;
According to matching result, matching module 43, for the http intercepted access request being mated with the validity rule of parameter, determines that whether the http access request intercepted is legal.
The specific works principle of described device can be shown in Figure 2 method, repeat no more here.
It should be noted that, can perform in the computer system being such as provided with one group of computer executable instructions in the step shown in the flow chart of accompanying drawing, and, although show logical order in flow charts, but in some cases, can be different from the step shown or described by order execution herein.In addition, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.