Embodiment
The method of existing detection URL is that URL is detected according to the safety regulation manually formulated by server.But
It is, on the one hand, the means that hacker carries out network attack using URL are ever-changing, and the safety regulation manually formulated is difficult to cover each
Attack means;On the other hand, the safety regulation manually formulated usually lags behind emerging attack means.
For this reason, in one or more embodiments of this specification, some URL are obtained, extract the parameter in each URL, and
The corresponding feature vector of each parameter is determined, according to the corresponding feature vector of each parameter, structure isolation forest Isolation
Forest models.It is well known to those skilled in the art, isolation forest model is a kind of abnormality detection model, uses isolation
Forest model can detect whether some URL is abnormal, and abnormal URL is exactly often the URL sent by hacker, and server can
With the URL that refusal parsing is abnormal, so as to avoid by hacker attack.
It should be noted that why can be according to the corresponding feature vector structure isolation forest of parameter in some URL
Model, is because in practice, hacker is exactly to add in the parameter of URL using the URL main means attacked server
Add illegal field.That is, the feature vector of parameter and the feature vector of parameter in abnormal URL exist in normal URL
Significant difference.The feature of parameter is often rare in abnormal URL, hence it is evident that is different from the feature of parameter in normal URL.
Based on this, the core concept of the technical solution described in this specification is, by the spy of parameter in known some URL
Sign vector is used as data sample, structure isolation forest model.Completely cutting off forest model can be according to the ginseng in some URL to be detected
Several feature vectors judges whether the URL is abnormal.
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation
Attached drawing in book one or more embodiment, is clearly and completely described the technical solution in this specification embodiment, shows
So, described embodiment is only this specification part of the embodiment, instead of all the embodiments.It is real by this specification
Apply example, those of ordinary skill in the art's all other embodiments obtained without creative efforts, all should
When the scope for belonging to this specification protection.
Below in conjunction with attached drawing, the technical solution that each embodiment of this specification provides is described in detail.
Fig. 1 is the model training method flow chart that this specification embodiment provides, and is comprised the following steps:
S100:Obtain some URL.
In this specification embodiment, executive agent can be server or other equipment with data-handling capacity,
Hereafter will be by taking executive agent be server as an example.
It is well known that for a URL, the parameter in the URL can include user's (being probably hacker) input
Some information.
For example, " http://server/path/documentName1=value1&name2=value2 " is
The typical structure of one URL, "" after data be parameter.More than one parameter, different ginsengs can be included in one URL
Usually separated between number with " & ", each parameter has parameter name and parameter value.Parameter value is typically by input by user.At this
In example, which includes two parameters, and " name1=value1 " represents that the parameter value of the parameter of the entitled name1 of parameter is
value1;" name2=value2 " represents that the parameter value of the parameter of the entitled name2 of parameter is value2.
Hacker can sometimes add abnormal illegal field in the parameter of URL, to attack server.Citing comes
Say, if the normal URL sent during bone fide subscriber login service device is as follows:
“http://server/path/documentName1=user1&name2=password1 ", wherein, first
The parameter value of parameter is user name " user1 ", and the parameter value of second parameter is password " password1 ", and server parsing should
URL, for verification username and password by rear, user signs in server.
And hacker can use the means of SQL injection attack when wanting to pretend to be user " user1 " login service device, to service
Device sends following abnormal URL:
“http://server/path/documentName1=user1&name2=" ' or 1=1 ", wherein, first
The parameter value of parameter is user name " user1 ", and the parameter value of second parameter does not correspond to the password of user name but, and right and wrong
Method field " " ' or 1=1 ", due to the intrinsic characteristic of SQL syntax, when server can not be to the close of user according to the illegal field
When code is verified, which can resolve to executable code by server and be performed by server, cause hacker need not
Password can also sign in the account of user " user1 ", and the data of user are operated.
In this step S200, the normal URL in part is generally comprised in some URL that server obtains and part is abnormal
URL.And since abnormal URL is more rare, its shared ratio in some URL is relatively low.
S102:For each URL, the parameter in the URL is extracted.
In this specification embodiment, the parameter in server extraction URL can be extract in URL the parameter name that includes and
Parameter value or the parameter value for only extracting the parameter in URL.
In addition, server is directed to each URL, whole parameters in the URL can be extracted, can also be extracted in the URL
Partial parameters.
Since in practical applications, the probability of occurrence of some parameter names is relatively low, illegal field is also seldom added to by hacker
In the corresponding parameter value of the relatively low parameter name of these probabilities of occurrence, therefore, server can not extract the relatively low ginseng of probability of occurrence
Several corresponding parameter values.
Specifically, server can be directed to each URL, in the parameter included in the URL, determine that parameter name meets to refer to
The parameter of fixed condition;For definite each parameter, the parameter value of the parameter is extracted.Wherein, the specified requirements can be ginseng
Several probabilities of occurrence, which is more than, specifies probable value.Filtered out consequently, it is possible to which the relatively low parameter of probability can will appear from, alleviate clothes
Business device handles the burden of data in subsequent step.
S104:For each parameter of extraction, the corresponding feature vector of the parameter is determined.
In this specification embodiment, each parameter of extraction can be directed to, according to the parameter value of the parameter, determines the ginseng
The corresponding N-dimensional feature vector of number;N is the natural number more than 0.
Wherein, it is total can to include character sum, letter that the parameter value of parameter includes for the dimension of parameter character pair vector
It is number, numerical sum, special symbol sum, the quantity of kinds of characters, the quantity of different letter, the quantity of different digital, different
It is at least one in the quantity of special symbol.
With URL " http://server/path/documentExemplified by name1=user1&name2=password1 ",
The parameter value of parameter name1 in the URL is user1, the character sum 5 which includes, letter sum 4, numerical sum
1, special symbol sum 0, the quantity 5 of kinds of characters, the quantity 4 of different letters, the quantity 1 of different digital, different specific symbols
Number quantity 0.So, the corresponding feature vectors of parameter name1 can be (5,4,1,0,5,4,1,0).
Value it is possible to further each dimension to feature vector is normalized.Herein or edge is used
Example explanation, can be according to formula8 feature vector values corresponding to parameter name1 are normalized.Wherein, x tables
Show feature vector value, z represents the character sum that parameter name1 is included, and y represents the numerical value built after x is normalized.
So, the feature vector vector that parameter name1 is included is (5/5,4/5,1/5,0/5,5/5,4/5,1/5,0/5), i.e., (1,0.8,
0.2,0,1,0.8,0.8,0).
S106:According to the corresponding feature vector of each parameter, structure isolation forest model.
In this specification embodiment, using isolation forest algorithm, built according to the corresponding feature vector of each parameter
Completely cut off forest model, whether the isolation forest model is abnormal for detecting URL.Wherein, without to the corresponding feature of each parameter
Vector carries out normal or abnormal mark.
The thought for completely cutting off forest algorithm is briefly introduced herein.Referring to Fig. 2 a, in this 10 points shown in Fig. 2 a
Including hollow dots and solid dot, the quantity of hollow dots is more (8), and distribution is more concentrated, and the negligible amounts (2 of solid dot
It is a), and be distributed more scattered.Hollow dots can be considered as to normal point, solid dot is considered as abnormal point.That is, abnormal point is just
It is point that is a small number of and peeling off.Then proceed as follows:
1st division:Occur a line at random, these points in Fig. 2 a are divided into part A and part B, obtain Fig. 2 b.
2nd division:For part A, continue occur a line at random, the point in part A is divided into C portion and D portions
Point;Equally, for part B, also occur a line at random, the point in part B is divided into E parts and F parts, such as Fig. 2 c.
Occur a line at random continuing with the part of each new division, continue to divide, until the plane shown in by Fig. 2 a
10 parts are divided into, each part only includes 1 point, i.e., each point is divided into an exclusive part (if some part
In only include a point, then this part be exactly this point exclusive part) in.Obviously, be easier also faster can be by for solid dot
Exclusive part is included in, as shown in figure 2b, the solid dot in the upper right corner is included in exclusive part (F parts).That is,
Some point is easier to be included in exclusive part, this puts more abnormal.
Above thought is based on, in forest algorithm is completely cut off, there are S classification tree (can be specifically binary tree), pin
For each binary tree, these points shown in Fig. 2 a are put into root node, since root node, the condition of bifurcated is each time
Random (being divided each time with a line occurred at random to point), it is more early to fall into leaf node in the binary tree
Point its abnormal possibility it is higher.
By taking above-mentioned isolation forest algorithm as an example, to, according to the corresponding feature vector of each parameter, being built in step S106
Isolation forest model is sketched.
Isolation forest includes S binary tree (iTree), and for each iTree, the process of the training iTree can describe
It is as follows:
The first step, in each feature vector, randomly chooses M feature vector, is put into the root node of the iTree;
Second step, in N number of dimension of feature vector, is randomly assigned a dimension (specified dimension), and specify dimension at this
In the value of degree, a value is randomly assigned, as cut value;The specified dimension of the cut value between M feature vector
Value in maximum and minimum value between;
3rd step, according to the cut value, is divided into two parts, the value of specified dimension is not less than by M feature vector
The feature vector of the cut value is a part, and what the value of specified dimension was less than the cut value is another part;
4th step, recurrence perform second step and the 3rd step, until the iTree reaches specified altitude assignment or the leaf of the iTree
A feature vector has all only been placed on node.Wherein, specified altitude assignment can be arranged as required to, generally log2M.
Four steps more than, it is possible to train an iTree.
It should be noted that as the next iTree of training, in the first step, can in whole feature vectors with
Machine selects M feature vector, and M feature vector can also be randomly choosed in the feature vector of not selected mistake.
Above-mentioned four step is repeated, S trained iTree, composition isolation forest model can be obtained.
A kind of method flow diagram for detection URL that Fig. 3 this specification embodiment provides, comprises the following steps:
S300:Obtain URL.
S302:Extract the parameter in the URL.
S304:For each parameter of extraction, the corresponding feature vector of the parameter is determined.
S306:The corresponding feature vector of each parameter is input to the isolation forest model built in advance, with to described
URL carries out abnormality detection.
The URL in Fig. 3 is URL to be detected.The explanation of step S300~S304 may refer to step S100~
S104, repeats no more.
In step S306, the corresponding feature vector of each parameter can be input to isolation forest model, obtained each
The output of parameter corresponding model as a result, according to the corresponding model output of each parameter as a result, judge in each parameter whether
In the presence of abnormal parameter.
It is possible to further which for each parameter, the corresponding feature vector of the parameter is input to isolation forest model, with
Classified by each classification tree in the isolation forest model to the corresponding feature vector of the parameter, determine that the parameter corresponds to
The average height of leaf node that is fallen into each classification tree of feature vector, as the corresponding model output knot of the parameter
Fruit;Then, for each parameter, if the corresponding model output result of the parameter is less than specified threshold, it is determined that the abnormal parameters,
If the corresponding model output result of the parameter is not less than specified threshold, it is determined that the parameter is normal;When definite any abnormal parameters
When, it is determined that there is abnormal parameter in each parameter;When determining that each parameter is all normal, it is determined that there is no abnormal in each parameter
Parameter.
By the method shown in Fig. 1 and Fig. 3, the feature vector of the parameter in URL, structure isolation forest model, makes
Obtaining server can be detected by completely cutting off the received URL of forest model docking, if it is determined that the URL received is abnormal, then
It can refuse to parse the URL, so as to avoid, by hacker attack, improving internet security.
In addition, by this specification embodiment, it has also been found that potential network attack means.Specifically, by every
Exhausted forest model can determine whether some URL is abnormal, if the URL is abnormal, then meaning that the parameter value of the parameter is
Abnormal, abnormal parameter value can prompt staff to analyze the attack means of hacker's use, facilitate staff perfect
Safety regulation.
Based on the model training method shown in Fig. 1, this specification embodiment also correspondence provides a kind of model training apparatus,
As shown in figure 4, including:
Acquisition module 401, obtains some uniform resource position mark URLs;
Extraction module 402, for each URL, extracts the parameter in the URL;
Determining module 403, for each parameter of extraction, determines the corresponding feature vector of the parameter;
Processing module 404, it is gloomy according to the corresponding feature vector of each parameter, structure isolation forest model, the isolation
Whether woods model is abnormal for detecting URL.
The extraction module, for each URL, in the parameter that the URL is included, determines that parameter name meets specified requirements
Parameter;For definite each parameter, the parameter value of the parameter is extracted.
The determining module, for each parameter of extraction, according to the parameter value of the parameter, determines the corresponding N of the parameter
Dimensional feature vector;N is the natural number more than 0.
The dimension of N-dimensional feature vector, specifically includes:The character that the parameter value of parameter includes is total, alphabetical total, digital total
In the quantity of the alphabetical quantity of number, total number of symbols, the quantity of kinds of characters, difference, the quantity of different digital and distinct symbols
It is at least one.
Based on the method for the detection URL shown in Fig. 3, this specification embodiment also correspondence provides a kind of dress of detection URL
Put, as shown in figure 5, including:
Acquisition module 501, obtains URL;
Extraction module 502, extracts the parameter in the URL;
Determining module 503, for each parameter of extraction, determines the corresponding feature vector of the parameter;
The corresponding feature vector of each parameter, is input to the isolation forest built in advance by abnormality detection module
Isolation Forest models, to carry out abnormality detection to the URL;The isolation forest model is instructed according to above-mentioned model
Practice method structure.
The corresponding feature vector of each parameter, is input to the isolation forest built in advance by the abnormality detection module
Isolation Forest models, build the corresponding model output result of each parameter;According to the corresponding mould of each parameter
Type is exported as a result, judging in each parameter with the presence or absence of abnormal parameter;If, it is determined that the URL is abnormal;Otherwise, it determines institute
It is normal to state URL.
The abnormality detection module, for each parameter, is input to what is built in advance by the corresponding feature vector of the parameter
Completely cut off forest model, to divide by each classification tree in the isolation forest model the corresponding feature vector of the parameter
Class, determines the average height for the leaf node that the corresponding feature vector of the parameter is fallen into each classification tree, as the parameter
Corresponding model exports result;For each parameter, if the corresponding model output result of the parameter is less than specified threshold, it is determined that
The abnormal parameters, if the corresponding model output result of the parameter is not less than specified threshold, it is determined that the parameter is normal.
Based on the model training method shown in Fig. 2, this specification embodiment also correspondence provides a kind of model training equipment,
As shown in fig. 6, including one or more processors and memory, the memory storage has program, and is configured to by institute
State one or more processors and perform following steps:
Obtain some uniform resource position mark URLs;
For each URL, the parameter in the URL is extracted;
For each parameter of extraction, the corresponding feature vector of the parameter is determined;
According to the corresponding feature vector of each parameter, structure isolation forest Isolation Forest models, it is described every
Whether exhausted forest model is abnormal for detecting URL.
Based on the method for the detection URL shown in Fig. 3, this specification embodiment also correspondence provides a kind of setting for detection URL
Standby, as shown in fig. 7, comprises one or more processors and memory, the memory storage has program, and be configured to by
One or more of processors perform following steps:
Obtain URL;
Extract the parameter in the URL;
For each parameter of extraction, the corresponding feature vector of the parameter is determined;
The corresponding feature vector of each parameter is input to the isolation forest Isolation Forest moulds built in advance
Type, to carry out abnormality detection to the URL;The isolation forest model is built according to above-mentioned model training method.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.Especially for Fig. 6 and
For equipment shown in Fig. 7, since it is substantially similar to embodiment of the method, so description is fairly simple, related part referring to
The part explanation of embodiment of the method.
In the 1990s, the improvement for a technology can clearly distinguish be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And as the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow is programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, its logic function determines device programming by user.By designer
Voluntarily programming comes a number character repertoire " integrated " on a piece of PLD, without asking chip maker to design and make
Make dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, and this programming is also used instead mostly
" logic compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development
Seemingly, and the source code before compiling also handy specific programming language is write, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but have many kinds, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also should
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method flow can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be performed by (micro-) processor can
Read medium, logic gate, switch, application-specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller include but not limited to following microcontroller
Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited
Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that except with
Pure computer readable program code mode is realized beyond controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained in the form of logic gate, switch, application-specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact
Existing identical function.Therefore this controller is considered a kind of hardware component, and various to being used for realization for including in it
The device of function can also be considered as the structure in hardware component.Or even, the device for being used for realization various functions can be regarded
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates, can specifically be realized by computer chip or entity,
Or realized by having the function of certain product.One kind typically realizes that equipment is computer.Specifically, computer for example may be used
Think that personal computer, laptop computer, cell phone, camera phone, smart phone, individual digital symbol assistant, media are broadcast
Put appointing in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment
The combination of what equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented
The function of each unit can be realized in same or multiple softwares and/or hardware during specification.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program
Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
The present invention be with reference to according to the method for the embodiment of the present invention, the flow of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or square frame in journey and/or square frame and flowchart and/or the block diagram.These computer programs can be provided
The processors of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce
A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices, which produces, to be used in fact
The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to
Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or
The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted
Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or
The instruction performed on other programmable devices is provided and is used for realization in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM),
Digit multifunctional optical disk (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storages are set
Standby or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, count according to herein
Calculation machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability
Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment it is intrinsic will
Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described
Also there are other identical element in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that the embodiment of this specification can be provided as method, system or computer program production
Product.Therefore, this specification can use the implementation in terms of complete hardware embodiment, complete software embodiment or combination software and hardware
The form of example.Moreover, this specification can use the computer for wherein including computer usable program code in one or more
The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of product.
This specification can be described in the general context of computer executable instructions, such as journey
Sequence module.Usually, program module include performing particular task or realize the routine of particular abstract data type, program, object,
Component, data structure etc..This specification can also be put into practice in a distributed computing environment, in these distributed computing environment
In, by performing task by communication network and connected remote processing devices.In a distributed computing environment, program module
It can be located in the local and remote computer-readable storage medium including storage device.
The foregoing is merely the embodiment of this specification, this specification is not limited to.For art technology
For personnel, this specification can have various modifications and variations.It is all this specification spirit and principle within made it is any
Modification, equivalent substitution, improvement etc., should be included within the right of this specification.