The content of the invention
It is a primary object of the present invention to provide a kind of network malice reptile recognition methods and device, to solve to dislike network
When meaning reptile is identified the problem of accuracy difference.
To achieve these goals, according to an aspect of the invention, there is provided a kind of network malice reptile recognition methods.
The network according to the invention malice reptile recognition methods includes:Network address to be detected is obtained, wherein, survey grid to be checked
Network address is meets the network address of the first preparatory condition, if passing through network address access target website in preset time period
Number exceed preset times threshold value, it is determined that network address meet the first preparatory condition;It is corresponding to obtain network address to be detected
User access information, wherein, user access information includes the network terminal information of access target website, network terminal information bag
Include objective network end message;According to the network to be detected that objective network end message is included in corresponding user access information
The number of location and the number calculating target access ratio of network address access target website to be detected that passes through in preset time period;
Judge whether target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that
Behavior by network address access target website to be detected is that malice reptile accesses behavior.
Further, user access information corresponding to obtaining network address to be detected includes:Obtain the access of targeted website
Daily record;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in result and accesses letter
Breath.
Further, pre-set ratio threshold value is determined by the following method:Grid of reference address set is determined, wherein, reference
Collection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, if
By the number of network address access target website not less than preset times threshold value in preset time period, it is determined that network address
Meet the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address set
Corresponding user access information determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is corresponding in grid of reference address set
User access information in comprising objective network end message network address number and in preset time period by reference to
The ratio of the number of network address access target website in collection of network addresses.
Further, grid of reference address is determined by multiple network address access target websites in preset time period
Set includes:Detect and preset in preset time period by the way that whether the number of multiple network address access target websites exceedes respectively
Frequency threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network address for ginseng
Examine the network address in collection of network addresses.
Further, according to the network address to be detected that objective network end message is included in corresponding user access information
Number and target access ratio bag is calculated by the number of network address access target website to be detected in preset time period
Include:Statistics passes through the number of network address access target website to be detected in preset time period;Judge network address to be detected
Whether objective network end message is included in corresponding user access information;If user corresponding to network address to be detected accesses
Objective network end message is included in information, then is treated in user access information corresponding to statistics comprising objective network end message
Detect the number of network address;Target access ratio is calculated by below equation:S=A/B, wherein, S is target access ratio, A
To include the number of the network address to be detected of objective network end message in corresponding user access information, B is when default
Between pass through the number of network address access target website to be detected in section.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of network malice reptile identification device.
The network according to the invention malice reptile identification device includes:First acquisition unit, for obtaining network to be detected
Address, wherein, network address to be detected is meets the network address of the first preparatory condition, if passing through net in preset time period
The number of network address access target website exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;Second
Acquiring unit, for obtaining user access information corresponding to network address to be detected, wherein, user access information includes accessing mesh
The network terminal information of website is marked, network terminal information includes objective network end message;Computing unit, for corresponding to
In user access information comprising objective network end message network address to be detected number and pass through in preset time period
The number of network address access target website to be detected calculates target access ratio;Judging unit, for judging target access ratio
Whether rate exceedes pre-set ratio threshold value;Determining unit, for when target access ratio exceedes pre-set ratio threshold value, it is determined that passing through
The behavior of network address access target website to be detected is that malice reptile accesses behavior.
Further, second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;Solution
Module is analysed, for parsing access log, obtains analysis result;Second acquisition module, it is to be detected for being obtained in analytically result
User access information corresponding to network address.
Further, by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of reference
Location is gathered, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second preparatory condition
Network address, if by the number of network address access target website not less than preset times threshold in preset time period
Value, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address set
User access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set default
Rate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address set
The number of the network address of network end message and in preset time period by reference in collection of network addresses network address visit
Ask the ratio of the number of targeted website.
Further, included in preset time period by multiple network address access target websites, the first determining module:
Detection sub-module, whether surpassed by the number of multiple network address access target websites in preset time period for detecting respectively
Cross preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period not less than default
The network address of frequency threshold value is the network address in grid of reference address set.
Further, computing unit includes:First statistical module, pass through survey grid to be checked in preset time period for counting
The number of network address access target website;Judge module, for judging in user access information corresponding to network address to be detected
Whether objective network end message is included;Second statistical module, in user access information corresponding to network address to be detected
In when including objective network end message, it is to be detected comprising objective network end message in user access information corresponding to statistics
The number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is target
Access ratio, A include the number of the network address to be detected of objective network end message, B in the user access information for corresponding to
To pass through the number of network address access target website to be detected in preset time period.
By the present invention, using the method comprised the following steps:Network address to be detected is obtained, wherein, network to be detected
Address is meets the network address of the first preparatory condition, if passing through network address access target website in preset time period
Number exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;Obtain corresponding to network address to be detected
User access information, wherein, user access information includes the network terminal information of access target website, and network terminal information includes
Objective network end message;According to the network address to be detected that objective network end message is included in corresponding user access information
Number and target access ratio is calculated by the number of network address access target website to be detected in preset time period;Sentence
Whether disconnected target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that logical
The behavior for crossing network address access target website to be detected is that malice reptile accesses behavior, solves and network malice reptile is carried out
During identification the problem of accuracy difference, and then determine to pass through survey grid to be checked in the case where target access ratio exceedes pre-set ratio threshold condition
The behavior of network address access target website is that malice reptile accesses behavior, has reached the accuracy for improving the identification of network malice reptile
Effect.
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only
The embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model of the application protection
Enclose.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments herein described herein.In addition, term " comprising " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit
Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear
It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
According to an embodiment of the invention, there is provided a kind of network malice reptile recognition methods.
Fig. 1 is the flow chart of the embodiment of the network according to the invention malice reptile recognition methods.As shown in figure 1, the party
Method includes step S102 to step S110:
Step S102, network address to be detected is obtained, wherein, network address to be detected is to meet the net of the first preparatory condition
Network address, if exceeding preset times threshold value by the number of network address access target website in preset time period, really
Determine network address and meet the first preparatory condition.
In some cases, in preset time period by the number of a fixed network address access target website very
Greatly (beyond visit capacity generally), at this moment need that the property of the access by the network address is identified, wrap
Include and judge it for legal artificial access, or network malice reptile accesses.Here preset times threshold value is a reference value,
Can with but be not limited to according to the experience of web analytics person set.
Step S104, user access information corresponding to network address to be detected is obtained, wherein, user access information includes visiting
The network terminal information of targeted website is asked, network terminal information includes objective network end message.
User access information corresponding to network address to be detected being obtained by the following method includes:Obtain targeted website
Access log;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in result
Access information.
Preferably, user agent's information (UserAgent) corresponding to detection network address is analytically obtained in result,
The information such as the browser that is used when including user access target website in UserAgent, operating system, terminal device model.
Step S106, according to the network address to be detected that objective network end message is included in corresponding user access information
Number and target access ratio is calculated by the number of network address access target website to be detected in preset time period.
Preferably, can by the following method corresponding in user access information comprising objective network end message
The number of network address to be detected and calculated in preset time period by the number of network address access target website to be detected
Target access ratio includes:Statistics passes through the number of network address access target website to be detected in preset time period;Judge
Whether objective network end message is included in user access information corresponding to network address to be detected;If network address to be detected
Objective network end message is included in corresponding user access information, then includes target network in user access information corresponding to statistics
The number of the network address to be detected of network end message;Target access ratio is calculated by below equation:S=A/B, wherein, S is
Target access ratio, A are of the network address to be detected comprising objective network end message in corresponding user access information
Number, B are to pass through the number of network address access target website to be detected in preset time period.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period,
Number by the first IP address access target website is 1000 times.Wherein, the number to be conducted interviews using IE browser is 900
It is secondary.Then target access ratio is S=0.9.
Step S108, judges whether target access ratio exceedes pre-set ratio threshold value.
Pre-set ratio threshold value is a referential data, and the numerical value can be drafted according to the experience of judgement person, can also
Set according to legal IP access ratio.
Preferably, pre-set ratio threshold value can be determined by the following method:Grid of reference address set is determined, wherein, ginseng
Examining collection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, such as
Fruit is in preset time period by the number of network address access target website not less than preset times threshold value, it is determined that network
Location meets the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address set
User access information corresponding to conjunction determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is right in grid of reference address set
In the user access information answered comprising objective network end message network address number and pass through ginseng in preset time period
Examine the ratio of the number of the network address access target website in collection of network addresses.
Targeted website is have accessed by multiple network address in preset time period, can determine to refer to by the following method
Collection of network addresses:Detect and whether exceeded by the number of multiple network address access target websites in preset time period respectively
Preset times threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network address
For the network address in grid of reference address set.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period,
The network address that the number of access target website exceedes preset times threshold value (500 times) is the first IP address, is not above presetting
The network address of frequency threshold value is the second IP address, the 3rd IP address and the 4th IP address, wherein, pass through the first IP address and access
The number of targeted website is 1000 times (browser that access uses is IE browser for 800 times);Respectively by the 2nd IP
The number of location, the 3rd IP address and the 4th IP address access target website be 100 times, 200 times and 300 times, access use it is clear
Device of looking at is respectively 50 times, 100 times and 150 times of IE browser.By the second IP address, the 3rd IP address and the 4th IP address
It is considered as grid of reference address set, it is (50+100+150)/(100+200+300)=0.5 to calculate pre-set ratio threshold value.And target
Access ratio is 800/1000=0.8.Because 0.8 more than 0.5, it is possible to think by the first IP address access target website
Behavior be malice reptile access behavior.
Step S110, if target access ratio exceedes pre-set ratio threshold value, it is determined that visited by network address to be detected
The behavior for asking targeted website is that malice reptile accesses behavior.
Web crawlers is the automatic program and script for capturing web message according to certain rule.Due to pre-set ratio
Threshold value is a kind of statistical value of the artificial access situation in preset time period, and access situation is probability of happening corresponding to the statistical value
Maximum artificial access situation, a standard can be used as to contrast.When target access ratio has exceeded pre-set ratio threshold value,
It is considered that the access for passing through the network address is the access of non-artificial progress, belongs to malice reptile and access behavior.
The embodiment is as a result of following steps:Network address to be detected is obtained, wherein, network address to be detected is full
The network address of the first preparatory condition of foot, if exceeded in preset time period by the number of network address access target website
Preset times threshold value, it is determined that network address meets the first preparatory condition;User corresponding to network address to be detected is obtained to access
Information, wherein, user access information includes the network terminal information of access target website, and network terminal information includes objective network
End message;According in corresponding user access information include objective network end message network address to be detected number and
Target access ratio is calculated by the number of network address access target website to be detected in preset time period;Judge that target is visited
Ask whether ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that by be detected
The behavior of network address access target website is that malice reptile accesses behavior, is solved accurate when network malice reptile is identified
The problem of true property difference, and then determine to visit by network address to be detected in the case where target access ratio exceedes pre-set ratio threshold condition
The behavior for asking targeted website is that malice reptile accesses behavior, has reached the effect for the accuracy for improving the identification of network malice reptile.
Fig. 2 is the flow chart of the second embodiment of the network according to the invention malice reptile recognition methods, and Fig. 2 can conduct
A kind of preferred embodiment of embodiment illustrated in fig. 1.As shown in Fig. 2 the method comprising the steps of S201 to step S208:
Step S201, user is accessed and carries out log recording, including the UserAgent when IP address of user, access.
Step S202, daily record is parsed, judges IP address for suspicion IP or legal IP.
Suspicion IP refers to IP address of the number more than preset times threshold value of access target website in preset time period;It is legal
IP is IP address of the number not less than preset times threshold value of access target website in preset time period.
Step S203, the IP address for being judged as suspicion IP, UserAgent corresponding to each suspicion IP is divided
Analysis.
Step S204, calculate each suspicion IP UserAgent ratios.
UserAgent ratios are target access ratio, for example, in preset time period, suspicion IP access targets website
The operating system used accounts for the ratio of suspicion IP access targets website total degree for the number of the systems of windows 7.
Step S205, for the legal IP judged, using all legal IP as legal IP groups, calculate legal IP groups
UserAgent ratios.
The UserAgent ratios of legal IP groups are pre-set ratio threshold value.For example, pass through all IP address in legal IP groups
The operating system that access target website uses accounts for all IP address access targets in legal IP groups for the number of the systems of windows 7
The ratio of website total degree.
Step S206, judge the difference of each suspicion IP UserAgent ratios and the UserAgent ratios of legal IP groups
Whether preset error value is more than.
Step S207, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is little
In preset error value, then accessed by suspicion IP access to be artificial.
Step S208, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is more than
Preset error value, then by the suspicion IP non-artificial access of access, belong to malice reptile and access behavior.
During malice reptile is identified, one is detected by analyzing UserAgent by above-mentioned steps for the embodiment
Whether individual IP address is that multiple users access the IP address being used in conjunction with, and manslaughters rate when reducing identification malice reptile, improves
The accuracy of malice reptile identification.
According to an embodiment of the invention, there is provided a kind of network malice reptile identification device.It is it should be noted that of the invention
The network malice reptile identification device of embodiment can be used for performing the network malice reptile identification that the embodiment of the present invention is provided
Method, the network malice reptile recognition methods of the embodiment of the present invention can also be by network malice that the embodiment of the present invention is provided
Reptile identification device performs.
Fig. 3 is the schematic diagram of the embodiment of the network according to the invention malice reptile identification device.As shown in figure 3, the dress
Put including:First acquisition unit 10, second acquisition unit 20, computing unit 30, judging unit 40 and determining unit 50.
First acquisition unit 10, for obtaining network address to be detected, wherein, network address to be detected is pre- for satisfaction first
If the network address of condition, if exceeding preset times by the number of network address access target website in preset time period
Threshold value, it is determined that network address meets the first preparatory condition.
Second acquisition unit 20, for obtaining user access information corresponding to network address to be detected, wherein, user accesses
Information includes the network terminal information of access target website, and network terminal information includes objective network end message.
Second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;Parsing module, use
In parsing access log, analysis result is obtained;Second acquisition module, for obtaining network address pair to be detected in analytically result
The user access information answered.
Computing unit 30, for including the survey grid to be checked of objective network end message in the user access information corresponding to
The number of network address and the number calculating target access of network address access target website to be detected that passes through in preset time period
Ratio.
Alternatively, computing unit can include:First statistical module, for counting in preset time period by be detected
The number of network address access target website;Judge module, for judging user access information corresponding to network address to be detected
In whether include objective network end message;Second statistical module, for accessing letter in user corresponding to network address to be detected
It is to be checked comprising objective network end message in user access information corresponding to statistics when objective network end message is included in breath
Survey the number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is mesh
Access ratio is marked, A is the number of the network address to be detected comprising objective network end message in corresponding user access information,
B is to pass through the number of network address access target website to be detected in preset time period.
Judging unit 40, for judging whether target access ratio exceedes pre-set ratio threshold value.
It is alternatively possible to by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of reference
Address set, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second default bar
The network address of part, if by the number of network address access target website not less than preset times threshold in preset time period
Value, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address set
User access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set default
Rate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address set
The number of the network address of network end message and in preset time period by reference in collection of network addresses network address visit
Ask the ratio of the number of targeted website.
Alternatively, if by multiple network address access target websites in preset time period, the first determining module can
With including:Detection sub-module, for detecting respectively in preset time period by time of multiple network address access target websites
Whether number exceedes preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period not
Network address more than preset times threshold value is the network address in grid of reference address set.
Determining unit 50, for when target access ratio exceedes pre-set ratio threshold value, it is determined that by network to be detected
The behavior of location access target website is that malice reptile accesses behavior.
The network malice reptile identification device that the present embodiment provides includes:First acquisition unit 10, second acquisition unit 20,
Computing unit 30, judging unit 40 and determining unit 50.By the device, solve accurate when network malice reptile is identified
The problem of true property difference, and then determined in the case where target access ratio exceedes pre-set ratio threshold condition by determining unit 50 by treating
The behavior of detection network address access target website is that malice reptile accesses behavior, has reached and has improved the identification of network malice reptile
The effect of accuracy.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored
Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific
Hardware and software combines.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should be included in the scope of the protection.