CN104391979B - Network malice reptile recognition methods and device - Google Patents

Network malice reptile recognition methods and device Download PDF

Info

Publication number
CN104391979B
CN104391979B CN201410743056.6A CN201410743056A CN104391979B CN 104391979 B CN104391979 B CN 104391979B CN 201410743056 A CN201410743056 A CN 201410743056A CN 104391979 B CN104391979 B CN 104391979B
Authority
CN
China
Prior art keywords
network address
network
detected
address
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410743056.6A
Other languages
Chinese (zh)
Other versions
CN104391979A (en
Inventor
崔维福
范浩文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201410743056.6A priority Critical patent/CN104391979B/en
Publication of CN104391979A publication Critical patent/CN104391979A/en
Application granted granted Critical
Publication of CN104391979B publication Critical patent/CN104391979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Abstract

The invention discloses a kind of network malice reptile recognition methods and device.The network malice reptile recognition methods includes:Obtain network address to be detected;Obtain user access information corresponding to network address to be detected;Target access ratio is calculated according to the number for the network address to be detected that objective network end message is included in corresponding user access information and by the number of network address access target website to be detected in preset time period;Judge whether target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that the behavior by network address access target website to be detected is that malice reptile accesses behavior.Pass through the present invention, accuracy difference when solving the problems, such as that network malice reptile is identified, and then it is that malice reptile accesses behavior to be determined in the case where target access ratio exceedes pre-set ratio threshold condition by the behavior of network address access target website to be detected, has reached the effect for the accuracy for improving the identification of network malice reptile.

Description

Network malice reptile recognition methods and device
Technical field
The present invention relates to internet arena, in particular to a kind of network malice reptile recognition methods and device.
Background technology
Web crawlers is a kind of automatic program for obtaining web page contents.For a website, malice reptile it is a large amount of Request can consume the performance of server, waste many resources, or even server can be caused to delay machine.Therefore, it is necessary to ensure user Website is normally accessed, and avoids large-scale malice reptile from initiating to access to website.
The method of existing identification malice reptile is the server record daily record by parsing website, and frequency is found out from daily record It is numerous to access the network address of the website, and the network address is filtered out, forbid the network address to access the website again.But It is higher that this method manslaughters rate.Because company or building generally externally only have a public network network address, website records Network address may not be personal network address, but the network address of company or building, that is to say, that pass through the network That address accesses website is multiple users, should not be taken as the access of malice reptile.
During for network malice reptile being identified in correlation technique the problem of accuracy difference, not yet propose at present effective Solution.
The content of the invention
It is a primary object of the present invention to provide a kind of network malice reptile recognition methods and device, to solve to dislike network When meaning reptile is identified the problem of accuracy difference.
To achieve these goals, according to an aspect of the invention, there is provided a kind of network malice reptile recognition methods.
The network according to the invention malice reptile recognition methods includes:Network address to be detected is obtained, wherein, survey grid to be checked Network address is meets the network address of the first preparatory condition, if passing through network address access target website in preset time period Number exceed preset times threshold value, it is determined that network address meet the first preparatory condition;It is corresponding to obtain network address to be detected User access information, wherein, user access information includes the network terminal information of access target website, network terminal information bag Include objective network end message;According to the network to be detected that objective network end message is included in corresponding user access information The number of location and the number calculating target access ratio of network address access target website to be detected that passes through in preset time period; Judge whether target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that Behavior by network address access target website to be detected is that malice reptile accesses behavior.
Further, user access information corresponding to obtaining network address to be detected includes:Obtain the access of targeted website Daily record;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in result and accesses letter Breath.
Further, pre-set ratio threshold value is determined by the following method:Grid of reference address set is determined, wherein, reference Collection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, if By the number of network address access target website not less than preset times threshold value in preset time period, it is determined that network address Meet the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address set Corresponding user access information determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is corresponding in grid of reference address set User access information in comprising objective network end message network address number and in preset time period by reference to The ratio of the number of network address access target website in collection of network addresses.
Further, grid of reference address is determined by multiple network address access target websites in preset time period Set includes:Detect and preset in preset time period by the way that whether the number of multiple network address access target websites exceedes respectively Frequency threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network address for ginseng Examine the network address in collection of network addresses.
Further, according to the network address to be detected that objective network end message is included in corresponding user access information Number and target access ratio bag is calculated by the number of network address access target website to be detected in preset time period Include:Statistics passes through the number of network address access target website to be detected in preset time period;Judge network address to be detected Whether objective network end message is included in corresponding user access information;If user corresponding to network address to be detected accesses Objective network end message is included in information, then is treated in user access information corresponding to statistics comprising objective network end message Detect the number of network address;Target access ratio is calculated by below equation:S=A/B, wherein, S is target access ratio, A To include the number of the network address to be detected of objective network end message in corresponding user access information, B is when default Between pass through the number of network address access target website to be detected in section.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of network malice reptile identification device.
The network according to the invention malice reptile identification device includes:First acquisition unit, for obtaining network to be detected Address, wherein, network address to be detected is meets the network address of the first preparatory condition, if passing through net in preset time period The number of network address access target website exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;Second Acquiring unit, for obtaining user access information corresponding to network address to be detected, wherein, user access information includes accessing mesh The network terminal information of website is marked, network terminal information includes objective network end message;Computing unit, for corresponding to In user access information comprising objective network end message network address to be detected number and pass through in preset time period The number of network address access target website to be detected calculates target access ratio;Judging unit, for judging target access ratio Whether rate exceedes pre-set ratio threshold value;Determining unit, for when target access ratio exceedes pre-set ratio threshold value, it is determined that passing through The behavior of network address access target website to be detected is that malice reptile accesses behavior.
Further, second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;Solution Module is analysed, for parsing access log, obtains analysis result;Second acquisition module, it is to be detected for being obtained in analytically result User access information corresponding to network address.
Further, by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of reference Location is gathered, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second preparatory condition Network address, if by the number of network address access target website not less than preset times threshold in preset time period Value, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address set User access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set default Rate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address set The number of the network address of network end message and in preset time period by reference in collection of network addresses network address visit Ask the ratio of the number of targeted website.
Further, included in preset time period by multiple network address access target websites, the first determining module: Detection sub-module, whether surpassed by the number of multiple network address access target websites in preset time period for detecting respectively Cross preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period not less than default The network address of frequency threshold value is the network address in grid of reference address set.
Further, computing unit includes:First statistical module, pass through survey grid to be checked in preset time period for counting The number of network address access target website;Judge module, for judging in user access information corresponding to network address to be detected Whether objective network end message is included;Second statistical module, in user access information corresponding to network address to be detected In when including objective network end message, it is to be detected comprising objective network end message in user access information corresponding to statistics The number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is target Access ratio, A include the number of the network address to be detected of objective network end message, B in the user access information for corresponding to To pass through the number of network address access target website to be detected in preset time period.
By the present invention, using the method comprised the following steps:Network address to be detected is obtained, wherein, network to be detected Address is meets the network address of the first preparatory condition, if passing through network address access target website in preset time period Number exceedes preset times threshold value, it is determined that network address meets the first preparatory condition;Obtain corresponding to network address to be detected User access information, wherein, user access information includes the network terminal information of access target website, and network terminal information includes Objective network end message;According to the network address to be detected that objective network end message is included in corresponding user access information Number and target access ratio is calculated by the number of network address access target website to be detected in preset time period;Sentence Whether disconnected target access ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that logical The behavior for crossing network address access target website to be detected is that malice reptile accesses behavior, solves and network malice reptile is carried out During identification the problem of accuracy difference, and then determine to pass through survey grid to be checked in the case where target access ratio exceedes pre-set ratio threshold condition The behavior of network address access target website is that malice reptile accesses behavior, has reached the accuracy for improving the identification of network malice reptile Effect.
Brief description of the drawings
The accompanying drawing for forming the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate to be used to explain the present invention, do not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the first embodiment of the network according to the invention malice reptile recognition methods;
Fig. 2 is the flow chart of the second embodiment of the network according to the invention malice reptile recognition methods;And
Fig. 3 is the schematic diagram of the embodiment of the network according to the invention malice reptile identification device;
Embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application Accompanying drawing, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application part, rather than whole embodiments.Based on the embodiment in the application, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model of the application protection Enclose.
It should be noted that term " first " in the description and claims of this application and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use Data can exchange in the appropriate case, so as to embodiments herein described herein.In addition, term " comprising " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing series of steps or unit Process, method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include without clear It is listing to Chu or for the intrinsic other steps of these processes, method, product or equipment or unit.
According to an embodiment of the invention, there is provided a kind of network malice reptile recognition methods.
Fig. 1 is the flow chart of the embodiment of the network according to the invention malice reptile recognition methods.As shown in figure 1, the party Method includes step S102 to step S110:
Step S102, network address to be detected is obtained, wherein, network address to be detected is to meet the net of the first preparatory condition Network address, if exceeding preset times threshold value by the number of network address access target website in preset time period, really Determine network address and meet the first preparatory condition.
In some cases, in preset time period by the number of a fixed network address access target website very Greatly (beyond visit capacity generally), at this moment need that the property of the access by the network address is identified, wrap Include and judge it for legal artificial access, or network malice reptile accesses.Here preset times threshold value is a reference value, Can with but be not limited to according to the experience of web analytics person set.
Step S104, user access information corresponding to network address to be detected is obtained, wherein, user access information includes visiting The network terminal information of targeted website is asked, network terminal information includes objective network end message.
User access information corresponding to network address to be detected being obtained by the following method includes:Obtain targeted website Access log;Access log is parsed, obtains analysis result;User corresponding to network address to be detected is analytically obtained in result Access information.
Preferably, user agent's information (UserAgent) corresponding to detection network address is analytically obtained in result, The information such as the browser that is used when including user access target website in UserAgent, operating system, terminal device model.
Step S106, according to the network address to be detected that objective network end message is included in corresponding user access information Number and target access ratio is calculated by the number of network address access target website to be detected in preset time period.
Preferably, can by the following method corresponding in user access information comprising objective network end message The number of network address to be detected and calculated in preset time period by the number of network address access target website to be detected Target access ratio includes:Statistics passes through the number of network address access target website to be detected in preset time period;Judge Whether objective network end message is included in user access information corresponding to network address to be detected;If network address to be detected Objective network end message is included in corresponding user access information, then includes target network in user access information corresponding to statistics The number of the network address to be detected of network end message;Target access ratio is calculated by below equation:S=A/B, wherein, S is Target access ratio, A are of the network address to be detected comprising objective network end message in corresponding user access information Number, B are to pass through the number of network address access target website to be detected in preset time period.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period, Number by the first IP address access target website is 1000 times.Wherein, the number to be conducted interviews using IE browser is 900 It is secondary.Then target access ratio is S=0.9.
Step S108, judges whether target access ratio exceedes pre-set ratio threshold value.
Pre-set ratio threshold value is a referential data, and the numerical value can be drafted according to the experience of judgement person, can also Set according to legal IP access ratio.
Preferably, pre-set ratio threshold value can be determined by the following method:Grid of reference address set is determined, wherein, ginseng Examining collection of network addresses includes multiple network address, and multiple network address are to meet the network address of the second preparatory condition, such as Fruit is in preset time period by the number of network address access target website not less than preset times threshold value, it is determined that network Location meets the second preparatory condition;Obtain user access information corresponding to grid of reference address set;According to grid of reference address set User access information corresponding to conjunction determines pre-set ratio threshold value, wherein, pre-set ratio threshold value is right in grid of reference address set In the user access information answered comprising objective network end message network address number and pass through ginseng in preset time period Examine the ratio of the number of the network address access target website in collection of network addresses.
Targeted website is have accessed by multiple network address in preset time period, can determine to refer to by the following method Collection of network addresses:Detect and whether exceeded by the number of multiple network address access target websites in preset time period respectively Preset times threshold value;It is determined that in preset time period access target website number not less than preset times threshold value network address For the network address in grid of reference address set.
For example, it is IE browser that objective network end message, which is the browser that access uses,.Assuming that in preset time period, The network address that the number of access target website exceedes preset times threshold value (500 times) is the first IP address, is not above presetting The network address of frequency threshold value is the second IP address, the 3rd IP address and the 4th IP address, wherein, pass through the first IP address and access The number of targeted website is 1000 times (browser that access uses is IE browser for 800 times);Respectively by the 2nd IP The number of location, the 3rd IP address and the 4th IP address access target website be 100 times, 200 times and 300 times, access use it is clear Device of looking at is respectively 50 times, 100 times and 150 times of IE browser.By the second IP address, the 3rd IP address and the 4th IP address It is considered as grid of reference address set, it is (50+100+150)/(100+200+300)=0.5 to calculate pre-set ratio threshold value.And target Access ratio is 800/1000=0.8.Because 0.8 more than 0.5, it is possible to think by the first IP address access target website Behavior be malice reptile access behavior.
Step S110, if target access ratio exceedes pre-set ratio threshold value, it is determined that visited by network address to be detected The behavior for asking targeted website is that malice reptile accesses behavior.
Web crawlers is the automatic program and script for capturing web message according to certain rule.Due to pre-set ratio Threshold value is a kind of statistical value of the artificial access situation in preset time period, and access situation is probability of happening corresponding to the statistical value Maximum artificial access situation, a standard can be used as to contrast.When target access ratio has exceeded pre-set ratio threshold value, It is considered that the access for passing through the network address is the access of non-artificial progress, belongs to malice reptile and access behavior.
The embodiment is as a result of following steps:Network address to be detected is obtained, wherein, network address to be detected is full The network address of the first preparatory condition of foot, if exceeded in preset time period by the number of network address access target website Preset times threshold value, it is determined that network address meets the first preparatory condition;User corresponding to network address to be detected is obtained to access Information, wherein, user access information includes the network terminal information of access target website, and network terminal information includes objective network End message;According in corresponding user access information include objective network end message network address to be detected number and Target access ratio is calculated by the number of network address access target website to be detected in preset time period;Judge that target is visited Ask whether ratio exceedes pre-set ratio threshold value;If target access ratio exceedes pre-set ratio threshold value, it is determined that by be detected The behavior of network address access target website is that malice reptile accesses behavior, is solved accurate when network malice reptile is identified The problem of true property difference, and then determine to visit by network address to be detected in the case where target access ratio exceedes pre-set ratio threshold condition The behavior for asking targeted website is that malice reptile accesses behavior, has reached the effect for the accuracy for improving the identification of network malice reptile.
Fig. 2 is the flow chart of the second embodiment of the network according to the invention malice reptile recognition methods, and Fig. 2 can conduct A kind of preferred embodiment of embodiment illustrated in fig. 1.As shown in Fig. 2 the method comprising the steps of S201 to step S208:
Step S201, user is accessed and carries out log recording, including the UserAgent when IP address of user, access.
Step S202, daily record is parsed, judges IP address for suspicion IP or legal IP.
Suspicion IP refers to IP address of the number more than preset times threshold value of access target website in preset time period;It is legal IP is IP address of the number not less than preset times threshold value of access target website in preset time period.
Step S203, the IP address for being judged as suspicion IP, UserAgent corresponding to each suspicion IP is divided Analysis.
Step S204, calculate each suspicion IP UserAgent ratios.
UserAgent ratios are target access ratio, for example, in preset time period, suspicion IP access targets website The operating system used accounts for the ratio of suspicion IP access targets website total degree for the number of the systems of windows 7.
Step S205, for the legal IP judged, using all legal IP as legal IP groups, calculate legal IP groups UserAgent ratios.
The UserAgent ratios of legal IP groups are pre-set ratio threshold value.For example, pass through all IP address in legal IP groups The operating system that access target website uses accounts for all IP address access targets in legal IP groups for the number of the systems of windows 7 The ratio of website total degree.
Step S206, judge the difference of each suspicion IP UserAgent ratios and the UserAgent ratios of legal IP groups Whether preset error value is more than.
Step S207, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is little In preset error value, then accessed by suspicion IP access to be artificial.
Step S208, if the difference of the UserAgent ratios of suspicion IP UserAgent ratios and legal IP groups is more than Preset error value, then by the suspicion IP non-artificial access of access, belong to malice reptile and access behavior.
During malice reptile is identified, one is detected by analyzing UserAgent by above-mentioned steps for the embodiment Whether individual IP address is that multiple users access the IP address being used in conjunction with, and manslaughters rate when reducing identification malice reptile, improves The accuracy of malice reptile identification.
According to an embodiment of the invention, there is provided a kind of network malice reptile identification device.It is it should be noted that of the invention The network malice reptile identification device of embodiment can be used for performing the network malice reptile identification that the embodiment of the present invention is provided Method, the network malice reptile recognition methods of the embodiment of the present invention can also be by network malice that the embodiment of the present invention is provided Reptile identification device performs.
Fig. 3 is the schematic diagram of the embodiment of the network according to the invention malice reptile identification device.As shown in figure 3, the dress Put including:First acquisition unit 10, second acquisition unit 20, computing unit 30, judging unit 40 and determining unit 50.
First acquisition unit 10, for obtaining network address to be detected, wherein, network address to be detected is pre- for satisfaction first If the network address of condition, if exceeding preset times by the number of network address access target website in preset time period Threshold value, it is determined that network address meets the first preparatory condition.
Second acquisition unit 20, for obtaining user access information corresponding to network address to be detected, wherein, user accesses Information includes the network terminal information of access target website, and network terminal information includes objective network end message.
Second acquisition unit includes:First acquisition module, for obtaining the access log of targeted website;Parsing module, use In parsing access log, analysis result is obtained;Second acquisition module, for obtaining network address pair to be detected in analytically result The user access information answered.
Computing unit 30, for including the survey grid to be checked of objective network end message in the user access information corresponding to The number of network address and the number calculating target access of network address access target website to be detected that passes through in preset time period Ratio.
Alternatively, computing unit can include:First statistical module, for counting in preset time period by be detected The number of network address access target website;Judge module, for judging user access information corresponding to network address to be detected In whether include objective network end message;Second statistical module, for accessing letter in user corresponding to network address to be detected It is to be checked comprising objective network end message in user access information corresponding to statistics when objective network end message is included in breath Survey the number of network address;Computing module, for calculating target access ratio by below equation:S=A/B, wherein, S is mesh Access ratio is marked, A is the number of the network address to be detected comprising objective network end message in corresponding user access information, B is to pass through the number of network address access target website to be detected in preset time period.
Judging unit 40, for judging whether target access ratio exceedes pre-set ratio threshold value.
It is alternatively possible to by determining pre-set ratio threshold value with lower module:First determining module, for determining grid of reference Address set, wherein, grid of reference address set includes multiple network address, and multiple network address are to meet the second default bar The network address of part, if by the number of network address access target website not less than preset times threshold in preset time period Value, it is determined that network address meets the second preparatory condition;3rd acquisition module, for obtaining corresponding to grid of reference address set User access information;Second determining module, determined for the user access information according to corresponding to grid of reference address set default Rate threshold, wherein, pre-set ratio threshold value is to include target network in corresponding user access information in grid of reference address set The number of the network address of network end message and in preset time period by reference in collection of network addresses network address visit Ask the ratio of the number of targeted website.
Alternatively, if by multiple network address access target websites in preset time period, the first determining module can With including:Detection sub-module, for detecting respectively in preset time period by time of multiple network address access target websites Whether number exceedes preset times threshold value;Determination sub-module, for determining the number of the access target website in preset time period not Network address more than preset times threshold value is the network address in grid of reference address set.
Determining unit 50, for when target access ratio exceedes pre-set ratio threshold value, it is determined that by network to be detected The behavior of location access target website is that malice reptile accesses behavior.
The network malice reptile identification device that the present embodiment provides includes:First acquisition unit 10, second acquisition unit 20, Computing unit 30, judging unit 40 and determining unit 50.By the device, solve accurate when network malice reptile is identified The problem of true property difference, and then determined in the case where target access ratio exceedes pre-set ratio threshold condition by determining unit 50 by treating The behavior of detection network address access target website is that malice reptile accesses behavior, has reached and has improved the identification of network malice reptile The effect of accuracy.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general Computing device realize that they can be concentrated on single computing device, or be distributed in multiple computing devices and formed Network on, alternatively, they can be realized with the program code that computing device can perform, it is thus possible to they are stored Performed in the storage device by computing device, either they are fabricated to respectively each integrated circuit modules or by they In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific Hardware and software combines.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (8)

  1. A kind of 1. network malice reptile recognition methods, it is characterised in that including:
    Network address to be detected is obtained, wherein, the network address to be detected is meets the network address of the first preparatory condition, such as Fruit exceedes preset times threshold value in preset time period by the number of network address access target website, it is determined that the network Address meets first preparatory condition;
    User access information corresponding to the network address to be detected is obtained, wherein, the user access information includes accessing institute The network terminal information of targeted website is stated, the network terminal information includes objective network end message;
    According in corresponding user access information include the objective network end message network address to be detected number and The number for accessing the targeted website by the network address to be detected in preset time period calculates target access ratio;
    Judge whether the target access ratio exceedes pre-set ratio threshold value;And
    If the target access ratio exceedes the pre-set ratio threshold value, it is determined that is accessed by the network address to be detected The behavior of the targeted website is that malice reptile accesses behavior,
    Wherein, according to for the network address to be detected that the objective network end message is included in corresponding user access information Number and the number calculating target access ratio for accessing the targeted website by the network address to be detected in preset time period Rate includes:
    Count the number for accessing the targeted website by the network address to be detected in the preset time period;
    Judge whether include the objective network end message in user access information corresponding to the network address to be detected;
    If including the objective network end message in user access information corresponding to the network address to be detected, count The number of network address to be detected comprising the objective network end message in corresponding user access information;And
    The target access ratio is calculated by below equation:
    S=A/B,
    Wherein, S is the target access ratio, and A is to include the objective network end message in corresponding user access information Network address to be detected number, B is to pass through the network address to be detected in preset time period to access the target network The number stood.
  2. 2. according to the method for claim 1, it is characterised in that obtain user corresponding to the network address to be detected and access Information includes:
    Obtain the access log of the targeted website;
    The access log is parsed, obtains analysis result;And
    User access information corresponding to the network address to be detected is obtained from the analysis result.
  3. 3. according to the method for claim 1, it is characterised in that determine the pre-set ratio threshold value by the following method:
    Grid of reference address set is determined, wherein, the grid of reference address set includes multiple network address, the multiple net Network address is the network address for meeting the second preparatory condition, if accessing institute by network address in the preset time period The number of targeted website is stated not less than the preset times threshold value, it is determined that the network address meets the described second default bar Part;
    Obtain user access information corresponding to the grid of reference address set;And
    Pre-set ratio threshold value is determined according to user access information corresponding to the grid of reference address set, wherein, it is described default Rate threshold is to include the objective network end message in corresponding user access information in the grid of reference address set Network address number and described in being accessed in preset time period by network address in the grid of reference address set The ratio of the number of targeted website.
  4. 4. according to the method for claim 3, it is characterised in that visited in the preset time period by multiple network address The targeted website is asked, determines that grid of reference address set includes:
    Detect respectively in the preset time period by the multiple network address access the targeted website number whether More than the preset times threshold value;And
    It is determined that network of the number not less than the preset times threshold value of the targeted website is accessed in the preset time period Address is the network address in the grid of reference address set.
  5. A kind of 5. network malice reptile identification device, it is characterised in that including:
    First acquisition unit, for obtaining network address to be detected, wherein, the network address to be detected is default for satisfaction first The network address of condition, if exceeding preset times threshold by the number of network address access target website in preset time period Value, it is determined that the network address meets first preparatory condition;
    Second acquisition unit, for obtaining user access information corresponding to the network address to be detected, wherein, the user visits Ask that information includes accessing the network terminal information of the targeted website, the network terminal information is believed including objective network terminal Breath;
    Computing unit, for including the network to be detected of the objective network end message in the user access information corresponding to The number of address and the number calculating mesh for accessing the targeted website by the network address to be detected in preset time period Mark access ratio;
    Judging unit, for judging whether the target access ratio exceedes pre-set ratio threshold value;And
    Determining unit, for when the target access ratio exceedes the pre-set ratio threshold value, it is determined that by described to be detected The behavior that network address accesses the targeted website is that malice reptile accesses behavior,
    Wherein, the computing unit includes:
    First statistical module, the target is accessed by the network address to be detected in the preset time period for counting The number of website;
    Judge module, for judging whether include the target network in user access information corresponding to the network address to be detected Network end message;
    Second statistical module, for including the objective network in user access information corresponding to the network address to be detected During end message, the network address to be detected comprising the objective network end message in user access information corresponding to statistics Number;And
    Computing module, for calculating the target access ratio by below equation:
    S=A/B,
    Wherein, S is the target access ratio, and A is to include the objective network end message in corresponding user access information Network address to be detected number, B is to pass through the network address to be detected in preset time period to access the target network The number stood.
  6. 6. device according to claim 5, it is characterised in that the second acquisition unit includes:
    First acquisition module, for obtaining the access log of the targeted website;
    Parsing module, for parsing the access log, obtain analysis result;And
    Second acquisition module, letter is accessed for obtaining user corresponding to the network address to be detected from the analysis result Breath.
  7. 7. device according to claim 5, it is characterised in that by determining the pre-set ratio threshold value with lower module:
    First determining module, for determining grid of reference address set, wherein, the grid of reference address set includes multiple nets Network address, the multiple network address is to meet the network address of the second preparatory condition, if in the preset time period The number of the targeted website is accessed not less than the preset times threshold value by network address, it is determined that the network address expires Foot second preparatory condition;
    3rd acquisition module, for obtaining user access information corresponding to the grid of reference address set;And
    Second determining module, pre-set ratio threshold is determined for the user access information according to corresponding to the grid of reference address set Value, wherein, the pre-set ratio threshold value is described to be included in corresponding user access information in the grid of reference address set The number of the network address of objective network end message and in preset time period by the grid of reference address set Network address accesses the ratio of the number of the targeted website.
  8. 8. device according to claim 7, it is characterised in that visited in the preset time period by multiple network address The targeted website is asked, first determining module includes:
    Detection sub-module, the target is accessed by the multiple network address in the preset time period for detecting respectively Whether the number of website exceedes the preset times threshold value;And
    Determination sub-module, for determining that the number that the targeted website is accessed in the preset time period is default not less than described The network address of frequency threshold value is the network address in the grid of reference address set.
CN201410743056.6A 2014-12-05 2014-12-05 Network malice reptile recognition methods and device Active CN104391979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410743056.6A CN104391979B (en) 2014-12-05 2014-12-05 Network malice reptile recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410743056.6A CN104391979B (en) 2014-12-05 2014-12-05 Network malice reptile recognition methods and device

Publications (2)

Publication Number Publication Date
CN104391979A CN104391979A (en) 2015-03-04
CN104391979B true CN104391979B (en) 2017-12-19

Family

ID=52609883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410743056.6A Active CN104391979B (en) 2014-12-05 2014-12-05 Network malice reptile recognition methods and device

Country Status (1)

Country Link
CN (1) CN104391979B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202077B (en) * 2015-04-30 2020-01-21 华为技术有限公司 Task distribution method and device
CN105187396A (en) * 2015-08-11 2015-12-23 小米科技有限责任公司 Method and device for identifying web crawler
CN105426415A (en) * 2015-10-30 2016-03-23 Tcl集团股份有限公司 Management method, device and system of website access request
CN107341395B (en) * 2016-05-03 2020-03-03 北京京东尚科信息技术有限公司 Method for intercepting reptiles
CN106021552A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Internet creeper concurrency data collection method and system based on crowd behavior simulation
CN106886906B (en) * 2016-08-15 2020-06-30 阿里巴巴集团控股有限公司 Equipment identification method and device
CN108429721B (en) * 2017-02-15 2020-08-04 腾讯科技(深圳)有限公司 Identification method and device for web crawler
CN108664489B (en) * 2017-03-29 2022-12-23 腾讯科技(深圳)有限公司 Website content monitoring method and device
CN107392022B (en) * 2017-07-20 2020-12-29 北京星选科技有限公司 Crawler identification and processing method and related device
CN109510800B (en) * 2017-09-14 2020-11-27 北京金山云网络技术有限公司 Network request processing method and device, electronic equipment and storage medium
CN107800684B (en) * 2017-09-20 2018-09-18 贵州白山云科技有限公司 A kind of low frequency reptile recognition methods and device
CN107786542A (en) * 2017-09-26 2018-03-09 杭州安恒信息技术有限公司 Methods of marking and device based on big data intellectual analysis malice IP
CN109559245B (en) * 2017-09-26 2022-02-25 北京国双科技有限公司 Method and device for identifying specific user
CN107770171B (en) * 2017-10-18 2020-01-24 厦门集微科技有限公司 Verification method and system for anti-crawler of server
CN107943949B (en) * 2017-11-24 2020-06-26 厦门集微科技有限公司 Method and server for determining web crawler
CN108388794B (en) * 2018-02-01 2020-09-08 金蝶软件(中国)有限公司 Page data protection method and device, computer equipment and storage medium
CN109145185B (en) * 2018-02-02 2019-07-02 北京数安鑫云信息技术有限公司 It identifies web crawlers and extracts the method and device of web crawlers feature
CN108521402B (en) * 2018-03-07 2021-01-22 创新先进技术有限公司 Method, device and equipment for outputting label
CN109474640B (en) * 2018-12-29 2021-01-05 奇安信科技集团股份有限公司 Malicious crawler detection method and device, electronic equipment and storage medium
CN109862018B (en) * 2019-02-21 2021-07-09 中国工商银行股份有限公司 Anti-crawler method and system based on user access behavior
CN110245280B (en) * 2019-05-06 2021-03-02 北京三快在线科技有限公司 Method and device for identifying web crawler, storage medium and electronic equipment
CN110401639B (en) * 2019-06-28 2021-12-24 平安科技(深圳)有限公司 Method and device for judging abnormality of network access, server and storage medium thereof
CN110460593B (en) * 2019-07-29 2021-12-14 腾讯科技(深圳)有限公司 Network address identification method, device and medium for mobile traffic gateway
CN111859069B (en) * 2020-07-15 2021-10-15 北京市燃气集团有限责任公司 Network malicious crawler identification method, system, terminal and storage medium
KR102595303B1 (en) * 2021-04-20 2023-10-27 주식회사 스크립터스 Method for detecting web scraping, and server for executing the same
CN113612768B (en) * 2021-08-02 2023-10-17 北京知道创宇信息技术股份有限公司 Network protection method and related device
CN114978674B (en) * 2022-05-18 2023-12-05 中国电信股份有限公司 Crawler recognition enhancement method and device, storage medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101707598A (en) * 2009-11-10 2010-05-12 成都市华为赛门铁克科技有限公司 Method, device and system for identifying flood attack
CN103905372A (en) * 2012-12-24 2014-07-02 珠海市君天电子科技有限公司 Method and device for removing false alarm of phishing website
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318609A1 (en) * 2012-05-25 2013-11-28 Electronics And Telecommunications Research Institute Method and apparatus for quantifying threat situations to recognize network threat in advance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101707598A (en) * 2009-11-10 2010-05-12 成都市华为赛门铁克科技有限公司 Method, device and system for identifying flood attack
CN103905372A (en) * 2012-12-24 2014-07-02 珠海市君天电子科技有限公司 Method and device for removing false alarm of phishing website
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof

Also Published As

Publication number Publication date
CN104391979A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN104391979B (en) Network malice reptile recognition methods and device
CN105357195B (en) Go beyond one's commission leak detection method and the device of web access
CN103179132B (en) A kind of method and device detecting and defend CC attack
CN105930727B (en) Reptile recognition methods based on Web
CN109951500A (en) Network attack detecting method and device
CN104601601B (en) The detection method and device of web crawlers
CN105577608B (en) Network attack behavior detection method and device
CN107465648A (en) The recognition methods of warping apparatus and device
CN107465651A (en) Network attack detecting method and device
CN110609937A (en) Crawler identification method and device
CN105471819A (en) Account abnormity detection method and account abnormity detection device
CN104935609A (en) Network attack detection method and detection apparatus
CN106888211A (en) The detection method and device of a kind of network attack
CN105516390B (en) Domain name management method and device
CN108763274A (en) Recognition methods, device, electronic equipment and the storage medium of access request
CN103905372A (en) Method and device for removing false alarm of phishing website
CN104933069A (en) Method and system for analyzing web browsing statistics of desktop terminal
CN106301980A (en) A kind of brush amount tool detection method and apparatus
CN106802904A (en) Log processing method, apparatus and system
CN106685899A (en) Method and device for identifying malicious access
CN106921504A (en) A kind of method and apparatus of the associated path for determining different user
CN108768921A (en) A kind of malicious web pages discovery method and system of feature based detection
CN107888602A (en) A kind of method and device for detecting abnormal user
CN107800686A (en) A kind of fishing website recognition methods and device
CN107528812A (en) A kind of attack detection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Malicious web crawler recognition method and device

Effective date of registration: 20190531

Granted publication date: 20171219

Pledgee: Shenzhen Black Horse World Investment Consulting Co., Ltd.

Pledgor: Beijing Guoshuang Technology Co.,Ltd.

Registration number: 2019990000503

PE01 Entry into force of the registration of the contract for pledge of patent right
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: Beijing Guoshuang Technology Co.,Ltd.

CP02 Change in the address of a patent holder