CN108898009A - A kind of anti-crawler method, terminal and computer-readable medium - Google Patents

A kind of anti-crawler method, terminal and computer-readable medium Download PDF

Info

Publication number
CN108898009A
CN108898009A CN201810685659.3A CN201810685659A CN108898009A CN 108898009 A CN108898009 A CN 108898009A CN 201810685659 A CN201810685659 A CN 201810685659A CN 108898009 A CN108898009 A CN 108898009A
Authority
CN
China
Prior art keywords
mine script
terminal
target webpage
digs
script
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810685659.3A
Other languages
Chinese (zh)
Inventor
邵壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810685659.3A priority Critical patent/CN108898009A/en
Priority to PCT/CN2018/108672 priority patent/WO2020000747A1/en
Publication of CN108898009A publication Critical patent/CN108898009A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/54Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of anti-crawler method, terminal and computer-readable medium, wherein method includes:It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in target webpage;When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the terminal loads of the target webpage and executes at least one JS digging mine script.The embodiment of the present invention in this way, can largely consume the cpu resource of crawler terminal, make crawler terminal can not normal use, with achieve the purpose that it is counter climb, and can prevent crawler terminal around JS rendering crawl data, improve the anti-validity climbed.

Description

A kind of anti-crawler method, terminal and computer-readable medium
Technical field
The present invention relates to field of communication technology more particularly to a kind of anti-crawler methods, terminal and computer-readable medium.
Background technique
The crawler on internet is broadly divided into static crawler and dynamic crawler at present, since static crawler cannot parse JavaScript (abbreviation JS) code, therefore be usually using by adding JS in webpage for the countermeasure of static crawler Code achievees the purpose that counter climb in a manner of carrying out JS rendering processing to webpage.However it is this to webpage carry out JS rendering at The mode of reason cannot achieve the purpose that anti-crawler for that can parse the dynamic crawler of JS.Therefore, how more effectively to prevent to climb The influence of worm, the safety for improving internet have become a hot topic of research.
Summary of the invention
The embodiment of the present invention provides a kind of anti-crawler method, terminal and computer-readable medium, can be improved it is counter climb it is effective Property, improve Internet security.
In a first aspect, the embodiment of the invention provides a kind of anti-crawler method, this method includes:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in mesh Mark webpage;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage Terminal loads and execute at least one described JS and dig mine script.
Further, the acquisition JS digs mine script bank, including:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine foot that the JS, which digs mine script bank, This;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script, The script of language.
Further, the link that at least one described JS is dug mine script is embedded in target webpage, including:
The determine the probability first that the data information in each position region is crawled in the target webpage according to historical record The band of position, the first position region are that the data information in each position region in the target webpage is crawled maximum probability The band of position;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
Further, described when detecting the enabled instruction of link of at least one JS digging mine script, triggering is visited It asks the terminal loads of the target webpage and executes at least one described JS and dig mine script, including:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage The language supported according to the browser that the terminal uses of terminal, choose that the first JS digs mine script or the 2nd JS digs mine Script;
The browser for calling the terminal to use load and execute the first JS that the terminal is chosen dig mine script or 2nd JS digs mine script.
The link that JS digs mine script is embedded in the first position region of the target webpage.
Further, the terminal that the triggering accesses the target webpage is supported according to the browser that the terminal uses Language, choose that the first JS digs mine script or the 2nd JS digs mine script, including:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version This number set in any one;
If it is judged that being yes, it is determined that browser that the terminal uses supports WebAssembly language, from described JS, which is dug in mine script bank, chooses the first JS digging mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from institute It states and chooses the 2nd JS digging mine script in JS digging mine script bank.
Further, the link that at least one described JS is dug mine script is embedded in target webpage, including:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls everybody The determine the probability second position region of the data information in region is set, the second position region is each position in the target webpage The browser that the data information in region is supported the WebAssembly language crawls the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl respectively in the target webpage according to historical record The determine the probability the third place region of the data information of the band of position, the third place region be in the target webpage everybody The data information for setting region is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by institute State the third place region that the 2nd JS digs the link insertion target webpage of mine script.
Further, described when detecting the enabled instruction of link of at least one JS digging mine script, triggering is visited It asks the terminal loads of the target webpage and executes at least one described JS and dig mine script, including:
When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage End judges whether the browser version number that the terminal uses is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and The browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering It accesses the browser that the terminal of the target webpage uses and loads and execute the 2nd JS and dig mine script.
Second aspect, the embodiment of the invention provides a kind of terminal, which includes for executing above-mentioned first aspect The unit of method.
The third aspect, the embodiment of the invention provides another terminal, including processor, input equipment, output equipment and Memory, the processor, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing branch The computer program that terminal executes the above method is held, the computer program includes program instruction, and the processor is configured to use In calling described program instruction, the method for executing above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer storage medium It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor State the method that processor executes above-mentioned first aspect.
The embodiment of the present invention is embedded in target webpage by the way that at least one JS to be dug to the link of mine script, when detect it is described extremely When the enabled instruction for the link that a few JS digs mine script, triggering accesses the terminal loads of the target webpage and execution is described extremely A few JS digs mine script.It may be implemented to load and execute in the link when crawler terminal crawls the link in this way Digging mine script, largely to consume the cpu resource of crawler terminal, make crawler terminal can not normal use, achieve the purpose that it is counter climb, And can prevent crawler terminal from crawling data around JS rendering, improve the anti-validity climbed.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of schematic flow diagram of anti-crawler method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow diagram of the anti-crawler method of another kind provided in an embodiment of the present invention;
Fig. 3 is the schematic flow diagram of another anti-crawler method provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic block diagram of terminal provided in an embodiment of the present invention;
Fig. 5 is another terminal schematic block diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
It should be appreciated that the term used in this description of the invention is merely for the sake of for the purpose of describing particular embodiments And it is not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless up and down Text clearly indicates other situations, and otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Anti- crawler method provided in an embodiment of the present invention can be executed by terminal, and the terminal can be mobile phone, computer, put down On the intelligent terminals such as plate, smartwatch.The anti-crawler method for being applied to terminal is illustrated below.
This programme is to be directed to that the anti-crawler technology scheme that the dynamic crawler of JS is cooked can be parsed, in the prior art about net The anti-crawler strategy of commonly using of page is usually to add JS code in webpage to carry out JS rendering processing, with realize to static crawler into Row is counter to climb, however this mode cannot achieve the purpose that anti-crawler, therefore we for the dynamic crawler that can parse JS Case improves anti-crawler strategy for the dynamic crawler that can parse JS.
Before introducing this programme, the dynamic crawler that can parse JS is introduced first.The dynamic that JS can be parsed is climbed There are two types of situations for the crawler process of worm:One is by directly parsing to html source code;In addition also there is a special case It is:If the data of webpage are using asynchronous load, such as the data of ajax load, then can parse the dynamic crawler of JS without Method parses and checks source code, for this special circumstances, can parse the dynamic crawler of JS frequently with some third-party works Tool is such as:Automatic testing instrument Selenium or cooperation in the browser of WebAssembly language is supported to browse without interface Load data are gone in device PhantomJs, the behavior of simulation browser.It should be noted that the WebAssembly is that one kind operates in One of modern network browser webpage assembler language, file suffixes are " .wasm ".
Therefore, this programme is directed to the dynamic crawler that can parse JS used above two crawler side during crawler Formula has write at least one JS in advance and has dug mine script, and establishes JS and dig mine script bank, is dug in mine script bank according to the JS and is compiled in advance At least one JS write digs mine script, generates the link that at least one JS digs mine script.The embodiment of the present invention by generation this extremely A few JS digs the higher band of position of probability that data information is crawled in the link insertion target webpage of mine script, if inspection It measures and accesses the link that at least one described JS in the terminal opening insertion of the target webpage target webpage digs mine script, then Determine access the target webpage terminal be crawler terminal, and trigger access the target webpage terminal load and execute automatically institute At least one JS stated in link digs mine script.Wherein, at least one described JS digs in mine script and contains a large amount of operations, therefore The terminal for accessing the target webpage can consume a large amount of CPU money of the terminal when executing at least one described JS and digging mine script Source, make the terminal for accessing the target webpage occur Caton etc. can not normal use the case where, to achieve the purpose that counter climb.It ties below Attached drawing is closed the embodiment of the present invention is described in detail.
It is a kind of schematic flow diagram of anti-crawler method provided in an embodiment of the present invention referring to Fig. 1, Fig. 1, as shown in Figure 1, This method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.Specifically, the present invention is real Applying example, described method includes following steps.
S101:It obtains JS and digs mine script bank.
In the embodiment of the present invention, terminal can establish JS and dig mine script bank, wherein the JS that the terminal is established digs mine script It include that one or more JS digs mine script in library.
In one embodiment, it includes that the first JS digs mine script and the 2nd JS digging that the JS which establishes, which is dug in mine script bank, Mine script.Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script, The script of language.
Wherein, it is the script write according to WebAssembly language that the first JS, which digs mine script, in one embodiment In, the dynamic crawler due to that can parse JS under normal conditions is often used Selenium cooperation Webdriver and calls browser Simulation browser operation, wherein the browser using Selenium cooperation Webdriver calling is to support the WebAssembly Browser;Or carry out simulation browser operation using the browser at the Selenium cooperation this no interface PhantomJS.Therefore This programme is directed to the browser or PhantomJS for the support WebAssembly that can parse that the dynamic crawler of JS is often used, The first JS, which has been write, using WebAssembly language digs mine script.
In other embodiments, the 2nd JS digs the script that mine script is JS language, due to that can parse the dynamic of JS Crawler, which can also use, does not support the generic browser of WebAssembly directly to parse html source code and parsed, therefore utilizes JS language has write the 2nd JS and has dug mine script.
S102:The link that at least one JS digs mine script is generated, at least one JS link for digging mine script is embedded in target Webpage.
In the embodiment of the present invention, terminal can be generated the link that at least one JS digs mine script, and will it is described at least one The link that JS digs mine script is embedded in target webpage.
In one embodiment, which can dig the JS that mine script and the 2nd JS digging mine script are established according to by the first JS It digs mine script bank and generates the link that at least one JS digs mine script, and at least one described JS connection for digging mine script is embedded in mesh Mark the first position region of webpage.
Wherein, the first position region is according to each position region in the target webpage recorded in historical record The determine the probability that data information is crawled, if determining certain band of position from each position region of the target webpage The maximum probability that data information is crawled, it is determined that the band of position that the data information is crawled maximum probability is first position Region.For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position region, letter Display position region is ceased, the probability that the data information in Commentary Position region is crawled is had recorded in historical record if got and is 90%, the probability for puing question to the data information of the band of position to be crawled is 60%, and the data information in Image display position region is climbed The probability taken is 20%, and the probability that the data information in information display position region is crawled is 50%, by the number in each position region It is believed that the probability that is crawled of breath compares, determine probability 90% that the data information in the Commentary Position region is crawled most Greatly, it is thus determined that the Commentary Position region in the target webpage is first position region.
In one embodiment, the first position region can be according in the target webpage recorded in historical record The keyword determination that the data information in each position region is crawled obtains, if from each position region of the target webpage really It is most to make the keyword that the data information for including in certain band of position is crawled, it is determined that the band of position is first position area Domain.For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position region, information Display position region has recorded the number of keyword that the data information in Commentary Position region is crawled if got in historical record Amount is n, and the keyword quantity for puing question to the data information of the band of position to be crawled is m, the data information in Image display position region The keyword quantity being crawled is x, and the keyword quantity that the data information in information display position region is crawled is y, by everybody It sets the keyword quantity that the data information in region is crawled to compare, if n>m>x>Y can then determine the Commentary Position The keyword quantity that the data information in region is crawled is most, it is thus determined that the Commentary Position region in the target webpage is the One band of position.
It can determine that this is climbed when detecting the terminal for accessing the target webpage is crawler terminal by the embodiment Worm terminal maximum probability can crawl the data information in the first position region of the target webpage, to improve crawler terminal opening At least one JS in the first position region digs the efficiency of the link of mine script, to improve the anti-efficiency climbed.
In one embodiment, which can dig mine script according to the first JS got and generate the first digging mine script Link, and mine script is dug according to the 2nd JS got and generates the second link for digging mine script, and the first JS is dug into mine foot The second position region of this link insertion target webpage, and the 2nd JS dug mine script link insertion target webpage the Three bands of position.
Wherein, the second position region is according to being supported in the target webpage recorded in historical record The browser of WebAssembly language crawls the determine the probability of the data information in each position region, if from the target network Determine that the data information of certain band of position is supported the browser of the WebAssembly language and climbs in each position region of page The maximum probability taken, it is determined that the position for the maximum probability that the browser for being supported the WebAssembly language crawls Region is second position region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area Domain, information display position region, if getting the number for having recorded Commentary Position region in historical record and puing question to the band of position It is believed that the browser that breath is supported the WebAssembly language crawls, and the data information in the Commentary Position region is propped up Holding the probability that the browser of the WebAssembly language crawls is 80%, and the data information for puing question to the band of position is propped up Holding the probability that the browser of the WebAssembly language crawls is 50%, by the data information quilt in the Commentary Position region The probability and the data information for puing question to the band of position for supporting the browser of the WebAssembly language to crawl are supported institute It states the probability that the browser of WebAssembly language crawls to compare, determines the data information quilt in the Commentary Position region The probability 80% for supporting the browser of the WebAssembly language to crawl is maximum, it is thus determined that commenting in the target webpage It is the second position region by the band of position.
In one embodiment, the second position region can be according to the target webpage recorded in historical record What the keyword that the browser that the data information in middle each position region is supported the WebAssembly language crawls determined, such as It is described that fruit determines that the data information for including in certain band of position is supported from each position region of the target webpage The keyword that the browser of WebAssembly language crawls is most, it is determined that the band of position is second position region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area Domain, information display position region are supported institute if getting and having recorded the data information in Commentary Position region in historical record Stating the keyword quantity that the browser of WebAssembly language crawls is n, puts question to the data information of the band of position to be supported described The keyword quantity that the browser of WebAssembly language crawls is m, and the data information in Image display position region is supported institute Stating the keyword quantity that the browser of WebAssembly language crawls is x, and the data information in information display position region is supported The keyword quantity that the browser of the WebAssembly language crawls is y, and the data information in each position region is supported institute It states the keyword quantity that the browser of WebAssembly language crawls to compare, if n>m>x>Y can then be determined described The keyword quantity that the browser that the data information in Commentary Position region is supported the WebAssembly language crawls is most, It is thus determined that the Commentary Position region in the target webpage is second position region.
In one embodiment, the third place region is according to quilt in the target webpage recorded in historical record The browser of the WebAssembly language is not supported to crawl the determine the probability of the data information in each position region, if from Determine that the data information of certain band of position is not supported the WebAssembly language in each position region of the target webpage The maximum probability that the browser of speech crawls, it is determined that the browser for not supported the WebAssembly language crawled The band of position of maximum probability is the third place region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area Domain, information display position region have recorded Image display position region and information display position if got in historical record The data information in region is not supported the browser of the WebAssembly language to crawl, and described image display position region The data information probability of not supported the browser of the WebAssembly language to crawl be 60%, the information shows position The probability that the data information for setting region is not supported the browser of the WebAssembly language to crawl is 90%, by the figure The probability and the letter for not supported the browser of the WebAssembly language to crawl as the data information in display position region The probability that the browser that the data information in breath display position region is supported the WebAssembly language crawls compares, It determining the data information in the information display position region is not supported, the browser of the WebAssembly language crawls and is general Rate 90% is maximum, it is thus determined that the information display position region in the target webpage is the third place region.
In one embodiment, the third place region can also be according to the target network recorded in historical record The keyword that the data information in each position region is not supported the browser of the WebAssembly language to crawl in page determines , if determining that the data information for including in certain band of position is not supported institute from each position region of the target webpage It is most to state the keyword that the browser of WebAssembly language crawls, it is determined that the band of position is the third place region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area Domain, information display position region are not supported if getting and having recorded the data information in Commentary Position region in historical record The keyword quantity that the browser of the WebAssembly language crawls is n, and the data information of the band of position is putd question to not supported The keyword quantity that the browser of the WebAssembly language crawls is m, and the data information in Image display position region is not by The keyword quantity for supporting the browser of the WebAssembly language to crawl is x, the data information in information display position region The keyword quantity for not supported the browser of the WebAssembly language to crawl is y, by the data information in each position region It is supported the keyword quantity that the browser of the WebAssembly language crawls to compare, if n<m<x<Y, then can be with Determine the pass that the data information in the information display position region is not supported the browser of the WebAssembly language to crawl Key number of words is most, it is thus determined that the information display position region in the target webpage is the third place region.
By the embodiment, when detecting the terminal for accessing the target webpage is crawler terminal, if accessing the mesh The browser that the terminal of mark webpage uses supports WebAssembly, then can determine that the crawler terminal maximum probability can be crawled first The data information in the second position region of the target webpage.If the browser that the terminal for accessing the target webpage uses is not supported WebAssembly can then determine that the crawler terminal maximum probability can first crawl the number in the third place region of the target webpage It is believed that breath.Therefore the language supported according to the browser that the terminal for accessing the target webpage uses, accesses the target webpage Terminal maximum probability, which can load and execute JS corresponding with the language that the browser that the crawler terminal uses is supported, digs mine script Link, to further improve the anti-efficiency and validity climbed.
In one embodiment, which, can be with when the link that at least one JS is dug to mine script is embedded in target webpage Addition prompt information is chained at least one JS digging mine script, to remind the normal users for accessing the target webpage should not This link is clicked, and at least one JS for adding prompt information link for digging mine script is embedded in any position of the target webpage Set region.
For example, if at least one JS link for digging mine script is embedded in target webpage by the terminal, it can be at least one A JS digs the prompt information for chaining addition " please don't click " of mine script, then will add at least one JS of the prompt information The link for digging mine script is embedded into the first position region on the target webpage, wherein the explanation of the target position band of position is such as Preceding described, details are not described herein again.
In another example if the first JS is dug the second position area of the link insertion target webpage of mine script by the terminal respectively Domain, and by the third place region of the link insertion target webpage of the 2nd JS digging mine script, then mine can be dug in the first JS The link of script and the 2nd JS dig the prompt information for chaining addition " please don't click " of mine script, then respectively should by addition The link that first JS of prompt information digs mine script is embedded into the second position region on the target webpage, and mentions this is added Show that the link of the 2nd JS digging mine script of information is embedded into the third place region on the target webpage, wherein the second The explanation in region and the third place region is set as previously mentioned, details are not described herein again.
It can avoid accessing the user of the normal terminal of the target webpage not small to a certain extent by the embodiment The heart clicks the link for being embedded at least one described JS digging mine script of the target webpage, to avoid to the access target webpage Normal terminal damage.
S103:When detecting that at least one JS digs the enabled instruction of the link of mine script, triggering access target webpage Terminal loads simultaneously execute at least one JS digging mine script.
In the embodiment of the present invention, the terminal access target webpage is being detected, be embedded in the target webpage if detected At least one JS dig the link of mine script and be opened, it is determined that the terminal for accessing the target webpage is crawler terminal, and is triggered It accesses the terminal loads of the target webpage and executes at least one described JS and dig mine script, access the target with a large amount of consumption The cpu resource of the terminal of webpage reaches the anti-purpose climbed, and the terminal is prevented to render the number for crawling the target webpage around JS It is believed that breath, to improve the anti-validity climbed and the safety for enhancing internet.
In the embodiment of the present invention, terminal can dig the link of mine script by generating at least one JS, by described at least one The link that a JS digs mine script is embedded in target webpage, when the enabled instruction for the link for detecting at least one JS digging mine script When, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, so that largely consumption is visited The cpu resource for asking the terminal of the target webpage, make to access the target webpage terminal can not normal use, climbed with reaching counter Purpose, and can prevent the terminal of the access target webpage from crawling data around JS rendering, improve it is counter climb have Effect property.
Referring to fig. 2, Fig. 2 is the schematic flow diagram of the anti-crawler method of another kind provided in an embodiment of the present invention, such as Fig. 2 institute Show, this method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.The embodiment of the present invention with The difference of embodiment described in above-mentioned Fig. 1 is that the embodiment of the present invention digs the link of mine script by generating at least one JS, is examining When measuring the enabled instruction for the link that at least one JS digs mine script, triggering accesses the terminal of the target webpage according to access The language that the browser that the terminal of the target webpage uses is supported digs in mine script bank from the JS and chooses corresponding JS digging mine foot This, and call browser used in the terminal for accessing the target webpage to load and execute selected JS and dig mine script, to mention The browser that the terminal that height accesses the target webpage uses executes the efficiency that JS digs mine script.Specifically, the embodiment of the present invention Described method includes following steps.
S201:It obtains JS and digs mine script bank.
In the embodiment of the present invention, terminal can establish JS and dig mine script bank, and it includes that the first JS is dug which, which digs in mine script bank, Mine script and the 2nd JS dig mine script.Wherein, the first JS dig mine script and the 2nd JS dig mine script explanation as previously mentioned, Details are not described herein again.
S202:The link that at least one JS digs mine script is generated, at least one JS link for digging mine script is embedded in target The first position region of webpage.
In the embodiment of the present invention, terminal can dig mine script bank according to the JS got and generate at least one JS digging mine foot This link, and at least one JS link for digging mine script is embedded into the cursor position region of target webpage.Wherein, described One band of position it is specific as previously mentioned, details are not described herein again.
S203:When detecting that at least one JS digs the enabled instruction of the link of mine script, triggering accesses the target webpage The language supported according to the browser that uses of terminal for accessing the target webpage of terminal, choose the first JS and dig mine script or the Two JS dig mine script.
In the embodiment of the present invention, terminal can be in the enabled instruction for the link for detecting at least one JS digging mine script When, triggering accesses the language that the terminal of the target webpage is supported according to the browser that the terminal for accessing the target webpage uses Speech chooses the first JS and digs mine script or the 2nd JS digging mine script.
Wherein, the terminal of access target webpage can according to access the target webpage terminal used in browsing version Number judge to access whether browser used in the terminal of the target webpage supports WebAssembly language.At one In embodiment, when detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target network The terminal of page judges whether the browser version number that the terminal for accessing the target webpage uses is appointing in default version number's set It anticipates one kind.If it is judged that the browser version number that the terminal for accessing the target webpage uses is appointing in default version number's set It anticipates one kind, it is determined that access the browser that the terminal of the target webpage uses and support WebAssembly language, therefore dug from the JS The first JS is chosen in mine script bank digs mine script.If it is judged that the browser version number that the terminal for accessing the target webpage uses It is not any one in default version number's set, it is determined that access the browser that the terminal of the target webpage uses and do not support WebAssembly language digs mine script to dig in mine script bank from the JS and choose the 2nd JS.
For example, it is assumed that default version number's collection be combined into Firefox1.0, Firefox9, Firefox8, Firefox7, Firefox6, Firefox5, Firefox4, Firefox3.6, Firefox3, IE9, IE8, IE7 }, when detecting at least one When JS digs the enabled instruction of the link of mine script, the terminal of triggering access target webpage judges the terminal institute of the access target webpage Whether the browser version number used is any one preset in version number's set, if it is judged that the end of access target webpage Browser version number used in holding is IE8, then the browser that can determine that the terminal of the access target webpage uses is default One of version number's set, therefore the browser that the terminal of the access target webpage uses supports WebAssembly language, from The JS, which is dug in mine script bank, chooses the first JS digging mine script.
In another example, it is assumed that default version number's collection be combined into Firefox1.0, Firefox9, Firefox8, Firefox7, Firefox6, Firefox5, Firefox4, Firefox3.6, Firefox3, IE9, IE8, IE7 }, when detecting at least one When JS digs the enabled instruction of the link of mine script, the terminal of triggering access target webpage judges the terminal institute of the access target webpage Whether the browser version number used is any one preset in version number's set, if it is judged that the end of access target webpage Browser version number used in holding is IE5, then can determine that the browser that the terminal of the access target webpage uses is not pre- If any one in version number's set, therefore the browser that the terminal of the access target webpage uses is not supported WebAssembly language digs in mine script bank from the JS and chooses the first JS digging mine script.
S204:It calls the browser that uses of terminal for accessing the target webpage load and executes the first JS digging mine script or the Two JS dig mine script.
In the embodiment of the present invention, the browser that terminal can call the terminal of access target webpage to use loads and executes visit Ask that the first JS that the terminal speed of the target webpage is chosen digs mine script or the 2nd JS digging mine script.
In one embodiment, if detecting what the terminal for accessing the target webpage was chosen from JS digging mine script bank Script is that the first JS digs mine script, then terminal can call the load of browser used in the terminal for accessing the target webpage simultaneously It executes the first JS and digs mine script.For example, being selected if detecting that the terminal for accessing the target webpage is dug in mine script bank from JS The script taken is that the first JS digs mine script, it is assumed that determine that accessing browser used in the terminal of the target webpage is IE8, It can then call browser IE8 used in the terminal for accessing the target webpage to load and execute the first JS and dig mine script.
In one embodiment, if detecting that the terminal of access target webpage digs the script chosen in mine script bank from JS It is that the 2nd JS digs mine script, then terminal can be called does not support used in the terminal for accessing the target webpage The browser of WebAssembly language, which loads and executes the 2nd JS, digs mine script.For example, if detecting access target webpage Terminal to dig the script chosen in mine script bank from JS be that the 2nd JS digs mine script, it is assumed that determine to access the target webpage Browser used in terminal is IE5, then can call the load of browser IE5 used in the terminal for accessing the target webpage simultaneously It executes the 2nd JS and digs mine script.
In the embodiment of the present invention, terminal can be embedded in target network by the link that at least one JS that will be generated digs mine script The first position region of page, when detecting the enabled instruction of link of at least one JS digging mine script, triggering access should The language that the terminal of target webpage browser according to used in the terminal for accessing the target webpage is supported chooses the first JS and digs mine Script or the 2nd JS dig mine script, and the browser for calling the terminal of access target webpage to use is loaded and executed and accesses the target The first JS that the terminal of webpage is chosen digs mine script or the 2nd JS digs mine script.In this way, it may be implemented according to access The browser that the terminal of target webpage uses, which is chosen, loads and executes corresponding JS digging mine script, accesses the target webpage with consumption Terminal cpu resource, achieve the purpose that it is counter climb, improve the anti-validity climbed.
It is the schematic flow diagram of another anti-crawler method provided in an embodiment of the present invention referring to Fig. 3, Fig. 3, such as Fig. 3 institute Show, this method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.The embodiment of the present invention with The difference of embodiment described in above-mentioned Fig. 2 is that the embodiment of the present invention passes through the link insertion that the first JS that will be generated digs mine script The second position region of target webpage, and the 2nd JS of generation is dug to the third place of the link insertion target webpage of mine script It is chosen so as to access the language of terminal browsing according to used in the terminal for accessing the target webpage of the target webpage in region The second position region or the third place region, so that the terminal for accessing the target webpage calls the terminal for accessing the target webpage Used browsing calls the JS in the second position region or the third place region that load and execute selection to dig mine script, thus into One step improves the browser that the terminal uses and executes the efficiency that JS digs mine script.Specifically, the side of the embodiment of the present invention Method includes the following steps.
S301:It obtains JS and digs mine script bank, it includes that the first JS digging mine script and the 2nd JS dig mine foot which, which digs mine script bank, This.
In the embodiment of the present invention, the available JS of terminal digs mine script bank, and the JS digs mine script bank and digs including the first JS Mine script and the 2nd JS dig mine script, and the first JS digs mine script and the 2nd JS digs the explanation of mine script as previously mentioned, herein It repeats no more.
S302:It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script.
In the embodiment of the present invention, terminal can dig the first JS for including in mine script bank according to the JS and dig mine script and second JS digs mine script, and the link of mine script is dug in the link and the 2nd JS that generate the first JS digging mine script.
S303:The first JS link for digging mine script is embedded in the second position region of the target webpage, and by this The link that two JS dig mine script is embedded in the third place region of the target webpage.
In the embodiment of the present invention, the link that the first JS can be dug mine script by terminal is embedded in the institute of the target webpage Second position region is stated, and the 2nd JS link for digging mine script is embedded in the third place area of the target webpage Domain.Wherein, the explanation in the second position region and the third place region is as previously mentioned, details are not described herein again.For example, it is assumed that the Two bands of position are the Commentary Position region in target webpage, and the third place region is the Image display position area in target webpage Domain, the then link that the first JS can be dug mine script by terminal are embedded in the Commentary Position region of the target webpage, and will The link that 2nd JS digs mine script is embedded in the Image display position region of the target webpage.
S304:When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the target webpage Terminal judges whether the browser version number that the terminal for accessing the target webpage uses is any one in default version number's set Kind, if it is judged that be it is yes, then follow the steps S305, if it is judged that be it is no, then follow the steps S306.
In the embodiment of the present invention, when detecting that the first JS digs the enabled instruction of the link of mine script, triggering access The terminal of the target webpage judges to access whether browser version number used in the terminal of the target webpage is default version Any one in this number set, if it is judged that the browser version number that uses of the terminal for accessing the target webpage is default version This number set in any one, then follow the steps S305.If it is judged that the browsing that the terminal for accessing the target webpage uses Device version number is not any one in default version number's set, thens follow the steps S306.Wherein, triggering accesses the target network The terminal of page judges to access whether browser version number used in the terminal of the target webpage is in default version number's set The specific implementation process of any one and citing as previously mentioned, details are not described herein again.
For example, when the starting for the link for detecting the first JS digging mine script for being embedded into Commentary Position region in target webpage When instruction, the terminal judgement that triggering accesses the target webpage accesses browser version used in the terminal of the target webpage It number whether is any one in default version number's set.In another example being embedded into picture position area in target webpage when detecting When 2nd JS in domain digs the enabled instruction of the link of mine script, the terminal judgement that triggering accesses the target webpage accesses the mesh Mark whether browser version number used in the terminal of webpage is any one preset in version number's set.
S305:The browser for determining that the terminal for accessing the target webpage uses supports WebAssembly language, and triggers visit Ask that the browser that the terminal of the target webpage uses loads and executes the first JS digging mine script.
In the embodiment of the present invention, if it is judged that the browser version number that the terminal for accessing the target webpage uses is default Any one in version number's set, it is determined that access described in the browser that the terminal of the target webpage uses supports WebAssembly language, and trigger and access the browser that the terminal of the target webpage uses and load and execute the first JS Dig mine script.Wherein, the browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine Specific implementation process and the illustration of script are as previously mentioned, details are not described herein again.
S306:When detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage It holds the browser used to load and executes the 2nd JS and dig mine script.
In the embodiment of the present invention, if it is judged that the browser version number that the terminal uses is not in default version number's set Any one determine then when detecting that the 2nd JS digs the enabled instruction of the link of mine script and access the target network The browser that uses of terminal of page does not support the WebAssembly language, and triggers and access the terminal of the target webpage and make Browser, which loads and executes the 2nd JS, digs mine script.Wherein, what the terminal that triggering accesses the target webpage used Browser loads and the specific implementation process for executing the 2nd JS digging mine script and illustration are as previously mentioned, herein no longer It repeats.
In the embodiment of the present invention, the first JS of generation can be dug the second of the link insertion target webpage of mine script by terminal The band of position, and the 2nd JS of generation is dug to the third place region of the link insertion target webpage of mine script, when detecting When first JS digs the enabled instruction of the link of mine script, if it is judged that accessing browsing used in the terminal of the target webpage Device version number is any one in default version number's set, then calls browsing used in the terminal for accessing the target webpage Device, which loads and executes the first JS, digs mine script, if it is judged that accessing browser version used in the terminal of the target webpage Number it is not any one in default version number's set, then browser used in the terminal for accessing the target webpage is called to add It carries and executes the 2nd JS and dig mine script.The cpu resource that the terminal of access target webpage can largely be consumed in this way, reaches To the anti-purpose climbed, the anti-efficiency and validity climbed is further improved, to enhance the safety of internet.
The embodiment of the invention also provides a kind of terminal, which is used to execute the list of aforementioned described in any item methods Member.Specifically, referring to fig. 4, Fig. 4 is a kind of schematic block diagram of terminal provided in an embodiment of the present invention.The terminal packet of the present embodiment It includes:Acquiring unit 401, embedded unit 402, trigger unit 403.
Acquiring unit 401 digs mine script bank for obtaining JS, and it includes that at least one JS digs mine foot that the JS, which digs mine script bank, This.
Further, mine script bank is dug specifically for establishing the JS in acquiring unit 401, and the JS digs mine script bank packet It includes the first JS and digs mine script and the 2nd JS digging mine script;Wherein, the first JS digs the foot that mine script is WebAssembly language This, the 2nd JS digs the script that mine script is JS language.
Embedded unit 402 digs the link of mine script for generating at least one described JS, at least one described JS is dug mine The link of script is embedded in target webpage.
Further, embedded unit 402, specifically for each position region in the target webpage according to historical record The determine the probability first position region that data information is crawled, the first position region are each position areas in the target webpage The data information in domain is crawled the band of position of maximum probability;At least one described JS link for digging mine script is embedded in the mesh Mark the first position region of webpage.
Further, embedded unit 402, link and the 2nd JS specifically for generation the first JS digging mine script Dig the link of mine script;The browser of the WebAssembly language is supported in the target webpage according to historical record The determine the probability second position region of the data information in each position region is crawled, the second position region is the target webpage The browser that the data information in middle each position region is supported the WebAssembly language crawls the position area of maximum probability Domain;The browser of the WebAssembly language is not supported to crawl each position in the target webpage according to historical record The determine the probability the third place region of the data information in region, the third place region are each position areas in the target webpage The data information in domain is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;It will be described The link that first JS digs mine script is embedded in the second position region of the target webpage, and the 2nd JS is dug mine foot This link is embedded in the third place region of the target webpage.
Trigger unit 403, for triggering when detecting the enabled instruction of link of at least one JS digging mine script It accesses the terminal loads of the target webpage and executes at least one described JS and dig mine script.
Further, trigger unit 403, for referring to when the starting for the link for detecting at least one JS digging mine script It when enabling, triggers and accesses the language that the terminal of the target webpage is supported according to the browser that the terminal uses, described in selection First JS digs mine script or the 2nd JS digs mine script;The browser for calling the terminal to use loads and executes the terminal and chooses The first JS dig mine script or the 2nd JS and dig mine script.
Further, trigger unit 403 access the terminal of the target webpage and judge what the terminal used for triggering Whether browser version number is any one preset in version number's set;If it is judged that being yes, it is determined that the terminal The browser used supports WebAssembly language, digs in mine script bank from the JS and chooses the first JS digging mine script;Such as Fruit judging result is no, it is determined that the browser that the terminal uses does not support WebAssembly language, digs mine foot from the JS The 2nd JS is chosen in this library digs mine script.
Further, trigger unit 403, for the enabled instruction when the link for detecting the first JS digging mine script When, the terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version number's collection Any one in conjunction;If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly Language, and trigger to access the browser that the terminal of the target webpage uses and load and execute the first JS and dig mine script;Such as Fruit judging result be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, determine that the terminal makes Browser does not support the WebAssembly language, and triggers and access the browser that the terminal of the target webpage uses It loads and executes the 2nd JS and dig mine script.
In the embodiment of the present invention, the acquiring unit 401 of terminal can dig mine script by generating at least one described JS At least one described JS link for digging mine script is embedded in target webpage by link, embedded unit 402, and trigger unit 403 is when detection When digging the enabled instruction of the link of mine script at least one described JS, triggering accesses the terminal loads of the target webpage and holds At least one described JS of row digs mine script, to largely consume the cpu resource of crawler terminal, make crawler terminal can not normally With, with achieve the purpose that it is counter climb, and can prevent crawler terminal around JS rendering crawl data, improve it is counter climb it is effective Property.
It is another terminal schematic block diagram provided in an embodiment of the present invention referring to Fig. 5, Fig. 5.The present embodiment as shown in the figure In terminal may include:One or more processors 501;One or more input equipments 502, one or more output equipments 503 and memory 504.Above-mentioned processor 501, input equipment 402, output equipment 503 and memory 504 are connected by bus 505 It connects.Memory 504 includes program instruction for storing computer program, the computer program, and processor 501 is deposited for executing The program instruction that reservoir 504 stores.Wherein, processor 501 is configured for calling described program instruction execution:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in mesh Mark webpage;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage Terminal loads and execute at least one described JS and dig mine script.
Further, the processor 501 is for executing following steps:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine foot that the JS, which digs mine script bank, This;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script, The script of language.
Further, the processor 501 is for executing following steps:
The determine the probability first that the data information in each position region is crawled in the target webpage according to historical record The band of position, the first position region are that the data information in each position region in the target webpage is crawled maximum probability The band of position;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
Further, the processor 501 is for executing following steps:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage The language supported according to the browser that the terminal uses of terminal, choose that the first JS digs mine script or the 2nd JS digs mine Script;
The browser for calling the terminal to use load and execute the first JS that the terminal is chosen dig mine script or 2nd JS digs mine script.
Further, the processor 501 is for executing following steps:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version This number set in any one;
If it is judged that being yes, it is determined that browser that the terminal uses supports WebAssembly language, from described JS, which is dug in mine script bank, chooses the first JS digging mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from institute It states and chooses the 2nd JS digging mine script in JS digging mine script bank.
Further, the processor 501 is for executing following steps:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls everybody The determine the probability second position region of the data information in region is set, the second position region is each position in the target webpage The browser that the data information in region is supported the WebAssembly language crawls the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl respectively in the target webpage according to historical record The determine the probability the third place region of the data information of the band of position, the third place region be in the target webpage everybody The data information for setting region is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by institute State the third place region that the 2nd JS digs the link insertion target webpage of mine script.
Further, the processor 501 is for executing following steps:
When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage End judges whether the browser version number that the terminal uses is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and The browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, determine The browser that the terminal uses does not support the WebAssembly language, and triggers and access the terminal of the target webpage and make Browser, which loads and executes the 2nd JS, digs mine script.
In the embodiment of the present invention, terminal can dig the link of mine script by generating at least one described JS, by described in extremely The link that a few JS digs mine script is embedded in target webpage, when the starting for the link for detecting at least one JS digging mine script When instruction, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, to largely disappear The cpu resource for consuming crawler terminal, make crawler terminal can not normal use, with achieve the purpose that it is counter climb, and crawler can be prevented Terminal crawls data around JS rendering, improves the anti-validity climbed.
It should be appreciated that in embodiments of the present invention, alleged processor 501 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this at Reason device is also possible to any conventional processor etc..
Input equipment 502 may include that Trackpad, fingerprint adopt sensor (for acquiring the finger print information and fingerprint of user Directional information), microphone etc., output equipment 503 may include display (LCD etc.), loudspeaker etc..
The memory 504 may include read-only memory and random access memory, and to processor 501 provide instruction and Data.The a part of of memory 504 can also include nonvolatile RAM.For example, memory 504 can also be deposited Store up the information of device type.
In the specific implementation, processor 501 described in the embodiment of the present invention, input equipment 502, output equipment 503 can Execute realization described in embodiment of the method described in Fig. 1, Fig. 2 or Fig. 3 of anti-crawler method provided in an embodiment of the present invention The implementation of terminal described in Fig. 4 of the embodiment of the present invention also can be performed in mode, and details are not described herein.
A kind of computer readable storage medium is additionally provided in the embodiment of the present invention, the computer readable storage medium is deposited Computer program is contained, the computer program is realized in embodiment corresponding to Fig. 1, Fig. 2 or Fig. 3 when being executed by processor and described Anti- crawler method, can also realize the terminal of embodiment corresponding to Fig. 4 or Fig. 5 of the present invention, details are not described herein.
The computer readable storage medium can be the internal storage unit of terminal described in aforementioned any embodiment, example Such as the hard disk or memory of terminal.The computer readable storage medium is also possible to the External memory equipment of the terminal, such as The plug-in type hard disk being equipped in the terminal, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, the computer readable storage medium can also be wrapped both The internal storage unit for including the terminal also includes External memory equipment.The computer readable storage medium is described for storing Other programs and data needed for computer program and the terminal.The computer readable storage medium can be also used for temporarily When store the data that has exported or will export.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention Portion or part steps.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey The medium of sequence code.
The above, some embodiments only of the invention, but scope of protection of the present invention is not limited thereto, and it is any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.

Claims (10)

1. a kind of anti-crawler method, which is characterized in that including:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in target network Page;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the end of the target webpage End, which loads and executes at least one described JS, digs mine script.
2. the method according to claim 1, wherein the acquisition JS dig mine script bank, including:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine script that the JS, which digs mine script bank,;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS language that the 2nd JS, which digs mine script, Script.
3. the method according to claim 1, wherein the link that at least one described JS is dug mine script is embedding Enter target webpage, including:
The determine the probability first position that the data information in each position region is crawled in the target webpage according to historical record Region, the first position region are that the data information in each position region in the target webpage is crawled the position of maximum probability Region;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
4. according to the method described in claim 2, it is characterized in that, described ought detect that at least one described JS digs mine script When the enabled instruction of link, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, Including:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the end of the target webpage The language supported according to the browser that the terminal uses is held, the first JS is chosen and digs mine script or the 2nd JS digging mine foot This;
The browser for calling the terminal to use, which loads and executes the first JS that the terminal is chosen, digs mine script or second JS digs mine script.
5. according to the method described in claim 4, it is characterized in that, the triggering accesses the terminal of the target webpage according to institute The language that the browser that terminal uses is supported is stated, the first JS is chosen and digs mine script or the 2nd JS digging mine script, including:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version number Any one in set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports WebAssembly language, digs from the JS The first JS is chosen in mine script bank digs mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from the JS It digs in mine script bank and chooses the 2nd JS digging mine script.
6. according to the method described in claim 2, it is characterized in that, the link that at least one described JS is dug mine script is embedding Enter target webpage, including:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls each position area The determine the probability second position region of the data information in domain, the second position region are each position regions in the target webpage Data information be supported the browser of the WebAssembly language and crawl the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl each position in the target webpage according to historical record The determine the probability the third place region of the data information in region, the third place region are each position areas in the target webpage The data information in domain is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by described the The link that two JS dig mine script is embedded in the third place region of the target webpage.
7. according to the method described in claim 6, it is characterized in that, described ought detect that at least one described JS digs mine script When the enabled instruction of link, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, Including:
When detecting that the first JS digs the enabled instruction of the link of mine script, the terminal that triggering accesses the target webpage is sentenced Whether the browser version number that the terminal uses that breaks is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and triggers It accesses the browser that the terminal of the target webpage uses and loads and execute the first JS and dig mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering access The browser that the terminal of the target webpage uses, which loads and executes the 2nd JS, digs mine script.
8. a kind of terminal, which is characterized in that including for executing the method as described in any one of claim 1-7 claim Unit.
9. a kind of terminal, which is characterized in that the processor, defeated including processor, input equipment, output equipment and memory Enter equipment, output equipment and memory to be connected with each other, wherein the memory is for storing computer program, the computer Program includes program instruction, and the processor is configured for calling described program instruction, is executed such as any one of claim 1-7 The method.
10. a kind of computer readable storage medium, which is characterized in that the computer storage medium is stored with computer program, The computer program includes program instruction, and described program instruction makes the processor execute such as right when being executed by a processor It is required that the described in any item methods of 1-7.
CN201810685659.3A 2018-06-27 2018-06-27 A kind of anti-crawler method, terminal and computer-readable medium Pending CN108898009A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810685659.3A CN108898009A (en) 2018-06-27 2018-06-27 A kind of anti-crawler method, terminal and computer-readable medium
PCT/CN2018/108672 WO2020000747A1 (en) 2018-06-27 2018-09-29 Anti-crawler method and terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810685659.3A CN108898009A (en) 2018-06-27 2018-06-27 A kind of anti-crawler method, terminal and computer-readable medium

Publications (1)

Publication Number Publication Date
CN108898009A true CN108898009A (en) 2018-11-27

Family

ID=64346831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810685659.3A Pending CN108898009A (en) 2018-06-27 2018-06-27 A kind of anti-crawler method, terminal and computer-readable medium

Country Status (2)

Country Link
CN (1) CN108898009A (en)
WO (1) WO2020000747A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111585961A (en) * 2020-04-03 2020-08-25 北京大学 Webpage mining attack detection and protection method and device
CN111832024A (en) * 2020-07-27 2020-10-27 广州智云尚大数据科技有限公司 Big data security protection method and system
CN112463526A (en) * 2020-11-13 2021-03-09 苏州浪潮智能科技有限公司 Method and related device for acquiring server state

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488546B (en) * 2020-04-13 2023-09-26 北京小米移动软件有限公司 Page generation method and device and storage medium
CN112804269B (en) * 2021-04-14 2021-07-06 中建电子商务有限责任公司 Method for realizing website interface anti-crawler

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833212B (en) * 2011-06-14 2016-01-06 阿里巴巴集团控股有限公司 Webpage visitor identity identification method and system
US9866545B2 (en) * 2015-06-02 2018-01-09 ALTR Solutions, Inc. Credential-free user login to remotely executed applications
CN105743901B (en) * 2016-03-07 2019-04-09 携程计算机技术(上海)有限公司 Server, anti-crawler system and anti-crawler verification method
CN107908959B (en) * 2017-11-10 2020-02-14 北京知道创宇信息技术股份有限公司 Website information detection method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111585961A (en) * 2020-04-03 2020-08-25 北京大学 Webpage mining attack detection and protection method and device
CN111585961B (en) * 2020-04-03 2021-08-20 北京大学 Webpage mining attack detection and protection method and device
CN111832024A (en) * 2020-07-27 2020-10-27 广州智云尚大数据科技有限公司 Big data security protection method and system
CN112463526A (en) * 2020-11-13 2021-03-09 苏州浪潮智能科技有限公司 Method and related device for acquiring server state

Also Published As

Publication number Publication date
WO2020000747A1 (en) 2020-01-02

Similar Documents

Publication Publication Date Title
CN108898009A (en) A kind of anti-crawler method, terminal and computer-readable medium
US10728274B2 (en) Method and system for injecting javascript into a web page
Heiderich et al. Scriptless attacks: stealing the pie without touching the sill
CN104091125B (en) Handle the method and suspended window processing unit of suspended window
US8843820B1 (en) Content script blacklisting for use with browser extensions
CN105631359B (en) A kind of control method and device of web page operation
US20160065613A1 (en) System and method for detecting malicious code based on web
EP3349137A1 (en) Client-side attack detection in web applications
US7735094B2 (en) Ascertaining domain contexts
CN101356535A (en) A method and apparatus for detecting and preventing unsafe behavior of javascript programs
JP2014510353A (en) Risk detection processing method and apparatus for website address
CN111177727B (en) Vulnerability detection method and device
CN103605924A (en) Method and device for preventing malicious program from attacking online payment page
CN108769070A (en) One kind is gone beyond one&#39;s commission leak detection method and device
CN112637185B (en) Webpage protection method and device and browser
CN106487793A (en) application installation method and device
CN110750750A (en) Webpage generation method and device, computer equipment and storage medium
EP3518135A1 (en) Protection against third party javascript vulnerabilities
CN103336693B (en) The creation method of refer chain, device and security detection equipment
CN108509228B (en) Page loading method, terminal equipment and computer readable storage medium
CN116450533B (en) Security detection method and device for application program, electronic equipment and medium
CN108427884A (en) Webpage digs the alarming method for power and device of mine script
CN103581321B (en) A kind of creation method of refer chains, device and safety detection method and client
JP2019194832A (en) System and method for detecting changes in web resources
CN108416214A (en) Webpage digs mine means of defence and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181127