CN108898009A - A kind of anti-crawler method, terminal and computer-readable medium - Google Patents
A kind of anti-crawler method, terminal and computer-readable medium Download PDFInfo
- Publication number
- CN108898009A CN108898009A CN201810685659.3A CN201810685659A CN108898009A CN 108898009 A CN108898009 A CN 108898009A CN 201810685659 A CN201810685659 A CN 201810685659A CN 108898009 A CN108898009 A CN 108898009A
- Authority
- CN
- China
- Prior art keywords
- mine script
- terminal
- target webpage
- digs
- script
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/54—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by adding security routines or objects to programs
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of anti-crawler method, terminal and computer-readable medium, wherein method includes:It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in target webpage;When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the terminal loads of the target webpage and executes at least one JS digging mine script.The embodiment of the present invention in this way, can largely consume the cpu resource of crawler terminal, make crawler terminal can not normal use, with achieve the purpose that it is counter climb, and can prevent crawler terminal around JS rendering crawl data, improve the anti-validity climbed.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of anti-crawler methods, terminal and computer-readable medium.
Background technique
The crawler on internet is broadly divided into static crawler and dynamic crawler at present, since static crawler cannot parse
JavaScript (abbreviation JS) code, therefore be usually using by adding JS in webpage for the countermeasure of static crawler
Code achievees the purpose that counter climb in a manner of carrying out JS rendering processing to webpage.However it is this to webpage carry out JS rendering at
The mode of reason cannot achieve the purpose that anti-crawler for that can parse the dynamic crawler of JS.Therefore, how more effectively to prevent to climb
The influence of worm, the safety for improving internet have become a hot topic of research.
Summary of the invention
The embodiment of the present invention provides a kind of anti-crawler method, terminal and computer-readable medium, can be improved it is counter climb it is effective
Property, improve Internet security.
In a first aspect, the embodiment of the invention provides a kind of anti-crawler method, this method includes:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in mesh
Mark webpage;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage
Terminal loads and execute at least one described JS and dig mine script.
Further, the acquisition JS digs mine script bank, including:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine foot that the JS, which digs mine script bank,
This;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script,
The script of language.
Further, the link that at least one described JS is dug mine script is embedded in target webpage, including:
The determine the probability first that the data information in each position region is crawled in the target webpage according to historical record
The band of position, the first position region are that the data information in each position region in the target webpage is crawled maximum probability
The band of position;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
Further, described when detecting the enabled instruction of link of at least one JS digging mine script, triggering is visited
It asks the terminal loads of the target webpage and executes at least one described JS and dig mine script, including:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage
The language supported according to the browser that the terminal uses of terminal, choose that the first JS digs mine script or the 2nd JS digs mine
Script;
The browser for calling the terminal to use load and execute the first JS that the terminal is chosen dig mine script or
2nd JS digs mine script.
The link that JS digs mine script is embedded in the first position region of the target webpage.
Further, the terminal that the triggering accesses the target webpage is supported according to the browser that the terminal uses
Language, choose that the first JS digs mine script or the 2nd JS digs mine script, including:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version
This number set in any one;
If it is judged that being yes, it is determined that browser that the terminal uses supports WebAssembly language, from described
JS, which is dug in mine script bank, chooses the first JS digging mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from institute
It states and chooses the 2nd JS digging mine script in JS digging mine script bank.
Further, the link that at least one described JS is dug mine script is embedded in target webpage, including:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls everybody
The determine the probability second position region of the data information in region is set, the second position region is each position in the target webpage
The browser that the data information in region is supported the WebAssembly language crawls the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl respectively in the target webpage according to historical record
The determine the probability the third place region of the data information of the band of position, the third place region be in the target webpage everybody
The data information for setting region is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by institute
State the third place region that the 2nd JS digs the link insertion target webpage of mine script.
Further, described when detecting the enabled instruction of link of at least one JS digging mine script, triggering is visited
It asks the terminal loads of the target webpage and executes at least one described JS and dig mine script, including:
When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage
End judges whether the browser version number that the terminal uses is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and
The browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering
It accesses the browser that the terminal of the target webpage uses and loads and execute the 2nd JS and dig mine script.
Second aspect, the embodiment of the invention provides a kind of terminal, which includes for executing above-mentioned first aspect
The unit of method.
The third aspect, the embodiment of the invention provides another terminal, including processor, input equipment, output equipment and
Memory, the processor, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing branch
The computer program that terminal executes the above method is held, the computer program includes program instruction, and the processor is configured to use
In calling described program instruction, the method for executing above-mentioned first aspect.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer storage medium
It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor
State the method that processor executes above-mentioned first aspect.
The embodiment of the present invention is embedded in target webpage by the way that at least one JS to be dug to the link of mine script, when detect it is described extremely
When the enabled instruction for the link that a few JS digs mine script, triggering accesses the terminal loads of the target webpage and execution is described extremely
A few JS digs mine script.It may be implemented to load and execute in the link when crawler terminal crawls the link in this way
Digging mine script, largely to consume the cpu resource of crawler terminal, make crawler terminal can not normal use, achieve the purpose that it is counter climb,
And can prevent crawler terminal from crawling data around JS rendering, improve the anti-validity climbed.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of schematic flow diagram of anti-crawler method provided in an embodiment of the present invention;
Fig. 2 is the schematic flow diagram of the anti-crawler method of another kind provided in an embodiment of the present invention;
Fig. 3 is the schematic flow diagram of another anti-crawler method provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic block diagram of terminal provided in an embodiment of the present invention;
Fig. 5 is another terminal schematic block diagram provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that the term used in this description of the invention is merely for the sake of for the purpose of describing particular embodiments
And it is not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless up and down
Text clearly indicates other situations, and otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
Anti- crawler method provided in an embodiment of the present invention can be executed by terminal, and the terminal can be mobile phone, computer, put down
On the intelligent terminals such as plate, smartwatch.The anti-crawler method for being applied to terminal is illustrated below.
This programme is to be directed to that the anti-crawler technology scheme that the dynamic crawler of JS is cooked can be parsed, in the prior art about net
The anti-crawler strategy of commonly using of page is usually to add JS code in webpage to carry out JS rendering processing, with realize to static crawler into
Row is counter to climb, however this mode cannot achieve the purpose that anti-crawler, therefore we for the dynamic crawler that can parse JS
Case improves anti-crawler strategy for the dynamic crawler that can parse JS.
Before introducing this programme, the dynamic crawler that can parse JS is introduced first.The dynamic that JS can be parsed is climbed
There are two types of situations for the crawler process of worm:One is by directly parsing to html source code;In addition also there is a special case
It is:If the data of webpage are using asynchronous load, such as the data of ajax load, then can parse the dynamic crawler of JS without
Method parses and checks source code, for this special circumstances, can parse the dynamic crawler of JS frequently with some third-party works
Tool is such as:Automatic testing instrument Selenium or cooperation in the browser of WebAssembly language is supported to browse without interface
Load data are gone in device PhantomJs, the behavior of simulation browser.It should be noted that the WebAssembly is that one kind operates in
One of modern network browser webpage assembler language, file suffixes are " .wasm ".
Therefore, this programme is directed to the dynamic crawler that can parse JS used above two crawler side during crawler
Formula has write at least one JS in advance and has dug mine script, and establishes JS and dig mine script bank, is dug in mine script bank according to the JS and is compiled in advance
At least one JS write digs mine script, generates the link that at least one JS digs mine script.The embodiment of the present invention by generation this extremely
A few JS digs the higher band of position of probability that data information is crawled in the link insertion target webpage of mine script, if inspection
It measures and accesses the link that at least one described JS in the terminal opening insertion of the target webpage target webpage digs mine script, then
Determine access the target webpage terminal be crawler terminal, and trigger access the target webpage terminal load and execute automatically institute
At least one JS stated in link digs mine script.Wherein, at least one described JS digs in mine script and contains a large amount of operations, therefore
The terminal for accessing the target webpage can consume a large amount of CPU money of the terminal when executing at least one described JS and digging mine script
Source, make the terminal for accessing the target webpage occur Caton etc. can not normal use the case where, to achieve the purpose that counter climb.It ties below
Attached drawing is closed the embodiment of the present invention is described in detail.
It is a kind of schematic flow diagram of anti-crawler method provided in an embodiment of the present invention referring to Fig. 1, Fig. 1, as shown in Figure 1,
This method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.Specifically, the present invention is real
Applying example, described method includes following steps.
S101:It obtains JS and digs mine script bank.
In the embodiment of the present invention, terminal can establish JS and dig mine script bank, wherein the JS that the terminal is established digs mine script
It include that one or more JS digs mine script in library.
In one embodiment, it includes that the first JS digs mine script and the 2nd JS digging that the JS which establishes, which is dug in mine script bank,
Mine script.Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script,
The script of language.
Wherein, it is the script write according to WebAssembly language that the first JS, which digs mine script, in one embodiment
In, the dynamic crawler due to that can parse JS under normal conditions is often used Selenium cooperation Webdriver and calls browser
Simulation browser operation, wherein the browser using Selenium cooperation Webdriver calling is to support the WebAssembly
Browser;Or carry out simulation browser operation using the browser at the Selenium cooperation this no interface PhantomJS.Therefore
This programme is directed to the browser or PhantomJS for the support WebAssembly that can parse that the dynamic crawler of JS is often used,
The first JS, which has been write, using WebAssembly language digs mine script.
In other embodiments, the 2nd JS digs the script that mine script is JS language, due to that can parse the dynamic of JS
Crawler, which can also use, does not support the generic browser of WebAssembly directly to parse html source code and parsed, therefore utilizes
JS language has write the 2nd JS and has dug mine script.
S102:The link that at least one JS digs mine script is generated, at least one JS link for digging mine script is embedded in target
Webpage.
In the embodiment of the present invention, terminal can be generated the link that at least one JS digs mine script, and will it is described at least one
The link that JS digs mine script is embedded in target webpage.
In one embodiment, which can dig the JS that mine script and the 2nd JS digging mine script are established according to by the first JS
It digs mine script bank and generates the link that at least one JS digs mine script, and at least one described JS connection for digging mine script is embedded in mesh
Mark the first position region of webpage.
Wherein, the first position region is according to each position region in the target webpage recorded in historical record
The determine the probability that data information is crawled, if determining certain band of position from each position region of the target webpage
The maximum probability that data information is crawled, it is determined that the band of position that the data information is crawled maximum probability is first position
Region.For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position region, letter
Display position region is ceased, the probability that the data information in Commentary Position region is crawled is had recorded in historical record if got and is
90%, the probability for puing question to the data information of the band of position to be crawled is 60%, and the data information in Image display position region is climbed
The probability taken is 20%, and the probability that the data information in information display position region is crawled is 50%, by the number in each position region
It is believed that the probability that is crawled of breath compares, determine probability 90% that the data information in the Commentary Position region is crawled most
Greatly, it is thus determined that the Commentary Position region in the target webpage is first position region.
In one embodiment, the first position region can be according in the target webpage recorded in historical record
The keyword determination that the data information in each position region is crawled obtains, if from each position region of the target webpage really
It is most to make the keyword that the data information for including in certain band of position is crawled, it is determined that the band of position is first position area
Domain.For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position region, information
Display position region has recorded the number of keyword that the data information in Commentary Position region is crawled if got in historical record
Amount is n, and the keyword quantity for puing question to the data information of the band of position to be crawled is m, the data information in Image display position region
The keyword quantity being crawled is x, and the keyword quantity that the data information in information display position region is crawled is y, by everybody
It sets the keyword quantity that the data information in region is crawled to compare, if n>m>x>Y can then determine the Commentary Position
The keyword quantity that the data information in region is crawled is most, it is thus determined that the Commentary Position region in the target webpage is the
One band of position.
It can determine that this is climbed when detecting the terminal for accessing the target webpage is crawler terminal by the embodiment
Worm terminal maximum probability can crawl the data information in the first position region of the target webpage, to improve crawler terminal opening
At least one JS in the first position region digs the efficiency of the link of mine script, to improve the anti-efficiency climbed.
In one embodiment, which can dig mine script according to the first JS got and generate the first digging mine script
Link, and mine script is dug according to the 2nd JS got and generates the second link for digging mine script, and the first JS is dug into mine foot
The second position region of this link insertion target webpage, and the 2nd JS dug mine script link insertion target webpage the
Three bands of position.
Wherein, the second position region is according to being supported in the target webpage recorded in historical record
The browser of WebAssembly language crawls the determine the probability of the data information in each position region, if from the target network
Determine that the data information of certain band of position is supported the browser of the WebAssembly language and climbs in each position region of page
The maximum probability taken, it is determined that the position for the maximum probability that the browser for being supported the WebAssembly language crawls
Region is second position region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area
Domain, information display position region, if getting the number for having recorded Commentary Position region in historical record and puing question to the band of position
It is believed that the browser that breath is supported the WebAssembly language crawls, and the data information in the Commentary Position region is propped up
Holding the probability that the browser of the WebAssembly language crawls is 80%, and the data information for puing question to the band of position is propped up
Holding the probability that the browser of the WebAssembly language crawls is 50%, by the data information quilt in the Commentary Position region
The probability and the data information for puing question to the band of position for supporting the browser of the WebAssembly language to crawl are supported institute
It states the probability that the browser of WebAssembly language crawls to compare, determines the data information quilt in the Commentary Position region
The probability 80% for supporting the browser of the WebAssembly language to crawl is maximum, it is thus determined that commenting in the target webpage
It is the second position region by the band of position.
In one embodiment, the second position region can be according to the target webpage recorded in historical record
What the keyword that the browser that the data information in middle each position region is supported the WebAssembly language crawls determined, such as
It is described that fruit determines that the data information for including in certain band of position is supported from each position region of the target webpage
The keyword that the browser of WebAssembly language crawls is most, it is determined that the band of position is second position region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area
Domain, information display position region are supported institute if getting and having recorded the data information in Commentary Position region in historical record
Stating the keyword quantity that the browser of WebAssembly language crawls is n, puts question to the data information of the band of position to be supported described
The keyword quantity that the browser of WebAssembly language crawls is m, and the data information in Image display position region is supported institute
Stating the keyword quantity that the browser of WebAssembly language crawls is x, and the data information in information display position region is supported
The keyword quantity that the browser of the WebAssembly language crawls is y, and the data information in each position region is supported institute
It states the keyword quantity that the browser of WebAssembly language crawls to compare, if n>m>x>Y can then be determined described
The keyword quantity that the browser that the data information in Commentary Position region is supported the WebAssembly language crawls is most,
It is thus determined that the Commentary Position region in the target webpage is second position region.
In one embodiment, the third place region is according to quilt in the target webpage recorded in historical record
The browser of the WebAssembly language is not supported to crawl the determine the probability of the data information in each position region, if from
Determine that the data information of certain band of position is not supported the WebAssembly language in each position region of the target webpage
The maximum probability that the browser of speech crawls, it is determined that the browser for not supported the WebAssembly language crawled
The band of position of maximum probability is the third place region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area
Domain, information display position region have recorded Image display position region and information display position if got in historical record
The data information in region is not supported the browser of the WebAssembly language to crawl, and described image display position region
The data information probability of not supported the browser of the WebAssembly language to crawl be 60%, the information shows position
The probability that the data information for setting region is not supported the browser of the WebAssembly language to crawl is 90%, by the figure
The probability and the letter for not supported the browser of the WebAssembly language to crawl as the data information in display position region
The probability that the browser that the data information in breath display position region is supported the WebAssembly language crawls compares,
It determining the data information in the information display position region is not supported, the browser of the WebAssembly language crawls and is general
Rate 90% is maximum, it is thus determined that the information display position region in the target webpage is the third place region.
In one embodiment, the third place region can also be according to the target network recorded in historical record
The keyword that the data information in each position region is not supported the browser of the WebAssembly language to crawl in page determines
, if determining that the data information for including in certain band of position is not supported institute from each position region of the target webpage
It is most to state the keyword that the browser of WebAssembly language crawls, it is determined that the band of position is the third place region.
For example, it is assumed that including Commentary Position region in the target webpage, puing question to the band of position, Image display position area
Domain, information display position region are not supported if getting and having recorded the data information in Commentary Position region in historical record
The keyword quantity that the browser of the WebAssembly language crawls is n, and the data information of the band of position is putd question to not supported
The keyword quantity that the browser of the WebAssembly language crawls is m, and the data information in Image display position region is not by
The keyword quantity for supporting the browser of the WebAssembly language to crawl is x, the data information in information display position region
The keyword quantity for not supported the browser of the WebAssembly language to crawl is y, by the data information in each position region
It is supported the keyword quantity that the browser of the WebAssembly language crawls to compare, if n<m<x<Y, then can be with
Determine the pass that the data information in the information display position region is not supported the browser of the WebAssembly language to crawl
Key number of words is most, it is thus determined that the information display position region in the target webpage is the third place region.
By the embodiment, when detecting the terminal for accessing the target webpage is crawler terminal, if accessing the mesh
The browser that the terminal of mark webpage uses supports WebAssembly, then can determine that the crawler terminal maximum probability can be crawled first
The data information in the second position region of the target webpage.If the browser that the terminal for accessing the target webpage uses is not supported
WebAssembly can then determine that the crawler terminal maximum probability can first crawl the number in the third place region of the target webpage
It is believed that breath.Therefore the language supported according to the browser that the terminal for accessing the target webpage uses, accesses the target webpage
Terminal maximum probability, which can load and execute JS corresponding with the language that the browser that the crawler terminal uses is supported, digs mine script
Link, to further improve the anti-efficiency and validity climbed.
In one embodiment, which, can be with when the link that at least one JS is dug to mine script is embedded in target webpage
Addition prompt information is chained at least one JS digging mine script, to remind the normal users for accessing the target webpage should not
This link is clicked, and at least one JS for adding prompt information link for digging mine script is embedded in any position of the target webpage
Set region.
For example, if at least one JS link for digging mine script is embedded in target webpage by the terminal, it can be at least one
A JS digs the prompt information for chaining addition " please don't click " of mine script, then will add at least one JS of the prompt information
The link for digging mine script is embedded into the first position region on the target webpage, wherein the explanation of the target position band of position is such as
Preceding described, details are not described herein again.
In another example if the first JS is dug the second position area of the link insertion target webpage of mine script by the terminal respectively
Domain, and by the third place region of the link insertion target webpage of the 2nd JS digging mine script, then mine can be dug in the first JS
The link of script and the 2nd JS dig the prompt information for chaining addition " please don't click " of mine script, then respectively should by addition
The link that first JS of prompt information digs mine script is embedded into the second position region on the target webpage, and mentions this is added
Show that the link of the 2nd JS digging mine script of information is embedded into the third place region on the target webpage, wherein the second
The explanation in region and the third place region is set as previously mentioned, details are not described herein again.
It can avoid accessing the user of the normal terminal of the target webpage not small to a certain extent by the embodiment
The heart clicks the link for being embedded at least one described JS digging mine script of the target webpage, to avoid to the access target webpage
Normal terminal damage.
S103:When detecting that at least one JS digs the enabled instruction of the link of mine script, triggering access target webpage
Terminal loads simultaneously execute at least one JS digging mine script.
In the embodiment of the present invention, the terminal access target webpage is being detected, be embedded in the target webpage if detected
At least one JS dig the link of mine script and be opened, it is determined that the terminal for accessing the target webpage is crawler terminal, and is triggered
It accesses the terminal loads of the target webpage and executes at least one described JS and dig mine script, access the target with a large amount of consumption
The cpu resource of the terminal of webpage reaches the anti-purpose climbed, and the terminal is prevented to render the number for crawling the target webpage around JS
It is believed that breath, to improve the anti-validity climbed and the safety for enhancing internet.
In the embodiment of the present invention, terminal can dig the link of mine script by generating at least one JS, by described at least one
The link that a JS digs mine script is embedded in target webpage, when the enabled instruction for the link for detecting at least one JS digging mine script
When, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, so that largely consumption is visited
The cpu resource for asking the terminal of the target webpage, make to access the target webpage terminal can not normal use, climbed with reaching counter
Purpose, and can prevent the terminal of the access target webpage from crawling data around JS rendering, improve it is counter climb have
Effect property.
Referring to fig. 2, Fig. 2 is the schematic flow diagram of the anti-crawler method of another kind provided in an embodiment of the present invention, such as Fig. 2 institute
Show, this method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.The embodiment of the present invention with
The difference of embodiment described in above-mentioned Fig. 1 is that the embodiment of the present invention digs the link of mine script by generating at least one JS, is examining
When measuring the enabled instruction for the link that at least one JS digs mine script, triggering accesses the terminal of the target webpage according to access
The language that the browser that the terminal of the target webpage uses is supported digs in mine script bank from the JS and chooses corresponding JS digging mine foot
This, and call browser used in the terminal for accessing the target webpage to load and execute selected JS and dig mine script, to mention
The browser that the terminal that height accesses the target webpage uses executes the efficiency that JS digs mine script.Specifically, the embodiment of the present invention
Described method includes following steps.
S201:It obtains JS and digs mine script bank.
In the embodiment of the present invention, terminal can establish JS and dig mine script bank, and it includes that the first JS is dug which, which digs in mine script bank,
Mine script and the 2nd JS dig mine script.Wherein, the first JS dig mine script and the 2nd JS dig mine script explanation as previously mentioned,
Details are not described herein again.
S202:The link that at least one JS digs mine script is generated, at least one JS link for digging mine script is embedded in target
The first position region of webpage.
In the embodiment of the present invention, terminal can dig mine script bank according to the JS got and generate at least one JS digging mine foot
This link, and at least one JS link for digging mine script is embedded into the cursor position region of target webpage.Wherein, described
One band of position it is specific as previously mentioned, details are not described herein again.
S203:When detecting that at least one JS digs the enabled instruction of the link of mine script, triggering accesses the target webpage
The language supported according to the browser that uses of terminal for accessing the target webpage of terminal, choose the first JS and dig mine script or the
Two JS dig mine script.
In the embodiment of the present invention, terminal can be in the enabled instruction for the link for detecting at least one JS digging mine script
When, triggering accesses the language that the terminal of the target webpage is supported according to the browser that the terminal for accessing the target webpage uses
Speech chooses the first JS and digs mine script or the 2nd JS digging mine script.
Wherein, the terminal of access target webpage can according to access the target webpage terminal used in browsing version
Number judge to access whether browser used in the terminal of the target webpage supports WebAssembly language.At one
In embodiment, when detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target network
The terminal of page judges whether the browser version number that the terminal for accessing the target webpage uses is appointing in default version number's set
It anticipates one kind.If it is judged that the browser version number that the terminal for accessing the target webpage uses is appointing in default version number's set
It anticipates one kind, it is determined that access the browser that the terminal of the target webpage uses and support WebAssembly language, therefore dug from the JS
The first JS is chosen in mine script bank digs mine script.If it is judged that the browser version number that the terminal for accessing the target webpage uses
It is not any one in default version number's set, it is determined that access the browser that the terminal of the target webpage uses and do not support
WebAssembly language digs mine script to dig in mine script bank from the JS and choose the 2nd JS.
For example, it is assumed that default version number's collection be combined into Firefox1.0, Firefox9, Firefox8, Firefox7,
Firefox6, Firefox5, Firefox4, Firefox3.6, Firefox3, IE9, IE8, IE7 }, when detecting at least one
When JS digs the enabled instruction of the link of mine script, the terminal of triggering access target webpage judges the terminal institute of the access target webpage
Whether the browser version number used is any one preset in version number's set, if it is judged that the end of access target webpage
Browser version number used in holding is IE8, then the browser that can determine that the terminal of the access target webpage uses is default
One of version number's set, therefore the browser that the terminal of the access target webpage uses supports WebAssembly language, from
The JS, which is dug in mine script bank, chooses the first JS digging mine script.
In another example, it is assumed that default version number's collection be combined into Firefox1.0, Firefox9, Firefox8, Firefox7,
Firefox6, Firefox5, Firefox4, Firefox3.6, Firefox3, IE9, IE8, IE7 }, when detecting at least one
When JS digs the enabled instruction of the link of mine script, the terminal of triggering access target webpage judges the terminal institute of the access target webpage
Whether the browser version number used is any one preset in version number's set, if it is judged that the end of access target webpage
Browser version number used in holding is IE5, then can determine that the browser that the terminal of the access target webpage uses is not pre-
If any one in version number's set, therefore the browser that the terminal of the access target webpage uses is not supported
WebAssembly language digs in mine script bank from the JS and chooses the first JS digging mine script.
S204:It calls the browser that uses of terminal for accessing the target webpage load and executes the first JS digging mine script or the
Two JS dig mine script.
In the embodiment of the present invention, the browser that terminal can call the terminal of access target webpage to use loads and executes visit
Ask that the first JS that the terminal speed of the target webpage is chosen digs mine script or the 2nd JS digging mine script.
In one embodiment, if detecting what the terminal for accessing the target webpage was chosen from JS digging mine script bank
Script is that the first JS digs mine script, then terminal can call the load of browser used in the terminal for accessing the target webpage simultaneously
It executes the first JS and digs mine script.For example, being selected if detecting that the terminal for accessing the target webpage is dug in mine script bank from JS
The script taken is that the first JS digs mine script, it is assumed that determine that accessing browser used in the terminal of the target webpage is IE8,
It can then call browser IE8 used in the terminal for accessing the target webpage to load and execute the first JS and dig mine script.
In one embodiment, if detecting that the terminal of access target webpage digs the script chosen in mine script bank from JS
It is that the 2nd JS digs mine script, then terminal can be called does not support used in the terminal for accessing the target webpage
The browser of WebAssembly language, which loads and executes the 2nd JS, digs mine script.For example, if detecting access target webpage
Terminal to dig the script chosen in mine script bank from JS be that the 2nd JS digs mine script, it is assumed that determine to access the target webpage
Browser used in terminal is IE5, then can call the load of browser IE5 used in the terminal for accessing the target webpage simultaneously
It executes the 2nd JS and digs mine script.
In the embodiment of the present invention, terminal can be embedded in target network by the link that at least one JS that will be generated digs mine script
The first position region of page, when detecting the enabled instruction of link of at least one JS digging mine script, triggering access should
The language that the terminal of target webpage browser according to used in the terminal for accessing the target webpage is supported chooses the first JS and digs mine
Script or the 2nd JS dig mine script, and the browser for calling the terminal of access target webpage to use is loaded and executed and accesses the target
The first JS that the terminal of webpage is chosen digs mine script or the 2nd JS digs mine script.In this way, it may be implemented according to access
The browser that the terminal of target webpage uses, which is chosen, loads and executes corresponding JS digging mine script, accesses the target webpage with consumption
Terminal cpu resource, achieve the purpose that it is counter climb, improve the anti-validity climbed.
It is the schematic flow diagram of another anti-crawler method provided in an embodiment of the present invention referring to Fig. 3, Fig. 3, such as Fig. 3 institute
Show, this method can be executed by terminal, and the specific explanations of the terminal are as previously mentioned, details are not described herein again.The embodiment of the present invention with
The difference of embodiment described in above-mentioned Fig. 2 is that the embodiment of the present invention passes through the link insertion that the first JS that will be generated digs mine script
The second position region of target webpage, and the 2nd JS of generation is dug to the third place of the link insertion target webpage of mine script
It is chosen so as to access the language of terminal browsing according to used in the terminal for accessing the target webpage of the target webpage in region
The second position region or the third place region, so that the terminal for accessing the target webpage calls the terminal for accessing the target webpage
Used browsing calls the JS in the second position region or the third place region that load and execute selection to dig mine script, thus into
One step improves the browser that the terminal uses and executes the efficiency that JS digs mine script.Specifically, the side of the embodiment of the present invention
Method includes the following steps.
S301:It obtains JS and digs mine script bank, it includes that the first JS digging mine script and the 2nd JS dig mine foot which, which digs mine script bank,
This.
In the embodiment of the present invention, the available JS of terminal digs mine script bank, and the JS digs mine script bank and digs including the first JS
Mine script and the 2nd JS dig mine script, and the first JS digs mine script and the 2nd JS digs the explanation of mine script as previously mentioned, herein
It repeats no more.
S302:It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script.
In the embodiment of the present invention, terminal can dig the first JS for including in mine script bank according to the JS and dig mine script and second
JS digs mine script, and the link of mine script is dug in the link and the 2nd JS that generate the first JS digging mine script.
S303:The first JS link for digging mine script is embedded in the second position region of the target webpage, and by this
The link that two JS dig mine script is embedded in the third place region of the target webpage.
In the embodiment of the present invention, the link that the first JS can be dug mine script by terminal is embedded in the institute of the target webpage
Second position region is stated, and the 2nd JS link for digging mine script is embedded in the third place area of the target webpage
Domain.Wherein, the explanation in the second position region and the third place region is as previously mentioned, details are not described herein again.For example, it is assumed that the
Two bands of position are the Commentary Position region in target webpage, and the third place region is the Image display position area in target webpage
Domain, the then link that the first JS can be dug mine script by terminal are embedded in the Commentary Position region of the target webpage, and will
The link that 2nd JS digs mine script is embedded in the Image display position region of the target webpage.
S304:When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the target webpage
Terminal judges whether the browser version number that the terminal for accessing the target webpage uses is any one in default version number's set
Kind, if it is judged that be it is yes, then follow the steps S305, if it is judged that be it is no, then follow the steps S306.
In the embodiment of the present invention, when detecting that the first JS digs the enabled instruction of the link of mine script, triggering access
The terminal of the target webpage judges to access whether browser version number used in the terminal of the target webpage is default version
Any one in this number set, if it is judged that the browser version number that uses of the terminal for accessing the target webpage is default version
This number set in any one, then follow the steps S305.If it is judged that the browsing that the terminal for accessing the target webpage uses
Device version number is not any one in default version number's set, thens follow the steps S306.Wherein, triggering accesses the target network
The terminal of page judges to access whether browser version number used in the terminal of the target webpage is in default version number's set
The specific implementation process of any one and citing as previously mentioned, details are not described herein again.
For example, when the starting for the link for detecting the first JS digging mine script for being embedded into Commentary Position region in target webpage
When instruction, the terminal judgement that triggering accesses the target webpage accesses browser version used in the terminal of the target webpage
It number whether is any one in default version number's set.In another example being embedded into picture position area in target webpage when detecting
When 2nd JS in domain digs the enabled instruction of the link of mine script, the terminal judgement that triggering accesses the target webpage accesses the mesh
Mark whether browser version number used in the terminal of webpage is any one preset in version number's set.
S305:The browser for determining that the terminal for accessing the target webpage uses supports WebAssembly language, and triggers visit
Ask that the browser that the terminal of the target webpage uses loads and executes the first JS digging mine script.
In the embodiment of the present invention, if it is judged that the browser version number that the terminal for accessing the target webpage uses is default
Any one in version number's set, it is determined that access described in the browser that the terminal of the target webpage uses supports
WebAssembly language, and trigger and access the browser that the terminal of the target webpage uses and load and execute the first JS
Dig mine script.Wherein, the browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine
Specific implementation process and the illustration of script are as previously mentioned, details are not described herein again.
S306:When detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage
It holds the browser used to load and executes the 2nd JS and dig mine script.
In the embodiment of the present invention, if it is judged that the browser version number that the terminal uses is not in default version number's set
Any one determine then when detecting that the 2nd JS digs the enabled instruction of the link of mine script and access the target network
The browser that uses of terminal of page does not support the WebAssembly language, and triggers and access the terminal of the target webpage and make
Browser, which loads and executes the 2nd JS, digs mine script.Wherein, what the terminal that triggering accesses the target webpage used
Browser loads and the specific implementation process for executing the 2nd JS digging mine script and illustration are as previously mentioned, herein no longer
It repeats.
In the embodiment of the present invention, the first JS of generation can be dug the second of the link insertion target webpage of mine script by terminal
The band of position, and the 2nd JS of generation is dug to the third place region of the link insertion target webpage of mine script, when detecting
When first JS digs the enabled instruction of the link of mine script, if it is judged that accessing browsing used in the terminal of the target webpage
Device version number is any one in default version number's set, then calls browsing used in the terminal for accessing the target webpage
Device, which loads and executes the first JS, digs mine script, if it is judged that accessing browser version used in the terminal of the target webpage
Number it is not any one in default version number's set, then browser used in the terminal for accessing the target webpage is called to add
It carries and executes the 2nd JS and dig mine script.The cpu resource that the terminal of access target webpage can largely be consumed in this way, reaches
To the anti-purpose climbed, the anti-efficiency and validity climbed is further improved, to enhance the safety of internet.
The embodiment of the invention also provides a kind of terminal, which is used to execute the list of aforementioned described in any item methods
Member.Specifically, referring to fig. 4, Fig. 4 is a kind of schematic block diagram of terminal provided in an embodiment of the present invention.The terminal packet of the present embodiment
It includes:Acquiring unit 401, embedded unit 402, trigger unit 403.
Acquiring unit 401 digs mine script bank for obtaining JS, and it includes that at least one JS digs mine foot that the JS, which digs mine script bank,
This.
Further, mine script bank is dug specifically for establishing the JS in acquiring unit 401, and the JS digs mine script bank packet
It includes the first JS and digs mine script and the 2nd JS digging mine script;Wherein, the first JS digs the foot that mine script is WebAssembly language
This, the 2nd JS digs the script that mine script is JS language.
Embedded unit 402 digs the link of mine script for generating at least one described JS, at least one described JS is dug mine
The link of script is embedded in target webpage.
Further, embedded unit 402, specifically for each position region in the target webpage according to historical record
The determine the probability first position region that data information is crawled, the first position region are each position areas in the target webpage
The data information in domain is crawled the band of position of maximum probability;At least one described JS link for digging mine script is embedded in the mesh
Mark the first position region of webpage.
Further, embedded unit 402, link and the 2nd JS specifically for generation the first JS digging mine script
Dig the link of mine script;The browser of the WebAssembly language is supported in the target webpage according to historical record
The determine the probability second position region of the data information in each position region is crawled, the second position region is the target webpage
The browser that the data information in middle each position region is supported the WebAssembly language crawls the position area of maximum probability
Domain;The browser of the WebAssembly language is not supported to crawl each position in the target webpage according to historical record
The determine the probability the third place region of the data information in region, the third place region are each position areas in the target webpage
The data information in domain is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;It will be described
The link that first JS digs mine script is embedded in the second position region of the target webpage, and the 2nd JS is dug mine foot
This link is embedded in the third place region of the target webpage.
Trigger unit 403, for triggering when detecting the enabled instruction of link of at least one JS digging mine script
It accesses the terminal loads of the target webpage and executes at least one described JS and dig mine script.
Further, trigger unit 403, for referring to when the starting for the link for detecting at least one JS digging mine script
It when enabling, triggers and accesses the language that the terminal of the target webpage is supported according to the browser that the terminal uses, described in selection
First JS digs mine script or the 2nd JS digs mine script;The browser for calling the terminal to use loads and executes the terminal and chooses
The first JS dig mine script or the 2nd JS and dig mine script.
Further, trigger unit 403 access the terminal of the target webpage and judge what the terminal used for triggering
Whether browser version number is any one preset in version number's set;If it is judged that being yes, it is determined that the terminal
The browser used supports WebAssembly language, digs in mine script bank from the JS and chooses the first JS digging mine script;Such as
Fruit judging result is no, it is determined that the browser that the terminal uses does not support WebAssembly language, digs mine foot from the JS
The 2nd JS is chosen in this library digs mine script.
Further, trigger unit 403, for the enabled instruction when the link for detecting the first JS digging mine script
When, the terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version number's collection
Any one in conjunction;If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly
Language, and trigger to access the browser that the terminal of the target webpage uses and load and execute the first JS and dig mine script;Such as
Fruit judging result be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, determine that the terminal makes
Browser does not support the WebAssembly language, and triggers and access the browser that the terminal of the target webpage uses
It loads and executes the 2nd JS and dig mine script.
In the embodiment of the present invention, the acquiring unit 401 of terminal can dig mine script by generating at least one described JS
At least one described JS link for digging mine script is embedded in target webpage by link, embedded unit 402, and trigger unit 403 is when detection
When digging the enabled instruction of the link of mine script at least one described JS, triggering accesses the terminal loads of the target webpage and holds
At least one described JS of row digs mine script, to largely consume the cpu resource of crawler terminal, make crawler terminal can not normally
With, with achieve the purpose that it is counter climb, and can prevent crawler terminal around JS rendering crawl data, improve it is counter climb it is effective
Property.
It is another terminal schematic block diagram provided in an embodiment of the present invention referring to Fig. 5, Fig. 5.The present embodiment as shown in the figure
In terminal may include:One or more processors 501;One or more input equipments 502, one or more output equipments
503 and memory 504.Above-mentioned processor 501, input equipment 402, output equipment 503 and memory 504 are connected by bus 505
It connects.Memory 504 includes program instruction for storing computer program, the computer program, and processor 501 is deposited for executing
The program instruction that reservoir 504 stores.Wherein, processor 501 is configured for calling described program instruction execution:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in mesh
Mark webpage;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage
Terminal loads and execute at least one described JS and dig mine script.
Further, the processor 501 is for executing following steps:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine foot that the JS, which digs mine script bank,
This;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS that the 2nd JS, which digs mine script,
The script of language.
Further, the processor 501 is for executing following steps:
The determine the probability first that the data information in each position region is crawled in the target webpage according to historical record
The band of position, the first position region are that the data information in each position region in the target webpage is crawled maximum probability
The band of position;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
Further, the processor 501 is for executing following steps:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the target webpage
The language supported according to the browser that the terminal uses of terminal, choose that the first JS digs mine script or the 2nd JS digs mine
Script;
The browser for calling the terminal to use load and execute the first JS that the terminal is chosen dig mine script or
2nd JS digs mine script.
Further, the processor 501 is for executing following steps:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version
This number set in any one;
If it is judged that being yes, it is determined that browser that the terminal uses supports WebAssembly language, from described
JS, which is dug in mine script bank, chooses the first JS digging mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from institute
It states and chooses the 2nd JS digging mine script in JS digging mine script bank.
Further, the processor 501 is for executing following steps:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls everybody
The determine the probability second position region of the data information in region is set, the second position region is each position in the target webpage
The browser that the data information in region is supported the WebAssembly language crawls the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl respectively in the target webpage according to historical record
The determine the probability the third place region of the data information of the band of position, the third place region be in the target webpage everybody
The data information for setting region is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by institute
State the third place region that the 2nd JS digs the link insertion target webpage of mine script.
Further, the processor 501 is for executing following steps:
When detecting that the first JS digs the enabled instruction of the link of mine script, triggering accesses the end of the target webpage
End judges whether the browser version number that the terminal uses is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and
The browser that the terminal that triggering accesses the target webpage uses, which loads and executes the first JS, digs mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, determine
The browser that the terminal uses does not support the WebAssembly language, and triggers and access the terminal of the target webpage and make
Browser, which loads and executes the 2nd JS, digs mine script.
In the embodiment of the present invention, terminal can dig the link of mine script by generating at least one described JS, by described in extremely
The link that a few JS digs mine script is embedded in target webpage, when the starting for the link for detecting at least one JS digging mine script
When instruction, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script, to largely disappear
The cpu resource for consuming crawler terminal, make crawler terminal can not normal use, with achieve the purpose that it is counter climb, and crawler can be prevented
Terminal crawls data around JS rendering, improves the anti-validity climbed.
It should be appreciated that in embodiments of the present invention, alleged processor 501 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic
Device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this at
Reason device is also possible to any conventional processor etc..
Input equipment 502 may include that Trackpad, fingerprint adopt sensor (for acquiring the finger print information and fingerprint of user
Directional information), microphone etc., output equipment 503 may include display (LCD etc.), loudspeaker etc..
The memory 504 may include read-only memory and random access memory, and to processor 501 provide instruction and
Data.The a part of of memory 504 can also include nonvolatile RAM.For example, memory 504 can also be deposited
Store up the information of device type.
In the specific implementation, processor 501 described in the embodiment of the present invention, input equipment 502, output equipment 503 can
Execute realization described in embodiment of the method described in Fig. 1, Fig. 2 or Fig. 3 of anti-crawler method provided in an embodiment of the present invention
The implementation of terminal described in Fig. 4 of the embodiment of the present invention also can be performed in mode, and details are not described herein.
A kind of computer readable storage medium is additionally provided in the embodiment of the present invention, the computer readable storage medium is deposited
Computer program is contained, the computer program is realized in embodiment corresponding to Fig. 1, Fig. 2 or Fig. 3 when being executed by processor and described
Anti- crawler method, can also realize the terminal of embodiment corresponding to Fig. 4 or Fig. 5 of the present invention, details are not described herein.
The computer readable storage medium can be the internal storage unit of terminal described in aforementioned any embodiment, example
Such as the hard disk or memory of terminal.The computer readable storage medium is also possible to the External memory equipment of the terminal, such as
The plug-in type hard disk being equipped in the terminal, intelligent memory card (Smart Media Card, SMC), secure digital (Secure
Digital, SD) card, flash card (Flash Card) etc..Further, the computer readable storage medium can also be wrapped both
The internal storage unit for including the terminal also includes External memory equipment.The computer readable storage medium is described for storing
Other programs and data needed for computer program and the terminal.The computer readable storage medium can be also used for temporarily
When store the data that has exported or will export.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the present invention
Portion or part steps.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. are various can store journey
The medium of sequence code.
The above, some embodiments only of the invention, but scope of protection of the present invention is not limited thereto, and it is any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.
Claims (10)
1. a kind of anti-crawler method, which is characterized in that including:
It obtains JS and digs mine script bank, it includes that at least one JS digs mine script that the JS, which digs mine script bank,;
The link that at least one described JS digs mine script is generated, at least one described JS link for digging mine script is embedded in target network
Page;
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the end of the target webpage
End, which loads and executes at least one described JS, digs mine script.
2. the method according to claim 1, wherein the acquisition JS dig mine script bank, including:
It establishes the JS and digs mine script bank, it includes that the first JS digs mine script and the 2nd JS digging mine script that the JS, which digs mine script bank,;
Wherein, the first JS digs the script that mine script is WebAssembly language, and it is JS language that the 2nd JS, which digs mine script,
Script.
3. the method according to claim 1, wherein the link that at least one described JS is dug mine script is embedding
Enter target webpage, including:
The determine the probability first position that the data information in each position region is crawled in the target webpage according to historical record
Region, the first position region are that the data information in each position region in the target webpage is crawled the position of maximum probability
Region;
At least one described JS link for digging mine script is embedded in the first position region of the target webpage.
4. according to the method described in claim 2, it is characterized in that, described ought detect that at least one described JS digs mine script
When the enabled instruction of link, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script,
Including:
When detecting the enabled instruction of link of at least one JS digging mine script, triggering accesses the end of the target webpage
The language supported according to the browser that the terminal uses is held, the first JS is chosen and digs mine script or the 2nd JS digging mine foot
This;
The browser for calling the terminal to use, which loads and executes the first JS that the terminal is chosen, digs mine script or second
JS digs mine script.
5. according to the method described in claim 4, it is characterized in that, the triggering accesses the terminal of the target webpage according to institute
The language that the browser that terminal uses is supported is stated, the first JS is chosen and digs mine script or the 2nd JS digging mine script, including:
The terminal that triggering accesses the target webpage judges whether the browser version number that the terminal uses is default version number
Any one in set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports WebAssembly language, digs from the JS
The first JS is chosen in mine script bank digs mine script;
If it is judged that being no, it is determined that the browser that the terminal uses does not support WebAssembly language, from the JS
It digs in mine script bank and chooses the 2nd JS digging mine script.
6. according to the method described in claim 2, it is characterized in that, the link that at least one described JS is dug mine script is embedding
Enter target webpage, including:
It generates the first JS and digs the link of mine script and the link of the 2nd JS digging mine script;
The browser that the WebAssembly language is supported in the target webpage according to historical record crawls each position area
The determine the probability second position region of the data information in domain, the second position region are each position regions in the target webpage
Data information be supported the browser of the WebAssembly language and crawl the band of position of maximum probability;
The browser of the WebAssembly language is not supported to crawl each position in the target webpage according to historical record
The determine the probability the third place region of the data information in region, the third place region are each position areas in the target webpage
The data information in domain is not supported the browser of the WebAssembly language to crawl the band of position of maximum probability;
First JS link for digging mine script is embedded in the second position region of the target webpage, and by described the
The link that two JS dig mine script is embedded in the third place region of the target webpage.
7. according to the method described in claim 6, it is characterized in that, described ought detect that at least one described JS digs mine script
When the enabled instruction of link, triggering, which accesses the terminal loads of the target webpage and executes at least one described JS, digs mine script,
Including:
When detecting that the first JS digs the enabled instruction of the link of mine script, the terminal that triggering accesses the target webpage is sentenced
Whether the browser version number that the terminal uses that breaks is any one preset in version number's set;
If it is judged that being yes, it is determined that the browser that the terminal uses supports the WebAssembly language, and triggers
It accesses the browser that the terminal of the target webpage uses and loads and execute the first JS and dig mine script;
If it is judged that be it is no, then when detecting that the 2nd JS digs the enabled instruction of the link of mine script, triggering access
The browser that the terminal of the target webpage uses, which loads and executes the 2nd JS, digs mine script.
8. a kind of terminal, which is characterized in that including for executing the method as described in any one of claim 1-7 claim
Unit.
9. a kind of terminal, which is characterized in that the processor, defeated including processor, input equipment, output equipment and memory
Enter equipment, output equipment and memory to be connected with each other, wherein the memory is for storing computer program, the computer
Program includes program instruction, and the processor is configured for calling described program instruction, is executed such as any one of claim 1-7
The method.
10. a kind of computer readable storage medium, which is characterized in that the computer storage medium is stored with computer program,
The computer program includes program instruction, and described program instruction makes the processor execute such as right when being executed by a processor
It is required that the described in any item methods of 1-7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685659.3A CN108898009A (en) | 2018-06-27 | 2018-06-27 | A kind of anti-crawler method, terminal and computer-readable medium |
PCT/CN2018/108672 WO2020000747A1 (en) | 2018-06-27 | 2018-09-29 | Anti-crawler method and terminal and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810685659.3A CN108898009A (en) | 2018-06-27 | 2018-06-27 | A kind of anti-crawler method, terminal and computer-readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108898009A true CN108898009A (en) | 2018-11-27 |
Family
ID=64346831
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810685659.3A Pending CN108898009A (en) | 2018-06-27 | 2018-06-27 | A kind of anti-crawler method, terminal and computer-readable medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108898009A (en) |
WO (1) | WO2020000747A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111585961A (en) * | 2020-04-03 | 2020-08-25 | 北京大学 | Webpage mining attack detection and protection method and device |
CN111832024A (en) * | 2020-07-27 | 2020-10-27 | 广州智云尚大数据科技有限公司 | Big data security protection method and system |
CN112463526A (en) * | 2020-11-13 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Method and related device for acquiring server state |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488546B (en) * | 2020-04-13 | 2023-09-26 | 北京小米移动软件有限公司 | Page generation method and device and storage medium |
CN112804269B (en) * | 2021-04-14 | 2021-07-06 | 中建电子商务有限责任公司 | Method for realizing website interface anti-crawler |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833212B (en) * | 2011-06-14 | 2016-01-06 | 阿里巴巴集团控股有限公司 | Webpage visitor identity identification method and system |
US9866545B2 (en) * | 2015-06-02 | 2018-01-09 | ALTR Solutions, Inc. | Credential-free user login to remotely executed applications |
CN105743901B (en) * | 2016-03-07 | 2019-04-09 | 携程计算机技术(上海)有限公司 | Server, anti-crawler system and anti-crawler verification method |
CN107908959B (en) * | 2017-11-10 | 2020-02-14 | 北京知道创宇信息技术股份有限公司 | Website information detection method and device, electronic equipment and storage medium |
-
2018
- 2018-06-27 CN CN201810685659.3A patent/CN108898009A/en active Pending
- 2018-09-29 WO PCT/CN2018/108672 patent/WO2020000747A1/en active Application Filing
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111585961A (en) * | 2020-04-03 | 2020-08-25 | 北京大学 | Webpage mining attack detection and protection method and device |
CN111585961B (en) * | 2020-04-03 | 2021-08-20 | 北京大学 | Webpage mining attack detection and protection method and device |
CN111832024A (en) * | 2020-07-27 | 2020-10-27 | 广州智云尚大数据科技有限公司 | Big data security protection method and system |
CN112463526A (en) * | 2020-11-13 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Method and related device for acquiring server state |
Also Published As
Publication number | Publication date |
---|---|
WO2020000747A1 (en) | 2020-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108898009A (en) | A kind of anti-crawler method, terminal and computer-readable medium | |
US10728274B2 (en) | Method and system for injecting javascript into a web page | |
Heiderich et al. | Scriptless attacks: stealing the pie without touching the sill | |
CN104091125B (en) | Handle the method and suspended window processing unit of suspended window | |
US8843820B1 (en) | Content script blacklisting for use with browser extensions | |
CN105631359B (en) | A kind of control method and device of web page operation | |
US20160065613A1 (en) | System and method for detecting malicious code based on web | |
EP3349137A1 (en) | Client-side attack detection in web applications | |
US7735094B2 (en) | Ascertaining domain contexts | |
CN101356535A (en) | A method and apparatus for detecting and preventing unsafe behavior of javascript programs | |
JP2014510353A (en) | Risk detection processing method and apparatus for website address | |
CN111177727B (en) | Vulnerability detection method and device | |
CN103605924A (en) | Method and device for preventing malicious program from attacking online payment page | |
CN108769070A (en) | One kind is gone beyond one's commission leak detection method and device | |
CN112637185B (en) | Webpage protection method and device and browser | |
CN106487793A (en) | application installation method and device | |
CN110750750A (en) | Webpage generation method and device, computer equipment and storage medium | |
EP3518135A1 (en) | Protection against third party javascript vulnerabilities | |
CN103336693B (en) | The creation method of refer chain, device and security detection equipment | |
CN108509228B (en) | Page loading method, terminal equipment and computer readable storage medium | |
CN116450533B (en) | Security detection method and device for application program, electronic equipment and medium | |
CN108427884A (en) | Webpage digs the alarming method for power and device of mine script | |
CN103581321B (en) | A kind of creation method of refer chains, device and safety detection method and client | |
JP2019194832A (en) | System and method for detecting changes in web resources | |
CN108416214A (en) | Webpage digs mine means of defence and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181127 |