CN108062459B - Method and device for preventing page information from being captured - Google Patents

Method and device for preventing page information from being captured Download PDF

Info

Publication number
CN108062459B
CN108062459B CN201610984642.9A CN201610984642A CN108062459B CN 108062459 B CN108062459 B CN 108062459B CN 201610984642 A CN201610984642 A CN 201610984642A CN 108062459 B CN108062459 B CN 108062459B
Authority
CN
China
Prior art keywords
page
client
threshold
time period
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610984642.9A
Other languages
Chinese (zh)
Other versions
CN108062459A (en
Inventor
董鹏
赵亚楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610984642.9A priority Critical patent/CN108062459B/en
Publication of CN108062459A publication Critical patent/CN108062459A/en
Application granted granted Critical
Publication of CN108062459B publication Critical patent/CN108062459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/121Restricting unauthorised execution of programs
    • G06F21/128Restricting unauthorised execution of programs involving web programs, i.e. using technology especially used in internet, generally interacting with a web browser, e.g. hypertext markup language [HTML], applets, java
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the application discloses a method and a device for preventing page information from being captured. The method comprises the following steps: receiving a page access request of a client, and judging whether the page access request is an abnormal page access request or not according to a first access parameter of the client; if the page access request is an abnormal page access request, determining whether a second access parameter of the client exceeds a preset access parameter threshold value; and if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client.

Description

Method and device for preventing page information from being captured
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for preventing capturing of page information.
Background
In the current network information era, some automatic grabbing tools can grab the content of a webpage according to a website. These crawling tools may be programmed to obtain data from other websites according to predetermined rules.
In order to prevent the content of the website from being crawled, a condition for determining a crawling action may be generally set, and then it is determined whether the access request is a crawling action by determining whether the access request matches the condition for determining the crawling action.
For the access request judged as the grabbing behavior, a blacklist can be added for shielding. For example, the number of access requests allowed in a unit time may be set, and if the access amount of a certain client in the set unit time exceeds the number, the access behavior of the client is determined as the grab behavior.
Disclosure of Invention
Some embodiments of the present application provide a method and an apparatus for preventing capturing of page information, so as to effectively prevent the page information from being captured, and punish a capturing behavior.
The method for preventing the page information from being captured comprises the following steps:
receiving a page access request of a client, and judging whether the page access request is an abnormal page access request or not according to a first access parameter of the client;
if the page access request is an abnormal page access request, determining whether a second access parameter of the client exceeds a preset access parameter threshold value;
and if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client.
The device for preventing capturing page information provided by the embodiment of the application comprises:
the receiving module is used for receiving a page access request of a client;
the first determining module is used for judging whether the page access request is an abnormal page access request according to the first access parameter of the client;
a second determining module, configured to determine whether a second access parameter of the client exceeds a predetermined access parameter threshold when the first determining module determines that the page access request is an abnormal page access request;
a modification module, configured to modify content of a page element in a page corresponding to the page access request when the second determination module determines that a second access parameter of the client does not exceed the predetermined access parameter threshold;
and the sending module is used for returning the modified page to the client.
In the technical scheme provided by the embodiment of the application, whether the page access request of the client is an abnormal page access request is judged through the first access parameter of the client. And then, judging according to a second access parameter of the client under the condition that the abnormal page access request is judged, if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client. Thus, if a normal page access request of a client is misjudged as an abnormal access request, the client is not directly shielded. On the other hand, by modifying the content of the page element, even if the crawling tool is able to crawl the content of the page, the crawled content is erroneous. Therefore, according to the technical scheme of the embodiment of the application, on one hand, the grabbing behaviors are effectively identified, on the other hand, punishment is performed on the grabbing tool, and the safety of data in the server is guaranteed.
Drawings
FIG. 1 is a schematic illustration of an operating environment in some embodiments of the present application.
Fig. 2 is a flowchart of a method for preventing crawling of page information according to some embodiments of the present disclosure.
Fig. 3 is another flowchart of a method for preventing crawling of page information according to some embodiments of the present disclosure.
Fig. 4(a) and 4(b) are schematic diagrams of normal user access and abnormal user access in some embodiments of the present application.
FIG. 5 is a flowchart of a method for determining whether a page access request is an abnormal page access request in some embodiments of the present application.
FIG. 6 is a block diagram of an apparatus for preventing crawling of page information according to some embodiments of the present disclosure.
FIG. 7 is a schematic diagram of another structure of an apparatus for preventing crawling of page information according to some embodiments of the present application.
Detailed Description
In order to make the technical solutions and advantages of the present application more apparent, the present application is further described in detail below with reference to the accompanying drawings and examples.
In the existing method for identifying the crawling behavior, the server usually only considers the access amount of the client. And if the access amount of the client in a period of time exceeds a preset threshold value, the server identifies the access behavior of the client as a grabbing behavior, adds the IP address corresponding to the client into a blacklist, and rejects the access of the client. Although the shielding mode is simple, the identification efficiency is low, normal user access behaviors are easily identified as grabbing behaviors, and effective punishment measures are also lacked for the grabbing behaviors of the automatic grabbing tool.
Therefore, the embodiment of the application provides a method for preventing page information from being captured. In the method provided by the embodiment of the application, a server receives a page access request of a client, and judges whether the page access request is an abnormal page access request or not according to a first access parameter of the client; if the page access request is an abnormal page access request, determining whether a second access parameter of the client exceeds a preset access parameter threshold value; and if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client.
In the embodiment of the present application, the normal page access request refers to a page access request sent to a server in a normal internet access process of a general user. The main features of normal user behavior are: fixed IP addresses and user agents, random access paths, random dwell times, random page access intervals, full page script execution, reaction to directions, and so forth.
In contrast, an abnormal page access request refers to a page access request issued to a server by an abnormal user, such as a machine-simulated user. The main features of abnormal user behavior are: batch IP addresses, fixed access paths, fixed dwell times, short page access intervals, few execution of page scripts, no response to direction, and so on.
In the technical scheme provided by the embodiment of the application, whether the page access request of the client is an abnormal page access request is judged through the first access parameter of the client. And then, judging according to a second access parameter of the client under the condition that the abnormal page access request is judged, if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client. Thus, by modifying the content of the page element, even if the crawling tool is able to crawl the content of the page, the crawled content is erroneous. On the other hand, if the normal page access request of the client is judged as the abnormal page access request by mistake, the client cannot be directly shielded. Therefore, according to the technical scheme of the embodiment of the application, on one hand, the grabbing behaviors are effectively identified, on the other hand, punishment is performed on the grabbing tool, and the safety of data in the server is guaranteed.
FIG. 1 is a schematic illustration of an operating environment 100 in some embodiments of the present application. As shown in FIG. 1, a plurality of users' respective terminal devices (e.g., user devices 104-a through 104-c) are each connected to a server 112 via a network 106.
In some embodiments, each user connects to the server 112 through an application 108-a to 108-c executing on the user device 104.
Server 112 maintains a database 114 having website data stored in database 114 for providing website services to user device 104.
In some embodiments, the server 112 may receive a page access request of the user device 104, and determine whether the page access request is an abnormal page access request according to the first access parameter of the user device 104; if the page access request is an abnormal page access request, determining whether a second access parameter of the user equipment 104 exceeds a predetermined access parameter threshold; and if the second access parameter of the user equipment 104 does not exceed the predetermined access parameter threshold, modifying the content of the page element in the page corresponding to the page access request, and returning the modified page to the user equipment 104.
In some embodiments, the server 112 may set different recognition modes according to different strictness capturing recognition conditions, adopt different recognition modes according to actual needs, and may switch between recognition modes according to actual situations. The specific setting method is described in detail with reference to the following examples.
In some embodiments, the server 112 inserts a script for restoring the content of the plurality of page elements in the code of the page when returning the modified page to the user device 104, such that the user device 104 displays the content of the plurality of page elements in the page before being modified.
In some embodiments, after the user device 104 acquires the page returned by the server 112, the code of the page is parsed to display the modified page, the script inserted in the page code is executed, and the content of the page element is rewritten, so as to display the content of the page element before being modified.
In the embodiment of the application, by modifying the content of the page element, even if the content of the page can be captured by the capture tool, the captured content is wrong. In addition, by executing the script at the client, the modified page can be restored to the state before modification when the page is displayed, so that the content seen by the user is correct. Therefore, according to the technical scheme of the embodiment of the application, on one hand, the grabbing behaviors are effectively identified, on the other hand, punishment is performed on the grabbing tool, and the safety of data in the server is guaranteed.
Examples of user device 104 include, but are not limited to, a palmtop computer, a wearable computing device, a Personal Digital Assistant (PDA), a tablet computer, a laptop computer, a desktop computer, a smartphone, or a combination of any two or more of these or other data processing devices.
In some embodiments, the network 106 may include a Local Area Network (LAN) and a Wide Area Network (WAN) such as the Internet. Network 106 may be implemented using any well-known network protocol, including various wired or wireless protocols.
In some embodiments, the server 112 may be implemented on one or more stand-alone data processing devices or a distributed computer network.
FIG. 2 is a flow chart of a method for preventing page information from being crawled in some embodiments of the present application. As shown in fig. 2, the method comprises the steps of:
step 201, receiving a page access request of a client, and determining whether the page access request is an abnormal page access request according to the first access parameter of the client.
Step 202, if the page access request is an abnormal page access request, determining whether the second access parameter of the client exceeds a predetermined access parameter threshold.
Step 203, if the second access parameter of the client does not exceed the predetermined access parameter threshold, modifying the content of the page element in the page corresponding to the page access request, and returning the modified page to the client.
In some embodiments, the modifying the content of the page element in the page corresponding to the page access request includes:
randomly selecting a plurality of page elements in the page elements of the page, and randomly exchanging the contents of the selected plurality of page elements.
In some embodiments, the above method further comprises: when the page after modification is returned to the client, a script for restoring the content of the plurality of page elements is inserted into the code of the page, wherein the script is executed by the client so that the client displays the content before the plurality of page elements in the page are modified.
In some embodiments, the first access parameter comprises any one or more of:
the number of pages requested to be accessed by the client within a preset time period;
the proportion of the information list page requested to be accessed and the information detail page requested to be accessed by the client in the preset time period;
the client accesses the variance of the time interval between different pages within the predetermined time period; and
the number of pages requested to be accessed by the client within a predetermined time period is proportional to the number of times the client pushes information exposure within the predetermined time period.
In some embodiments, the determining, according to the first access parameter, whether the page access request is an abnormal page access request includes:
determining one or more of a first threshold, a second threshold, a third threshold and a fourth threshold according to an access log saved by a server;
wherein the first threshold is determined according to the number of pages requested to be accessed by the plurality of clients within the preset time period, which are recorded in an access log saved by the server;
the second threshold value is determined according to the ratio of the request access information list page and the request access information detail page of the plurality of clients recorded in the access log in the preset time period;
the third threshold is determined according to the variance of the time intervals between the plurality of clients accessing different pages within the predetermined time period, which is recorded in the access log;
the fourth threshold is determined according to the ratio of the number of pages requested to be accessed by the plurality of clients within a preset time period to the number of times of advertisement exposure of the clients within the preset time period, which is recorded in the access log;
and judging whether the page access request is an abnormal page access request or not according to the first access parameter and one or more of the first threshold, the second threshold, the third threshold and the fourth threshold.
In some embodiments, the determining, according to the first access parameter and one or more of the first threshold, the second threshold, the third threshold, and the fourth threshold, whether the page access request is an abnormal page access request includes:
if the number of pages requested to be accessed by the client in the preset time period is larger than the first threshold, the proportion of the pages of the request access information list and the pages of the request access information details in the preset time period by the client is larger than the second threshold, the variance of the time interval between the access of different pages by the client in the preset time period is smaller than the third threshold, and the proportion of the number of pages requested to be accessed by the client in the preset time period to the number of times of advertisement exposure of the client in the preset time period is larger than the fourth threshold,
and judging that the page access request is an abnormal page access request.
In some embodiments, the determining, according to the first access parameter and one or more of the first threshold, the second threshold, the third threshold, and the fourth threshold, whether the page access request is an abnormal page access request includes:
if the number of pages requested to be accessed by the client in a predetermined time period is larger than the first threshold, the proportion of the pages of the requested access information list and the pages of the requested access information details in the predetermined time period by the client is larger than the second threshold, and the variance of the time interval between the access of different pages by the client in the predetermined time period is smaller than the third threshold,
and judging that the page access request is an abnormal page access request.
In some embodiments, the determining, according to the first access parameter and one or more of the first threshold, the second threshold, the third threshold, and the fourth threshold, whether the page access request is an abnormal page access request includes:
if the number of pages requested to be accessed by the client within a predetermined time period is greater than the first threshold value, or
The ratio of the page of the information list requested to be accessed and the page of the detail information requested to be accessed by the client in the preset time period is larger than the second threshold value, or
The variance of the time interval between the client accessing different pages within the predetermined time period is less than the third threshold, or
The ratio of the number of pages requested to be accessed by the client within the predetermined time period to the number of times the client has been exposed to advertisements within the predetermined time period is greater than the fourth threshold,
and judging that the page access request is an abnormal page access request.
In some embodiments, the second access parameter of the client comprises: the number of pages requested to be accessed by the client within the preset time period; the predefined access parameter threshold is greater than the first threshold.
In some embodiments, the above method further comprises: and if the page access request is an abnormal page access request and the second access parameter of the client exceeds the preset access parameter threshold, rejecting the page access request of the client.
In some embodiments, the above method further comprises: determining the number of times the client is denied access after denying the page access request of the client; determining access prohibition time corresponding to the client according to the times of access refused of the client; and rejecting all page access requests of the client before the access prohibition time is over.
In the existing method for identifying the crawling behavior, only the access amount of the client in a period of time is generally considered. The identification method has low identification efficiency and weak pertinence. In the process of browsing on the internet, a user often browses websites of house property, e-commerce and the like. When browsing these types of websites, a user usually finds some house resources and goods through a specific search condition, and then clicks and browses a detail page of some house resources or goods, for example, browsing details of the house resources (such as cell names, total prices, average prices, house types, floors, etc.) or details of the goods (such as prices, models, colors, brands, etc.) through the detail page.
Therefore, in the embodiment of the present application, in addition to considering the access amount of the client in a period of time, further introducing other parameters, such as a ratio of the requested access information list page to the requested access information detail page of the client in a predetermined period of time; the client accesses the variance of the time interval between different pages within the predetermined time period; and the ratio of the number of pages requested to be accessed by the client in a preset time period to the number of times of information pushing exposure of the client in the preset time period.
For the parameters, the parameters can be flexibly selected according to actual needs, so that the grabbing behaviors can be effectively identified.
The technical solution provided by the embodiment of the present application is described below by taking an example of a user accessing a property web page. Fig. 3 is a flowchart of a method for preventing capturing page information according to an embodiment of the present disclosure.
As shown in fig. 3, the method includes the following steps.
Step 301, receiving a page access request of a client, and determining whether the page access request is an abnormal page access request according to the first access parameter of the client. If the page access request is not an abnormal page access request, executing step 302; otherwise step 303 is performed.
According to the behavior habit of the user accessing the real estate website, the normal user and the abnormal user are distinguished as follows, see fig. 4(a) and 4 (b):
(1) the normal user has less access within a certain time (e.g. 10 minutes); the access amount of the abnormal users in the same time is large;
(2) the ratio of the normal user visiting the house source detail page (see 412-1-422-3 in the figure 4 (a)) to the house source search page (see 411 in the figure 4 (a)) is not high; the ratio of the abnormal user to visit the house source detail page (see 422-1-422-n in figure 4 (b)) to the house source search page (see 421 in figure 4 (b)) is high;
(3) the time interval between the normal user accessing each page is random; the time interval between the abnormal users accessing the various pages is relatively fixed;
(4) the advertisement exposure times are in a fixed proportion when a normal user accesses a page; while an abnormal user rarely loads advertisements when accessing the page.
With the above differences in mind, in some embodiments, the first access parameter may include one or more of the following parameters: the number of pages requested to be accessed by the client within a preset time period; the client requests to access the room source searching page and the ratio of the room source detail page within the preset time period; the client accesses the variance of the time interval between different pages within the predetermined time period; and the ratio of the number of pages requested to be accessed by the client within a predetermined time period to the number of advertising exposures of the client within the predetermined time period.
In some embodiments, the length of the predetermined time period may be determined according to actual needs, for example, the predetermined time period may be 10 minutes. The server may sample its saved access log every 10 minutes to obtain the first access parameter described above.
A process of determining whether a page access request is an abnormal access request in the embodiment of the present application is described below with reference to fig. 5.
Fig. 5 is a flowchart for determining whether a page access request of a client is an abnormal page access request according to an embodiment of the present application. In the embodiment shown in fig. 5, it is assumed that the server makes a determination based on the following parameters: the number of pages requested to be accessed by the client in a preset time period; the client requests to access the room source searching page and the ratio of the room source detail page within the preset time period; the client accesses the variance of the time interval between different pages within the predetermined time period; and the ratio of the number of pages requested to be accessed by the client within a predetermined time period to the number of advertising exposures of the client within the predetermined time period.
It should be noted that in practical applications, those skilled in the art may also use other parameters, or some of the above parameters to perform the determination.
As shown in fig. 5, the method comprises the following steps:
in step 501, a first threshold α is determined according to the number of pages requested to be accessed by a plurality of clients within a predetermined time period, which is recorded in an access log saved by a server.
In some embodiments, the server may count the number of pages requested to be accessed by each client in a predetermined time period, and then rank the number of pages requested to be accessed from high to low, and take the minimum value of the access numbers of the top N clients as the first threshold α, where the value of N may be adjusted according to actual needs.
Step 502, determining a second threshold β according to the ratio of the requested access information list page to the requested access information detail page of the plurality of clients in the predetermined time period recorded in the access log.
In some embodiments, the ratio of the access information list page to the request access information detail page of each client within a predetermined time period may be counted, and the minimum value of the ratios of the top M clients is taken as the second threshold β.
For example, M may be equal to the total number of clients having access requests within a predetermined time period × 1%, that is, the minimum value of the proportion of the clients having the highest proportion of 1% is the second threshold β.
Step 503, determining a third threshold γ according to the variance of the time intervals between the multiple clients accessing different pages within the predetermined time period recorded in the access log.
In some embodiments, the server may count the variances of the time intervals between the access of each client to different pages within a predetermined time period, and rank the variances from low to high, and take the maximum value of the variances of the S clients with the smallest variance as the third threshold γ.
In some embodiments, since some clients have few accesses within the predetermined time period, in order to improve the accuracy of the calculated third threshold, clients having accesses greater than a certain threshold (e.g., accesses greater than 5) within the predetermined time period may be pre-selected, and then the variance of the access time intervals of these clients may be sorted.
Similarly, the value of S can also be adjusted according to actual needs. For example, S may be equal to the total number of clients having access requests within a predetermined time period × 1%, i.e., the maximum value of the variance of the clients having the smallest variance of 1% is taken as the third threshold γ.
Step 504, determining a fourth threshold δ according to a ratio of the number of pages requested to be accessed by the plurality of clients within a predetermined time period and the number of times of information pushing and exposure of the clients within the predetermined time period, which are recorded in the access log.
In some embodiments, the push information may be an advertisement. Typically, there are a certain number of advertisements on each page. In some embodiments, the server may determine the lowest number of ad slots on a page, and record this lowest data as the fourth threshold δ.
The determination method of the four thresholds is explained above through steps 501 to 504. It should be noted that, the execution sequence of steps 501 to 504 is not limited in the embodiment of the present application. In practical applications, the four thresholds may be determined simultaneously or in any order. In addition, other methods for determining the four thresholds may be used by those skilled in the art. The embodiment of the present application does not limit how to determine the values of the four thresholds.
Step 505, determining whether the page access request is an abnormal page access request according to the first access parameter, the first threshold α, the second threshold β, the third threshold γ, and the fourth threshold δ.
After the four thresholds are determined through steps 501 to 504, the first access parameter of the current client may be compared with the four thresholds, so as to determine whether the access request of the current client is an abnormal request.
In the embodiment of the present application, recognition modes of different degrees of strictness, such as a loose mode, a moderate mode, and a strict mode, may be set according to the above four thresholds. In actual use, different recognition modes can be used according to different time stages or market competition and the like.
The following describes the relaxed mode, the moderate mode, and the strict mode by way of example, respectively.
The loose mode:
in the loose mode, only when the client requests the page to be accessed within the predetermined time periodα1Greater than the first threshold α, and a ratio β of requested access information list pages to requested access information detail pages by the client within the predetermined time period1Greater than the second threshold β, and the variance γ of the time interval between the client visiting different pages within the predetermined time period1Is less than the third threshold gamma, and the ratio delta of the number of pages requested to be accessed by the client within a predetermined time period to the number of advertising exposures of the client within the predetermined time period1And if the page access request is larger than the fourth threshold value delta, judging that the page access request is an abnormal page access request.
It can be seen that, in the loose mode, only when the first access parameter of the client meets the above four conditions at the same time, the server determines the access request of the client as the crawling behavior.
Moderate mode:
in the moderate mode, the number of pages α that the client requests access to if it requests within a predetermined period of time1Greater than the first threshold α, and a ratio β of requested access information list pages to requested access information detail pages by the client within the predetermined time period1Greater than the second threshold β, and the variance γ of the time interval between the client visiting different pages within the predetermined time period1And if the second threshold value gamma is smaller than the third threshold value gamma, judging that the page access request is an abnormal page access request.
Strict mode:
in strict mode, as long as the client requests α the number of pages accessed within a predetermined period of time1Greater than the first threshold α, or the ratio β of requested access information list page to requested access information detail page by the client within the predetermined time period1Greater than the second threshold β, or the variance γ of the time interval between the client visiting different pages within the predetermined period of time1Less than the third threshold γ, or a ratio δ of the number of pages requested to be accessed by the client within a predetermined time period to the number of ad exposures of the client within the predetermined time period1If the page access request is larger than the fourth threshold value delta, the server judges that the page access request is an abnormal page access request.
In the strict mode, as long as one of the first access parameters of the client reaches the recognition threshold of the grabbing behavior, the server recognizes the access request of the client as the grabbing behavior.
The recognition conditions for the three degrees of stringency of recognition patterns are shown in table 1 below.
TABLE 1
Recognizing patterns Recognizing conditions
Loose and comfortable α>α1And β is more than β1And gamma > gamma1And delta > delta1
Is moderate α>α1And β is more than β1And gamma > gamma1
Strict of the nature of the α>α1Or β > β1Or gamma > gamma1Or delta > delta1
The above description has been made only by way of example of recognition patterns of different degrees of stringency. In practical applications, a person skilled in the art can set different recognition modes according to actual situations.
Step 302, the server returns the page corresponding to the page access request to the client.
In this step, if the page access request is not an abnormal page access request, a page corresponding to the page access request is directly returned to the client.
Step 303, determining whether the client second access parameter exceeds a predetermined access parameter threshold. If the second access parameter of the client does not exceed the predetermined access parameter threshold, executing step 304; otherwise, step 307 is executed.
In this step, two different processing methods can be adopted for the user determined to be abnormal in step 301 through the second access parameter: aliasing return and direct masking.
In some embodiments, the second access parameter may be the amount of access of the client within a predetermined time period, for example, the amount of access of the user within 10 minutes.
When the second access parameter is the access amount of the client in the predetermined time period, the predetermined access parameter threshold should be greater than the first threshold α in step 301, which may be expressed as α + Σ, where Σ > 0, and may be adjusted according to actual needs.
And step 304, modifying the content of page elements in the page corresponding to the page access request.
When the second access parameter of the client does not exceed the predetermined access parameter threshold, the server may take a form of obfuscating the return. In the obfuscation return mode, the server reorders the contents of the page elements. Thus, the content of the page element returned by the server is erroneous.
In some embodiments, for the house source detail page, the server may pre-define some house source core data, and select some page elements from the page elements corresponding to the house source core data for modification. For example, in some embodiments, the house-sourced core data may include one or more of: county, business district, unit price, floor, total price, decoration condition, area, house type, orientation, and building age.
When selecting the page elements for modification, the server can select several page elements from the page elements corresponding to the house source core data, and then randomly exchange the contents of the selected several page elements.
Table 2 shows the page elements and their contents selected by the server in some embodiments of the present application. As shown in table 2, the page elements selected by the server include: unit price, floor, total price and area. Table 3 shows these page elements and their contents after modification.
Figure BDA0001148850300000151
As can be seen from tables 2 and 3, the contents of the respective page elements are erroneous after modification. Thus, the instant grasping tool captures the page data, and the obtained data is also wrong. Moreover, since the manner of modification is random, the gripper tool cannot recover the correct data. Therefore, punishment is performed on the grabbing behavior, and meanwhile, the safety of data in the server is guaranteed.
Step 305, inserting a script for restoring the contents of the plurality of page elements in the code of the page.
In some embodiments, the server generates a corresponding script according to the rule of modifying the content of the page element in step 304. The client can reversely restore the content of the page element by executing the script.
In some embodiments, the script may be inserted anywhere in the code of the page as long as the client is able to parse and execute the script.
And step 306, returning the modified page carrying the script to the client.
After inserting the script for restoring the page display, the server sends the page code carrying the script to the client.
Then, the client may analyze the page code, and display the page corresponding to the page code, i.e., the modified page. Then, through analyzing and executing the script, the client rewrites the page elements of the modified content in the page, so as to correctly display the page.
In the embodiment of the application, the script is inserted into the page code, so that the client can display the correct page to the user. However, since the script adjusts the content of the page element when the page is displayed, the content captured by the capture tool is incorrect. Therefore, the scheme of confusing the return avoids error shielding and plays a punishment role on the grabbing tool.
Step 307, rejecting the page access request of the client.
And if the second access parameter of the client exceeds the preset access parameter threshold, determining that the access of the client is a grabbing behavior, and rejecting the page access request of the client.
In some embodiments, the server may maintain a blacklist. After the access of a certain client is judged to be the grabbing behavior, the IP address of the client can be added into a blacklist.
Those skilled in the art may also shield the client in other manners, and the embodiment of the present application does not limit what shielding manner is specifically adopted.
After step 307, the embodiment of the present application may further include:
step 308, determining the times of access refused of the client; determining access prohibition time corresponding to the client according to the times of access refused of the client; and rejecting all page access requests of the client before the access prohibition time is over.
In some embodiments, the number of times a client is denied access for a certain period of time, such as a day, may be determined periodically. Then, the time length for shielding the client is adjusted according to the times of refusing the access of the client in the period of time.
For example, the mask duration is the determination period × the number of times the mask is performed.
The judgment period may be set according to actual conditions, for example, 10 minutes. In this way, the length of time that the client is masked may be dynamically adjusted with the number of times that the client is masked.
The scheme provided by the embodiment of the present application is described above by taking a house website as an example. It should be noted that the embodiments of the present application are also applicable to other types of websites, such as e-commerce websites.
FIG. 6 is a schematic diagram of an apparatus for preventing crawling of page information in some embodiments of the present application. The device for preventing the page information from being grabbed may be the server 112 shown in fig. 1, or may be a component integrated in the server 112.
As shown in FIG. 6, the apparatus 600 for preventing crawling of page information includes one or more processors (CPUs) 602, a network interface 604, a memory 606, and a communication bus 608 interconnecting these components.
In some embodiments, the network interface 604 is used to implement a network connection between the apparatus 600 for preventing crawling page information and an external device, for example, receive a page access request of a client, return a page corresponding to the page access request to the client, and the like.
The apparatus 600 for preventing crawling of page information may further comprise one or more output devices 612 (e.g., one or more visual displays), and/or include one or more input devices 614 (e.g., a keyboard, mouse, or other input controls, etc.).
Memory 606 may be a high-speed random access memory such as DRAM, SRAM, ddr ram, or other random access solid state memory device; or non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The memory 606 includes:
an operating system 616, including programs for handling various basic system services and for performing hardware related tasks;
the application 618 for preventing capturing page information is configured to receive a page access request of a client through the network interface 604, and determine whether the page access request is an abnormal page access request according to the first access parameter of the client;
if the page access request is an abnormal page access request, determining whether a second access parameter of the client exceeds a preset access parameter threshold value;
and if the second access parameter of the client does not exceed the predetermined access parameter threshold, modifying the content of the page element in the page corresponding to the page access request, and returning the modified page to the client through the network interface 604.
Fig. 7 is a schematic structural diagram of an apparatus for preventing crawling of page information according to some embodiments of the present application. As shown in fig. 7, the apparatus includes:
a receiving module 701, configured to receive a page access request of a client;
a first determining module 702, configured to determine whether the page access request is an abnormal page access request according to the first access parameter of the client;
a second determining module 703, configured to determine whether the client second access parameter exceeds a predetermined access parameter threshold when the first determining module 702 determines that the page access request is an abnormal page access request;
a modifying module 704, configured to modify the content of a page element in a page corresponding to the page access request when the second determining module 703 determines that the second access parameter of the client does not exceed the predetermined access parameter threshold;
a sending module 705, configured to return the modified page to the client.
In some embodiments, the modifying module 704 is further configured to randomly select a plurality of page elements among the page elements of the page, and randomly swap contents of the selected plurality of page elements.
In some embodiments, the sending module 705 is further configured to insert a script for restoring the content of the plurality of page elements in the code of the page when the page after modification is returned to the client, wherein the script is executed by the client to cause the client to display the content of the plurality of page elements in the page before modification.
In some embodiments, the first access parameter comprises one or more of:
the number of pages requested to be accessed by the client within a preset time period;
the proportion of the information list page requested to be accessed and the information detail page requested to be accessed by the client in the preset time period;
the client accesses the variance of the time interval between different pages within the predetermined time period; and
the number of pages requested to be accessed by the client within a predetermined time period is proportional to the number of advertising exposures the client has within the predetermined time period.
In some embodiments, as shown in fig. 7, the first determining module 702 further includes one or more of: a first determining unit 7021, a second determining unit 7022, a third determining unit 7023, and a fourth determining unit 7025; wherein
A first determining unit 7021, configured to determine a first threshold according to the number of pages requested to be accessed by the multiple clients within the predetermined time period, which is recorded in an access log stored by the server;
a second determining unit 7022, configured to determine a second threshold according to a ratio of a page of the access information list requested by the multiple clients in the predetermined time period to a page of the access information detail requested by the multiple clients in the access log;
a third determining unit 7023, configured to determine a third threshold according to a variance of time intervals between the multiple clients accessing different pages within the predetermined time period, where the variance is recorded in the access log;
a fourth determining unit 7024, configured to determine a fourth threshold according to a ratio between the number of pages requested to be accessed by the multiple clients within a predetermined time period and the number of times of advertisement exposure of the clients within the predetermined time period, where the number is recorded in the access log;
the first determining module 702 further comprises: a determining unit 7025, configured to determine whether the page access request is an abnormal page access request according to the first access parameter and one or more of the first threshold, the second threshold, the third threshold, and the fourth threshold.
In some embodiments, the determining unit 7025 is further configured to, if the first determining unit 7021 determines that the number of pages requested to be accessed by the client within the predetermined time period is greater than the first threshold, and the second determining unit 7022 determines that the ratio of the page of the requested access information list to the page of the requested access information details in the predetermined time period by the client is greater than the second threshold, and the third determining unit 7023 determines that the variance of the time interval between the clients accessing different pages within the predetermined time period is less than the third threshold, and the fourth determining unit 7024 determines that the ratio of the number of pages requested to be accessed by the client in the predetermined time period to the number of times of advertisement exposure of the client in the predetermined time period is greater than the fourth threshold, and determines that the page access request is an abnormal page access request.
In some embodiments, the determining unit 7025 is further configured to determine that the page access request is an abnormal page access request if the first determining unit 7021 determines that the number of pages requested to be accessed by the client in a predetermined time period is greater than the first threshold, the second determining unit 7022 determines that the ratio of pages requested to be accessed by the client in the predetermined time period to pages requested to be accessed by information lists is greater than the second threshold, and the third determining unit 7023 determines that the variance of time intervals between the client accessing different pages in the predetermined time period is less than the third threshold.
In some embodiments, the determining unit 7025 is further configured to determine that the page access request is an abnormal page access request if the first determining unit 7021 determines that the number of pages requested to be accessed by the client in a predetermined time period is greater than the first threshold, or the second determining unit 7022 determines that the ratio of pages requested to be accessed by the client in the predetermined time period to pages requested to be accessed by information details is greater than the second threshold, or the third determining unit 7023 determines that the variance of time intervals between the clients accessing different pages in the predetermined time period is less than the third threshold, or the fourth determining unit 7024 determines that the ratio of the number of pages requested to be accessed by the client in the predetermined time period to the number of times of advertisement exposure of the client in the predetermined time period is greater than the fourth threshold.
In some embodiments, the second access parameter of the client comprises: the number of pages requested to be accessed by the client within the preset time period; the predefined access parameter threshold is greater than the first threshold.
In some embodiments, the apparatus further comprises: a rejecting module 706, configured to reject the page access request of the client when the first determining module 702 determines that the page access request is an abnormal page access request and the second determining module 703 determines that the second access parameter of the client exceeds the predetermined access parameter threshold.
In some embodiments, the denial module 706 is further configured to, after denying the page access request of the client, determine a number of times the client is denied access; determining access prohibition time corresponding to the client according to the times of access refused of the client; and rejecting all page access requests of the client before the access prohibition time is over.
In the technical scheme provided by the embodiment of the application, whether the page access request of the client is an abnormal page access request is judged through the first access parameter of the client. And then, judging according to a second access parameter of the client under the condition that the abnormal page access request is judged, if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of page elements in the page corresponding to the page access request, and returning the modified page to the client. Thus, by modifying the content of the page element, even if the crawling tool is able to crawl the content of the page, the crawled content is erroneous. On the other hand, if the normal page access request of the client is misjudged as the abnormal access request, the client is not directly shielded. Therefore, according to the technical scheme of the embodiment of the application, on one hand, the grabbing behaviors are effectively identified, on the other hand, punishment is performed on the grabbing tool, and the safety of data in the server is guaranteed.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. A method for preventing page information from being captured is characterized by comprising the following steps:
receiving a page access request of a client, and judging whether the page access request is an abnormal page access request or not according to a first access parameter of the client; wherein the first access parameter comprises: the number of pages requested to be accessed by the client within a preset time period; the proportion of the information list page requested to be accessed and the information detail page requested to be accessed by the client in the preset time period; the client accesses the variance of the time interval between different pages within the predetermined time period; the number of pages requested to be accessed by the client in a preset time period is proportional to the number of times of information pushing exposure of the client in the preset time period;
if the page access request is an abnormal page access request, determining whether a second access parameter of the client exceeds a preset access parameter threshold value;
if the second access parameter of the client does not exceed the preset access parameter threshold, modifying the content of a page element in a page corresponding to the page access request, returning the modified page to the client, and inserting a script for recovering the content of the page element into a code of the page; and the client analyzes the code of the page, displays the modified page, and rewrites the page elements with modified content in the page by analyzing the script, so as to correctly display the page.
2. The method of claim 1, wherein modifying the content of a page element in a page corresponding to the page access request comprises:
randomly selecting a plurality of page elements in the page elements of the page, and randomly exchanging the contents of the selected plurality of page elements.
3. The method of claim 1, wherein determining whether the page access request is an abnormal page access request according to the first access parameter comprises:
determining a first threshold, a second threshold, a third threshold and a fourth threshold according to an access log stored by a server;
wherein the first threshold is determined according to the number of pages requested to be accessed by the plurality of clients within the preset time period, which are recorded in an access log saved by the server;
the second threshold value is determined according to the ratio of the request access information list page and the request access information detail page of the plurality of clients recorded in the access log in the preset time period;
the third threshold is determined according to the variance of the time intervals between the plurality of clients accessing different pages within the predetermined time period, which is recorded in the access log;
the fourth threshold is determined according to the ratio of the number of pages requested to be accessed by the plurality of clients within a preset time period to the number of times of advertisement exposure of the clients within the preset time period, which is recorded in the access log;
and judging whether the page access request is an abnormal page access request or not according to the first access parameter, the first threshold, the second threshold, the third threshold and the fourth threshold.
4. The method of claim 3, wherein the determining whether the page access request is an abnormal page access request according to the first access parameter and the first threshold, the second threshold, the third threshold, and the fourth threshold comprises:
if the number of pages requested to be accessed by the client in the preset time period is larger than the first threshold, the proportion of the pages of the request access information list and the pages of the request access information details in the preset time period by the client is larger than the second threshold, the variance of the time interval between the access of different pages by the client in the preset time period is smaller than the third threshold, and the proportion of the number of pages requested to be accessed by the client in the preset time period to the number of times of advertisement exposure of the client in the preset time period is larger than the fourth threshold,
and judging that the page access request is an abnormal page access request.
5. The method of claim 3, wherein the determining whether the page access request is an abnormal page access request according to the first access parameter and the first threshold, the second threshold, the third threshold, and the fourth threshold comprises:
if the number of pages requested to be accessed by the client in a predetermined time period is larger than the first threshold, the proportion of the pages of the requested access information list and the pages of the requested access information details in the predetermined time period by the client is larger than the second threshold, and the variance of the time interval between the access of different pages by the client in the predetermined time period is smaller than the third threshold,
and judging that the page access request is an abnormal page access request.
6. The method of claim 3, wherein the determining whether the page access request is an abnormal page access request according to the first access parameter and the first threshold, the second threshold, the third threshold, and the fourth threshold comprises:
if the number of pages requested to be accessed by the client within a predetermined time period is greater than the first threshold value, or
The ratio of the page of the information list requested to be accessed and the page of the detail information requested to be accessed by the client in the preset time period is larger than the second threshold value, or
The variance of the time interval between the client accessing different pages within the predetermined time period is less than the third threshold, or
The ratio of the number of pages requested to be accessed by the client within the predetermined time period to the number of times the client has been exposed to advertisements within the predetermined time period is greater than the fourth threshold,
and judging that the page access request is an abnormal page access request.
7. The method of claim 1, wherein the second access parameter of the client comprises: the number of pages requested to be accessed by the client within the preset time period; the predefined access parameter threshold is greater than a first threshold; wherein the first threshold is determined according to the number of pages requested to be accessed by the plurality of clients within the predetermined time period, which are recorded in an access log saved by the server.
8. An apparatus for preventing crawling of page information, comprising:
the receiving module is used for receiving a page access request of a client;
the first determining module is used for judging whether the page access request is an abnormal page access request according to the first access parameter of the client; wherein the first access parameter comprises the number of pages requested to be accessed by the client within a predetermined time period; the proportion of the information list page requested to be accessed and the information detail page requested to be accessed by the client in the preset time period; the client accesses the variance of the time interval between different pages within the predetermined time period; and the ratio of the number of pages requested to be accessed by the client within a predetermined time period to the number of times of advertising exposure of the client within the predetermined time period;
a second determining module, configured to determine whether a second access parameter of the client exceeds a predetermined access parameter threshold when the first determining module determines that the page access request is an abnormal page access request;
a modification module, configured to modify content of a page element in a page corresponding to the page access request when the second determination module determines that a second access parameter of the client does not exceed the predetermined access parameter threshold;
the sending module is used for returning the modified page to the client and inserting a script for recovering the content of the page element into the code of the page; and the client analyzes the code of the page, displays the modified page, and rewrites the page elements with modified content in the page by analyzing the script, so as to correctly display the page.
9. The apparatus of claim 8, wherein the modification module is further configured to randomly select a plurality of page elements among the page elements of the page, and randomly swap contents of the selected plurality of page elements.
10. The apparatus of claim 8, wherein the first determining module further comprises a first determining unit, a second determining unit, a third determining unit, and a fourth determining unit; wherein the content of the first and second substances,
a first determining unit, configured to determine a first threshold according to the number of pages requested to be accessed by the multiple clients within the predetermined time period, which are recorded in an access log stored by the server;
a second determining unit, configured to determine a second threshold according to a ratio of a request access information list page to a request access information detail page of the multiple clients recorded in the access log in the predetermined time period;
a third determining unit, configured to determine a third threshold according to a variance of time intervals between the multiple clients accessing different pages within the predetermined time period, which is recorded in the access log;
a fourth determining unit, configured to determine a fourth threshold according to a ratio of the number of pages requested to be accessed by the multiple clients within a predetermined time period, which is recorded in the access log, to the number of times of advertisement exposure of the client within the predetermined time period;
the first determining module further comprises: and the judging unit is used for judging whether the page access request is an abnormal page access request according to the first access parameter, the first threshold, the second threshold, the third threshold and the fourth threshold.
11. The apparatus of claim 10, wherein the determining unit is further configured to,
if the first determination unit determines that the number of pages requested to be accessed by the client in the predetermined time period is larger than the first threshold, the second determination unit determines that the ratio of the pages of the request access information list to the pages of the request access information details in the predetermined time period by the client is larger than the second threshold, the third determination unit determines that the variance of the time interval between the access of different pages by the client in the predetermined time period is smaller than the third threshold, and the fourth determination unit determines that the ratio of the number of pages requested to be accessed by the client in the predetermined time period to the number of times of advertisement exposure of the client in the predetermined time period is larger than the fourth threshold,
and judging that the page access request is an abnormal page access request.
12. The apparatus of claim 10, wherein the determining unit is further configured to,
if the first determining unit determines that the number of pages requested to be accessed by the client in a predetermined time period is larger than the first threshold, the second determining unit determines that the ratio of the pages of the request access information list to the pages of the request access information details in the predetermined time period is larger than the second threshold, and the third determining unit determines that the variance of the time interval between the access of different pages by the client in the predetermined time period is smaller than the third threshold,
and judging that the page access request is an abnormal page access request.
13. The apparatus of claim 10, wherein the determining unit is further configured to,
if the first determining unit determines that the number of pages requested to be accessed by the client in a predetermined time period is larger than the first threshold, or the second determining unit determines that the ratio of pages requested to be accessed by the client in the predetermined time period to pages requested to be accessed by information details is larger than the second threshold, or the third determining unit determines that the variance of time intervals between the different pages requested to be accessed by the client in the predetermined time period is smaller than the third threshold, or the fourth determining unit determines that the ratio of the number of pages requested to be accessed by the client in the predetermined time period to the number of times of advertisement exposure of the client in the predetermined time period is larger than the fourth threshold,
and judging that the page access request is an abnormal page access request.
14. The apparatus of claim 8, wherein the second access parameter of the client comprises: the number of pages requested to be accessed by the client within the preset time period; the predefined access parameter threshold is greater than a first threshold; wherein the first threshold is determined according to the number of pages requested to be accessed by the plurality of clients within the predetermined time period, which are recorded in an access log saved by the server.
15. A server, comprising:
a processor for processing the received data, wherein the processor is used for processing the received data,
a memory coupled to the processor; the memory has stored therein machine-readable instructions executable by the processor to perform the method of any of claims 1 to 7.
16. A non-transitory machine-readable storage medium having stored thereon machine-readable instructions executable by a processor to perform the method of any one of claims 1 to 7.
CN201610984642.9A 2016-11-09 2016-11-09 Method and device for preventing page information from being captured Active CN108062459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610984642.9A CN108062459B (en) 2016-11-09 2016-11-09 Method and device for preventing page information from being captured

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610984642.9A CN108062459B (en) 2016-11-09 2016-11-09 Method and device for preventing page information from being captured

Publications (2)

Publication Number Publication Date
CN108062459A CN108062459A (en) 2018-05-22
CN108062459B true CN108062459B (en) 2020-06-05

Family

ID=62137937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610984642.9A Active CN108062459B (en) 2016-11-09 2016-11-09 Method and device for preventing page information from being captured

Country Status (1)

Country Link
CN (1) CN108062459B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339548B (en) * 2018-12-18 2023-11-03 北京京东尚科信息技术有限公司 Data processing method and device for anticreep, computer equipment and storage medium
CN110944007B (en) * 2019-12-10 2020-11-10 北京北龙云海网络数据科技有限责任公司 Network access management method, system, device and storage medium
CN116150542B (en) * 2023-04-21 2023-07-14 河北网新数字技术股份有限公司 Dynamic page generation method and device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285716B1 (en) * 2009-12-21 2012-10-09 Google Inc. Identifying and ranking digital resources relating to places
CN105138907A (en) * 2015-07-22 2015-12-09 国家计算机网络与信息安全管理中心 Method and system for actively detecting attacked website
CN105187396A (en) * 2015-08-11 2015-12-23 小米科技有限责任公司 Method and device for identifying web crawler

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285716B1 (en) * 2009-12-21 2012-10-09 Google Inc. Identifying and ranking digital resources relating to places
CN105138907A (en) * 2015-07-22 2015-12-09 国家计算机网络与信息安全管理中心 Method and system for actively detecting attacked website
CN105187396A (en) * 2015-08-11 2015-12-23 小米科技有限责任公司 Method and device for identifying web crawler

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
网页防抓取系统的设计与实现;唐华栋;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160215;第12-25页 *

Also Published As

Publication number Publication date
CN108062459A (en) 2018-05-22

Similar Documents

Publication Publication Date Title
CN104601601B (en) The detection method and device of web crawlers
US7860971B2 (en) Anti-spam tool for browser
CN112738102B (en) Asset identification method, device, equipment and storage medium
CN107547495B (en) System and method for protecting a computer from unauthorized remote management
CN111008348A (en) Anti-crawler method, terminal, server and computer readable storage medium
CN102436564A (en) Method and device for identifying falsified webpage
EP2984616A1 (en) Method and device for testing multiple versions
CN108062459B (en) Method and device for preventing page information from being captured
CN107784205B (en) User product auditing method, device, server and storage medium
CN114417197A (en) Access record processing method and device and storage medium
CN106874253A (en) Recognize the method and device of sensitive information
CN102077201A (en) System and method for dynamic and real-time categorization of webpages
CN111163072B (en) Method and device for determining characteristic value in machine learning model and electronic equipment
CN110826006A (en) Abnormal collection behavior identification method and device based on privacy data protection
CN107239701B (en) Method and device for identifying malicious website
CN109241733A (en) Crawler Activity recognition method and device based on web access log
CN106569860A (en) Application management method and terminal
CN103491101A (en) Phishing website detecting method and device and client-side
JPWO2012132296A1 (en) Information leakage prevention apparatus, method and program
CN108769157B (en) Message popup display method and device, computing equipment and computer storage medium
CN102880698B (en) A kind of crawl website defining method and device
CN116015842A (en) Network attack detection method based on user access behaviors
CN109981533B (en) DDoS attack detection method, device, electronic equipment and storage medium
TWI617939B (en) Attacking node detection apparatus, method, and computer program product thereof
CN109492149B (en) Crawler task processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant