CN113660277B - Crawler-resisting method based on multiplexing embedded point information and processing terminal - Google Patents

Crawler-resisting method based on multiplexing embedded point information and processing terminal Download PDF

Info

Publication number
CN113660277B
CN113660277B CN202110951654.2A CN202110951654A CN113660277B CN 113660277 B CN113660277 B CN 113660277B CN 202110951654 A CN202110951654 A CN 202110951654A CN 113660277 B CN113660277 B CN 113660277B
Authority
CN
China
Prior art keywords
buried point
request
events
service
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110951654.2A
Other languages
Chinese (zh)
Other versions
CN113660277A (en
Inventor
朱骢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Tvcbook Technology Co ltd
Original Assignee
Guangzhou Tvcbook Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Tvcbook Technology Co ltd filed Critical Guangzhou Tvcbook Technology Co ltd
Priority to CN202110951654.2A priority Critical patent/CN113660277B/en
Publication of CN113660277A publication Critical patent/CN113660277A/en
Application granted granted Critical
Publication of CN113660277B publication Critical patent/CN113660277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • H04L63/308Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information retaining data, e.g. retaining successful, unsuccessful communication attempts, internet access, or e-mail, internet telephony, intercept related information or call content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Technology Law (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a crawler-resisting method and a processing terminal based on multiplexing embedded point information, wherein the method comprises the following steps: acquiring buried point data, wherein the buried point data comprises buried point events, the buried point events represent resource types accessed by request ip or service end id, and the number of single buried point events and the number of accumulated buried point events are obtained according to the buried point data; the method comprises the steps of taking a single buried point event exceeding a corresponding preset threshold value and a cumulative buried point event exceeding a first preset threshold value as non-compliant buried point events to obtain a set of non-compliant buried point events, receiving a service request, wherein the service request comprises an access resource type, judging whether the non-compliant buried point events of a target request ip or a service end id exist, if not, processing the service request, and if so, processing the service request according to one or more of the non-compliant buried point events as a back-climbing decision basis. The invention realizes the high decoupling of the service and the anti-crawler, has good flexibility, and avoids the repeated development and the resource waste.

Description

Crawler-resisting method based on multiplexing embedded point information and processing terminal
Technical Field
The invention relates to the technical field of anti-crawlers, in particular to an anti-crawler method and a processing terminal based on multiplexing buried point information.
Background
The existing anti-crawler technology is usually used for reporting information to a buried point server at a service client or a service server, the service end self meets the anti-crawler requirement, the anti-crawler is tightly coupled with the service, different service ends (corresponding to different services) are not communicated with each other, so that anti-crawler data for providing anti-crawler decision depends on the service data of the service end self, and repeated development and resource waste are caused when each service end automatically performs anti-crawler. In addition, the existing anti-crawler technology does not usually distinguish the specific access behaviors of users, but adopts the same set of standards, thereby causing misjudgment and mistakenly blocking related user IP. For example, the access behavior of the user in accessing the invitation code, the voucher and the like is different from the access behavior of the user in browsing the webpage information, and if the same set of anti-crawler judgment logic is adopted, it is often difficult to perform anti-crawler on the behavior of interfering with getting the invitation code, the voucher and the like. The existing anti-crawler technology has the defect that the IP section is usually anti-crawler, so that misjudgment and misinjury are easily caused. For example, if there are multiple employees under the same company name, and if the user IP of a certain employee is determined to be within the anti-crawler range, all other users under the same IP segment are prohibited, so that other users cannot access the same IP segment, and the user is accidentally injured. Therefore, a need exists for a system capable of analyzing user access behaviors more accurately, extracting user global access behaviors with low cost, and further making an accurate decision on a crawler-resistant scheme by comprehensively referring to access behavior characteristics of a user, so as to avoid accidental injuries, and also capable of performing accurate crawler-resistant on certain special user access behaviors.
Disclosure of Invention
Aiming at the defects of the prior art, one of the purposes of the invention is to provide a crawler-resisting method based on multiplexing embedded point information, which can solve the problem of avoiding accidental injury by accurate crawler-resisting;
the second purpose of the invention is to provide a processing terminal which can solve the problem of accurate anti-reptile and avoid accidental injury;
the technical scheme for realizing one purpose of the invention is as follows: a crawler-resisting method based on multiplexing buried point information comprises the following steps:
step 1: acquiring buried point data which comprises a request ip or a service end id and a buried point event representing the type of resources accessed by the request ip or the service end id,
counting the access times of single buried point events of each request ip or service end id to obtain the number of the single buried point events corresponding to each single buried point event, and counting the sum of the number of all the single buried point events of the same request ip or service end id to obtain the accumulated number of the buried point events of the accumulated buried point events;
step 2: comparing the number of each single buried point event with a preset threshold corresponding to a single buried point event, comparing the number of accumulated buried point events with a first preset threshold, regarding the single buried point event exceeding the corresponding preset threshold as an unconventional buried point event, and regarding the accumulated buried point event as an unconventional buried point event if the number of accumulated buried point events exceeds the first preset threshold, thereby obtaining an unconventional buried point event set,
and step 3: receiving a service request, wherein the service request comprises a target request ip or a service end id and also comprises an access resource type,
and traversing the non-compliant buried point event set, judging whether a target request ip or a non-compliant buried point event of the service end id exists, if not, processing the service request, and if so, taking one or more of the non-compliant buried point events of the target request ip or the service end id as a back-climbing decision basis to determine whether to process the service request.
Further, in the step 2, after the non-compliant buried point event set is obtained, the non-compliant buried point event set is written into the database to obtain a non-compliant buried point event database,
in step 3, after receiving the service request, first accessing the non-compliant buried point event database, and traversing the non-compliant buried point event set in the non-compliant buried point event database.
Furthermore, the access times of single buried point events of each request ip or service end id and the sum of the number of all single buried point events of the same request ip or service end id are counted in a preset period.
Further, the first preset threshold is larger than the preset threshold corresponding to each single buried point event.
Further, the buried point event comprises one or more of accessing a picture, a video, a search, text, an invitation code, and a voucher on a web page.
Further, one or more of the noncompliant buried point events according to the target request ip or the service end id are used as a back-crawl decision basis to decide whether to process the service request, specifically:
forbidding to process the service request meeting the first condition, and processing the service request not meeting the first condition:
the first condition is as follows: the access resource type in the service request corresponds to the resource type of the non-compliant buried point event.
Further, the determining whether to process the service request or not by using one or more of the non-compliance buried point events according to the target request ip or the service end id as a back-climbing decision basis specifically includes:
if any service in the service requests meets the condition one, forbidding processing on all service requests of the target request ip or the service end id:
the first condition is as follows: the access resource type in the service request corresponds to the resource type of the non-compliant buried point event.
The second technical scheme for realizing the purpose of the invention is as follows: a processing terminal, comprising:
a memory for storing program instructions;
and the processor is used for operating the program instructions to execute the steps of the crawler-resisting method based on the multiplexing buried point information.
The invention has the beneficial effects that: the invention does not need to add the request ip into a blacklist, does not need to rely on establishing a list system to realize the anti-crawler, but the service end is unbind from the non-compliance buried point event formed by statistics, realizes high decoupling, has good flexibility, is different from the mode that the prior service end relies on the anti-crawler, namely the anti-crawler of the prior service end, and is highly coupled with the self service, thereby effectively avoiding the defects of repeated development and resource waste caused by the anti-crawler of each service, and can set one or more thresholds according to the self service requirement, so that the service end can automatically decide whether to intercept different service types according to the non-compliance buried point event obtained by accessing a non-compliance buried point event database, for example, aiming at services such as invitation codes, surrogates and the like, each service ticket can automatically decide whether to intercept, and effectively decouple the service and the anti-crawler, and has extremely high flexibility. For example, taking a search service as an example, the downloading frequency of a user can be used as an interception basis, so that a normal user who frequently searches but does not find correct target data can continuously and normally search, and a crawler which continuously changes a keyword and maliciously downloads content can be intercepted, which is difficult to achieve by the existing anti-crawler technology because the service and the anti-crawler are highly coupled.
Drawings
FIG. 1 is a schematic flow chart of a preferred embodiment;
fig. 2 is a schematic diagram of a processing terminal.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Referring to fig. 1, a crawler-resistant method based on multiplexing buried point information includes the following steps:
step 1: acquiring buried point data, wherein the buried point data comprises a request ip, buried point events and a service end id, the buried point events represent resource types accessed by the request ip, one buried point event correspondingly accesses one resource type, and different buried point events correspondingly access different resource types. For example, the point burying event includes accessing a picture, a video, a search, a text, an invitation code, a voucher and the like on a webpage, the picture is one resource, the text is another resource, and the search is also used as one resource, which means that a search operation is performed in a search box of the webpage, for example, a keyword is input in the search box to perform a search, so as to download target resources such as the video, the file, the picture and the like. Therefore, accessing any one of the resources corresponds to a buried point event, and the number of times of accessing the same resource is the number of the buried point events. The request ip is a public network ip, and the request ip comprises a plurality of service ends, and each service end can access resources through the request ip. For example, a company has multiple computers inside, each computer serves as a service end, and each service end uses the same request ip, that is, the public network ip of the company is used.
Each request ip or each service end id corresponds to its own buried point data, the number of each buried point event of each request ip or service end id is counted, and the sum of all the buried point events of the same request ip or service end id is counted, and the sum of all the buried point events is the accumulated buried point event number. The method includes the steps that the number of individual buried point events of each buried point event of a request ip in a preset period time and the accumulated number of buried point events of all buried point events are counted, the number of the individual buried point events refers to the number of times of accessing the request ip to the same resource type in the preset period time, namely the number of times of accessing the same buried point event, and the accumulated number of the buried point events refers to the sum of the number of all the individual buried point events of the request ip in the preset period time.
For example, the buried point events of the request ip within a preset period time (e.g. 24 hours) include accessing pictures (denoted as buried point event 1), accessing videos (buried point event 2), accessing files (buried point event 3), and performing searches (buried point event 4), where the number of buried point events 1 is 5, the number of buried point events 2 is 10, the number of buried point events 3 is 20, and the number of buried point events 4 is 30. Therefore, the number of individual buried point events of the buried point event 1 is 5, the number of individual buried point events of the buried point event 2 is 10, the number of individual buried point events of the buried point event 3 is 20, the number of individual buried point events of the buried point event 4 is 30, and the cumulative number of buried point events is 65.
Step 2: comparing the number of each single buried point event with a preset threshold corresponding to the number of accessed resource type events, comparing the number of accumulated buried point events with a first preset threshold, taking a single buried point event exceeding the corresponding preset threshold as a non-compliant buried point event and writing the non-compliant buried point event into a database, if the number of accumulated buried point events exceeds the first preset threshold, taking the accumulated buried point event as the non-compliant buried point event and writing the non-compliant buried point event into the database, so as to obtain a non-compliant buried point event database, wherein the non-compliant buried point event database stores buried point events exceeding the preset threshold of each request ip or service end id, and if the sum of the number of all the buried point events of the request ip or the service end id exceeds the first preset threshold, all the buried point events are also recorded and stored into the non-compliant buried point event database as one item. Typically, the first predetermined threshold is greater than the predetermined threshold corresponding to each individual buried point event. For example, a certain request ip buried point event includes an access resource type event with an access resource type of picture and an access resource type event with an access resource type of video, and corresponds to a buried point event a and a buried point event B, respectively. And if the number of the buried point events A exceeds a corresponding preset threshold value, namely the number of times of accessing the picture exceeds the threshold value set by picture access, taking the buried point events A as non-compliance buried point events and writing the non-compliance buried point events into the database. Similarly, if the number of the buried point events B does not exceed the corresponding preset threshold, that is, the number of times of accessing the video does not exceed the threshold set by the video access, the buried point event B is not regarded as an unqualified buried point event, that is, is not written into the database. And if the sum of the buried point event quantity of the buried point event A and the buried point event quantity of the buried point event B exceeds a first preset threshold, taking the accumulated events (namely the total access times including the buried point event A and the buried point event B) of the request ip as an unqualified buried point event and storing the unqualified buried point event into an unqualified buried point event database.
Therefore, the last non-compliant buried point event database stores the non-compliant buried point events for the request ip including buried point event a and the total number of visits to buried point event a and buried point event B.
The preset threshold and the first preset threshold corresponding to each resource type event can be determined empirically, and the numerical value of each threshold can be given according to the empirical value. For example, the preset threshold of the buried point event for accessing the picture is set to 50, the preset threshold of the buried point event for accessing the video is 100, and the first preset threshold of all the buried point events for accessing the resources is set to 1000.
And step 3: receiving a service request, wherein the service request comprises a target request ip, a service end id and an access resource type, inquiring whether an unconventional buried point event of the target request ip or the service end id exists in an unconventional buried point event database, if not, processing the service request, and if so, determining whether to process the service request according to one or more of the unconventional buried point events of the target request ip or the service end id as a back-climbing decision basis. The processing of the service request is to allow the target request ip or the service end id to access one or more resources on the web page, and the processing of the service request is not to be performed, that is to say, to prohibit the target request ip or the service end id from accessing one or more resources on the web page.
For example, suppose that the target request ip needs to access 4 resource types on the web page, and the corresponding 4 buried point events are respectively marked as a buried point event a, a buried point event b, a buried point event c, and a buried point event d, and the 4 buried point events are all independent buried point events. The total number of visits to the 4 buried point events is the accumulated number of buried point events, i.e. the total number of visits to each buried point event. The access request of the buried point event a is 20 times within the preset cycle time, the access request of the buried point event b is 40 times within the preset cycle time, the access request of the buried point event c is 120 times within the preset cycle time, and the access request of the buried point event d is 200 times within the preset cycle time. The preset threshold corresponding to the buried point event a is 60, the preset threshold corresponding to the buried point event b is 70, the preset threshold corresponding to the buried point event c is 50, and the preset threshold corresponding to the buried point event d is 300, then the number of the buried point event a, the buried point event b, and the buried point event d corresponding to the resource a and the resource d of the target request ip access does not exceed the respective preset thresholds, and only the number of the buried point event c corresponding to the resource c of the target request ip access exceeds the preset threshold, therefore, the buried point event c of the target request ip is an unqualified buried point event, and the record is stored in the unqualified buried point event database, that is, the number of times that the resource c of the target request ip layer boundary access is recorded exceeds the preset threshold. The service end can determine whether the service end where the subsequent request ip is located can continuously access the resource c according to the non-compliance buried point event.
That is, after receiving a new service request from the service end where the target request ip is located, by querying that the target request ip has an out-of-compliance buried point event, the access of the target request ip to the resource c may be prohibited, and the access of the target request ip to the resource a, the resource b, and the resource d may be permitted, or of course, the access to 4 resources may be permitted.
Similarly, if the number of access times of the resource c accessed by the target request ip is 30, the number of all buried point events (a, b, c, d) accessed by the target request ip does not exceed the respective preset threshold, and meanwhile, the total number of access times of all resources by each buried point event is also counted, that is, the total number of all resource type events (a, b, c, d) is counted. The sum of the number of all resource type events (a, b, c, d) is 20+40+30+200=290, and if the first preset threshold is 500, the sum of the number of all resource type events (a, b, c, d) does not exceed the first preset threshold, the accumulated number of buried point events is a non-compliant buried point event, that is, a compliant buried point event. If the first preset threshold is 220, if the first preset threshold is exceeded, the accumulated buried point event is changed into an unconventional buried point event, the record is also stored in an unconventional buried point event database, when a new service request is received, the service end can determine whether to prohibit the request for ip access to any resource on the webpage according to the accumulated buried point event, and the service end can determine to prohibit access or allow access according to the service requirement.
The crawler-oriented method based on the multiplexing embedded point information provided by this embodiment does not need to add a request ip to a blacklist, does not need to rely on a list establishment system to realize crawler-oriented actions, but the service end is unbind from the irregular embedded point events formed by statistics, realizes high decoupling, has good flexibility, is different from the existing mode that the service end relies on the crawler-oriented actions of the service end to be highly coupled with the service end, and effectively avoids the defects of repeated development and resource waste caused by crawler-oriented actions of each service. For example, taking a search service as an example, the downloading frequency of a user can be used as an interception basis, so that a normal user who frequently searches but does not find correct target data can continuously and normally search, and a crawler which continuously changes a keyword and maliciously downloads content can be intercepted, which is difficult to achieve by the existing anti-crawler technology because the service and the anti-crawler are highly coupled. Specifically, the search frequency and the download frequency are respectively used as independent buried point events, and the corresponding preset thresholds in step 2 are respectively 100 and 200, that is, the search with the search frequency of greater than or equal to 100 is used as an unconventional buried point event, and the download with the download frequency of greater than or equal to 200 is used as an unconventional buried point event. When the step 3 is executed, it is assumed that a preset threshold a corresponding to the download frequency of a user requesting an ip is set to 1000, where the preset threshold a is one of the judgment criteria used as a basis for a back-crawl decision, and the preset threshold a is the download frequency in a unit time, for example, 1000 downloads per hour; similarly, the preset threshold b corresponding to the search frequency is set to 300, that is, the search frequency per unit time, for example, 300 searches per hour; the ratio of the download frequency to the search frequency is 1000/300. Although the previous access request of the request ip is written into the non-compliant buried point database because the search frequency and the download frequency exceed the corresponding preset threshold values in the step 2, the access request needs to be judged according to a back-climbing decision basis when the step 3 is executed, so that the service is effectively decoupled from the back-crawlers. If the number of downloads of a user in a unit time (5 minutes, 10 minutes or one hour and the like) is large but the number of searches is very small, this often means that the user does not really query the required resources, but downloads a large number of searched resources without destination, that is, continuously downloads the resources in the current search result interface, and whether the resources in the current search result interface are the resources required by the user or not, it can be determined that the accumulated buried point event is a non-compliance event and needs to be intercepted, that is, the user needs to be intercepted; however, once the frequent search is identified but the frequent download is not performed or the frequent search and the frequent download are performed, it can be considered that the user is a user who does not search the real resource, and still judges the user to be a compliance event without intercepting, so that the user can not only ensure the continuous normal search and download of the user, but also intercept the malicious downloaded content which is continuously changed by the keyword and is downloaded in large quantity. That is, first, the service request is received in step 3, and it is determined whether the download frequency in the service request exceeds 1000, if not, the service request is processed, which means that the download frequency is processed as long as the download frequency exceeds 1000 regardless of whether the search frequency exceeds 300; then, if the ratio of the actual download frequency to the search frequency of the user requesting the ip is less than or equal to 1000/300, the request ip is not intercepted, even if the single download frequency (namely, a single buried point event) corresponding to the request ip exceeds a preset threshold a or the single search frequency (even if a single buried point event) exceeds a preset threshold b, the request ip is not intercepted; however, if the ratio of the download frequency and the search frequency (i.e. the cumulative buried point event) is greater than 1000/300, the request ip will be intercepted. Therefore, normal users who frequently search but do not find correct target data can continuously and normally search, and crawlers who continuously change malicious downloaded contents of keywords can be intercepted.
And for other cases, interception is carried out, including that the searching frequency exceeds 300 times (exceeds the searching limit of ordinary people), only users who download but do not have searching behaviors do not accord with the downloading behaviors of normal users.
Many users search, each search result may be many, and it may be necessary to continuously download the search result and perform an inquiry to check whether the search result is a target resource required by the user until the search result is really required by the user, and a large number of searches and a large number of downloads are accompanied in the process and are performed synchronously, which is that normal user behavior is not easy to intercept.
Certainly, for a crawler which is frequently searched and frequently downloaded, the crawler may be determined by adjusting the threshold, for example, the threshold is set to be very large, and the crawler is generally not determined and is not intercepted, because the number of the crawlers which are frequently searched and frequently downloaded by the abnormal user is very large, which is far higher than the search frequency and the download frequency of a normal user, so that the crawlers can be distinguished by setting the threshold.
Referring to fig. 2, the present embodiment further provides a processing terminal, which includes:
a memory 101 for storing program instructions;
a processor 102, configured to execute the program instructions to perform the steps of the anti-crawler method based on the reuse burial point information.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (7)

1. A crawler-resisting method based on multiplexing buried point information is characterized by comprising the following steps:
step 1: acquiring buried point data which comprises a request ip or a service end id and a buried point event, wherein the buried point event represents the resource type accessed by the request ip or the service end id,
counting the access times of single buried point events of each request ip or service end id to obtain the number of the single buried point events corresponding to each single buried point event, and counting the sum of the number of all the single buried point events of the same request ip or service end id to obtain the accumulated number of the buried point events of the accumulated buried point events;
and 2, step: comparing the quantity of each single buried point event with a preset threshold corresponding to a single buried point event, comparing the quantity of accumulated buried point events with a first preset threshold, regarding the single buried point event exceeding the corresponding preset threshold as an out-of-compliance buried point event, and regarding the accumulated buried point event as the out-of-compliance buried point event if the quantity of accumulated buried point events exceeds the first preset threshold, thereby obtaining an out-of-compliance buried point event set,
and 3, step 3: receiving a service request, wherein the service request comprises a target request ip or a service end id and also comprises an access resource type,
traversing the set of the unregulated buried point events, judging whether the unregulated buried point events of the target request ip or the service end id exist or not, if not, processing the service request, if so, taking one or more of the unregulated buried point events of the target request ip or the service end id as a back-climbing decision basis to determine whether to process the service request or not,
one or more of the non-compliance buried point events according to the target request ip or the service end id are used as a back-climbing decision basis to determine whether to process the service request, and the specific steps are as follows:
forbidding processing of the service request meeting the first condition, and processing the service request not meeting the first condition:
the first condition is as follows: the access resource type in the service request corresponds to the resource type of the non-compliant buried point event.
2. The anti-crawler method based on multiplexing buried point information as claimed in claim 1, wherein in the step 2, after obtaining the non-compliant buried point event set, the non-compliant buried point event set is written into the database to obtain a non-compliant buried point event database,
in step 3, after receiving the service request, first accessing the non-compliant buried point event database, and traversing the non-compliant buried point event set in the non-compliant buried point event database.
3. The anti-crawler method based on multiplexing embedded point information according to claim 1, wherein the number of times of access to a single embedded point event of each request ip or service end id, and the sum of the number of all single embedded point events of the same request ip or service end id are counted within a preset period.
4. The crawler-based crawler-oriented method based on multiplexing buried point information of claim 1, wherein the first preset threshold is greater than a preset threshold corresponding to each single buried point event.
5. The anti-crawler method based on multiplexing buried point information of claim 1, wherein the buried point event comprises one or more of accessing pictures, videos, searches, texts, invitation codes and vouchers on a webpage.
6. The anti-crawler method based on multiplexing buried point information of claim 1, wherein one or more of the non-compliant buried point events according to the target request ip or the service end id are used as a basis for a back-crawling decision to decide whether to process the service request, and may further be:
if any service in the service requests meets the first condition, all service requests of the target request ip or the service end id are forbidden to be processed:
the first condition is as follows: the access resource type in the service request corresponds to the resource type of the non-compliant buried point event.
7. A processing terminal, characterized in that it comprises:
a memory for storing program instructions;
a processor for executing the program instructions to perform the steps of the anti-crawler method based on the information of the reuse burial point according to any one of claims 1 to 6.
CN202110951654.2A 2021-08-18 2021-08-18 Crawler-resisting method based on multiplexing embedded point information and processing terminal Active CN113660277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110951654.2A CN113660277B (en) 2021-08-18 2021-08-18 Crawler-resisting method based on multiplexing embedded point information and processing terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110951654.2A CN113660277B (en) 2021-08-18 2021-08-18 Crawler-resisting method based on multiplexing embedded point information and processing terminal

Publications (2)

Publication Number Publication Date
CN113660277A CN113660277A (en) 2021-11-16
CN113660277B true CN113660277B (en) 2023-01-06

Family

ID=78481148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110951654.2A Active CN113660277B (en) 2021-08-18 2021-08-18 Crawler-resisting method based on multiplexing embedded point information and processing terminal

Country Status (1)

Country Link
CN (1) CN113660277B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084508A1 (en) * 2015-11-17 2017-05-26 阿里巴巴集团控股有限公司 Method and device for automatically burying points
CN113014623A (en) * 2021-02-05 2021-06-22 招联消费金融有限公司 Method and device for processing real-time streaming data of embedded point, computer equipment and storage medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9043919B2 (en) * 2008-10-21 2015-05-26 Lookout, Inc. Crawling multiple markets and correlating
CN102833668B (en) * 2012-08-20 2015-04-08 中国联合网络通信集团有限公司 Data traffic reminding method and data traffic reminding device
CN104917643B (en) * 2014-03-11 2019-02-01 腾讯科技(深圳)有限公司 Abnormal account detection method and device
CN104869155B (en) * 2015-04-27 2018-09-18 腾讯科技(深圳)有限公司 Data Audit method and device
CN105808639B (en) * 2016-02-24 2021-02-09 平安科技(深圳)有限公司 Network access behavior identification method and device
CN105912934B (en) * 2016-04-20 2018-10-30 迅鳐成都科技有限公司 A kind of data-oriented property right protection it is anti-in climb and visit prosecutor method
CN106021552A (en) * 2016-05-30 2016-10-12 深圳市华傲数据技术有限公司 Internet creeper concurrency data collection method and system based on crowd behavior simulation
CN106060048A (en) * 2016-05-31 2016-10-26 杭州华三通信技术有限公司 Network resource access method and network resource access device
CN110334307A (en) * 2019-07-11 2019-10-15 税友软件集团股份有限公司 A kind of business event method for pushing, device and equipment
CN110958228A (en) * 2019-11-19 2020-04-03 用友网络科技股份有限公司 Crawler access interception method and device, server and computer readable storage medium
CN111556109B (en) * 2020-04-17 2021-05-18 北京达佳互联信息技术有限公司 Request processing method and device, electronic equipment and storage medium
CN111625700B (en) * 2020-05-25 2023-04-07 北京世纪家天下科技发展有限公司 Anti-grabbing method, device, equipment and computer storage medium
CN111930719B (en) * 2020-08-13 2023-09-19 中国工商银行股份有限公司 Database access method, device and system
CN112291263A (en) * 2020-11-17 2021-01-29 珠海大横琴科技发展有限公司 Data blocking method and device
CN113179266A (en) * 2021-04-26 2021-07-27 口碑(上海)信息技术有限公司 Service request processing method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017084508A1 (en) * 2015-11-17 2017-05-26 阿里巴巴集团控股有限公司 Method and device for automatically burying points
CN113014623A (en) * 2021-02-05 2021-06-22 招联消费金融有限公司 Method and device for processing real-time streaming data of embedded point, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113660277A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
US10331863B2 (en) User-generated content permissions status analysis system and method
US9881179B2 (en) User-generated content permissions status analysis system and method
KR101422859B1 (en) Permission-based document server
US7716340B2 (en) Restricting access to a shared resource
US8095547B2 (en) Method and apparatus for detecting spam user created content
US7860971B2 (en) Anti-spam tool for browser
CN103166917A (en) Method and system for network equipment identity recognition
CN102077201A (en) System and method for dynamic and real-time categorization of webpages
US20160203337A1 (en) Identifying private information from data streams
CN109450969B (en) Method and device for acquiring data from third-party data source server and server
US20200014530A1 (en) Citation and Attribution Management Methods and Systems
CN111368227B (en) URL processing method and device
CN112131507A (en) Website content processing method, device, server and computer-readable storage medium
US20160004850A1 (en) Secure download from internet marketplace
US11062019B2 (en) System and method for webpages scripts validation
CN110929129B (en) Information detection method, equipment and machine-readable storage medium
US7970760B2 (en) System and method for automatic detection of needy queries
CN111625700B (en) Anti-grabbing method, device, equipment and computer storage medium
RU2693325C2 (en) Method and system for detecting actions potentially associated with spamming in account registration
US9361198B1 (en) Detecting compromised resources
US9251273B2 (en) Delivering a filtered search result
CN113660277B (en) Crawler-resisting method based on multiplexing embedded point information and processing terminal
KR20180007792A (en) Apparatus and method for providing data based on cloud service
US20140372361A1 (en) Apparatus and method for providing subscriber big data information in cloud computing environment
KR100462829B1 (en) A method for determining validity of command and a system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant