CN115098757A - Method, device, system and equipment for identifying web crawler - Google Patents

Method, device, system and equipment for identifying web crawler Download PDF

Info

Publication number
CN115098757A
CN115098757A CN202210743690.4A CN202210743690A CN115098757A CN 115098757 A CN115098757 A CN 115098757A CN 202210743690 A CN202210743690 A CN 202210743690A CN 115098757 A CN115098757 A CN 115098757A
Authority
CN
China
Prior art keywords
client
characteristic information
browser
parameters
web crawler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210743690.4A
Other languages
Chinese (zh)
Inventor
易旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202210743690.4A priority Critical patent/CN115098757A/en
Publication of CN115098757A publication Critical patent/CN115098757A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

In the method, a block chain platform acquires test characteristic information uploaded by a server and browser characteristic information uploaded by a client, wherein the test characteristic information is a characteristic value of a distinguishable factor for distinguishing a person from a machine during artificial access, and the browser characteristic information is an actual characteristic value of the distinguishable factor. Therefore, the network crawler can be effectively identified, the block chain technology can prevent data from being tampered, and the data security is further improved.

Description

Method, device, system and equipment for identifying web crawler
Technical Field
The application relates to the technical field of internet, in particular to a web crawler identification method, device, system and equipment.
Background
At present, with the rapid development of network technology, more and more information is carried on the network. Web crawlers have been introduced to capture and utilize various information on the web. However, the abuse of web crawler technology will result in the encroachment of large amounts of bandwidth resources and the compromise of illegal acquisition of private or intellectual property information, etc.
Disclosure of Invention
An object of the embodiments of the present application is to provide a web crawler identification method, apparatus, system, and device, which aim to solve the problem that the security of web page data is low due to abuse of web crawler technology.
In a first aspect, an embodiment of the present application provides a web crawler identification method, which is applied to a blockchain platform, and includes: acquiring test characteristic information uploaded by a server, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website; acquiring browser characteristic information uploaded by a client, wherein the browser characteristic information comprises actual characteristic parameters and actual verification parameters of a browser accessing a client webpage at present; and comparing the test characteristic information with the browser characteristic information to identify whether the operation object of the client is a web crawler.
In the implementation process, the blockchain platform acquires test characteristic information uploaded by the server and browser characteristic information uploaded by the client, wherein the test characteristic information is a characteristic value of a distinguishable factor for distinguishing a human and a machine during artificial access, and the browser characteristic information is an actual characteristic value of the distinguishable factor. Therefore, the network crawler can be effectively identified, the block chain technology can prevent data from being tampered, and the data security is further improved.
Further, in some embodiments, the preset verification parameter includes a page dwell time when the person accesses the page; the actual verification parameters include actual page dwell time of a browser currently accessing the client webpage.
In the implementation process, the page dwell time is used as a comparison factor to judge whether the operation object operated by the client is a normal person or a web crawler, so that the identification accuracy is improved.
Further, in some embodiments, the test feature information further includes an IP number segment and a domain name corresponding to the server.
In the implementation process, the IP number section and the domain name corresponding to the server are used as part of comparison factors, so that verification and comparison of a subsequent block chain platform can be improved.
Further, in some embodiments, the comparing the test feature information with the browser feature information to identify whether the current operation object of the client is a web crawler includes: and detecting the matching degree between the test characteristic information and the browser characteristic information, if the matching degree is higher than or equal to a preset value, determining that the current operation object of the client is not the web crawler, and if the matching degree is lower than the preset value, determining that the current operation object of the client is the web crawler.
In the implementation process, a specific implementation means of verification comparison is provided.
Further, in some embodiments, the method further comprises: and transmitting the comparison result to the client so that the client can judge whether to carry out request access interception according to the comparison result.
In the implementation process, the blocking of the access of the web crawler is realized by transmitting the comparison result to the client.
Further, in some embodiments, the transmitting the comparison result to the client includes: and storing the comparison result into a target block so that the client side can obtain the comparison result from the target block.
In the implementation process, the comparison result is stored in the block, so that the comparison result is prevented from being tampered.
In a second aspect, an embodiment of the present application provides a web crawler recognition apparatus, which is applied to a blockchain platform, and includes: the first acquisition module is used for acquiring test characteristic information uploaded by the server, wherein the test characteristic information comprises public characteristic parameters when various browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website; the second acquisition module is used for acquiring the browser characteristic information uploaded by the client, wherein the browser characteristic information comprises actual characteristic parameters and actual verification parameters of a browser currently accessing a client webpage; and the verification comparison module is used for comparing the test characteristic information with the browser characteristic information so as to identify whether the operation object of the client is a web crawler.
In a third aspect, an embodiment of the present application provides a web crawler identification system, including a blockchain platform, a server, and a client, where: the server is used for: uploading test characteristic information to the block chain platform, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers start windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website; the client is used for: uploading browser feature information to the blockchain platform, wherein the browser feature information comprises actual feature parameters and actual verification parameters of a browser accessing a client webpage at present; the blockchain platform is to: and comparing the test characteristic information with the browser characteristic information to identify whether the operation object of the client is a web crawler.
Further, in some embodiments, the blockchain platform is further to: transmitting the comparison result to the client; the client is further configured to: and judging whether to carry out access interception request or not according to the comparison result.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to any of the first aspect when executing the computer program.
In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium having instructions stored thereon, which, when executed on a computer, cause the computer to perform the method according to any one of the first aspect.
In a sixth aspect, an embodiment of the present application provides a computer program product, which when run on a computer, causes the computer to perform the method according to any one of the first aspect.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the above-described techniques.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a web crawler identification method according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a web crawler recognition system according to an embodiment of the present application;
fig. 3 is a block diagram of a web crawler recognition apparatus according to an embodiment of the present application;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
As described in the related art, currently, the abuse of web crawler technology results in a high security of web page data. Based on this, the embodiment of the present application provides a web crawler identification scheme to solve this problem.
As shown in fig. 1, fig. 1 is a flowchart of a web crawler identification method provided in an embodiment of the present application. The method is applied to a blockchain platform. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm, and the like, and the blockchain platform in the embodiment is a platform for ensuring the security of data uploaded by the server and the client by using the blockchain technology to avoid data tampering. The blockchain platform is an intermediate common node, and all nodes of a server and a client can access and perform data interaction.
The method comprises the following steps:
in step 101, test characteristic information uploaded by a server is obtained, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website;
in this embodiment, the server uploads the test feature information to the blockchain platform. The test characteristic information comprises common characteristic parameters when various browser opening windows are accessed. The browser access website has specific characteristic parameters, such as environment information, Hyper Text Transfer Protocol (HTTP) parameter information, request messages, response information and the like, and based on the characteristic parameters, the server can monitor network interaction data for various browsers in a system test stage and acquire common characteristic parameters when various browsers open windows access. The common characteristic parameter is a characteristic value which can be used for distinguishing a human from a machine as a factor of comparison. In addition, the multiple browsers mentioned in this step may include different types of browsers, and may also include different browser versions of the same type of browser.
The test characteristic information further comprises preset verification parameters. A web crawler is a program or script that automatically crawls web information according to certain rules. The difference between the manual website access and the automatic website access can be embodied in some hidden information of the browser, and the verification parameters are preset based on the difference and serve as another factor for verification comparison of a subsequent block chain platform, so that the accuracy of verification comparison can be enhanced. Experiments show that when a normal person visits a website, each page is opened for a certain dwell time, that is, the dwell time of the page, namely the interval time from opening to jumping of the page, is used as a comparison factor, so that whether an operation object which is operated by a client is a normal person or a web crawler can be judged. Thus, in some embodiments, the verification parameter may be a page dwell time, and the preset verification parameter may include a page dwell time upon human access. Specifically, in the system test stage, the residence time of each front-end website page when being respectively visited by a plurality of test users can be collected, and then the residence time of the page corresponding to the front-end website page is set in an averaging manner. Of course, in other embodiments, the verification parameter may also be a parameter set based on other implicit information, such as a movement track range of the mouse, which is not limited in this application.
In the form of presentation, the common characteristic parameter and the verification parameter may be considered as a specific and implicit ID string allocated by the server for a URL (Uniform Resource Locator) of a front-end website HTML (hypertext Markup message) page.
In addition, in some examples, the test feature information may further include an IP number segment and a domain name corresponding to the server. Generally, when a company website server is deployed, a website page URL is bound through an IP number section and a domain name of the server, and the fixed IP number section and domain name access can prevent illegal users from developing webpage crawlers to continuously deploy services to access through the URL. Therefore, verification comparison of a subsequent block chain platform can be enhanced and perfected based on the IP number segment and the domain name corresponding to the server side as a part of the anti-crawler.
In order to ensure the security of data, in some examples, the test feature information may be uploaded to the blockchain platform after the server side encrypts the test feature information through a blockchain encryption technology. Optionally, the encryption technology may be a symmetric encryption technology, and specifically, the server may process a plaintext (i.e., the test feature information) and an encryption key through a symmetric encryption algorithm, change the plaintext into a ciphertext, and upload the ciphertext to the blockchain platform; after the ciphertext is uploaded to the block chain platform, the safety of the ciphertext is guaranteed by a block chain technology, and other people cannot tamper with the ciphertext; if the blockchain platform wants to decode, the ciphertext needs to be decrypted by using the key used for encryption and the inverse algorithm of the same algorithm, so that the ciphertext is restored to readable plaintext. The symmetric Encryption Algorithm may be any one of DES (Data Encryption Standard), IDEA (International Data Encryption Algorithm), and the like. Of course, in other embodiments, the encryption technique may also be an asymmetric encryption technique. The specific encryption process and principle are detailed in the related art, and are not described in detail herein.
In step 102, browser feature information uploaded by a client is obtained, wherein the browser feature information is an actual feature parameter and an actual verification parameter of a browser accessing a webpage of the client at present;
in an actual scene, a server system is developed and deployed through a company website server, the presentation form is a website, then, an operation object can access a client webpage through a browser, and the solution of the embodiment is to identify whether the operation object is a web crawler.
In this embodiment, the client uploads the browser feature information to the blockchain platform. Corresponding to the public characteristic parameters, the actual characteristic parameters may include environment information of a browser currently accessing the client webpage, HTTP parameter information, request message, response information, and the like; likewise, corresponding to the preset verification parameter, the actual verification parameter is an actual value of the characteristic information of the browser currently accessing the client webpage with respect to the verification parameter, for example, in some embodiments, the actual verification parameter may be an actual page dwell time of the browser currently accessing the client webpage. Of course, where the test profile includes other distinguishable automated and human factor profiles, the browser profile herein may also include corresponding actual values.
In some embodiments, the browser feature information may be crawled by a js (javascript) program embedded in the client webpage. The JS program can be regarded as an automation module embedded in the client webpage, and can capture hidden information of a currently externally accessed browser, including retention time ratio of the currently accessed page and the like, and upload the information to the block chain, so that the block chain platform can obtain the characteristic information of the browser for subsequent verification and comparison.
In step 103, comparing the test feature information with the browser feature information to identify whether the current operation object of the client is a web crawler.
And comparing the test characteristic information uploaded by the server side with the browser characteristic information uploaded by the client side by the block chain platform, wherein the comparison can be detecting the matching degree between the test characteristic information and the browser characteristic information. The test characteristic information can be considered as characteristic information presented by the browser when the operating object being operated by the client side is a normal person, so that if the matching degree is higher than or equal to a preset value as a comparison result, the test characteristic information uploaded by the server side is closer to the browser characteristic information uploaded by the client side, and the operating object being operated by the client side can be determined to be the normal person; on the contrary, if the matching degree is lower than the preset value as a comparison result, it is indicated that the test feature information uploaded by the server and the browser feature information uploaded by the client are different, and it can be determined that the operating object being operated by the client is the web crawler. Optionally, when the block chain platform is aligned, a similarity measurement method may be used for the alignment, such as euclidean distance, manhattan distance, pearson correlation coefficient, and the like, and a trained machine learning model may also be used to obtain the alignment result, which is not limited in this application.
After the comparison result is obtained, the block chain platform can transmit the comparison result to the client, so that the client can determine whether to perform access interception request according to the comparison result. Specifically, under the condition that the JS program is embedded in the client webpage, the block chain platform can transmit the comparison result to the JS program, so that the JS program can determine whether to request access interception of the website according to the comparison result. Correspondingly, when the matching degree is higher than or equal to the preset value as a comparison result, the JS program can judge that the request access interception is not needed so as to avoid blocking normal access; and when the matching degree is lower than the preset value as a comparison result, the JS program can judge that the request access interception is required to block the access of the web crawler.
Further, to prevent the comparison result from being tampered, in some embodiments, the blockchain platform may store the comparison result in the target block, so that the client obtains the comparison result from the target block. The block chain is a chain formed by one block and another block, each block stores certain information, and generally, in order to modify the information in the block, more than half of the nodes must be authenticated and the information in all the nodes must be modified, so that the information in the block is difficult to tamper. Therefore, the data security of the comparison result is guaranteed.
In addition, considering that the test characteristic information used for comparison is characteristic data acquired by the server in the system test stage and the deployment stage, and may not be perfect, in the subsequent application stage, the corresponding characteristic data used for comparison can be perfect through real-time network monitoring, so that the perfection and the reinforcement of the function of identifying the web crawler by the block chain platform are realized.
According to the method and the device for identifying the operation object of the client, the block chain platform obtains the test characteristic information uploaded by the server and the browser characteristic information uploaded by the client, wherein the test characteristic information is a characteristic value of a distinguishable factor for distinguishing a person and a machine during artificial access, and the browser characteristic information is an actual characteristic value of the distinguishable factor. Therefore, the network crawler can be effectively identified, the block chain technology can prevent data from being tampered, and the data security is further improved.
To explain the web crawler identification scheme of the present application in more detail, a specific embodiment is described as follows:
the present embodiment provides a web crawler recognition system, as shown in fig. 2, fig. 2 is a schematic diagram of the web crawler recognition system, the web crawler recognition system includes a blockchain platform 21, a server 22, and a client 23, where: the server 22 is configured to: uploading test characteristic information to the block chain platform 21, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website; the client 23 is configured to: uploading browser feature information to the blockchain platform 21, wherein the browser feature information comprises actual feature parameters and actual verification parameters of a browser currently accessing a client webpage; the blockchain platform 21 is configured to: and comparing the test characteristic information with the browser characteristic information to identify whether the operation object of the client 23 is a web crawler. In addition, the blockchain platform 21 is further configured to: transmitting the comparison result to the client 23; the client 23 is further configured to: and judging whether to perform request access interception according to the comparison result.
Specifically, the workflow of the web crawler recognition system includes:
s201, the server 22 monitors network interaction data for various browsers in a system test stage, and public characteristic data values accessed by opening windows of the various browsers are obtained;
the browser has specific characteristic parameters, such as environment information, HTTP parameter information, request message, response information, and the like, when accessing the website, in this embodiment, the server 22 may obtain the public characteristic data value by intelligent data extraction;
s202, the server 22 allocates a specific and implicit ID string aiming at the URL of the HTML page of the front-end website, wherein the ID string comprises a public characteristic data value pre-acquired in a testing stage and a preset page retention time;
considering that when a normal person accesses a website, each page is opened for a specific stay time, and the specific stay time is different from the stay time of the page when a machine automatically accesses the website, therefore, the page interval jump time is preset and is bound with the allocated ID value to be encrypted, so that the verification and comparison of the subsequent block chain platform 21 are facilitated;
s203, the server 22 obtains test characteristic information by combining the ID strings with the IP number section and the domain name of the server, encrypts the test characteristic information by the encryption technology of the block chain and uploads the encrypted test characteristic information to the block chain platform 21;
generally, when a company website server is deployed, a website page URL is bound through a server IP number segment and a domain name, and the fixed IP number segment and domain name access can prevent an illegal user from developing a webpage crawler to continuously deploy services to access through the URL, so that the server IP number segment and the domain name are used as part of subsequent comparison factors, and the accuracy is improved;
s204, a JS program embedded in the client 23 captures the characteristic information of the browser accessed from the outside at present and uploads the characteristic information to a block chain;
corresponding to the test characteristic information, the characteristic information of the browser comprises actual characteristic parameters of the current externally accessed browser, namely current actual environment information, HTTP parameter information, request messages, response information and the like, and also comprises actual page retention time;
s205, comparing the test characteristic information with the characteristic information of the browser by the block chain platform 21, and transmitting a comparison result to the client 23;
the block chain platform 21 is used as a role of an intermediate agent, information sent by the JS program is compared based on the URL of the page, and if the test characteristic information is inconsistent with the characteristic information of the browser, it is indicated that the operation object at the client side is a web crawler;
s206, the client 23 determines whether to intercept the request access of the website according to the comparison result of the blockchain platform 21, so as to block the web crawler access.
In addition, in subsequent application, the system can perfect corresponding characteristic data for comparison through real-time network monitoring.
From the above process, the system of this embodiment accurately identifies whether the operation object on the client side is a web crawler based on the characteristic values of the distinguishable factors that can distinguish between human beings and machines, and the blockchain platform is used as the role of the intermediate agent, the security of the data is guaranteed by the blockchain technology, so as to effectively prevent the data from being tampered, and compared with a third-party platform, the blockchain platform can be safe regardless of the threat of concentrated attacks by attackers based on the decentralized characteristic. Therefore, when the webpage data of the company are protected based on the system of the embodiment, the data security can be improved, and the data loss of the company is avoided.
Corresponding to the embodiment of the method, the specification also provides an embodiment of the web crawler recognition device and a terminal applied by the web crawler recognition device.
As shown in fig. 3, fig. 3 is a block diagram of a web crawler recognition apparatus provided in an embodiment of the present application, where the web crawler recognition apparatus is applied to a blockchain platform, and includes:
the first obtaining module 31 is configured to obtain test feature information uploaded by a server, where the test feature information includes public feature parameters when multiple browsers open windows to access, and preset verification parameters; the verification parameters are set based on the difference between the manual access website and the automatic access website;
a second obtaining module 32, configured to obtain browser feature information uploaded by a client, where the browser feature information includes an actual feature parameter and an actual verification parameter of a browser currently accessing a client webpage;
and a verification comparison module 33, configured to compare the test feature information with the browser feature information to identify whether an operation object of the client is a web crawler.
In some embodiments, the preset verification parameters include a retention time of a page during manual access; the actual verification parameters include actual page dwell time of a browser currently accessing the client webpage.
In some embodiments, the test feature information further includes an IP number segment and a domain name corresponding to the server.
In some embodiments, the verification and alignment module is specifically configured to: and detecting the matching degree between the test characteristic information and the browser characteristic information, if the matching degree is higher than or equal to a preset value, determining that the current operation object of the client is not the web crawler, and if the matching degree is lower than the preset value, determining that the current operation object of the client is the web crawler.
In some embodiments, the above apparatus further comprises: and the transmission module is used for transmitting the comparison result to the client so that the client can judge whether to carry out request access interception according to the comparison result.
In some embodiments, the transfer module is specifically configured to: and storing the comparison result into a target block so that the client side can obtain the comparison result from the target block.
Fig. 4 shows a block diagram of an electronic device according to an embodiment of the present disclosure, where fig. 4 is a block diagram of the electronic device. The electronic device may include a processor 410, a communication interface 420, a memory 430, and at least one communication bus 440. Wherein the communication bus 440 is used to enable direct connection communication of these components. In this embodiment, the communication interface 420 of the electronic device is used for performing signaling or data communication with other node devices. The processor 410 may be an integrated circuit chip having signal processing capabilities.
The Processor 410 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 410 may be any conventional processor or the like.
The Memory 430 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Read Only Memory (EPROM), an electrically Erasable Read Only Memory (EEPROM), and the like. The memory 430 stores computer readable instructions that, when executed by the processor 410, enable the electronic device to perform the various steps involved in the method embodiment of fig. 1 described above.
Optionally, the electronic device may further include a memory controller, an input output unit.
The memory 430, the memory controller, the processor 410, the peripheral interface, and the input/output unit are electrically connected to each other directly or indirectly to implement data transmission or interaction. For example, these components may be electrically connected to each other via one or more communication buses 440. The processor 410 is used to execute executable modules stored in the memory 430, such as software functional modules or computer programs included in the electronic device.
The input and output unit is used for providing a task for a user to create and start an optional time period or preset execution time for the task creation so as to realize the interaction between the user and the server. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.
It will be appreciated that the configuration shown in fig. 4 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 4 or may have a different configuration than shown in fig. 4. The components shown in fig. 4 may be implemented in hardware, software, or a combination thereof.
The embodiment of the present application further provides a storage medium, where the storage medium stores instructions, and when the instructions are run on a computer, when the computer program is executed by a processor, the method in the method embodiment is implemented, and in order to avoid repetition, details are not repeated here.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the method embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A web crawler identification method is applied to a block chain platform and comprises the following steps:
acquiring test characteristic information uploaded by a server, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website;
acquiring browser characteristic information uploaded by a client, wherein the browser characteristic information comprises actual characteristic parameters and actual verification parameters of a browser accessing a client webpage at present;
and comparing the test characteristic information with the browser characteristic information to identify whether the operation object of the client is a web crawler.
2. The method of claim 1, wherein the preset authentication parameters include a page dwell time upon human access; the actual verification parameters include actual page dwell time of a browser currently accessing the client webpage.
3. The method according to claim 1, wherein the testing feature information further includes an IP number segment and a domain name corresponding to the server.
4. The method according to claim 1, wherein the comparing the test feature information and the browser feature information to identify whether the current operation object of the client is a web crawler includes:
and detecting the matching degree between the test characteristic information and the browser characteristic information, if the matching degree is higher than or equal to a preset value, determining that the current operation object of the client is not the web crawler, and if the matching degree is lower than the preset value, determining that the current operation object of the client is the web crawler.
5. The method of claim 1, further comprising:
and transmitting the comparison result to the client so that the client can judge whether to carry out request access interception according to the comparison result.
6. The method according to claim 5, wherein said transmitting the comparison result to the client comprises:
and storing the comparison result into a target block so that the client side can obtain the comparison result from the target block.
7. The utility model provides a web crawler recognition device, its characterized in that is applied to the block chain platform, includes:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring test characteristic information uploaded by a server, and the test characteristic information comprises public characteristic parameters when various browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website;
the second acquisition module is used for acquiring browser characteristic information uploaded by the client, wherein the browser characteristic information comprises actual characteristic parameters and actual verification parameters of a browser currently accessing a client webpage;
and the verification comparison module is used for comparing the test characteristic information with the browser characteristic information so as to identify whether the operation object of the client is a web crawler.
8. The utility model provides a web crawler identification system which characterized in that, includes block chain platform, server and customer end, wherein:
the server is used for: uploading test characteristic information to the block chain platform, wherein the test characteristic information comprises public characteristic parameters when a plurality of browsers open windows to access and preset verification parameters; the verification parameters are set based on the difference between the manually accessed website and the automatically accessed website;
the client is used for: uploading browser feature information to the blockchain platform, wherein the browser feature information comprises actual feature parameters and actual verification parameters of a browser accessing a client webpage at present;
the blockchain platform is to: and comparing the test characteristic information with the browser characteristic information to identify whether the operation object of the client is a web crawler.
9. The system of claim 8, wherein the blockchain platform is further configured to: transmitting the comparison result to the client;
the client is further configured to: and judging whether to carry out access interception request or not according to the comparison result.
10. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 6 when executing the computer program.
CN202210743690.4A 2022-06-27 2022-06-27 Method, device, system and equipment for identifying web crawler Pending CN115098757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210743690.4A CN115098757A (en) 2022-06-27 2022-06-27 Method, device, system and equipment for identifying web crawler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210743690.4A CN115098757A (en) 2022-06-27 2022-06-27 Method, device, system and equipment for identifying web crawler

Publications (1)

Publication Number Publication Date
CN115098757A true CN115098757A (en) 2022-09-23

Family

ID=83294286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210743690.4A Pending CN115098757A (en) 2022-06-27 2022-06-27 Method, device, system and equipment for identifying web crawler

Country Status (1)

Country Link
CN (1) CN115098757A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150542A (en) * 2023-04-21 2023-05-23 河北网新数字技术股份有限公司 Dynamic page generation method and device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150542A (en) * 2023-04-21 2023-05-23 河北网新数字技术股份有限公司 Dynamic page generation method and device and storage medium

Similar Documents

Publication Publication Date Title
US10498761B2 (en) Method for identifying phishing websites and hindering associated activity
US10248782B2 (en) Systems and methods for access control to web applications and identification of web browsers
Gupta et al. XSS-secure as a service for the platforms of online social network-based multimedia web applications in cloud
CN107209830B (en) Method for identifying and resisting network attack
KR101001132B1 (en) Method and System for Determining Vulnerability of Web Application
CN101964025B (en) XSS detection method and equipment
US8392963B2 (en) Techniques for tracking actual users in web application security systems
US11503072B2 (en) Identifying, reporting and mitigating unauthorized use of web code
CN110535806B (en) Method, device and equipment for monitoring abnormal website and computer storage medium
CA3126127A1 (en) Systems and methods of adaptively identifying anomalous network communication traffic
US20160226900A1 (en) Systems and methods for detecting and addressing html-modifying malware
Shrivastava et al. XSS vulnerability assessment and prevention in web application
CN106548075B (en) Vulnerability detection method and device
CN104956372A (en) Determining coverage of dynamic security scans using runtime and static code analyses
US20200287934A1 (en) Detecting realtime phishing from a phished client or at a security server
KR101902747B1 (en) Method and Apparatus for Analyzing Web Vulnerability for Client-side
US8893270B1 (en) Detection of cross-site request forgery attacks
CN108259619A (en) Network request means of defence and network communicating system
CN109818906B (en) Equipment fingerprint information processing method and device and server
KR101372906B1 (en) Method and system to prevent malware code
CN115098757A (en) Method, device, system and equipment for identifying web crawler
Pramono Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: Case study: Statistics of Indonesia (BPS) website
KR101464736B1 (en) Security Assurance Management System and Web Page Monitoring Method
Torrano-Gimenez et al. A self-learning anomaly-based web application firewall
Roopak et al. On effectiveness of source code and SSL based features for phishing website detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination