WO2022067157A1 - Method and system of detecting a data-center bot interacting with a web page or other source of content - Google Patents
Method and system of detecting a data-center bot interacting with a web page or other source of content Download PDFInfo
- Publication number
- WO2022067157A1 WO2022067157A1 PCT/US2021/052144 US2021052144W WO2022067157A1 WO 2022067157 A1 WO2022067157 A1 WO 2022067157A1 US 2021052144 W US2021052144 W US 2021052144W WO 2022067157 A1 WO2022067157 A1 WO 2022067157A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- machine
- request
- gpu
- code
- content
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000009877 rendering Methods 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 17
- 238000002372 labelling Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 3
- 230000002401 inhibitory effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 description 34
- 230000006870 function Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0248—Avoiding fraud
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/101—Access control lists [ACL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2133—Verifying human interaction, e.g., Captcha
Definitions
- This application relates generally to web page management, and more specifically to a system, article of manufacture and method of detecting a data-center bot interacting with a web page.
- Web traffic originating from data centers could be bot traffic programmed to masquerade as humans.
- data-center bots can be used to commit false impression counts for a web page. Advertisers may receive false impression counts and thus be defrauded for advertising payments to a website. Accordingly, improvements to detecting a data-center bot interacting with a web page can be implemented.
- a computerized method useful for detecting a data-center bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine, and the step of with the code and in response to the API request or request for the content, executing instructions in the code to request graphic processing unit (GPU) information of the machine, and detecting, upon return by the machine from the execution of the instructions in the code, that the machine is in a GPU not-present state, and labeling the machine as not a visually operated device.
- API application programming interface
- GPU graphic processing unit
- a computerized method useful for a detecting a datacenter bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine, and the step of with the code, executing a function to request graphic processing unit (GPU) information of the machine, detecting, based on an output of the function, that the GPU information is missing or false and labeling the machine as not a visually operated device.
- API application programming interface
- GPU graphic processing unit
- a computerized method useful for a detecting a datacenter bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine, and the step of, with the code, executing a function to request graphic processing unit (GPU) information of the machine, and utilizing the code, (a) when the function does not throw an error or an exception, to determine that the machine has a GPU capability set as a binary true state of the machine, or (b) when the function throws an error or an exception, to determine that the machine has a GPU capability set as a binary false state.
- the GPU capability is represented as a binary true state of the machine, the machine may be labeled as a visually operated device, and when the GPU capability is represented as a binary false state of the machine, the machine may be labeled as a not visually operated device.
- a computerized method useful for detecting a data-center bot interacting with a web page includes the step of inserting a code within a web page source.
- the computerized method includes the step of detecting that the web page is visited by a machine, wherein the machine is running a web browser to access the web page.
- the computerized method includes the step of rendering and loading the web page with the code in the web browser of the machine.
- the computerized method includes the step of, with the code, utilizing an application programming interface (API) to perform an operation on a Graphics Processing Unit (GPU) of the machine.
- API application programming interface
- Figure 1 illustrates an example system detecting a bot accessing a web page, according to some embodiments.
- Figure 2 depicts an exemplary computing system that can be configured to perform any one of the processes provided herein.
- Figure 3 is a block diagram of a sample computing environment that can be utilized to implement various embodiments.
- Figure 4 illustrates an example process for labelling a visit to a web page, according to some embodiments.
- Figure 5 illustrates an example process for script tag generation via generation server, according to some embodiments.
- Figure 6 illustrates script generation for a client side, according to some embodiments.
- Figure 7 illustrates a graphical/symbolic representation of the various steps of process, according to some embodiments.
- Figure 8 illustrates an example process, according to some embodiments.
- Figure 9 illustrates an example process, according to some embodiments
- Figure 10 illustrates a graphical/symbolic representation of the various steps of process 900, according to some embodiments.
- Figure 11 illustrates an example of a snippet of code that can be inserted in an API employing WebGL or OpenGL, according to some embodiments.
- Figure 12 illustrates a computerized method useful for detecting a data-center bot interacting with a web page, according to some embodiments.
- an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
- API Application programming interface
- Bot can be a software agent that visits web pages or other content, via a content distribution network, such as, inter alia: a social bot, a web crawler, an Internet bot, etc.
- a content distribution network such as, inter alia: a social bot, a web crawler, an Internet bot, etc.
- Graphics processing unit can be a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.
- HTML5 can be a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and current version of the Hypertext Markup Language (HTML) standard.
- HTML Hypertext Markup Language
- iframe can allow a visual HTML browser window to be split into segments, each of which can show a different document.
- RGBA stands for red green blue alpha.
- Script tag (a ⁇ script> tag) can be used to define a client-side script (e.g. with JavaScript).
- a ⁇ script> element can contain scripting statements and/or point to an external script file through the SRC attribute (used to identify the location of a resource which relates to an element).
- Example uses can be image manipulation, form validation, and dynamic changes of content.
- Web browser can be a software application for retrieving, presenting, and traversing information resources on the World Wide Web.
- WebGPU is a web standard and JavaScript API for accelerated graphics and computing that can provide various 3D graphics and computation capabilities. WebGPU exposes an API for performing operations, such as rendering and computation, on a Graphics Processing Unit.
- FIG. 1 illustrates an example system detecting a bot accessing a web page, according to some embodiments.
- System 100 can include various processes, such as processes 300-1000. These processes can be implemented by systems 200 and 300 infra.
- system 100 can detect bots accessing any web document/application running a web technology such as HTML5, running web documents, executing JavaScript code, etc.
- System 100 can paste a tag into a web document.
- the tag can be code.
- the code can analyze a machine accessing the web document and determine if it is a bot.
- System 100 can flag the machine and/or flag the machine. Other entities can utilize the flag to prevent further access to web documents.
- System 100 can look for a device marker that indicates that the machine has graphic capability (e.g. see infra).
- System 100 can use a web-based API to make a call to determine if the machine requesting access to the web document includes a graphic processing system. Based on this a value is returned. This value can be based on the type of graphics processing system and/or whether a graphics processing system is extant in the machine. If not, then system 100 can determine that the machine is not operated by a human user but a bot.
- FIG. 2 depicts an exemplary computing system 200 that can be configured to perform any one of the processes provided herein.
- computing system 200 may include, for example, a processor, memory, storage, and I/O devices (e.g., monitor, keyboard, disk drive, Internet connection, etc.).
- computing system 200 may include circuitry or other specialized hardware for carrying out some or all aspects of the processes.
- computing system 200 may be configured as a system that includes one or more units, each of which is configured to carry out some aspects of the processes either in software, hardware, or some combination thereof.
- FIG. 2 depicts computing system 200 with a number of components that may be used to perform any of the processes described herein.
- the main system 202 includes a motherboard 204 having an I/O section 206, one or more central processing units (CPU) 208, and a memory section 210, which may have a flash memory card 212 related to it.
- the I/O section 206 can be connected to a display 214, a keyboard and/or other user input (not shown), a disk storage unit 216, and a media drive unit 218.
- the media drive unit 218 can read/write a computer-readable medium 220, which can contain programs 222 and/or data.
- Computing system 200 can include a web browser.
- computing system 200 can be configured to include additional systems in order to fulfill various functionalities.
- Computing system 200 can communicate with other computing devices based on various computer communication protocols such a Wi-Fi, Bluetooth® (and/or other standards for exchanging data over short distances includes those using short-wavelength radio transmissions), USB, Ethernet, cellular, an ultrasonic local area communication protocol, etc.
- FIG. 3 is a block diagram of a sample computing environment 300 that can be utilized to implement various embodiments.
- the system 300 further illustrates a system that includes one or more client(s) 302.
- the client(s) 302 can be hardware and/or software (e.g., threads, processes, computing devices).
- the system 300 also includes one or more server(s) 304.
- the server(s) 304 can also be hardware and/or software (e.g., threads, processes, computing devices).
- One possible communication between a client 302 and a server 304 may be in the form of a data packet adapted to be transmitted between two or more computer processes.
- the system 300 includes a communication framework 310 that can be employed to facilitate communications between the client(s) 302 and the server(s) 304.
- the client(s) 302 are connected to one or more client data store(s) 306 that can be employed to store information local to the client(s)
- system 300 can instead be a collection of remote computing services constituting a cloud-computing platform.
- Figure 4 illustrates, as an example of a computerized method useful for detecting a data-center bot interacting with a content source, process 400 for labelling a visit to a web page, according to some embodiments.
- the code is inserted within the web page source.
- the web page is visited by a machine. A machine that can run a web browser environment.
- the web page is loaded with code from step 402 is loaded by the device.
- the code creates a hidden canvas element and executes a function to obtain GPU information of the machine.
- the code can implement the following steps.
- an HTML ⁇ canvas> element can be used to draw graphics, on the fly, via JavaScript.
- a hidden canvas element is used for the purpose of checking low level properties/capabilities. It is hidden from the user so as to not affect the user experience, or be detected by the user.
- the code can set a flag.
- the code can publish an event to other code/libraries to execute further actions.
- the code can be labeled as invalid bot traffic.
- the GPU information if the GPU information is missing, false, undefined, etc. then the code labels the visit as invalid bot traffic.
- the code labels the visit as not data-center bot traffic (e.g. web traffic originating from a data center programmed to masquerade as a human, etc.).
- the code can be a JavaScript code.
- the web page source can be an HTML5 web page document.
- the GPU information can include, inter alia: the GPU vendor, type, engine, etc.
- FIG. 5 illustrates an example process 500 for script tag generation via generation server which can be one of the servers 304, according to some embodiments. This further augments the GPU detection methodology by issuing a 'drawing challenge' to the device.
- the device receives values and must "draw a square" with a specific number of pixels. It is worth noting that only devices with GPUs can be able to do this in a sufficient and quick manner.
- an API request received from the device is forwarded to the generation server.
- the generation server in response to receiving the request, generates drawing challenge code. For example, the generation server then generates random values for: R(ed), G(reen), B(lue), A(lpha), and (Width and Height).
- the Alpha value can be the alpha compositing value.
- a generation server can be a server environment that can generate specific snippets of 'drawing challenge' code". It is noted that process 500 is this method is optional and can be used in the case a GPU is reported.
- various countermeasures may be taken. For example, any one or more of the following counter actions may be taken: disabling the content on the machine (e.g. assuming the content has already been provided); inhibiting access by the machine to the API or content source; blacklisting a network address of the machine, etc.
- inventive methods of this disclosure have been discussed supra in the context of a web page, as an example.
- bots also access mobile applications and other content sources, particularly those that employ server-side execution or cloud execution. It should be appreciated that the aforementioned methodologies and processes can be adapted for applications other than web pages.
- Figure 6 illustrates script generation for a client side, according to some embodiments.
- the generation server creates colored boxes with values and retrieves raw pixel data.
- the generation server calculates hash with pixels and associates RGBA and width/height values with the hash and stores.
- the generation server outputs a script with RGBA and width values for client side.
- Process 600 can include the 'server side' part of the 'drawing challenge' (e.g. the association of the RGBA + width + height values with a hash to be checked, etc.).
- Figure 7 illustrates a graphical/symbolic representation of the various steps of process 600, according to some embodiments
- FIG. 8 illustrates an example process 800, according to some embodiments.
- a generated script is added to any HTML Page.
- This can be a publisher page or embedded (e.g. an iframe) advertisement creative HTML.
- the code is executed when the web browser and/or application loads the HTML content.
- the code has the relevant RGBA values and then generates a square with a width plus height value.
- Process 800 can include the 'client side' part of the 'drawing challenge'. The device, if it really does have a GPU, must draw the associated square, get all the pixels, and calculate a hash of the pixels.
- FIG. 9 illustrates an example process 900, according to some embodiments.
- step 902 pixel values are derived from generated square and hashed.
- step 904 Hash, RGBA and width values are sent to a generation server.
- step 906 if there is a match, the request is flagged as "not data center bot traffic". If there is no match, the request is flagged as "data center bot traffic”.
- Process 900 can be where the client and server come together. The calculated hash and the RGBA + width + height values on the client side are sent to the server and the server must determine if these values all match. If they do match, the device does have a valid GPU. If they do not match, the device is deemed to be attempting to spoof a GPU and is invalid (e.g. labeled as data center bot).
- Figure 10 illustrates a graphical/symbolic representation of the various steps of process 900, according to some embodiments.
- Figure 11 illustrates an example of a snippet of code 1100 that can be inserted in an API employing WebGL and/or OpenGL, according to some embodiments.
- the function can be used to obtain GPU information provided in the API (e.g., WebGL, OpenGL, etc.).
- FIG. 12 illustrates a computerized method useful for detecting a data-center bot interacting with a web page, according to some embodiments.
- process 1200 inserts a code within a web page source.
- process 1200 detects that the web page is visited by a machine. The machine is running a web browser to access the web page.
- process 1200 renders and loads the web page with the code in the web browser of the machine.
- process 1200 utilizes an application programming interface (API) to perform an operation on a Graphics Processing Unit (GPU) of the machine.
- API application programming interface
- GPU Graphics Processing Unit
- process 1200 executes the operation to obtain a GPU information of the machine.
- the API for the operation on the GPU is a WebGPU API.
- the operation can be a rendering operation on the GPU. Alternatively, the operation can be a computation operation on the GPU.
- the various devices, modules, etc. described herein can be enabled and operated using hardware circuitry, firmware, software or any combination of hardware, firmware, and software (e.g., embodied in a machine-readable medium).
- the various operations, processes, and methods disclosed herein can be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and can be performed in any order (e.g., including using means for achieving the various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
- the machine-readable medium can be a non-transitory form of machine-readable medium.
Abstract
In one aspect, a computerized method useful for detecting a data-center bot interacting with a web page includes the step of inserting a code within a web page source. The computerized method includes the step of detecting that the web page is visited by a machine, wherein the machine is running a web browser to access the web page. The computerized method includes the step of rendering and loading the web page with the code in the web browser of the machine. The code utilizes an API to perform an operation on a GPU of the machine.
Description
METHOD AND SYSTEM OF DETECTING A DATA-CENTER BOT INTERACTING WITH A WEB PAGE OR OTHER SOURCE OF CONTENT
BACKGROUND
Field of the Invention:
[0001] This application relates generally to web page management, and more specifically to a system, article of manufacture and method of detecting a data-center bot interacting with a web page.
Description of the Related Art:
[0002] Web traffic originating from data centers could be bot traffic programmed to masquerade as humans. For example, data-center bots can be used to commit false impression counts for a web page. Advertisers may receive false impression counts and thus be defrauded for advertising payments to a website. Accordingly, improvements to detecting a data-center bot interacting with a web page can be implemented.
BRIEF SUMMARY OF THE INVENTION
[0003] In an inventive aspect, a computerized method useful for detecting a data-center bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine, and the step of with the code and in response to the API request or request for the content, executing instructions in the code to request graphic processing unit (GPU) information of the machine, and detecting, upon return by the machine from the execution of the instructions in the code, that the machine is in a GPU not-present state, and labeling the machine as not a visually operated device.
[0004] In another inventive aspect, a computerized method useful for a detecting a datacenter bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine,
and the step of with the code, executing a function to request graphic processing unit (GPU) information of the machine, detecting, based on an output of the function, that the GPU information is missing or false and labeling the machine as not a visually operated device.
[0005] In another inventive aspect, a computerized method useful for a detecting a datacenter bot interacting with a content source includes the step of inserting a code within an API (application programming interface) or content from the content source, the step of detecting that an API request or request for the content is received from a machine, and the step of, with the code, executing a function to request graphic processing unit (GPU) information of the machine, and utilizing the code, (a) when the function does not throw an error or an exception, to determine that the machine has a GPU capability set as a binary true state of the machine, or (b) when the function throws an error or an exception, to determine that the machine has a GPU capability set as a binary false state. When the GPU capability is represented as a binary true state of the machine, the machine may be labeled as a visually operated device, and when the GPU capability is represented as a binary false state of the machine, the machine may be labeled as a not visually operated device.
[0006] In still yet another inventive aspect, a computerized method useful for detecting a data-center bot interacting with a web page includes the step of inserting a code within a web page source. The computerized method includes the step of detecting that the web page is visited by a machine, wherein the machine is running a web browser to access the web page. The computerized method includes the step of rendering and loading the web page with the code in the web browser of the machine. The computerized method includes the step of, with the code, utilizing an application programming interface (API) to perform an operation on a Graphics Processing Unit (GPU) of the machine.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Figure 1 illustrates an example system detecting a bot accessing a web page, according to some embodiments.
[0008] Figure 2 depicts an exemplary computing system that can be configured to
perform any one of the processes provided herein.
[0009] Figure 3 is a block diagram of a sample computing environment that can be utilized to implement various embodiments.
[0010] Figure 4 illustrates an example process for labelling a visit to a web page, according to some embodiments.
[0011] Figure 5 illustrates an example process for script tag generation via generation server, according to some embodiments.
[0012] Figure 6 illustrates script generation for a client side, according to some embodiments.
[0013] Figure 7 illustrates a graphical/symbolic representation of the various steps of process, according to some embodiments.
[0014] Figure 8 illustrates an example process, according to some embodiments.
[0015] Figure 9 illustrates an example process, according to some embodiments
[0016] Figure 10 illustrates a graphical/symbolic representation of the various steps of process 900, according to some embodiments.
[0017] Figure 11 illustrates an example of a snippet of code that can be inserted in an API employing WebGL or OpenGL, according to some embodiments.
[0018] Figure 12 illustrates a computerized method useful for detecting a data-center bot interacting with a web page, according to some embodiments.
[0019] The Figures described above are a representative set, and are not exhaustive with respect to embodying the invention.
DESCRIPTION
[0020] Disclosed are a system, method, and article of manufacture for detecting a datacenter bot interacting with a web page or other source of content. The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein can be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing
from the spirit and scope of the various embodiments.
[0021] Reference throughout this specification to 'one embodiment,' 'an embodiment,' 'one example,' or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases 'in one embodiment,' 'in an embodiment,' and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
[0022] Furthermore, the described features, structures, or characteristics of the invention may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided, such as examples of programming, software modules, user selections, network transactions, database queries, database structures, hardware modules, hardware circuits, hardware chips, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art can recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, materials, and so forth. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
[0023] The schematic flow chart diagrams included herein are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of one embodiment of the presented method. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated method. Additionally, the format and symbols employed are provided to explain the logical steps of the method and are understood not to limit the scope of the method. Although various arrow types and line types may be employed in the flow chart diagrams, and they are understood not to limit the scope of the corresponding method. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the method. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted method. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
[0024] DEFINITIONS
[0025] Example definitions for some embodiments are now provided.
[0026] Application programming interface (API) can specify how software components of various systems interact with each other.
[0027] Bot can be a software agent that visits web pages or other content, via a content distribution network, such as, inter alia: a social bot, a web crawler, an Internet bot, etc. [0028] Graphics processing unit (GPU) can be a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles.
[0029] HTML5 can be a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and current version of the Hypertext Markup Language (HTML) standard.
[0030] iframe can allow a visual HTML browser window to be split into segments, each of which can show a different document.
[0031] RGBA stands for red green blue alpha.
[0032] Script tag (a <script> tag) can be used to define a client-side script (e.g. with JavaScript). A <script> element can contain scripting statements and/or point to an external script file through the SRC attribute (used to identify the location of a resource which relates to an element). Example uses can be image manipulation, form validation, and dynamic changes of content.
[0033] Web browser can be a software application for retrieving, presenting, and traversing information resources on the World Wide Web.
[0034] WebGPU is a web standard and JavaScript API for accelerated graphics and computing that can provide various 3D graphics and computation capabilities. WebGPU exposes an API for performing operations, such as rendering and computation, on a Graphics Processing Unit.
[0035] EXAMPLE SYSTEMS
[0036] Figure 1 illustrates an example system detecting a bot accessing a web page, according to some embodiments. System 100 can include various processes, such as
processes 300-1000. These processes can be implemented by systems 200 and 300 infra. In addition to bot detection with a web page, system 100 can detect bots accessing any web document/application running a web technology such as HTML5, running web documents, executing JavaScript code, etc. System 100 can paste a tag into a web document. The tag can be code. The code can analyze a machine accessing the web document and determine if it is a bot. System 100 can flag the machine and/or flag the machine. Other entities can utilize the flag to prevent further access to web documents. System 100 can look for a device marker that indicates that the machine has graphic capability (e.g. see infra). System 100 can use a web-based API to make a call to determine if the machine requesting access to the web document includes a graphic processing system. Based on this a value is returned. This value can be based on the type of graphics processing system and/or whether a graphics processing system is extant in the machine. If not, then system 100 can determine that the machine is not operated by a human user but a bot.
[0037] Figure 2 depicts an exemplary computing system 200 that can be configured to perform any one of the processes provided herein. In this context, computing system 200 may include, for example, a processor, memory, storage, and I/O devices (e.g., monitor, keyboard, disk drive, Internet connection, etc.). However, computing system 200 may include circuitry or other specialized hardware for carrying out some or all aspects of the processes. In some operational settings, computing system 200 may be configured as a system that includes one or more units, each of which is configured to carry out some aspects of the processes either in software, hardware, or some combination thereof.
[0038] Figure 2 depicts computing system 200 with a number of components that may be used to perform any of the processes described herein. The main system 202 includes a motherboard 204 having an I/O section 206, one or more central processing units (CPU) 208, and a memory section 210, which may have a flash memory card 212 related to it. The I/O section 206 can be connected to a display 214, a keyboard and/or other user input (not shown), a disk storage unit 216, and a media drive unit 218. The media drive unit 218 can read/write a computer-readable medium 220, which can contain programs
222 and/or data. Computing system 200 can include a web browser. Moreover, it is noted that computing system 200 can be configured to include additional systems in order to fulfill various functionalities. Computing system 200 can communicate with other computing devices based on various computer communication protocols such a Wi-Fi, Bluetooth® (and/or other standards for exchanging data over short distances includes those using short-wavelength radio transmissions), USB, Ethernet, cellular, an ultrasonic local area communication protocol, etc.
[0039] Figure 3 is a block diagram of a sample computing environment 300 that can be utilized to implement various embodiments. The system 300 further illustrates a system that includes one or more client(s) 302. The client(s) 302 can be hardware and/or software (e.g., threads, processes, computing devices). The system 300 also includes one or more server(s) 304. The server(s) 304 can also be hardware and/or software (e.g., threads, processes, computing devices). One possible communication between a client 302 and a server 304 may be in the form of a data packet adapted to be transmitted between two or more computer processes. The system 300 includes a communication framework 310 that can be employed to facilitate communications between the client(s) 302 and the server(s) 304. The client(s) 302 are connected to one or more client data store(s) 306 that can be employed to store information local to the client(s)
302. Similarly, the server(s) 304 are connected to one or more server data store(s) 308 that can be employed to store information local to the server(s) 304. In some embodiments, system 300 can instead be a collection of remote computing services constituting a cloud-computing platform.
[0040] EXAMPLE METHODS AND PROCESSES
[0041] Figure 4 illustrates, as an example of a computerized method useful for detecting a data-center bot interacting with a content source, process 400 for labelling a visit to a web page, according to some embodiments. In step 402, the code is inserted within the web page source. In step 404, the web page is visited by a machine. A machine that can run a web browser environment. In step 406, the web page is loaded with code from step 402 is loaded by the device. In step 408, the code creates a hidden canvas element and executes a function to obtain GPU information of the machine. In step 410, if the
function throws error/exception, the code can implement the following steps. It is noted that an HTML <canvas> element can be used to draw graphics, on the fly, via JavaScript. A hidden canvas element is used for the purpose of checking low level properties/capabilities. It is hidden from the user so as to not affect the user experience, or be detected by the user. The code can set a flag. The code can publish an event to other code/libraries to execute further actions. The code can be labeled as invalid bot traffic. In step 412, if the GPU information is missing, false, undefined, etc. then the code labels the visit as invalid bot traffic. In step 414, if the GPU information is present, the code labels the visit as not data-center bot traffic (e.g. web traffic originating from a data center programmed to masquerade as a human, etc.). The code can be a JavaScript code. The web page source can be an HTML5 web page document. The GPU information can include, inter alia: the GPU vendor, type, engine, etc.
[0042] Figure 5 illustrates an example process 500 for script tag generation via generation server which can be one of the servers 304, according to some embodiments. This further augments the GPU detection methodology by issuing a 'drawing challenge' to the device. The device receives values and must "draw a square" with a specific number of pixels. It is worth noting that only devices with GPUs can be able to do this in a sufficient and quick manner. In step 502, an API request received from the device is forwarded to the generation server. In step 504, the generation server, in response to receiving the request, generates drawing challenge code. For example, the generation server then generates random values for: R(ed), G(reen), B(lue), A(lpha), and (Width and Height). The Alpha value can be the alpha compositing value. A generation server can be a server environment that can generate specific snippets of 'drawing challenge' code". It is noted that process 500 is this method is optional and can be used in the case a GPU is reported.
[0043] Once it is determined that the machine seeking access to the web page or other content is a data-center bot, or some other type of bot, various countermeasures may be taken. For example, any one or more of the following counter actions may be taken: disabling the content on the machine (e.g. assuming the content has already been provided); inhibiting access by the machine to the API or content source; blacklisting a
network address of the machine, etc.
[0044] Further, the inventive methods of this disclosure have been discussed supra in the context of a web page, as an example. However, bots also access mobile applications and other content sources, particularly those that employ server-side execution or cloud execution. It should be appreciated that the aforementioned methodologies and processes can be adapted for applications other than web pages.
[0045] Figure 6 illustrates script generation for a client side, according to some embodiments. In step 602, the generation server creates colored boxes with values and retrieves raw pixel data. In step 604, the generation server calculates hash with pixels and associates RGBA and width/height values with the hash and stores. In step 608, the generation server outputs a script with RGBA and width values for client side. Process 600 can include the 'server side' part of the 'drawing challenge' (e.g. the association of the RGBA + width + height values with a hash to be checked, etc.).
[0046] Figure 7 illustrates a graphical/symbolic representation of the various steps of process 600, according to some embodiments
[0047] Figure 8 illustrates an example process 800, according to some embodiments. In step 802, a generated script is added to any HTML Page. This can be a publisher page or embedded (e.g. an iframe) advertisement creative HTML. In step 804, the code is executed when the web browser and/or application loads the HTML content. In step 806, the code has the relevant RGBA values and then generates a square with a width plus height value. Process 800 can include the 'client side' part of the 'drawing challenge'. The device, if it really does have a GPU, must draw the associated square, get all the pixels, and calculate a hash of the pixels.
[0048] Figure 9 illustrates an example process 900, according to some embodiments. In step 902, pixel values are derived from generated square and hashed. In step 904, Hash, RGBA and width values are sent to a generation server. In step 906, if there is a match, the request is flagged as "not data center bot traffic". If there is no match, the request is flagged as "data center bot traffic". Process 900 can be where the client and server come together. The calculated hash and the RGBA + width + height values on the client side are sent to the server and the server must determine if these values all match. If they do
match, the device does have a valid GPU. If they do not match, the device is deemed to be attempting to spoof a GPU and is invalid (e.g. labeled as data center bot). Figure 10 illustrates a graphical/symbolic representation of the various steps of process 900, according to some embodiments.
[0049] Figure 11 illustrates an example of a snippet of code 1100 that can be inserted in an API employing WebGL and/or OpenGL, according to some embodiments. The function can be used to obtain GPU information provided in the API (e.g., WebGL, OpenGL, etc.).
[0050] Figure 12 illustrates a computerized method useful for detecting a data-center bot interacting with a web page, according to some embodiments. In step 1202, process 1200 inserts a code within a web page source. In step 1204, process 1200 detects that the web page is visited by a machine. The machine is running a web browser to access the web page. In step 1206, process 1200 renders and loads the web page with the code in the web browser of the machine. In step 1208, with code, process 1200 utilizes an application programming interface (API) to perform an operation on a Graphics Processing Unit (GPU) of the machine. In step 1210, with the code, process 1200 executes the operation to obtain a GPU information of the machine. The API for the operation on the GPU is a WebGPU API. The operation can be a rendering operation on the GPU. Alternatively, the operation can be a computation operation on the GPU.
[0051] CONCLUSION
[0052] Although the present embodiments have been described with reference to specific example embodiments, various modifications and changes can be made to these embodiments without departing from the broader spirit and scope of the various embodiments. For example, the various devices, modules, etc. described herein can be enabled and operated using hardware circuitry, firmware, software or any combination of hardware, firmware, and software (e.g., embodied in a machine-readable medium). [0053] In addition, it can be appreciated that the various operations, processes, and methods disclosed herein can be embodied in a machine-readable medium and/or a machine accessible medium compatible with a data processing system (e.g., a computer system), and can be performed in any order (e.g., including using means for achieving the
various operations). Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. In some embodiments, the machine-readable medium can be a non-transitory form of machine-readable medium.
Claims
1. A computerized method useful for a detecting a data-center bot interacting with a content source, the method comprising:
(a) inserting a code within an API (application programming interface) or content from the content source;
(b) detecting that an API request, or a request for the content, has been received from a machine; and
(c) with the code and in response to the API request or request for the content, executing instructions in the code to request graphic processing unit (GPU) information of the machine, and detecting, upon return by the machine from the execution of the instructions in the code, that the machine is in a GPU not- present state, and labeling the machine as not a visually operated device.
2. The computerized method of claim 1, further comprising: determining that the API request or request for the content came from a bot, when the GPU information is missing upon return by the machine from the execution of the instructions in the code that requests the GPU information.
3. The computerized method of claim 1, further comprising: determining that the API request or request for the content came from a bot, when the GPU information returned by the machine from the execution of the instructions in the code that requests the GPU information is false.
4. The computerized method of claim 1, further comprising: determining that the API request or request for the content came from a bot, when the GPU information returned by the machine from the execution of the instructions does not include one or more pre-defined information that constitutes an acceptable answer to the request for the GPU information.
5. The computerized method of claim 1, further comprising: determining that the API request or request for the content came from a bot when an exception or error is returned by the machine from the execution of the instructions.
6. The computerized method of claim 1, wherein the instructions in the code for requesting GPU information of the machine corresponds to an OpenGL function provided by the API.
7. A computerized method useful for a detecting a data-center bot interacting with a content source, the method comprising:
(a) inserting a code within an API (application programming interface) or content from the content source;
(b) detecting that an API request or request for the content is received from a machine; and
(c) with the code, executing a function to request graphic processing unit (GPU) information of the machine, detecting, based on an output of the function, that the GPU information is missing or false, and labeling the machine as not a visually operated device.
8. The computerized method of claim 7, wherein the content into which the code is inserted in (a) comprises an HTML5 web page document, and the code inserted in (a) comprises an HTML <canvas> element used by the code to draw graphics via JavaScript, and wherein in (c) and with the code, a JavaScript code is executed to create a hidden canvas element, prior to requesting graphic processing unit (GPU) information of the machine.
9. The computerized method of claim 7, wherein the function executed in (c) to request graphic processing unit (GPU) information of the machine is an OpenGL function provided by the API.
10. The computerized method of claim 7, further comprising: determining that the API request or request for the content came from a bot when the GPU information is missing from the output of the function.
11. The computerized method of claim 7, further comprising: determining that the API request or request for the content came from a bot, when the GPU information returned by the machine is false.
12. The computerized method of claim 7, further comprising:
determining that the API request or request for the content came from a bot, when the GPU information returned by the machine does not include one or more predefined information that constitutes an acceptable answer to the request for the GPU information.
13. The computerized method of claim 7, further comprising: determining that the API request or request for the content came from a bot when an exception or error is returned by the function.
14. The computerized method of claim 7, further comprising: disabling the content on the machine, if it is determined in (c) based on the output of the function that the GPU information is missing or false.
15. The computerized method of claim 7, further comprising: inhibiting access by the machine to the API or content source, if it is determined in (c) based on the output of the function that the GPU information is missing or false.
16. The computerized method of claim 7, further comprising: blacklisting a network address of the machine, if it is determined in (c) based on the output of the function that the GPU information is missing or false.
17. A computerized method useful for detecting a data-center bot interacting with a web page comprising: inserting a code within a web page source; detecting that the web page is visited by a machine, wherein the machine is running a web browser to access the web page; rendering and loading the web page with the code in the web browser of the machine; with the code, utilizing an application programming interface (API) to perform an operation on a Graphics Processing Unit (GPU) of the machine; and with the code, executing the operation to obtain a GPU information of the machine.
18. The computer method of claim 17, wherein the API for the operation on the GPU comprises a WebGPU API.
19. The computer method of claim 17, wherein the operation comprises a rendering operation on the GPU.
14
20. The computer method of claim 17, wherein the operation comprises a computation operation on the GPU.
15
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/033,906 | 2020-09-27 | ||
US17/033,906 US20210144156A1 (en) | 2017-07-07 | 2020-09-27 | Method and system of detecting a data-center bot interacting with a web page or other source of content |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022067157A1 true WO2022067157A1 (en) | 2022-03-31 |
Family
ID=80846918
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/052144 WO2022067157A1 (en) | 2020-09-27 | 2021-09-27 | Method and system of detecting a data-center bot interacting with a web page or other source of content |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022067157A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100262457A1 (en) * | 2009-04-09 | 2010-10-14 | William Jeffrey House | Computer-Implemented Systems And Methods For Behavioral Identification Of Non-Human Web Sessions |
US20150112892A1 (en) * | 2012-10-18 | 2015-04-23 | Daniel Kaminsky | System and method for detecting classes of automated browser agents |
US20180322270A1 (en) * | 2017-05-05 | 2018-11-08 | Mastercard Technologies Canada ULC | Systems and methods for distinguishing among human users and software robots |
-
2021
- 2021-09-27 WO PCT/US2021/052144 patent/WO2022067157A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100262457A1 (en) * | 2009-04-09 | 2010-10-14 | William Jeffrey House | Computer-Implemented Systems And Methods For Behavioral Identification Of Non-Human Web Sessions |
US20150112892A1 (en) * | 2012-10-18 | 2015-04-23 | Daniel Kaminsky | System and method for detecting classes of automated browser agents |
US20180322270A1 (en) * | 2017-05-05 | 2018-11-08 | Mastercard Technologies Canada ULC | Systems and methods for distinguishing among human users and software robots |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10567529B2 (en) | Unified tracking data management | |
US10990655B1 (en) | Methods, systems, and computer program products for web browsing | |
US8689117B1 (en) | Webpages with conditional content | |
US9411782B2 (en) | Real time web development testing and reporting system | |
US10666763B2 (en) | Automatic integrity checking of content delivery network files | |
US9747441B2 (en) | Preventing phishing attacks | |
US8108770B2 (en) | Secure inter-module communication mechanism | |
CN105210051B (en) | Estimate the method and system of the visibility of content item | |
US20160335680A1 (en) | Securing expandable display advertisements in a display advertising environment | |
US20100299205A1 (en) | Protected serving of electronic content | |
WO2018205918A1 (en) | Webpage monitoring method and apparatus, and storage medium | |
US20110126113A1 (en) | Displaying content on multiple web pages | |
US9865008B2 (en) | Determining a configuration of a content item display environment | |
US9100434B2 (en) | Web page falsification detection apparatus and storage medium | |
JP2013084259A (en) | Gradual visual comparison of web browser screen | |
US20180012250A1 (en) | Method and system for rendering and optimizing internet content | |
CN104881452B (en) | Resource address sniffing method, device and system | |
CN103581321B (en) | A kind of creation method of refer chains, device and safety detection method and client | |
US10411976B2 (en) | Method and system of detecting a data-center bot interacting with a web page | |
US20230080601A1 (en) | Webpage integrity monitoring | |
US20210144156A1 (en) | Method and system of detecting a data-center bot interacting with a web page or other source of content | |
JP2013168156A (en) | Determination of standby time of content server | |
WO2022067157A1 (en) | Method and system of detecting a data-center bot interacting with a web page or other source of content | |
CN104407979A (en) | Script detection method and device | |
US20200162488A1 (en) | Method and system of detecting a data-center bot interacting with a video or audio stream |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 04/09/2023) |