CN113609411A - Method for crawling page information through web crawler - Google Patents

Method for crawling page information through web crawler Download PDF

Info

Publication number
CN113609411A
CN113609411A CN202110710973.4A CN202110710973A CN113609411A CN 113609411 A CN113609411 A CN 113609411A CN 202110710973 A CN202110710973 A CN 202110710973A CN 113609411 A CN113609411 A CN 113609411A
Authority
CN
China
Prior art keywords
page
webpage
information
crawling
url
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110710973.4A
Other languages
Chinese (zh)
Other versions
CN113609411B (en
Inventor
王嵩森
杜邦豪
刘加勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huayuan Information Technology Co Ltd
Original Assignee
Beijing Huayuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huayuan Information Technology Co Ltd filed Critical Beijing Huayuan Information Technology Co Ltd
Priority to CN202110710973.4A priority Critical patent/CN113609411B/en
Publication of CN113609411A publication Critical patent/CN113609411A/en
Application granted granted Critical
Publication of CN113609411B publication Critical patent/CN113609411B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Embodiments of the present disclosure provide a method, apparatus, device, and computer-readable storage medium for crawling page information by a web crawler. The method comprises the steps of obtaining corresponding webpage information according to a webpage loading request; analyzing the webpage information and determining url information contained in the webpage; and crawling page information corresponding to url contained in the webpage. In this way, the method is suitable for information crawling of all the current websites, and a reverse crawling mechanism is effectively avoided.

Description

Method for crawling page information through web crawler
Technical Field
Embodiments of the present disclosure relate generally to the field of data processing, and more particularly, to a method, apparatus, device, and computer-readable storage medium for crawling page information by a web crawler.
Background
There are many kinds of crawlers based on python language, and the most widely used at present are the following:
requests + Beautiful Soup (synchronization);
using current.
Using aiohttp + asyncio + requests + beautiful soup (asynchrony);
using the framework Scapy;
the above methods can effectively acquire the relevant information of the web page in the past decades, but with the rapid development of the network in recent years, the format of the web page changes, wherein various network frameworks (such as VUE) are raised and continuously perfected, and the identification of anti-crawlers and anti-machine crawlers is gradually strengthened, so that the difficulty of quickly and accurately acquiring the relevant information is increased. The method has serious defects in the aspects of information acquisition, deep crawling, webpage evading and reverse crawling and the like, and cannot adapt to the realization and development of subsequent crawlers.
Disclosure of Invention
According to an embodiment of the present disclosure, a scheme for crawling page information by a web crawler is provided.
In a first aspect of the disclosure, a method of crawling page information by a web crawler is provided.
The method comprises the following steps:
acquiring corresponding webpage information according to the webpage loading request;
analyzing the webpage information and determining url information contained in the webpage;
and crawling page information corresponding to url contained in the webpage.
Further, the acquiring the corresponding webpage information according to the webpage loading request includes:
capturing a webpage loading request through Hook, and acquiring a url corresponding to the webpage loading request;
and acquiring webpage information through the url.
Further, the analyzing the web page information and determining url information included in the web page includes:
if the webpage comprises the click events, gradually triggering all the click events, refreshing the webpage, grabbing a webpage loading request through Hook, and acquiring url information corresponding to the click events.
Further, the crawling of the page information corresponding to the url contained in the webpage includes:
crawling is performed through Ajax rendering, and page information is crawled;
the page information comprises a request, a tag and/or a deep request, a tag of the page.
Further, still include:
if the click event is triggered, monitoring the time for crawling page information;
if the crawling time exceeds a preset threshold, judging that the current page is a special page;
analyzing the special page to determine the type of the special page; the types of the special pages comprise verifiable pages and non-verifiable pages;
if the special page is a verifiable page, simulating manual input to complete verification and/or login, and crawling the page information; the verifiable page comprises account password verification, verification code verification and/or character verification;
and if the special page is a non-verifiable page, performing bug processing.
Further, still include:
and if the url contained in the current page triggers a plurality of pages to jump, crawling page information of the plurality of pages according to the preset concurrency number and the crawling depth concurrency crawler.
In a second aspect of the disclosure, an apparatus for crawling page information by a web crawler is provided.
The device includes:
the acquisition module is used for acquiring corresponding webpage information according to the webpage loading request;
the analysis module is used for analyzing the webpage information and determining url information contained in the webpage;
and the crawling module is used for crawling the page information corresponding to the url contained in the webpage.
In a third aspect of the disclosure, an electronic device is provided. The electronic device includes: a memory having a computer program stored thereon and a processor implementing the method as described above when executing the program.
In a fourth aspect of the present disclosure, a computer readable storage medium is provided, having stored thereon a computer program, which when executed by a processor, implements a method as in accordance with the first aspect of the present disclosure.
According to the method for crawling the page information by the web crawler, the corresponding page information is acquired according to the page loading request; analyzing the webpage information and determining url contained in the webpage; and crawling page information corresponding to url contained in the webpage. The anti-crawling mechanism can be effectively avoided, the required information can be quickly and accurately crawled, information crawling of any website is realized, and the requirement of a multi-webpage frame is met. Meanwhile, in the process of continuously crawling information, the request frequency can be continuously analyzed and adjusted according to the processing of the url, and the risks of blockage of the IP and the like caused by stagnation, breakdown and monitoring from a website are avoided.
It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, like or similar reference characters designate like or similar elements, and wherein:
FIG. 1 illustrates a schematic diagram of an exemplary operating environment in which embodiments of the present disclosure can be implemented;
FIG. 2 illustrates a flow diagram of a method of crawling page information by a web crawler according to an embodiment of the present disclosure;
FIG. 3 illustrates a block diagram of an apparatus for crawling page information by a web crawler according to an embodiment of the present disclosure;
FIG. 4 illustrates a block diagram of an exemplary electronic device capable of implementing embodiments of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
FIG. 1 illustrates a schematic diagram of an exemplary operating environment 100 in which embodiments of the present disclosure can be implemented. Included in the runtime environment 100 are a client 101, a network 102, and a server 103.
It should be understood that the number of user clients, networks, and servers in FIG. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In particular, in the case where the target data does not need to be acquired from a remote place, the above system architecture may not include a network but only a terminal device or a server.
FIG. 2 shows a flowchart 200 of a method for crawling page information by a web crawler according to an embodiment of the present disclosure. As shown in fig. 2, the method for crawling page information by web crawlers includes:
and S210, acquiring corresponding webpage information according to the webpage loading request.
In this embodiment, an execution subject (for example, a server shown in fig. 1) of the method for crawling page information by a web crawler may acquire the page information in a wired manner or a wireless connection manner.
Optionally, the crawler cannot accurately and effectively acquire comprehensive data when rendering a web page in a new web site framework (VUE), and therefore in the present disclosure, a headless browser is used to crawl the required page information.
Optionally, in the present disclosure, a plurality of new URLs may be formed based on the initial URL in the crawler crawling task, that is, the present disclosure may decompose the initial URL into a plurality of URLs, and may form one crawler crawling task object for each new URL.
Optionally, the maximum number of new URLs decomposed from the initial URL may be limited, that is, the maximum number of crawler crawling task objects corresponding to one crawler crawling task may be limited (concurrent limitation). For example, when it is preset that one crawler crawling task can form 50 crawler crawling task objects at a time, the initial URL in the crawler crawling task is decomposed to separate 50 new URLs of URL-0, URL-1, … …, and URL-49, and one crawler crawling task object is formed for each of URL _0, URL _1, … …, and URL _49 to form 50 crawler crawling task objects, that is, crawler crawling task object 0, crawler crawling task object 1, … …, and crawler crawling task object 49.
Alternatively, the new URL formed based on the initial URL may be a URL of a primary page or a URL of a secondary page. Namely, the current page can be crawled, and the sub-links of the current page can also be deeply crawled.
Optionally, the obtaining of the web page information through the URL includes: when the webpage information is acquired in a target URL input mode; and grabbing a webpage loading request through Hook, acquiring a URL (uniform resource locator) corresponding to the webpage, and acquiring webpage information through the URL.
Optionally, when the webpage information is obtained through the URL, the URL is preprocessed through big data analysis, for example, the URL or the URL queue (URLs) is processed by deduplication, wrong URL filtering, website deduplication filtering, and/or abnormal URL filtering.
S220, analyzing the webpage information and determining url information contained in the webpage.
Optionally, the acquired web page information is analyzed, an event type included in the current web page is determined, and url information is acquired according to the event type.
Specifically, if the webpage comprises click events, simulating manual click to gradually trigger all click events, clicking all clickable events on an interface as much as possible, refreshing the webpage, grabbing a webpage loading request through Hook, and acquiring the url corresponding to the click events.
And S230, crawling page information corresponding to url contained in the webpage.
Optionally, crawling page information by Ajax rendering according to a preset concurrency number (50), a crawling depth and the acquired URL label; the page information includes a request, a tag of the page and/or a deep request, a tag of the page (primary, secondary page).
Optionally, the crawling task is monitored by a monitoring program such as a task manager, and the concurrence number can be modified and controlled in real time according to monitoring information returned (displayed) by the monitoring program such as the task manager, so as to prevent the network paralysis caused by excessive requests.
Optionally, the data such as url of the current request is analyzed through methods such as big data analysis, the anti-crawling reaction of the current crawling task and the special processing condition (advertisement and the like) of the current task are confirmed, the concurrent acquisition speed is updated and adjusted in real time, and the condition that the service ip is forbidden due to overhigh request is prevented.
Optionally, the time for crawling the page related information is monitored while the click event is triggered. If the crawling time exceeds a preset threshold (for example, 5 minutes), judging that the current page is a special page, analyzing the special page, and determining the type of the special page; the types of the special pages comprise verifiable pages and non-verifiable pages;
if the special page is a verifiable page, simulating manual input to complete verification and/or login, and crawling the page information; the verifiable page comprises account password verification, verification code verification and/or character verification;
for example, login is performed through a user name and a password pre-stored in the system;
logging in through modes such as account password blasting and the like;
characters (images) are identified through an image identification technology to carry out character (picture) verification and the like.
And if the special page is a non-verifiable page, performing bug processing.
Optionally, the process of crawling and/or bug processing is analyzed by a big data analysis method, and if the analysis result is excellent, that is, the preset target information is crawled, the process of crawling and the corresponding frame of crawling are stored. The requirement of adapting to a multi-webpage frame is gradually met.
According to the embodiment of the disclosure, the method for acquiring the effective information by automatically identifying the certificate code and simulating login of the headless browser is adopted, the detection of a reverse-stealing prevention mechanism can be maximally realized, the corresponding effective information can be timely, effectively and accurately acquired, and compared with the traditional centralized crawler mode, the headless browser has the following huge advantages:
1. the code is convenient and simple, is completely compatible and concurrent, and has high information acquisition speed.
2. The operation is completed manually in a simulated mode, the anti-crawlers can be effectively avoided, and most manual verification and detection parts are completed.
3. Machine learning, data are accumulated in continuous crawlers, the crawler records of each frame are effectively counted, and the optimal crawler strategy is convenient to rapidly identify.
4. Effectively identifying information such as a verification code, a login code, a scanning code and the like, and quickly inputting and processing to finish blasting.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.
FIG. 3 illustrates a block diagram of an apparatus 300 for crawling page information by a web crawler according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus 300 includes:
an obtaining module 310, configured to obtain corresponding web page information according to the web page loading request;
the analysis module 320 is configured to analyze the webpage information and determine url information included in the webpage;
and the crawling module 330 is configured to crawl page information corresponding to urls included in the web page.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the described module may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
FIG. 4 shows a schematic block diagram of an electronic device 400 that may be used to implement embodiments of the present disclosure. As shown, device 400 includes a Central Processing Unit (CPU)401 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)402 or loaded from a storage unit 508 into a Random Access Memory (RAM) 403. In the RAM503, various programs and data required for the operation of the device 400 can also be stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
A number of components in device 400 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processing unit 401 executes the various methods and processes described above. For example, in some embodiments, the method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by CPU 401, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the CPU 401 may be configured to perform the method by any other suitable means (e.g., by way of firmware).
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims, and the scope of the invention is not limited thereto, as modifications and substitutions may be readily made by those skilled in the art without departing from the spirit and scope of the invention as disclosed herein.

Claims (9)

1. A method for crawling page information by a web crawler, comprising:
acquiring corresponding webpage information according to the webpage loading request;
analyzing the webpage information and determining url information contained in the webpage;
and crawling page information corresponding to url contained in the webpage.
2. The method of claim 1, wherein the obtaining the corresponding web page information according to the web page loading request comprises:
capturing a webpage loading request through Hook, and acquiring a url corresponding to the webpage loading request;
and acquiring webpage information through the url.
3. The method of claim 2, wherein analyzing the web page information and determining url information contained in the web page comprises:
if the webpage comprises the click events, gradually triggering all the click events, refreshing the webpage, grabbing a webpage loading request through Hook, and acquiring url information corresponding to the click events.
4. The method of claim 3, wherein crawling page information corresponding to urls contained in the web page comprises:
crawling is performed through Ajax rendering, and page information is crawled;
the page information comprises a request, a tag and/or a deep request, a tag of the page.
5. The method of claim 4, further comprising:
if the click event is triggered, monitoring the time for crawling page information;
if the crawling time exceeds a preset threshold, judging that the current page is a special page;
analyzing the special page to determine the type of the special page; the types of the special pages comprise verifiable pages and non-verifiable pages;
if the special page is a verifiable page, simulating manual input to complete verification and/or login, and crawling the page information; the verifiable page comprises account password verification, verification code verification and/or character verification;
and if the special page is a non-verifiable page, performing bug processing.
6. The method of claim 1, further comprising:
and if the url contained in the current page triggers a plurality of pages to jump, crawling page information of the plurality of pages according to the preset concurrency number and the crawling depth concurrency crawler.
7. An apparatus for crawling page information by a web crawler, comprising:
the acquisition module is used for acquiring corresponding webpage information according to the webpage loading request;
the analysis module is used for analyzing the webpage information and determining url information contained in the webpage;
and the crawling module is used for crawling the page information corresponding to the url contained in the webpage.
8. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, wherein the processor, when executing the program, implements the method of any of claims 1-6.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1 to 6.
CN202110710973.4A 2021-06-25 2021-06-25 Method for crawling page information through web crawlers Active CN113609411B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110710973.4A CN113609411B (en) 2021-06-25 2021-06-25 Method for crawling page information through web crawlers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110710973.4A CN113609411B (en) 2021-06-25 2021-06-25 Method for crawling page information through web crawlers

Publications (2)

Publication Number Publication Date
CN113609411A true CN113609411A (en) 2021-11-05
CN113609411B CN113609411B (en) 2024-06-14

Family

ID=78336799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110710973.4A Active CN113609411B (en) 2021-06-25 2021-06-25 Method for crawling page information through web crawlers

Country Status (1)

Country Link
CN (1) CN113609411B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268361A (en) * 2013-06-07 2013-08-28 百度在线网络技术(北京)有限公司 Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
CN106649567A (en) * 2016-11-15 2017-05-10 杭州安恒信息技术有限公司 Web crawler system based on browser kernel
CN106844522A (en) * 2016-12-29 2017-06-13 北京市天元网络技术股份有限公司 A kind of network data crawling method and device
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268361A (en) * 2013-06-07 2013-08-28 百度在线网络技术(北京)有限公司 Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
CN106649567A (en) * 2016-11-15 2017-05-10 杭州安恒信息技术有限公司 Web crawler system based on browser kernel
CN106844522A (en) * 2016-12-29 2017-06-13 北京市天元网络技术股份有限公司 A kind of network data crawling method and device
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium

Also Published As

Publication number Publication date
CN113609411B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CA3018196C (en) Visual regresssion testing tool
CN113098870B (en) Phishing detection method and device, electronic equipment and storage medium
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
US10516685B2 (en) Analysis method, analysis device and analysis program
CN105204825B (en) Method and device for monitoring terminal system safety
CN111435393B (en) Object vulnerability detection method, device, medium and electronic equipment
CN109657431B (en) Method for identifying user identity
CN110909229A (en) Webpage data acquisition and storage system based on simulated browser access
CN111585785B (en) Method and device for shielding alarm information, computer equipment and storage medium
CN113392303A (en) Background blasting method, device, equipment and computer readable storage medium
CN113688022A (en) Browser performance monitoring method, device, equipment and medium
CA2788100C (en) Crawling of generated server-side content
CN112507271A (en) Webpage evidence obtaining method, device and equipment
CN113609411B (en) Method for crawling page information through web crawlers
CN116501945A (en) Multithreaded browser driven crawler method, system and readable storage medium
DE202023100388U1 (en) A system for the automatic risk analysis of software
CN115062304A (en) Risk identification method and device, electronic equipment and readable storage medium
CN110032503A (en) Data processing system, method, equipment and device based on UI automation and OCR
CN113395234B (en) Method and device for detecting flow hijacking of popularization information
CN113609412A (en) Method for acquiring URL (Uniform resource locator) through Hook key function and event
CN112083974A (en) Advertisement window closing method and device and electronic equipment
CN114944994B (en) Page loading time acquisition method and device, computer equipment and storage medium
US11977435B2 (en) Access method, communication system, and non-transitory computer readable memory
CN113836411B (en) Data processing method and device and computer equipment
CN114172749B (en) Test paper downloading method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant