CN111159517A - Information processing method, device, system and computer storage medium - Google Patents

Information processing method, device, system and computer storage medium Download PDF

Info

Publication number
CN111159517A
CN111159517A CN201911274626.0A CN201911274626A CN111159517A CN 111159517 A CN111159517 A CN 111159517A CN 201911274626 A CN201911274626 A CN 201911274626A CN 111159517 A CN111159517 A CN 111159517A
Authority
CN
China
Prior art keywords
data information
information
webpage
website
address information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911274626.0A
Other languages
Chinese (zh)
Inventor
唐荦彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201911274626.0A priority Critical patent/CN111159517A/en
Publication of CN111159517A publication Critical patent/CN111159517A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Abstract

The embodiment of the invention discloses an information processing method, which is characterized by comprising the following steps: acquiring address information of a website; presenting first data information based on the address information of the website to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the address information of the website; the first page data information comprises the static data information and the dynamic data information; acquiring the dynamic data information in the first page data information; and rendering the dynamic data information based on the first webpage to obtain a second webpage. The invention also provides an information processing apparatus, a system and a computer-readable storage medium. The method can solve the problems that the crawling page hierarchy depth is not enough and the number of the crawling pages is not enough easily caused when the web page in the website is crawled in the relative technology.

Description

Information processing method, device, system and computer storage medium
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to an information processing method, device, system, and computer-readable storage medium.
Background
The security of each web page data in the website is very important for the stability of the website and the security of users accessing the website, so that the monitoring of the security of each web page data in the website is very important. In the related art, monitoring of webpage data in a website tends to be performed in a cloud in combination with a crawler technology. However, in the monitoring operation performed on the website by the cloud, since the timeliness of rendering the page after the cloud page is crawled is insufficient, and all data in the first page and other pages of the website cannot be rendered, all address data in the first page and other pages of the website cannot be crawled, so that when the webpage in the website is crawled, the level depth of the crawled page is insufficient, and the number of the crawled pages is insufficient.
Disclosure of Invention
In view of this, the present invention provides an information processing method, device, system and computer readable storage medium, which can solve the problems of insufficient hierarchical depth of a crawled page and insufficient data amount of the crawled page generated when crawling a web page in a website in the related art.
The information processing method provided by the invention is realized as follows:
an information processing method, the method comprising:
acquiring address information of a website;
presenting first data information based on the address information of the website to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the address information of the website; the first page data information comprises the static data information and the dynamic data information;
acquiring the dynamic data information in the first page data information;
and rendering the dynamic data information based on the first webpage to obtain a second webpage.
Optionally, the rendering the dynamic data information based on the first webpage to obtain a second webpage includes:
acquiring each dynamic object in the dynamic data information;
and rendering each dynamic object based on the first webpage to obtain the second webpage.
Optionally, the method further includes:
acquiring webpage data information of the second webpage;
acquiring address information of the second webpage based on the webpage data information;
storing the address information into a message queue;
the acquiring address information of the website includes:
and acquiring the address information of the website from the message queue.
Optionally, the method further includes:
acquiring webpage data information of the second webpage;
determining second page data information based on the webpage data information; and the second page data information comprises static data information and dynamic data information of the second webpage.
Optionally, the method further includes:
and executing page information detection on the second page data information, and determining the security level of the second page data information.
Optionally, the presenting first data information based on the address information of the website to obtain a first webpage includes:
loading the address information of the website based on the matching relation between the address information of the website and the historical address information, and presenting first data information to obtain a first webpage; the historical address information is used for representing the loaded address information.
Optionally, the loading the address information of the website based on the matching relationship between the address information of the website and the historical address information, and presenting the first data information to obtain the first webpage includes:
and if the address information of the website is not matched with the historical address information, presenting first data information based on the address information of the website to obtain a first webpage.
An information processing apparatus, the information processing apparatus comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute the stored information retrieval program in the memory to implement the steps of:
acquiring address information of a website;
presenting first data information based on the first address information to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the first address information; the first page data information comprises the static data information and the dynamic data information;
acquiring the dynamic data information in the first page data information;
and rendering the dynamic data information based on the first webpage to obtain a second webpage.
An information processing system, the information processing system comprising: the device comprises an acquisition module and a processing module; wherein the content of the first and second substances,
the acquisition module is used for acquiring address information of the website;
the processing module is used for presenting first data information based on the first address information to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the first address information; the first page data information comprises the static data information and the dynamic data information;
the acquiring module is further configured to acquire the dynamic data information in the first page data information;
the processing module is further configured to render the dynamic data information based on the first webpage to obtain a second webpage.
A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of any of the information processing methods as described above.
The information processing method provided by the embodiment of the invention comprises the steps of firstly obtaining address information of a website, then presenting first data information based on the address information of the website to obtain a first webpage, obtaining dynamic data information in the first webpage data information, and finally rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing method provided by the embodiment of the invention can acquire any address information in a website and present the static data information and the dynamic data information corresponding to the address information, thereby realizing crawling of any address information in the website, comprehensively presenting information in a crawled webpage, and further ensuring normal crawling of any next-level page in the crawled page, thereby solving the problems of insufficient crawled page level depth and insufficient crawled page data amount generated when the webpage in the website is crawled in the related technology.
Drawings
Fig. 1 is a schematic flowchart of a first information processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a second information processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a third information processing method according to an embodiment of the present invention;
fig. 4 is an architecture diagram of an information processing method according to an embodiment of the present invention;
fig. 5 is a block diagram of an information processing apparatus according to an embodiment of the present invention;
fig. 6 is a block diagram of an information processing system according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an information processing method, relates to the technical field of information processing, and particularly relates to an information processing method, an information processing system, information processing equipment and a computer readable storage medium.
In the related art, if the page data of any page in a certain website is unsafe, the data security of a user accessing the any page is possibly threatened, and meanwhile, if the security level of any page in the website is insufficient, the security of related data in the website is threatened, the page data in the website cannot be presented, and even the website jumps to a malicious website. Therefore, monitoring the security of the webpage data in the website is extremely important.
At present, the security threat to the webpage data in the website is mainly embodied by a webpage tampering mode. The webpage tampering refers to that a hacker counterfeits the content of a certain website, or embeds illegal content, malicious content or other black links in the website. The black chain is a quite common means in Search Engine Optimization (SEO) methods, and in a general way, the black chain refers to reverse links of other websites obtained by some abnormal means, and the most common black chain is a webshell of a website with higher Search Engine weight or webpage level (PR) through various website program vulnerabilities, so that the website linked to the black website is consistent with the clear chain in nature, and the black chain belongs to a cheating method used for efficiently improving ranking.
Therefore, once a web page tampering phenomenon occurs on a website, a user may be switched to another website or even a malicious website during a process of accessing the web page of the website.
At present, in order to prevent a page in a website from being tampered with, a third-party application is generally purchased and installed on a server side, and then a crawler technology is combined to implement the third-party application. Because the operating systems used at the server end mainly include two major types, i.e., Windows, Linux, and Unix, in the purchased third-party applications, page tamper-resistant operations are usually deployed based on the two major types of operating systems, and actually, the operating principles executed on the two different types of operating systems are communicated, i.e., the modified monitoring of the web page data in the website is realized by calling an Application Programming Interface (API) of the operating system through the bottom-layer driver of the operating system.
A crawler, also known as a web spider, is a program or script that automatically crawls web information according to certain rules. Other less commonly used names are ants, automatic indexing, simulation programs, or worms.
In addition, because the core data in the website is generally stored in the database, the security of the core data of the website can be protected by enhancing the access and modification operations of the database.
However, the above ways of installing a third-party application on the server side or protecting the database all require the owner enterprise or individual of the website to purchase third-party software and the service technology corresponding to the software, and protect the website on the server side, so the above ways all require the administrator of the website to log in the server side to perform relevant network configuration, and these configuration operations may increase the load and the limitation of the whole server system; in addition, the above methods require a network manager with expert technical knowledge to perform real-time monitoring, and, if necessary, to perform related debugging. Therefore, the above two ways of tampering the web page in the website and protecting the key data in the website are not flexible enough.
With the rapid development of cloud computing, cloud data storage is widely applied, and therefore, more and more website owners tend to use a crawler technology in a cloud to monitor website tampering prevention, so as to reduce the above-mentioned additional configuration of servers and the personnel cost for server operation and maintenance, which are generated in monitoring website page tampering and protecting core data at a server side.
However, the monitoring scheme for performing tamper-proofing on the website in the cloud has the following disadvantages:
first, the timeliness of page rendering in a website is insufficient. Currently, in crawling a website page in a cloud, a download of a browser is usually simulated by using a selenium (browser automation test framework), or the download is directly performed through Requests or other request libraries. However, in the former, since the serial operation mechanism is executed by the selenium itself, when a large number of pages are subjected to downloading operation, downloading blocking is likely to occur, so that efficiency is low, and on the other hand, Requests cannot realize rendering and presentation of dynamic page data, i.e., JavaScript objects, and cannot detect dynamic page contents and dynamic pages, so that address information related to the dynamic page data, even the entire dynamic page information, is lost in the page crawling process. Requests are a module commonly used for HyperText transfer protocol (HTTP) Requests, and are used for initiating a request to a background server of a specified target website and receiving response content returned by the server.
Secondly, the level of monitoring the pages in the website is not enough, most websites can show the main services in the first page or the second level webpage, therefore, the websites can add a great deal of dynamic data in the first page or the second level webpage to realize the comprehensive display of website data, functions and services, the descriptions of rendering and presentation of the dynamic pages cannot be realized based on the Requests, the website tamper-proof monitoring is executed at the cloud, the monitoring of the dynamic data in the first page or the second level webpage of the website cannot be realized, and the monitoring of the tampering of the deeper level webpage can be lost.
Based on this, an embodiment of the present invention provides an information processing method, which may be implemented by a processor in an information processing apparatus, as shown in fig. 1, and which may be implemented by the following steps:
step 101, address information of a website is obtained.
In step 101, the address information of the website may be address information of a top page of the website.
In one embodiment, the address information of the website may be any address information carried in a top page of the website.
In one embodiment, the address information of the website may also be any address information carried in any page of the website.
In one embodiment, the address information of the website may be obtained by using a crawler technology.
In one embodiment, the address information of the website may be opened by using other software for automatically acquiring the address of the webpage, and acquired by using a crawler technology.
And 102, presenting first data information based on the address information of the website to obtain a first webpage.
The first data information is used for representing static data information in the first page data information corresponding to the address information of the website; and the first page data information comprises static data information and dynamic data information.
In step 102, the static data information may be data information that is accessed based on a request of a user and transmits server data to the first webpage without performing script calculation, no matter how the static data information is accessed in the first webpage. Thus, it loads speed blocks and can be executed across platforms, loading across servers.
In step 102, the static data information may be data information in the first webpage data that is directly loaded to the client browser for display without being compiled by the server.
In one embodiment, the static data information may be data information of a static data object in the first webpage. In the idea of object-oriented programming, any kind of data or any kind of data that needs to be operated can be packaged and presented in the form of an object, and thus, the static data information may also be data information corresponding to the static data object.
In one embodiment, the static data information may be data in which data in the first webpage remains unchanged, for example, text data information or animation data information that only prompts a user to log in the welcome interface.
In one embodiment, the static data information may be data information that is directly loaded on a specified template. Such as a list of messages displayed in the first web page.
In one embodiment, the static data information may be loaded through a static web page with suffixes html and htm.
In step 102, the dynamic data information may be data information presented through a dynamic web page. The dynamic web page is a fusion of a basic Hypertext markup Language (html) grammar specification and high-level programming languages such as Java and the like, database programming and other technologies, so as to realize efficient, dynamic and interactive management of website contents and styles. Thus, in this sense, web pages generated by web page programming techniques that incorporate high-level programming languages other than HTML and database techniques are dynamic web pages.
In one embodiment, the dynamic data information may be data information of a dynamic data object in the first webpage, and is typically data information of a JavaScript object.
In one embodiment, the dynamic data information may be data displayed in a program or a web page running in the server.
In one embodiment, the dynamic data information may be data information corresponding to the static data information and transferred by interacting with a background database of the server.
In one embodiment, the dynamic data information may be data information loaded from a dynamic web page with suffixes of aspx, aspp, jsp, php, perl, cgi.
The static web page usually uses a designed web page, so that the database of the server is not required to be accessed, the consumption of the system is reduced, the pressure on the server is low, the attack of bad codes can be reduced, and the safety of the website can be improved. The dynamic web page is generally based on a database, and more functions of a website, such as user registration, login, information management and the like, can be realized by using the dynamic web page, so that the updating speed of data stored in a background database corresponding to the dynamic web page is high, but the pressure of accessing the server data is increased along with the continuous increase of the amount of data stored in the database, and the page loading speed is reduced.
Further, the static data information loaded in the static web page may be loaded with scrolling subtitles, animation, and the like. That is, the dynamic data information loaded by the dynamic web page and the static data information loaded by the static web page cannot be mixed with the static effect and the dynamic effect in the visual effect.
In one embodiment, the first page data information is used to represent all data information presented by the page of the first webpage when displayed, including dynamic data information and static data information.
In one embodiment, the first page data information is used to represent all data information presented when the page of the first webpage is displayed, including text information, picture information, animation information, and the like.
In one embodiment, the first page data information is used to represent all data information and attribute information thereof presented when the page of the first webpage is displayed, such as: the text information and attribute information thereof, such as the height and width of the text display area and the position information of the text display area in the whole page; attribute information such as picture information, and pixel and gray information of a picture, whether to switch scrolling display with other pictures, and the like; such as the size of the animation information and its animation resources, the play start time, the duration, and other attribute information.
In step 102, the first data information is presented based on the address information of the website to obtain a first webpage, which may be the first webpage obtained by sending the address information of the website to a server where the website is located, receiving the first webpage data information that is the data information returned by the server, and further presenting the static data information in the first webpage data information.
In one embodiment, step 102 may be implemented by: and receiving first page data information returned by the website server, classifying the data information in the first page data information to obtain static data information, and then loading the static data information in advance to obtain a first webpage.
In one embodiment, step 102 may be implemented by: the method comprises the steps of receiving first page data information returned by a website server, setting the presenting mode of the first page data information to be a mode of only loading static data information, and recording the static data information in the first page data information based on the mode of only loading the static data information, so that a first webpage is obtained.
In one embodiment, step 102 may be implemented by: and receiving first page data information returned by the website server, trying to present all data in the first page data information based on the presentation form of the first page, and obtaining the first page after presentation.
And 103, acquiring dynamic data information in the first page data information.
In step 103, the dynamic data information in the first page data information is obtained, which may be implemented by removing the static data information loaded by the first web page from the first page data information.
In one embodiment, the static data information category and the dynamic data information category are firstly divided, and then the first page data information is divided according to the categories to respectively obtain the static data information and the dynamic data information. And then obtaining the dynamic data information based on the dynamic data information category.
And step 104, rendering the dynamic data information based on the first webpage to obtain a second webpage.
In step 104, the dynamic data information is rendered based on the first webpage to obtain a second webpage, which may be obtained by loading and rendering the dynamic data information based on the first webpage by using a dynamic data loading technique.
The information processing method provided by the embodiment of the invention comprises the steps of firstly obtaining address information of a website, then presenting first data information based on the address information of the website to obtain a first webpage, obtaining dynamic data information in the first webpage data information, and finally rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing method provided by the embodiment of the invention can acquire any address information in a website and present the static data information and the dynamic data information corresponding to the address information, thereby realizing crawling of any address information in the website, comprehensively presenting information in a crawled webpage, and further ensuring normal crawling of any next-level page in the crawled page, thereby solving the problems of insufficient crawled page level depth and insufficient crawled page data amount generated when the webpage in the website is crawled in the related technology.
Based on the foregoing embodiments, an embodiment of the present invention provides an information processing method, as shown in fig. 2, which may be implemented by the following steps:
step 201, address information of the website is obtained.
Step 202, presenting the first data information based on the address information of the website to obtain a first webpage.
The first data information is used for representing static data information in the first page data information corresponding to the address information of the website; and the first page data information comprises static data information and dynamic data information.
In one embodiment, step 202 may be implemented by:
the method comprises the steps of obtaining address information of a website, namely a Uniform Resource Locator (URL) of the website, executing certain conversion on the URL, requesting the converted address information, and displaying first data information returned by a server of the website to obtain a first webpage.
In an embodiment, step 202 may also be to obtain the first webpage by converting the URL into Requests, sending the Requests to a server of the website, and then presenting the first data information returned by the server of the website.
Although Requests cannot render dynamic objects, the first data information returned by the web server in response to the Requests includes dynamic data information of all dynamic objects in the first web page.
And step 203, acquiring dynamic data information in the first page data information.
And step 204, acquiring each dynamic object in the dynamic data information.
In step 204, the dynamic object may dynamically encapsulate the resulting object based on the data returned from the database of the web server. For example, after the user inputs the user name and the password, the server of the website matches the user name and the password with the user name and the password stored in the database, returns the authentication result obtained after the user name and the password are authenticated, and packages the authentication result into an object of animation prompt, text prompt or webpage jump prompt.
In step 204, each dynamic object in the dynamic data information may be a unique dynamic object included in the dynamic data information.
In one embodiment, each dynamic object in the dynamic data information may be a plurality of dynamic objects included in the dynamic data information.
Step 205, rendering each dynamic object based on the first webpage to obtain a second webpage.
In step 205, rendering each dynamic object based on the first webpage to obtain the second webpage may be implemented by means of a dynamic object rendering tool.
In one embodiment, step 205 may be based on the first web page, and the second web page is obtained by rendering each dynamic object by calling WebKit.
The WebKit is an open-source browser engine, and has the remarkable characteristic of supporting different browsers, and after the browser obtains content to be rendered, the content is rendered through the following process: analyzing the webpage content data to obtain an analysis result; constructing document object model tree nodes according to the analysis result, and continuing to download webpage content data, analyze the webpage content data and construct the document object model tree nodes until all document object model tree nodes of the document object model tree are constructed; creating a document object model tree according to the document object model tree nodes; generating a corresponding rendering tree according to the document object model tree, and separating the creation of the rendering tree from the analysis module; and displaying the webpage according to the rendering tree.
In the embodiment of the invention, based on a first webpage, after acquiring each dynamic object to be rendered, a control-level Webkit analyzes each dynamic object, then establishes document object model tree nodes according to the analysis result until all document object model tree nodes are completely constructed, then establishes document object model trees according to the document object model tree nodes, finally generates corresponding rendering trees according to the document object model trees, and displays the webpage according to the rendering trees to obtain a second webpage.
In one embodiment, after step 205, step a 1-step A3:
and A1, acquiring webpage data information of the second webpage.
In step a1, the web page data information of the second web page may include related data information of the web page displayed on the second web page, such as a text list, a picture, an animation information, etc. displayed on the second web page.
In one embodiment, the web page data information of the second web page may further include web page address information in the second web page, for example, by selecting the first link address in the second web page, a jump may be made from the second web page to the third web page.
In one embodiment, the web page data information of the second web page may further include attribute information of the content displayed in the second web page, such as a width, a height, a display area, a display effect, and the like of a display area of the first address link displayed in the second web page; for example, attribute information such as whether the first text information displayed in the second web page is hidden, color change, scrolling, and the like.
In step a1, the web page data information of the second page is obtained by traversing the web page data loaded in the second web page.
And A2, acquiring the address information of the second webpage based on the webpage data information.
In step a2, the address information of the second web page may be used to indicate information of at least one first address link displayed in the second web page.
In one embodiment, the address information of the second web page may be used to represent a set of information for each first address link displayed in the second web page.
In one embodiment, the address information of the second web page may be used to represent a set of information of a certain type of first address link in the second web page.
In one embodiment, the address information of the second web page may be used to represent a set of information of at least two types of first address links in the second web page.
In one embodiment, the address information of the second webpage is obtained based on the webpage data information, and the address information can be obtained by a crawler technology based on the webpage data information.
In one embodiment, the address information of the second webpage obtained based on the webpage data information may be URL information of the second webpage.
Step A3, storing the address information in the message queue.
In step a3, a message queue may be used to indicate a queue for storing address information.
In one embodiment, the message queue may be used to indicate a container for storing address information during transmission of the address information.
In one embodiment, the message queue may perform the queue for storing the address information based on a First-in First-out (FIFO) mechanism. Also, the length of the message queue may be set according to the web page hierarchy of the web site, the number of web pages, the total capacity of address links in each web page, and the processing capability of the processor in the information processing apparatus.
In one embodiment, the message queue may be composed of a first message queue and a second message queue, where the first message queue is used to store address information obtained from the second webpage, and the second message queue is used to store other information corresponding to the address information stored in the first message queue.
In one embodiment, step 101 may also be implemented by:
and acquiring address information of the website from the message queue.
In one embodiment, the address information of the website is obtained from the message queue, and when the processor of the information processing apparatus detects that the message queue is not empty, that is, at least one piece of address information is stored in the message queue, the obtaining of the address information of the website from the message queue is started.
In one embodiment, the address information of the website is obtained from the message queue, and when the processor of the information processing device detects that the number of pieces of address information in the message queue is not less than a preset threshold value, the obtaining of the address information of the website from the message queue is started. The preset threshold is an integer which is greater than or equal to 2 and smaller than the length of the message queue; the preset threshold may be set according to the length of the message queue and the processing capability of the processor in the information processing device.
In one embodiment, the address information of the website is obtained from the message queue, and when the processor of the information processing device detects that the address information stored in the message queue matches with the preset address information, the processor starts to obtain the address information of the website from the message queue. The preset address information may be address information of a first page of the website, address information of any second-level page of the website, and address information of any webpage set based on the monitoring requirement of the website.
In one embodiment, the address information of the website is obtained from the message queue, and the obtaining of the address information of the website from the message queue is started when the number of pieces of address information stored in the message queue and the length of the message queue satisfy a preset proportional relationship. The preset proportional relationship may be preset, any proportion between 0 and 1, or any proportion set according to the length of the message queue and the processing capacity of the processor in the information processing device, such as 80%.
In one embodiment, after step 205, steps B1-B2 may also be performed:
and step B1, acquiring the webpage data information of the second webpage.
And step B2, determining second page data information based on the web page data information.
And the second page data information comprises static data information and dynamic data information of the second webpage.
In step B2, the second page data information includes static data information of the second web page, which may be data information displayed in the second web page and loaded directly to the browser without being calculated and compiled by the server script.
In one embodiment, the static data information of the second webpage included in the second page data may be data information of a static data object.
In step B2, the dynamic data information included in the second page data information may be data information of a dynamic data object in the second web page.
In one embodiment, the dynamic data information included in the second page data information may be data information obtained by a program running on the server side and needing to interact with a server background database, which is related to the second webpage.
In one embodiment, the second page data information is used to represent all data information presented by the page of the second webpage when displayed, including dynamic data information and static data information.
In one embodiment, the second page data information is used to represent all data information presented when the page of the second webpage is displayed, including text information, picture information, animation information, and the like.
In one embodiment, the second page data information is used to represent all data information and attribute information of the page of the second web page presented when being displayed, such as: the text information and attribute information thereof, such as the height and width of the text display area and the position information of the text display area in the whole page; attribute information such as picture information, and pixel and gray information of a picture, whether to switch scrolling display with other pictures, and the like; such as the size of the animation information and its animation resources, the play start time, the duration, and other attribute information.
In one embodiment, after step B2, the following operations may also be performed:
and performing page information detection on the second page data information, and determining the security level of the second page data information.
In the above operation, the page information detection is performed on the second page data, and may be to detect whether the second page data information matches preset second page data information.
In one embodiment, the page information detection is performed on the second page data, and may be to detect whether some page data information in the second page data information is modified.
In one embodiment, the performing of the page detection on the second page data may be performed by performing a detection on the second page whether the page data is tampered.
In the above operation, the security level of the second page data information may be used to indicate whether the second page data information matches the preset second page data information or the original second page data information, that is, if the second page data matches the preset second page data or the original second page data, the security level of the second page data information is high, otherwise, the security level of the second page data information is low.
In an embodiment, the security level of the second page data information may be represented by a ratio of the tampered data in the second page data information to the entire data information of the second page data information, for example, if the second page data information is tampered with 20%, the security level of the second page data information is a security level corresponding to 20% of the tampered content.
In one embodiment, the security level of the second page data information may be used to indicate whether the key data information in the second page data information is tampered, if the key data information is tampered, the security level is lower, and if the key data information is not tampered, the security level is relatively higher; if no data information is tampered, the security level is the highest level.
In one embodiment, the security level of the second page data information may be used to indicate a ratio of tampering of the key data information in the second page data information, and if all the key data information is tampered, the security level is the lowest level, that is, the second webpage is a very dangerous webpage; if a portion of the critical data information is tampered with, the security level is a relatively low level.
In one embodiment, the security level of the second page data information may be used to indicate whether the general data information in the second page data information is tampered with, if the general data information is not tampered with and the key data information is not tampered with, the security level is high, and if the key data information is not tampered with and the general data information is tampered with, the security level is relatively low.
Illustratively, performing page information detection on the second page data information to determine the security level of the second page data information may be implemented as follows:
and determining the security level of the second page data by performing webpage tampering detection on all or part of the page data in the second page data.
Illustratively, performing page information detection on the second page data information to determine the security level of the second page data information may be implemented as follows:
and determining the security level of the second page by executing webpage tampering detection on all key data information in the second page data information.
Illustratively, performing page information detection on the second page data information to determine the security level of the second page data information may be implemented as follows:
and determining the security level of the second page by performing webpage tampering detection on all the general data information in the second page data information.
Illustratively, performing page information detection on the second page data information to determine the security level of the second page data information may be implemented as follows:
and determining the security level of the second page by performing webpage tampering detection on all key data information and general data information in the second page data information.
The information processing method provided by the embodiment of the invention comprises the steps of obtaining address information of a website, presenting first data information based on the address information of the website to obtain a first webpage, then obtaining dynamic data information in the first webpage data information to further obtain each dynamic object in the dynamic data information, and finally rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing method provided by the embodiment of the invention can acquire any address information in the website, render and present the dynamic data information and the static data information corresponding to any address information, and further acquire the second webpage, so that any address information in the second webpage can be acquired by using a crawler technology.
Based on the foregoing embodiments, an embodiment of the present invention provides an information processing method, which may be implemented by the following steps as shown in fig. 3:
step 301, address information of the website is obtained.
Step 302, loading the address information of the website based on the matching relationship between the address information of the website and the historical address information, and presenting the first data information to obtain a first webpage.
The historical address information is used for indicating the loaded address information.
In step 302, the historical address information may be address information that has been loaded within a preset time.
In one embodiment, the historical address information may be address information that has been loaded within the length of the message queue. For example, the length of the message queue is 100, that is, within 100 pieces of address information before the address information of the website is currently acquired, if the address information identical to the address information of the website is loaded, the address information of the website is one piece of address information in the historical address information; if the same address information as the address information of the website is not loaded except the 100 pieces of address information before the address information of the website is currently acquired, the address information of the website is not one piece of address information in the historical address information.
In one embodiment, the historical address information may be a set of at least two pieces of address information.
In one embodiment, the historical address information may be stored using a database.
Correspondingly, based on the matching relationship between the address information of the website and the historical address information, the address information of the website is loaded, the first data information is presented, and the first webpage is obtained, which can be realized in the following way:
and searching historical address information stored in a database based on the address information of the website, loading the address information of the website according to the searching result, and presenting first data information to obtain a first webpage.
In one embodiment, the historical address information may be stored in the form of an array of address information objects.
Correspondingly, based on the matching relationship between the address information of the website and the historical address information, the address information of the website is loaded, the first data information is presented, and the first webpage is obtained, which can be realized in the following way:
and traversing the address information object array based on the address information of the website, loading the address information of the website according to the traversal result, and presenting the first data information to obtain a first webpage.
In one embodiment, step 302 may also be implemented by:
and if the address information of the website is not matched with the historical address information, presenting first data information based on the address information of the website to obtain a first webpage.
Based on the description of the foregoing embodiment, if the address information of the website does not match the historical address information, the first data information is presented based on the address information of the website to obtain the first webpage, which may be implemented by:
if the address information matched with the address information of the website is not found from the historical address information stored in the database, the address information of the website is indicated to be not loaded within the preset time or within the length of the message queue, and first data information is presented based on the address information of the website to obtain a first webpage.
If the address information matched with the address information of the website is found from the historical address information stored in the database, the address information of the website is indicated to be loaded within the preset time or within the length of the message queue, the address information of the website is discarded, and no loading operation is executed.
Based on the description of the foregoing embodiment, if the address information of the website does not match the historical address information, the first data information is presented based on the address information of the website to obtain the first webpage, which may be implemented by:
if address information matched with the address information of the website is not found from the address information object array, the address information of the website is indicated to be not loaded within preset time or within the length of a message queue, and first data information is presented based on the address information of the website to obtain a first webpage.
If the address information matched with the address information of the website is found from the address information object array, the address information of the website is indicated to be loaded within the preset time or within the length of the message queue, the address information of the website is discarded, and no loading operation is executed.
And step 303, acquiring dynamic data information in the first page data information.
And step 304, rendering the dynamic data information based on the first webpage to obtain a second webpage.
The information processing method provided by the embodiment of the invention comprises the steps of obtaining address information of a website, loading the address information of the website based on the matching relation between the address information of the website and historical address information, presenting first data information, obtaining a first webpage, and obtaining dynamic data information in the first webpage data information; and rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing method provided by the embodiment of the invention can ensure that all the dynamic data information and the static data information in the second webpage corresponding to any address information of the website are loaded, and can avoid repeated loading of the address information of the same website.
Based on the foregoing embodiments, an information processing architecture diagram of an information processing method is provided in an embodiment of the present invention, as shown in fig. 4, and each processing unit in the information processing architecture in fig. 4 will be described in detail below.
In fig. 4, the scheduler is configured to implement interaction with the message queue, continuously extract URLs from the message queue, perform URL encapsulation operation to obtain Requests objects, perform serialization on the Requests objects, and then place the serialized Requests objects into the message queue.
Serialization is the process of converting the state information of an object into a byte form which can be stored or transmitted. During serialization, the object writes its current state to a temporary or persistent store. The object may later be recreated by reading or deserializing the state of the object from storage. The core role of the serialization mechanism is the storage and reconstruction of the object state. Deserialization is the process of restoring the byte form after serialization to the object.
And the message queue is used for storing the URL so that the scheduler can continuously acquire a new URL, and is also used for storing the serialized Requests objects so as to ensure that the downloader can continuously acquire the new Requests objects.
The downloader is used for continuously acquiring new Requests objects from the message queue, performing deserialization on the Requests objects, and then performing repetitive filtering on the deserialized Requests objects, namely determining whether the deserialized Requests objects are in a preset time or are requested within the length of the message queue, if the deserialized Requests objects are in the preset time or are not requested within the length of the message queue, loading the deserialized Requests objects, and sending a first webpage obtained after loading to the renderer.
And the renderer is used for carrying out loading and rendering operations of dynamic data information on the received deserialized Requests objects in an asynchronous mode so as to obtain a second webpage, wherein address information of the website, namely all data information required to be loaded by the URL, is loaded in the second webpage.
The resolver is used for matching with a crawler technology, crawling all address information, namely URL information, from the second webpage and sending the URL information to the message queue; the resolver can also acquire page data information from the second webpage and send the page data information to the processing pipeline.
And the processing pipeline is used for executing other processing, such as page tampering detection and other operations, on the received page data information in the second webpage.
Illustratively, when the dispatcher detects that the message queue stores the URL information meeting the condition, the dispatcher acquires the URL from the message queue, encapsulates the URL to obtain a corresponding Requests object, then performs serialization operation on the Requests object, and stores the serialized Requests object into the message queue. When the downloader detects that the message queue stores the Requests meeting certain conditions, the downloader acquires the Requests from the message queue, performs deserialization on the Requests, loads the deserialized Requests to obtain a first webpage, sends the first webpage to the renderer, renders dynamic data on the basis of the first webpage by the renderer to obtain a second webpage, and sends the second webpage to the parser. The resolver acquires all address information, namely a new URL, from the second webpage, sends the new URL to the message queue, and simultaneously sends the page data information in the second webpage to the processing pipeline so that the processing pipeline can execute operations such as tampering detection on the page data information in the second webpage.
According to the information processing method provided by the embodiment of the invention, the loading of all webpage data information corresponding to any address information of a website is ensured through the mutual cooperation of the scheduler, the message queue, the downloader, the renderer, the parser and the processing pipeline, so that the comprehensive crawling of the address information in the king and my data information corresponding to any address information of the website is realized, and the problems of insufficient crawling page level depth and insufficient crawling page data amount generated when a webpage in the website is crawled in the related technology can be solved.
Based on the foregoing embodiments, an embodiment of the present invention provides an information processing apparatus 5, and the information processing apparatus 5 includes, as shown in fig. 5, a processor 51, a memory 52, and a communication bus 53. Wherein the content of the first and second substances,
the communication bus 53 is used for realizing communication connection between the processor and the memory;
the processor 51 is configured to execute the stored information acquisition program in the memory 52 to implement the following steps:
acquiring address information of a website;
presenting first data information based on the first address information to obtain a first webpage; the first data information is used for representing static data information in the first page data information corresponding to the first address information; first page data information including static data information and dynamic data information;
acquiring dynamic data information in the first page data information;
and rendering the dynamic data information based on the first webpage to obtain a second webpage.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
rendering the dynamic data information based on the first webpage to obtain a second webpage, comprising:
acquiring each dynamic object in the dynamic data information;
and rendering each dynamic object based on the first webpage to obtain a second webpage.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
acquiring webpage data information of a second webpage;
acquiring address information of a second webpage based on the webpage data information;
storing the address information into a message queue;
acquiring address information of a website, comprising:
and acquiring address information of the website from the message queue.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
acquiring webpage data information of a second webpage;
determining second page data information based on the webpage data information; and the second page data information comprises static data information and dynamic data information of the second webpage.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
and performing page information detection on the second page data information, and determining the security level of the second page data information.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
based on the address information of the website, presenting first data information to obtain a first webpage, comprising:
loading the address information of the website based on the matching relation between the address information of the website and the historical address information, and presenting first data information to obtain a first webpage; the historical address information is used for indicating the loaded address information.
In other embodiments of the present invention, the processor 51 is configured to execute an information acquisition program stored in the memory 52 to implement the steps of:
loading the address information of the website based on the matching relationship between the address information of the website and the historical address information, presenting first data information, and obtaining a first webpage, wherein the method comprises the following steps:
and if the address information of the website is not matched with the historical address information, presenting first data information based on the address information of the website to obtain a first webpage.
As can be seen from the above, the information processing apparatus provided in the embodiment of the present invention presents the first data information based on the first address information, and obtains the first web page; acquiring dynamic data information in the first page data information; and rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing device provided by the embodiment of the invention can acquire any address information in a website and present static data information and dynamic data information corresponding to the address information, so that crawling of any address information in the website is realized, information in a crawled webpage can be comprehensively presented, normal crawling of any next-level page in the crawled page is further ensured, and the problems of insufficient crawling page level depth and insufficient crawling page data amount generated when the webpage in the website is crawled in the related technology can be solved.
Based on the foregoing embodiments, an embodiment of the present invention provides an information processing system 6, as shown in fig. 6, where the information processing system 6 includes: an acquisition module 61 and a processing module 62; wherein the content of the first and second substances,
an obtaining module 61, configured to obtain address information of a website;
the processing module 62 is configured to present the first data information based on the first address information to obtain a first webpage; the first data information is used for representing static data information in the first page data information corresponding to the first address information; first page data information including static data information and dynamic data information;
the obtaining module 61 is further configured to obtain dynamic data information in the first page data information;
the processing module 62 is further configured to render the dynamic data information based on the first webpage to obtain a second webpage.
In other embodiments of the present invention, the processing module 62, based on the first webpage, renders the dynamic data information to obtain a second webpage, including:
acquiring each dynamic object in the dynamic data information;
and rendering each dynamic object based on the first webpage to obtain a second webpage.
In other embodiments of the present invention, the obtaining module 61 is further configured to obtain web page data information of a second web page;
acquiring address information of a second webpage based on the webpage data information;
the processing module 62 is further configured to store the address information in the message queue;
in other embodiments of the present invention, the obtaining module 61 obtains address information of a website, including:
and acquiring address information of the website from the message queue.
In other embodiments of the present invention, the obtaining module 61 is further configured to obtain web page data information of a second web page;
the processing module 62 is further configured to determine second page data information based on the web page data information; and the second page data information comprises static data information and dynamic data information of the second webpage.
In other embodiments of the present invention, the processing module 62 is further configured to perform page information detection on the second page data information, and determine the security level of the second page data information.
In other embodiments of the present invention, the processing module 62 is further configured to load address information of a website based on a matching relationship between the address information of the website and historical address information, and present first data information to obtain a first webpage; the historical address information is used for indicating the loaded address information.
In other embodiments of the present invention, the processing module 62 is configured to load address information of a website based on a matching relationship between the address information of the website and historical address information, and present first data information to obtain a first webpage, and includes:
and if the address information of the website is not matched with the historical address information, presenting first data information based on the address information of the website to obtain a first webpage.
In the embodiment of the present invention, the processing module may be a processor in a cloud server, and the obtaining module may be a communication module in the cloud server, which communicates with a website server.
As can be seen from the above, the information processing system provided in the embodiment of the present invention presents the first data information based on the first address information, and obtains the first web page; acquiring dynamic data information in the first page data information; and rendering the dynamic data information based on the first webpage to obtain a second webpage. Therefore, the information processing system provided by the embodiment of the invention can acquire any address information in a website and present static data information and dynamic data information corresponding to the address information, so that crawling of any address information in the website is realized, information in a crawled webpage can be comprehensively presented, normal crawling of any next-level page in the crawled page is further ensured, and the problems of insufficient crawling page level depth and insufficient crawling page data amount generated when the webpage in the website is crawled in the related technology can be solved.
Based on the foregoing embodiments, the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement the steps of any one of the foregoing information processing methods.
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.
The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in various product embodiments provided by the application can be combined arbitrarily to obtain new product embodiments without conflict.
The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.
The computer-readable storage medium may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); and may be various electronic devices such as mobile phones, computers, tablet devices, personal digital assistants, etc., including one or any combination of the above-mentioned memories.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method described in the embodiments of the present invention.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An information processing method, characterized in that the method comprises:
acquiring address information of a website;
presenting first data information based on the address information of the website to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the address information of the website; the first page data information comprises the static data information and the dynamic data information;
acquiring the dynamic data information in the first page data information;
and rendering the dynamic data information based on the first webpage to obtain a second webpage.
2. The method of claim 1, wherein rendering the dynamic data information based on the first webpage to obtain a second webpage comprises:
acquiring each dynamic object in the dynamic data information;
and rendering each dynamic object based on the first webpage to obtain the second webpage.
3. The method of claim 2, further comprising:
acquiring webpage data information of the second webpage;
acquiring address information of the second webpage based on the webpage data information;
storing the address information into a message queue;
the acquiring address information of the website includes:
and acquiring the address information of the website from the message queue.
4. The method of claim 2, further comprising:
acquiring webpage data information of the second webpage;
determining second page data information based on the webpage data information; and the second page data information comprises static data information and dynamic data information of the second webpage.
5. The method of claim 4, further comprising:
and executing page information detection on the second page data information, and determining the security level of the second page data information.
6. The method of claim 1, wherein the presenting first data information based on the address information of the website to obtain a first webpage comprises:
loading the address information of the website based on the matching relation between the address information of the website and the historical address information, and presenting first data information to obtain a first webpage; the historical address information is used for representing the loaded address information.
7. The method according to claim 6, wherein the loading the address information of the website based on the matching relationship between the address information of the website and the historical address information, and presenting the first data information to obtain the first webpage comprises:
and if the address information of the website is not matched with the historical address information, presenting first data information based on the address information of the website to obtain a first webpage.
8. An information processing apparatus characterized by comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute the stored information retrieval program in the memory to implement the steps of:
acquiring address information of a website;
presenting first data information based on the address information to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the first address information; the first page data information comprises the static data information and the dynamic data information;
acquiring the dynamic data information in the first page data information;
and rendering the dynamic data information based on the first webpage to obtain a second webpage.
9. An information processing system characterized by comprising: the device comprises an acquisition module and a processing module; wherein the content of the first and second substances,
the acquisition module is used for acquiring address information of the website;
the processing module is used for presenting first data information based on the address information to obtain a first webpage; the first data information is used for representing static data information in first page data information corresponding to the first address information; the first page data information comprises the static data information and the dynamic data information;
the acquiring module is further configured to acquire the dynamic data information in the first page data information;
the processing module is further configured to render the dynamic data information based on the first webpage to obtain a second webpage.
10. A computer-readable storage medium characterized by storing one or more programs, which are executable by one or more processors, to implement the steps of the information processing method according to any one of claims 1 to 7.
CN201911274626.0A 2019-12-12 2019-12-12 Information processing method, device, system and computer storage medium Pending CN111159517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911274626.0A CN111159517A (en) 2019-12-12 2019-12-12 Information processing method, device, system and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911274626.0A CN111159517A (en) 2019-12-12 2019-12-12 Information processing method, device, system and computer storage medium

Publications (1)

Publication Number Publication Date
CN111159517A true CN111159517A (en) 2020-05-15

Family

ID=70556817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911274626.0A Pending CN111159517A (en) 2019-12-12 2019-12-12 Information processing method, device, system and computer storage medium

Country Status (1)

Country Link
CN (1) CN111159517A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931024A (en) * 2020-07-10 2020-11-13 北京邮电大学 Crawling method and device for dynamic webpage and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040168122A1 (en) * 2003-02-21 2004-08-26 Kobipalayam Murugaiyan Senthil Nathan System, method and computer readable medium for transferring and rendering a web page
CN102624713A (en) * 2012-02-29 2012-08-01 深信服网络科技(深圳)有限公司 Website tampering identification method and website tampering identification device
CN102938776A (en) * 2012-09-28 2013-02-20 方正国际软件有限公司 Dynamic page processing system based on Asynchronous JavaScript and XML (ajax) technique
CN103778236A (en) * 2014-01-26 2014-05-07 网宿科技股份有限公司 Webpage data distribution processing method and device as well as webpage generation processing method and device
CN104852883A (en) * 2014-02-14 2015-08-19 腾讯科技(深圳)有限公司 Method and system for protecting safety of account information
CN105069132A (en) * 2015-08-17 2015-11-18 中国海洋大学 Webpage implementation method based on static shell
CN105320851A (en) * 2014-08-05 2016-02-10 腾讯科技(深圳)有限公司 Safety detection method and device for webpage
CN109542436A (en) * 2018-11-14 2019-03-29 泰康保险集团股份有限公司 Data processing method, device, medium and electronic equipment
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040168122A1 (en) * 2003-02-21 2004-08-26 Kobipalayam Murugaiyan Senthil Nathan System, method and computer readable medium for transferring and rendering a web page
CN102624713A (en) * 2012-02-29 2012-08-01 深信服网络科技(深圳)有限公司 Website tampering identification method and website tampering identification device
CN102938776A (en) * 2012-09-28 2013-02-20 方正国际软件有限公司 Dynamic page processing system based on Asynchronous JavaScript and XML (ajax) technique
CN103778236A (en) * 2014-01-26 2014-05-07 网宿科技股份有限公司 Webpage data distribution processing method and device as well as webpage generation processing method and device
CN104852883A (en) * 2014-02-14 2015-08-19 腾讯科技(深圳)有限公司 Method and system for protecting safety of account information
CN105320851A (en) * 2014-08-05 2016-02-10 腾讯科技(深圳)有限公司 Safety detection method and device for webpage
CN105069132A (en) * 2015-08-17 2015-11-18 中国海洋大学 Webpage implementation method based on static shell
CN109542436A (en) * 2018-11-14 2019-03-29 泰康保险集团股份有限公司 Data processing method, device, medium and electronic equipment
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931024A (en) * 2020-07-10 2020-11-13 北京邮电大学 Crawling method and device for dynamic webpage and electronic equipment

Similar Documents

Publication Publication Date Title
CN102473171B (en) Information about local machine is conveyed to browser application
US10728274B2 (en) Method and system for injecting javascript into a web page
US8898796B2 (en) Managing network data
CA2946695C (en) Fraud detection network system and fraud detection method
CN103368957B (en) Method and system that web page access behavior is processed, client, server
CN106357689B (en) The processing method and system of threat data
US8621613B1 (en) Detecting malware in content items
CN111737692B (en) Application program risk detection method and device, equipment and storage medium
CN111163095B (en) Network attack analysis method, network attack analysis device, computing device, and medium
CN108733559B (en) Page event triggering method, terminal equipment and medium
CN103001817A (en) Method and device for real-time detection of webpage cross-domain requests
CN103699600A (en) Data processing method for web cache and browser
US20210026969A1 (en) Detection and prevention of malicious script attacks using behavioral analysis of run-time script execution events
CN112560090B (en) Data detection method and device
CN107147645B (en) Method and device for acquiring network security data
US10129278B2 (en) Detecting malware in content items
CN107180194B (en) Method and device for vulnerability detection based on visual analysis system
CN108156121A (en) The alarm method and device that the monitoring method and device of flow abduction, flow are kidnapped
CN107103243B (en) Vulnerability detection method and device
CN109670100B (en) Page data capturing method and device
CN103336693B (en) The creation method of refer chain, device and security detection equipment
CN111159517A (en) Information processing method, device, system and computer storage medium
CN108462749B (en) Web application processing method, device and system
CN114465741B (en) Abnormality detection method, abnormality detection device, computer equipment and storage medium
US8819049B1 (en) Frame injection blocking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination