CN114065088A - Webpage analyzing method, system, equipment and computer readable storage medium - Google Patents
Webpage analyzing method, system, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN114065088A CN114065088A CN202111261572.1A CN202111261572A CN114065088A CN 114065088 A CN114065088 A CN 114065088A CN 202111261572 A CN202111261572 A CN 202111261572A CN 114065088 A CN114065088 A CN 114065088A
- Authority
- CN
- China
- Prior art keywords
- target
- webpage
- bytecode
- server
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000004458 analytical method Methods 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 21
- 230000004304 visual acuity Effects 0.000 abstract 1
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/43—Checking; Contextual analysis
- G06F8/436—Semantic checking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a webpage analyzing method, a system, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a target script code of a target webpage, and analyzing the target script code to obtain a target byte code; and sending the target bytecode to a local browser so that the local browser can execute the target bytecode to open the target webpage. According to the method, the most time-consuming webpage script code analysis operation when the local browser loads the webpage is moved to the server, and the webpage script code is analyzed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the analyzed code from the server and directly execute the code without analyzing the code. The resolving power of the server is stronger than that of a local browser, so that the opening speed of the webpage is greatly optimized.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, a system, a device, and a computer-readable storage medium for webpage parsing.
Background
With the rapid development of digital communication technology, digital televisions are also more and more widely used. The current overseas digital TV applications, such as HBBTV (Hybrid Broadcast/Broadcast TV), fvp (freeview play), etc., are presented by parsing the Javascript code corresponding to the application page through a browser in a web page manner. With the coming of the 5G era, the network speed is no longer the bottleneck influencing the speed of webpage development. Through the Google devtools, it is found that in the current webpage opening process, the most time consuming link is a link of analyzing JavaScript into byte codes by a browser, and if a complex application needs to be opened, the time consumption in the link can reach about 4 seconds. The above situation reflects the problem that the application page of the overseas digital television takes a long time to open.
Disclosure of Invention
The invention mainly aims to provide a webpage analyzing method, a webpage analyzing system, a webpage analyzing device and a computer readable storage medium, and aims to solve the technical problem that the time consumed for opening an application page of an overseas digital television is long.
In order to achieve the above object, a first aspect of the present invention provides a web page parsing method, where the web page parsing method is applied to a server, and the web page parsing method includes:
acquiring a target script code of a target webpage, and analyzing the target script code to obtain a target byte code;
and sending the target bytecode to a local browser so that the local browser can execute the target bytecode to open the target webpage.
Optionally, the step of parsing the target script code to obtain a target bytecode includes:
performing semantic and syntax analysis on the target script codes based on a script analyzer inserted in the server, and outputting an abstract syntax tree;
converting the abstract syntax tree into the target bytecode using an interpreter in the script parser.
Optionally, the step of sending the target bytecode to a local browser includes:
and sending the target bytecode to a local browser based on a bytecode receiving interface preset in the local browser.
Optionally, after the step of obtaining the target script code of the target webpage and analyzing the target script code to obtain the target bytecode, the method further includes:
and carrying out backup processing on the target bytecode.
In order to achieve the above object, a second aspect of the present invention further provides a web page parsing method, where the web page parsing method is applied to a local browser, and the web page parsing method includes:
acquiring a main resource of a target webpage, and analyzing the main resource into a DOM tree;
receiving and executing a target bytecode of the target webpage sent by a server to generate a CSSOM tree based on the target bytecode and the DOM tree;
and drawing and displaying the target webpage based on the CSSOM tree.
Optionally, the step of receiving and executing the target bytecode of the target webpage sent by the server to generate the CSSOM tree based on the target bytecode and the DOM tree includes:
receiving a target bytecode and a cascading style sheet corresponding to the target webpage, wherein the target bytecode is sent by the server;
executing the target byte code, adding a pattern to the DOM tree based on the cascading style sheet, and generating the CSSOM tree.
Optionally, the step of rendering and displaying the target webpage based on the CSSOM tree includes:
and traversing each visible node in the CSSOM from the root node of the CSSOM, and laying out and drawing each visible node to display the target webpage.
In addition, to achieve the above object, a third aspect of the present invention provides a web page parsing system, where the web page parsing system includes a server and a local browser, and the web page parsing system performs the following operations:
the local browser acquires a main resource of a target webpage and analyzes the main resource into a DOM tree;
the server acquires a target script code of the target webpage and analyzes the target script code to obtain a target byte code;
the server sends the target byte code to the local browser;
the local browser receives and executes a target bytecode of the target webpage sent by the server to generate a CSSOM tree based on the target bytecode and the DOM tree;
and the local browser draws and displays the target webpage based on the CSSOM tree.
In order to achieve the above object, a fourth aspect of the present invention provides a web page parsing apparatus provided in a server, the web page parsing apparatus including:
the script code analysis module is used for acquiring a target script code of a target webpage and analyzing the target script code to obtain a target byte code;
and the target code sending module is used for sending the target bytecode to a local browser so that the local browser can execute the target bytecode to open the target webpage.
Optionally, the script code parsing module includes:
the grammar parsing unit is used for carrying out semantic and grammar parsing on the target script codes based on a script parser inserted in the server and outputting abstract grammar trees;
and the code conversion unit is used for converting the abstract syntax tree into the target byte codes by utilizing an interpreter in the script parser.
Optionally, the object code sending module includes:
and the interface sending unit is used for sending the target bytecode to the local browser based on a bytecode receiving interface preset in the local browser.
Optionally, the web page parsing apparatus further includes:
and the code backup module is used for carrying out backup processing on the target byte codes.
In addition, to achieve the above object, a fifth aspect of the present invention provides a web page parsing apparatus, where the web page parsing apparatus is disposed in a local browser, and the web page parsing apparatus includes:
the webpage resource analysis module is used for acquiring the main resource of the target webpage and analyzing the main resource into a DOM tree;
the target code execution module is used for receiving and executing the target bytecode of the target webpage sent by the server so as to generate a CSSOM tree based on the target bytecode and the DOM tree;
and the target webpage display module is used for drawing and displaying the target webpage based on the CSSOM tree.
Optionally, the object code execution module includes:
the webpage information receiving unit is used for receiving the target byte codes sent by the server and the cascading style sheet corresponding to the target webpage;
and the webpage style adding unit is used for executing the target byte codes, adding styles to the DOM tree based on the cascading style sheet and generating the CSSOM tree.
Optionally, the target webpage display module includes:
and the target webpage drawing unit is used for traversing each visible node in the CSSOM from the root node of the CSSOM, and laying out and drawing each visible node so as to display the target webpage.
Further, to achieve the above object, a sixth aspect of the present invention provides a computer apparatus comprising: a server and/or a local browser, the computer device comprising: a memory, a processor and a program stored on the memory and executable on the processor, the program implementing the steps of the web page parsing method according to any one of the first aspect when executed by the processor or implementing the steps of the web page parsing method according to any one of the second aspect when executed by the processor.
Furthermore, to achieve the above object, a seventh aspect of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the steps of the web page parsing method according to any one of the above first aspects, or which, when executed by the processor, implements the steps of the web page parsing method according to any one of the above second aspects.
The invention provides a webpage analyzing method, a webpage analyzing system, webpage analyzing equipment and a computer readable storage medium. According to the webpage analysis method, the webpage script code analysis operation which is the most time-consuming when the local browser loads the webpage is moved to the server, and the webpage script code is analyzed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the analyzed code from the server and directly execute the code without analyzing the code. The resolution capability of the server is stronger than that of a local browser, so that the webpage opening speed can be optimized, and the technical problem that the time consumed for opening the application page of the overseas digital television is long is solved. In addition, by the method, the data size of the bytecode is correspondingly reduced compared with the original script code, so that the network overhead can be reduced.
Drawings
FIG. 1 is a schematic diagram of a web page parsing device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a web page parsing method according to a first embodiment of the present invention;
fig. 3 is a schematic view illustrating a web page parsing flow on a server and a local browser before and after improvement in a second embodiment of the web page parsing method according to the present invention;
FIG. 4 is a flowchart illustrating a webpage parsing method according to a third embodiment of the present invention;
FIG. 5 is a flowchart illustrating a webpage loading process according to a fourth embodiment of the webpage parsing method of the present invention;
FIG. 6 is a schematic diagram of a functional module of the web page parsing apparatus (installed in a server) according to the present invention;
fig. 7 is a schematic diagram of functional modules of the web page parsing apparatus (installed in a local browser) according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a web page parsing device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the web page parsing apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The optional user interface 1003 may include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the web page parsing device architecture shown in FIG. 1 does not constitute a limitation of web page parsing devices and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a web page parsing program.
In the web page parsing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the processor 1001 may be configured to call a web page parsing program stored in the memory 1005 and execute the web page parsing method according to the embodiment of the present invention.
Based on the hardware structure, the invention provides various embodiments of the webpage analysis method.
With the rapid development of digital communication technology, digital televisions are also more and more widely used. The current overseas digital TV applications, such as HBBTV (Hybrid Broadcast/Broadcast TV), fvp (freeview play), etc., are presented by parsing the Javascript code corresponding to the application page through a browser in a web page manner. With the coming of the 5G era, the network speed is no longer the bottleneck influencing the speed of webpage development. Through the Google devtools, it is found that in the current webpage opening process, the most time consuming link is a link of analyzing JavaScript into byte codes by a browser, and if a complex application needs to be opened, the time consumption in the link can reach about 4 seconds. The above situation reflects the problem that the application page of the overseas digital television takes a long time to open.
In order to solve the technical problems, the invention provides a webpage parsing method, namely, the webpage script code parsing operation which is the most time-consuming when a local browser loads a webpage is moved to a server, and the webpage script code is parsed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the parsed code from the server and directly execute the parsed code without parsing the webpage. The resolution capability of the server is stronger than that of a local browser, so that the webpage opening speed can be optimized, and the technical problem that the time consumed for opening the application page of the overseas digital television is long is solved. In addition, by the method, the data size of the bytecode is correspondingly reduced compared with the original script code, so that the network overhead can be reduced.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a web page parsing method.
A first embodiment of the present invention provides a web page parsing method, where the web page parsing method is applied to a server, and the web page parsing method includes:
step S10, acquiring a target script code of a target webpage, and analyzing the target script code to obtain a target byte code;
in this embodiment, the target webpage refers to a webpage of which script codes need to be analyzed in a current webpage analysis task of the server. For example, in the overseas digital television application scenario, the target web page refers to an application page of the overseas digital television application. One or more target webpages in the same webpage parsing task may be provided, and the embodiment is not particularly limited. The target script code refers to a script code corresponding to the target webpage. For example, in an overseas digital television application scenario, the script code corresponding to the application page is a JavaScript source code. The object bytecode refers to a bytecode obtained by parsing the object script code. Bytecode refers to a binary file containing an executive consisting of a sequence of op code/data pairs.
The traditional server is only a database with a storage data and an updating mechanism, and after a local browser acquires script codes of a webpage from the server, the script codes can be operated only by being converted into machine identification byte codes through complicated analysis. The resolution capability of the local browser is completely determined by the hardware performance at present. The server in the application has a script source code analysis function.
When receiving a script code acquisition instruction, the server determines an acquisition path of a target script code based on the instruction, and then acquires the target script code of the target webpage according to the acquisition path. And then the server rapidly analyzes the target script codes by utilizing the high performance of the server to obtain the target byte codes analyzed by the target script codes.
Step S20, sending the target bytecode to a local browser, so that the local browser executes the target bytecode to open the target webpage.
In this embodiment, the local browser refers to a browser that needs to load and display a target web page.
The server analyzes the target script code into a target byte code by using a self-analysis function and then transmits the target byte code to the local browser. When receiving the target bytecode sent by the server, the local browser can directly execute the target bytecode to perform a loading process of the target page without performing parsing operation on the target bytecode.
As another embodiment, after the target bytecode is parsed, the server may further compile the target bytecode into a target machine code that can be directly recognized by the machine, so as to directly transmit the target machine code to the local browser, and after receiving the target machine code, the local browser may directly recognize and execute the target machine code, so as to load the target webpage and display the target webpage on a screen for a user to browse.
In the embodiment, a target bytecode is obtained by acquiring a target script code of a target webpage and analyzing the target script code; and sending the target bytecode to a local browser so that the local browser can execute the target bytecode to open the target webpage. Through the mode, the most time-consuming webpage script code analysis operation when the local browser loads the webpage is moved to the server, and the webpage script code is analyzed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the analyzed code from the server and directly execute the code without analyzing the code. The resolution capability of the server is stronger than that of a local browser, so that the webpage opening speed can be optimized, and the technical problem that the time consumed for opening the application page of the overseas digital television is long is solved. In addition, by the method, the data size of the bytecode is correspondingly reduced compared with the original script code, so that the network overhead can be reduced.
Further, based on the first embodiment shown in fig. 2, a second embodiment of the web page parsing method of the present invention is proposed, in this embodiment, the step of parsing the target script code to obtain the target bytecode includes:
performing semantic and syntax analysis on the target script codes based on a script analyzer inserted in the server, and outputting an abstract syntax tree;
converting the abstract syntax tree into the target bytecode using an interpreter in the script parser.
In the present embodiment, the following description will be given taking a script code as a JavaScript code as an example.
In order to break the bottleneck that the parsing function of the local browser is limited, a JavaScript parser (for example, the V8 engine on the Android platform and the JavaScript core on the IOS platform which are widely used in the market at present) is inserted into the server, so that the server is endowed with the parsing function for the JavaScript source code.
As shown in fig. 3, fig. 3 is a schematic diagram illustrating a web page parsing process on the local browser and the server before and after improvement. The left side is a schematic diagram of the existing analytic modes on a server and a local browser, an upper cloud frame represents the server, and a lower rectangular frame represents the local browser; the right side is a schematic diagram of an improved analysis mode on the server and the local browser, and similarly, an upper cloud-shaped frame represents the server and a lower rectangular frame represents the local browser. In the existing parsing manner, a server directly transmits an online JavaScript source code (i.e., JavaScript online in the figure) to a local browser without parsing, and after the browser downloads the JavaScript source code (JavaScript local), the JavaScript source code is parsed (parse) locally to obtain a corresponding bytecode (Byte code). In the improved analysis mode in the application, after acquiring the avaScript online, the server does not directly transmit the avaScript online to the local browser, but analyzes the avaScript online to obtain an analysis result (namely, the Byte code online in the figure), then transmits the analyzed Byte code online to the local browser, and the local browser downloads the Byte code online to obtain the Byte code local.
After the server acquires the JavaScript source codes of the target webpage, the server analyzes the JavaScript source codes by using a preset JavaScript analyzer. When the JavaScript parser is the V8 engine, the parsing process is as follows: firstly, carrying out semantic and syntactic analysis on a JavaScript source code (semantic analysis is a logic stage of a compiling process, the task of semantic analysis is to carry out context-related property examination on a structurally correct source program and carry out type examination, semantic analysis is to examine whether semantic errors exist in the source program and collect type information for a code generating stage, syntactic analysis is a logic stage of the compiling process, the task of syntactic analysis is to combine a word sequence into various syntactic phrases such as 'program', 'statement', 'expression' and the like on the basis of lexical analysis, the syntactic analysis program judges whether the source program is structurally correct, and the structure of the source program is described by context-free grammar). And after analysis, outputting the corresponding AST abstract syntax tree. The AST abstract syntax tree is then passed through the Ignition interpreter to generate the target bytecode. In addition, in the V8 engine, the target bytecode can further generate target machine code which can be directly identified by a machine through a Turbofan optimization compiler.
Further, the step of sending the target bytecode to a local browser includes:
and sending the target bytecode to a local browser based on a bytecode receiving interface preset in the local browser.
In this embodiment, in order to move the parsing function to the server, in addition to the JavaScript parser needs to be inserted into the server, an interface capable of directly receiving the bytecode needs to be provided for the local browser, so that the local server does not need to perform parsing after acquiring the JavaScript bytecode sent by the server, but can directly run.
Further, after step S10, the method further includes:
and carrying out backup processing on the target bytecode.
In this embodiment, the server may also perform backup after parsing the target bytecode, and only needs to directly call the backup if the same bytecode corresponding to the target webpage needs to be transmitted to the local browser subsequently, thereby avoiding repeated parsing operations.
In the embodiment, a JavaScript parser is further inserted into the server, so that the JavaScript can be converted into the bytecode on the server by using the high performance of the server; an interface for directly receiving the bytecode is provided for the local browser, so that the JavaScript bytecode acquired by the local browser does not need to be analyzed any more, and can be directly operated, thereby not only reducing the analysis time of the local browser, but also correspondingly reducing the size of the bytecode relative to the original JavaScript, and further reducing the network overhead; by backing up the bytecode on the server, the backup can be directly called without repeated analysis when the same bytecode is transmitted again.
In addition, referring to fig. 4, fig. 4 is a flowchart illustrating a webpage parsing method according to a third embodiment. A third embodiment of the present invention provides a web page parsing method, where the web page parsing method is applied to a local browser, and the web page parsing method includes:
step S100, acquiring a main resource of a target webpage, and analyzing the main resource into a DOM tree;
in this embodiment, the local browser does not perform parsing operation on the script source code of the web page, but directly receives the bytecode parsed by the server, and an interface for directly receiving the bytecode is preset in the local browser. The target web page refers to a web page that the local browser needs to display currently. Resources refer to relevant data resources of the target web page. With respect to Document Object Model (DOM), the HTML DOM defines the standard methods of accessing and manipulating HTML documents. The DOM expresses HTML documents as a tree structure.
When receiving a target webpage access request sent by a user, the local browser determines a target webpage based on the request, then acquires a main resource of the target webpage and analyzes the main resource into a DOM tree.
Step S200, receiving and executing a target byte code of the target webpage sent by a server to generate a CSSOM tree based on the target byte code and the DOM tree;
and step S300, drawing and displaying the target webpage based on the CSSOM tree.
In this embodiment, the server is a server that directly provides the bytecode parsed by the script source code to the local browser, and the server is inserted with a corresponding parser and can directly parse the script source code of the web page. The target bytecode refers to a bytecode obtained after parsing the target script code of the target web page. Bytecode refers to a binary file containing an executive consisting of a sequence of op code/data pairs. With respect to the (CSSOM, CSS Object Model) tree, CSSOM refers to the CSS Object Model, describing the mapping of all CSS selectors and the associated attributes of each selector.
The local browser receives the target bytecode sent by the server based on the preset interface, acquires the cascading style sheet corresponding to the target webpage from the server, and then directly executes the target bytecode. And adding a style to the DOM tree by the local browser based on the cascading style sheet, and generating and continuously updating the CSSOM tree in the execution process of the target bytecode. And finally, the local browser traverses each visible node in the CSSOM from the root node of the CSSOM, and lays out and draws each visible node to display the target webpage on a screen.
In the embodiment, the main resource of the target webpage is acquired and is analyzed into the DOM tree; receiving and executing a target bytecode of the target webpage sent by a server to generate a CSSOM tree based on the target bytecode and the DOM tree; and drawing and displaying the target webpage based on the CSSOM tree. Through the mode, the webpage script code analysis operation which is the most time-consuming when the local browser loads the webpage is moved to the server, and the webpage script code is analyzed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the analyzed code from the server and directly execute the analyzed code, and other related webpage loading operations are carried out until the target webpage is rendered and displayed without analyzing the webpage. The resolution capability of the server is stronger than that of a local browser, so that the webpage opening speed can be optimized, and the technical problem that the time consumed for opening the application page of the overseas digital television is long is solved.
Further, based on the third embodiment shown in fig. 3, a fourth embodiment of the web page parsing method of the present invention is provided. In this embodiment, step S200 includes:
receiving a target bytecode and a cascading style sheet corresponding to the target webpage, wherein the target bytecode is sent by the server;
executing the target byte code, adding a pattern to the DOM tree based on the cascading style sheet, and generating the CSSOM tree.
Further, step S300 includes:
and traversing each visible node in the CSSOM from the root node of the CSSOM, and laying out and drawing each visible node to display the target webpage.
In this embodiment, as shown in fig. 5, fig. 5 is a flowchart of loading a web page on a local browser.
The first step, Page module. Firstly, a local browser needs to acquire a main resource from a target to a target webpage;
and step two, a DOM module. The local browser analyzes the main resource of the target webpage into a DOM tree;
and thirdly, a CSSOM module. The local browser downloads the JavaScript byte codes of the target webpage and the Cascading Style Sheets (CSS) of the target webpage which are sent after being analyzed by the server, then executes the JavaScript byte codes and the CSS so as to execute CSS styles of DOM tree entries in the executing process, and continuously updates the DOM tree and the CSSOM tree.
And fourthly, a Render Tree module. The local browser traverses each visible node from the root node of the CSSDOM Tree, and calculates the style to generate a Render Tree;
and fifthly, a Layout module. The local browser lays out each visible node to a corresponding position according to the style;
and sixthly, a Paint module. And drawing each visible node by the local browser to render and display the target webpage.
In addition, the JavaScript module part of the figure relates to the process of converting JavaScript source code into machine-recognized bytecode on the server.
In the embodiment, the most time-consuming webpage script code analysis operation when the local browser loads the webpage is moved to the server, and the webpage script code is analyzed before the process of transmitting the local browser through the server by utilizing the high-performance characteristic of the server, so that the local browser can directly receive the analyzed code from the server and directly execute the analyzed code, and only the mode level construction and node rendering operation of the page are needed, thereby greatly reducing the page loading time on the local browser.
The invention also provides a webpage analysis system, which comprises a server and a local browser, and executes the following operations:
the local browser acquires a main resource of a target webpage and analyzes the main resource into a DOM tree;
the server acquires a target script code of the target webpage and analyzes the target script code to obtain a target byte code;
the server sends the target byte code to the local browser;
the local browser receives and executes a target bytecode of the target webpage sent by the server to generate a CSSOM tree based on the target bytecode and the DOM tree;
and the local browser draws and displays the target webpage based on the CSSOM tree.
As shown in fig. 6, the present invention further provides a web page parsing apparatus, where the web page parsing apparatus is disposed in a server, and the web page parsing apparatus includes:
the script code analyzing module S10 is used for acquiring a target script code of a target webpage and analyzing the target script code to obtain a target bytecode;
and the object code sending module S20 is configured to send the object bytecode to a local browser, so that the local browser executes the object bytecode to open the target webpage.
As shown in fig. 7, the present invention further provides a web page parsing apparatus, where the web page parsing apparatus is disposed in a local browser, and the web page parsing apparatus includes:
the webpage resource analysis module S100 is used for acquiring a main resource of a target webpage and analyzing the main resource into a DOM tree;
a target code execution module S200, configured to receive and execute a target bytecode of the target webpage sent by a server, so as to generate a CSSOM tree based on the target bytecode and the DOM tree;
and the target webpage display module S300 is used for drawing and displaying the target webpage based on the CSSOM tree.
The invention also provides computer equipment (such as webpage analysis equipment).
The computer device may include: a server and/or a local browser, the computer device comprising: a memory, a processor and a program (such as a web page parsing program) stored on the memory and executable on the processor, wherein the program, when executed by the processor, implements the steps of the web page parsing method as described in any one of the above.
The method implemented when the program is executed may refer to various embodiments of the web page parsing method of the present invention, and details thereof are not repeated herein.
The invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores thereon a program (e.g., a web page parsing program) that, when executed by a processor, implements the steps of the web page parsing method as described above.
The method implemented when the program is executed may refer to various embodiments of the web page parsing method of the present invention, and details thereof are not repeated herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a web page parsing apparatus to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A webpage analyzing method is characterized in that the webpage analyzing method is applied to a server, and comprises the following steps:
acquiring a target script code of a target webpage, and analyzing the target script code to obtain a target byte code;
and sending the target bytecode to a local browser so that the local browser can execute the target bytecode to open the target webpage.
2. The method for parsing web pages according to claim 1, wherein the step of parsing the target script code to obtain target bytecode includes:
performing semantic and syntax analysis on the target script codes based on a script analyzer inserted in the server, and outputting an abstract syntax tree;
converting the abstract syntax tree into the target bytecode using an interpreter in the script parser.
3. The web page parsing method of claim 1, wherein the step of sending the target bytecode to a local browser comprises:
and sending the target bytecode to a local browser based on a bytecode receiving interface preset in the local browser.
4. A method for parsing a web page according to any one of claims 1-3, wherein after the step of obtaining the target script code of the target web page and parsing the target script code to obtain the target bytecode, the method further comprises:
and carrying out backup processing on the target bytecode.
5. A webpage analyzing method is applied to a local browser and comprises the following steps:
acquiring a main resource of a target webpage, and analyzing the main resource into a DOM tree;
receiving and executing a target bytecode of the target webpage sent by a server to generate a CSSOM tree based on the target bytecode and the DOM tree;
and drawing and displaying the target webpage based on the CSSOM tree.
6. The web page parsing method of claim 5, wherein the step of receiving and executing the target bytecode of the target web page sent by the server to generate the CSSOM tree based on the target bytecode and the DOM tree comprises:
receiving a target bytecode and a cascading style sheet corresponding to the target webpage, wherein the target bytecode is sent by the server;
executing the target byte code, adding a pattern to the DOM tree based on the cascading style sheet, and generating the CSSOM tree.
7. The web page parsing method of claim 5, wherein the step of rendering and displaying the target web page based on the CSSOM tree comprises:
and traversing each visible node in the CSSOM from the root node of the CSSOM, and laying out and drawing each visible node to display the target webpage.
8. A web page parsing system, wherein the web page parsing system comprises a server and a local browser, and wherein the web page parsing system performs the following operations:
the local browser acquires a main resource of a target webpage and analyzes the main resource into a DOM tree;
the server acquires a target script code of the target webpage and analyzes the target script code to obtain a target byte code;
the server sends the target byte code to the local browser;
the local browser receives and executes a target bytecode of the target webpage sent by the server to generate a CSSOM tree based on the target bytecode and the DOM tree;
and the local browser draws and displays the target webpage based on the CSSOM tree.
9. A computer device, characterized in that the computer device comprises: memory, processor and program stored on the memory and executable on the processor, the program implementing the steps of the web page parsing method according to any one of claims 1 to 4 when executed by the processor or implementing the steps of the web page parsing method according to any one of claims 4 to 7 when executed by the processor.
10. A computer-readable storage medium, characterized in that a program is stored thereon, which program, when being executed by a processor, carries out the steps of the web page parsing method according to any one of claims 1 to 4, or which program, when being executed by the processor, carries out the steps of the web page parsing method according to any one of claims 4 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111261572.1A CN114065088A (en) | 2021-10-27 | 2021-10-27 | Webpage analyzing method, system, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111261572.1A CN114065088A (en) | 2021-10-27 | 2021-10-27 | Webpage analyzing method, system, equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114065088A true CN114065088A (en) | 2022-02-18 |
Family
ID=80235683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111261572.1A Pending CN114065088A (en) | 2021-10-27 | 2021-10-27 | Webpage analyzing method, system, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114065088A (en) |
-
2021
- 2021-10-27 CN CN202111261572.1A patent/CN114065088A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110990731B (en) | Rendering method, device and equipment of static webpage and computer storage medium | |
US10776567B2 (en) | Method for compiling page data, method, device and storage medium for page rendering | |
CN109522018B (en) | Page processing method and device and storage medium | |
US8775926B2 (en) | Stylesheet conversion engine | |
US9805009B2 (en) | Method and device for cascading style sheet (CSS) selector matching | |
EP3143497B1 (en) | Interactive viewer of intermediate representations of client side code | |
US20130159839A1 (en) | Semantic compression of cascading style sheets | |
US20140282379A1 (en) | Computer-implemented method, system and computer program product for displaying a user interface component | |
US20090313613A1 (en) | Methods and Apparatus for Automatic Translation of a Computer Program Language Code | |
CN102693323B (en) | Cascading style sheet resolving method, resolver, webpage presentation method and server | |
CN109144567B (en) | Cross-platform webpage rendering method and device, server and storage medium | |
US11474796B1 (en) | Build system for distributed applications | |
CN111950239B (en) | Schema document generation method, device, computer equipment and medium | |
CN111831384A (en) | Language switching method and device, equipment and storage medium | |
CN112799663A (en) | Page display method and device, computer readable storage medium and electronic equipment | |
CN116126347B (en) | File compiling system and method for low-code application program | |
CN111459537A (en) | Redundant code removing method, device, equipment and computer readable storage medium | |
CN114153459A (en) | Interface document generation method and device | |
CN111310005B (en) | Processing method and device of network request, server and storage medium | |
CN113836469A (en) | Website front-end development method and equipment | |
CN115599386A (en) | Code generation method, device, equipment and storage medium | |
CN111507074A (en) | Data processing method and device, processor, electronic equipment and storage medium | |
CN112000416B (en) | Card view generation method, device and computer readable storage medium | |
KR101092019B1 (en) | Web browsing system using the mobile web browser and method thereof and mobile terminal in the used the same | |
CN114065088A (en) | Webpage analyzing method, system, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |