CN101763432A - Method for constructing lightweight webpage dynamic view - Google Patents

Method for constructing lightweight webpage dynamic view Download PDF

Info

Publication number
CN101763432A
CN101763432A CN201010033724A CN201010033724A CN101763432A CN 101763432 A CN101763432 A CN 101763432A CN 201010033724 A CN201010033724 A CN 201010033724A CN 201010033724 A CN201010033724 A CN 201010033724A CN 101763432 A CN101763432 A CN 101763432A
Authority
CN
China
Prior art keywords
script
embedded
page
document
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201010033724A
Other languages
Chinese (zh)
Inventor
张慧琳
诸葛建伟
宋程昱
韩心慧
龚晓锐
邹维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201010033724A priority Critical patent/CN101763432A/en
Publication of CN101763432A publication Critical patent/CN101763432A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for constructing a lightweight webpage dynamic view, and belongs to the technical field of computer application. The method comprises the steps of: 1) extracting a local script and a static embedded link of a webpage to be analyzed; 2) extracting a static embedded script from the static embedded link; 3) executing the local script and the static embedded script dynamically by utilizing a script executing engine to recognize a dynamic embedded link of the webpage; 4) constructing an embedded document set by a document pointed by the dynamic embedded link and a document pointed by the static embedded link; 5) extracting a quoting-quoted relation of the webpage and the document in the embedded document set; and 6) respectively taking the recognized static embedded webpage and dynamic embedded webpage as webpages to be analyzed, repeating the steps from 1) to 5), and establishing the webpage dynamic view according to the embedded document set acquired every time and the quoting-quoted relation. In the method, the webpage dynamic view is constructed at small system cost in less time.

Description

A kind of method for constructing lightweight webpage dynamic view
Technical field
The invention belongs to the Computer Applied Technology field, be specifically related to a kind ofly resolve and script is dynamically carried out and combined the page is static, hang down the method for the quick structure of interaction network page dynamic view.
Background technology
The early stage web page only simply comprises contents such as some texts and picture, along with the development of web technology in recent years, the webpage personnel that write usually utilize some embedded link and in conjunction with the script operation, in a kind of mode that need not user interactions the how active element and the content of Geng Duo excellence are shown in the page.
Embedded link is the special hyperlink form that a class has the src attribute in the html page, be characterized in, when browser access has the page of embedded link, need not the user clicks, the content that link is pointed to will be loaded automatically, as have the src attribute<iframe,<frame what point to is an embedded page,<script src points to is an embedded script file.Script in the page is as the instant run time version of a kind of client, the surfing and the interaction capabilities of webpage have been improved greatly, make and usually generating the relation that has realized a kind of real-time between webpage and the user some embedded link and it is written into the page by dynamic execution of script.
The present invention takes out the notion of webpage dynamic view, and it is during corresponding to user to access pages, and browser loads the hierarchical chart of the embedded document that embedded link points to automatically.As shown in Figure 1, during user to access pages X, even do not have further alternately, all embedded documents all can load or carry out by viewed device automatically among the figure.Webpage has brought the user a kind of more enriching experiences by dynamic view, yet has brought challenge to a few thing also.For example the leak attack code of webpage Trojan horse may reside in optional position in the webpage dynamic view and loading and the execution automatically of viewed device, and the complete detection of carrying out webpage Trojan horse must be based on the webpage dynamic view; For example some public sentiments are analyzed, and also must come webpage is carried out comprehensive content analysis based on the webpage dynamic view.
During user to access pages, the dynamic view of this page can be finished automatic loading in browser.Therefore can adopt a kind of high interactively method to obtain the webpage dynamic view, behavior when promptly monitoring the browser browsing pages, capturing the user does not have all document access contents under the mutual situation, but this kind method often has bigger time cost and systematic cost.
Summary of the invention
The present invention conducts in-depth analysis to present page elements autoloading reason, purpose is to provide a kind of method for constructing lightweight webpage dynamic view, its with the page static resolve and script dynamically execution combine and simulate browser, can when taking less system resource, construct the webpage dynamic view apace.
For convenience of explanation, the present invention introduces following notion:
1. embedded link: the special hyperlink of a class that refers to contain the src attribute, what it was pointed is dissimilar embedded documents, as<iframe 〉/<frame the embedded page of class and<script embedded script etc., embedded document need not the user and clicks and can load automatically by viewed device;
2. static inline link: the embedded link that in the page source file, indicates with the static labels form;
3. dynamic embedded link: dynamically carry out and the automatic embedded link that generates by script;
4. local script: at<script〉and</script between content for script, for the embedded script that has src;
5. webpage dynamic view: when the page is accessed, the hierarchical chart that embedded link is loaded automatically (as accompanying drawing 1), it has following tripartite face feature:
A) embedded link documents pointed such as the webpage dynamic view embedded page that viewed device is written into automatically when accessed by the accessed page and this page, embedded script are formed;
B) embedded link in the webpage dynamic view or in the page source file, indicate or dynamically carry out and generate by script with the form of static labels;
C) the webpage dynamic view has certain hierarchical structure---and be that the embedded page equally also has embedded link.
In comprehensive 5 a), b), c) 3 points: the webpage dynamic view is during corresponding to user to access pages, and browser is to the automatic loading layer hierarchical structure chart of embedded document.The webpage dynamic view is a tree structure, and document wherein to be analyzed and embedded document are as nodal set, and quoting between the document-relation of being cited is as the limit, and document to be analyzed is the root node of tree.During the user capture page, all viewed device of all embedded documents in this page dynamic view loads in order automatically.
The present invention is by low interactively simulation browser, all embedded documents that are written into and quote-be cited relation when analyzing the viewed device visit of the page, thus construct the dynamic view of webpage.Main thought is dynamically to carry out the embedded link of obtaining in the current page by static the parsing in conjunction with script of the page, come the embedded page of further recursive analysis by certain strategy, corresponding webpage dynamic view when finally constructing the viewed device visit of the page.
The low interaction network page dynamic view fast construction method that the present invention proposes mainly comprises following step:
(1) page to be analyzed is appointed as current page
(2) come current page is carried out local script parsing and static inline link identification by the static parsing of the page, its concrete disposal route is as follows:
A) local script is resolved: extract local script, it is pending to transfer to step (3)
B) static inline link identification, identify all static inline links step of going forward side by side and carry out following operation:
I. with the static inline script that identifies, it is pending to transfer to step (3)
Ii. with the static inline page that identifies, it is pending to transfer to step (5)
(3) be core with the script executing engine, carry out script and dynamically carry out the dynamic embedded link that identifies current page, mainly contain following two aspect work:
A) carry out the DOM simulation, with its context environmental as script executing, it mainly is divided into following two parts:
I. offer the application programming interfaces (API) of script with script simulation browser
Ii. construct automatically and the corresponding dom tree of the page (DocumentObject Model, i.e. DOM Document Object Model) with script simulation browser
B) dynamically embedded link identification: go forward side by side and one go on foot and carry out following processing loading the embedded link that captures dynamic generation in the relevant crucial function automatically with embedded link:
I. the script that identifies is carried out in step (3) immediately
Ii. the dynamic embedded page that identifies is transferred to step (5) and is handled
(4) step (2) and step (3) have analyzed the static inline link and the dynamic embedded link of current page, the document that these embedded link indicate constitutes an embedded collection of document, and each document in current page and the embedded collection of document is quoted and the relation of being cited
(5) analyze the embedded page of current page by certain embedded page recursive analysis strategy: the dynamic embedded page that the static inline page that (2)-b)-ii is analyzed and (3)-b)-ii analyze, be assigned therein as current page one by one, and get back to step (2) and analyze
(6), finish page dynamic view structure according to each resulting embedded collection of document of step (4) with quote-be cited relation.
Advantage of the present invention and good effect are as follows:
1. resolve and script is dynamically carried out combination the page is static, quote-be cited relation when the viewed device of multianalysis page-out is visited between all embedded documents that are written into and each document.
2. the low interactive mode of lightweight is simulated browser, finishes the structure of page dynamic view with less time cost and less systematic cost.
Description of drawings
The dynamic view example of Fig. 1 page
The structural drawing of a kind of method for constructing lightweight webpage dynamic view of Fig. 2 the present invention
Embodiment
Below in conjunction with accompanying drawing 2, method among the present invention is elaborated, one concrete implements the example collection of data structure 1 structured representation as the webpage dynamic view,
struct?node_link{
Node_Type?child_type;
string?parentURL;
string?childURL;
}
Data structure 1 definition
Enum Node_Type{InlineScript wherein, InlinePage...}
The step of this embodiment is as follows:
(1) simulate the various DOM API that browser offers script with the JavaScript script, code is put into the defineDOMAPI.js file, wherein:
A) definition window.open () exports the url in the parameter by the print mode;
B) definition document.write (), it calls call back function its content of parameter is further resolved, and the content for script that parses is carried out at current context immediately by carrying out the eval function, and the embedded link that parses is exported by the print mode;
C) definition onmouseover, onmouseout, onload incident trigger automatically;
D) all the other attribute assignments among the DOM API are default value, and all the other function definitions are the do-nothing function body;
(2) URL of document to be analyzed is set at the URL of current page, S set is empty
(3) with the URL of the current page new folder of running after fame, and download this page, based on SAX (Sample API forXML), i.e. " the simple API of XML " carries out static state to this page and resolves, in resolving:
A) whenever be resolved to a label, write out, this JavaScript statement is added in the buildDOMTree variable (string type) of current page according to the API structure of definition in (1) and the JavaScript statement of the corresponding DOM object of this label;
B) whenever parse the static inline link, the URL of src sensing is put into the inline_linking variable (list type) of current page;
In addition:
C) the local script that parses is put into the jscript variable (string type) of current page;
D) be resolved to have src<script, with its download content and put into the jscript variable of current page;
E) be resolved to have src<iframe or<frame, the URL that src is pointed to puts into the inlinepage variable (List type) of current page
(4) carry out script and dynamically carry out, concrete steps are newly-built interim empty file tmp.js under current directory at first, next:
A) content in the defineDOMAPI.js file in (1) is write among the tmp.js
B) buildDOMTree with the current page in (3)-a) writes among the tmp.js
C) jscript with the current page in (3) writes among the tmp.js
D) carry out tmp.js with script engine, and the result is put into output.js file under the current directory
(5) the merge module of master routine is extracted the embedded link in the output.js file, puts into the inline_linking variable of current page, and general<iframe〉or<frame the URL of src sensing put into the inlinepage variable of current page;
(6) the inline_linking variable that obtains of step (3)-b) and step (5) is static inline link and the dynamically embedded link set in the current page, next step: is according to the URL of current page and the URL among the inline_linking, the different instances of construction data structure 1, and these examples are put into S set
(7) the embedded page of recursive analysis current page promptly is set at the URL in the current page inlinepage variable URL of current page successively, gets back to step (3);
(8) S set representative band is analyzed the data representation of the webpage dynamic view of the page
Although disclose specific embodiments of the invention and accompanying drawing for the purpose of illustration, its purpose is to help to understand content of the present invention and implement according to this, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.The present invention should not be limited to this instructions most preferred embodiment and the disclosed content of accompanying drawing, and the scope of protection of present invention is as the criterion with the scope that claims define.

Claims (6)

1. a method for constructing lightweight webpage dynamic view the steps include:
1) the local script of the extraction page to be analyzed and static inline link;
2) from described static inline link, extract the static inline script;
3) utilize script executing engine dynamically to carry out described local script and described static inline script, identify the dynamic embedded link of this page;
4) document and described static inline link document pointed described dynamic embedded link is pointed constitute an embedded collection of document;
That 5) extracts document in this page and the described embedded collection of document quotes-is cited relation;
6) the static inline page that will identify according to the link of described static inline and the dynamic embedded page that identifies according to described dynamic embedded link are respectively as the page to be analyzed, repeating step 1)~5), according to the embedded collection of document that obtains and quote-be cited relation at every turn, set up page dynamic view.
2. the method for claim 1 is characterized in that described page dynamic view is a tree structure, and document wherein to be analyzed and embedded document are set of node; Quoting and be cited that to close be the limit between the document, document to be analyzed is the root node of tree.
3. method as claimed in claim 1 or 2, when it is characterized in that described script executing engine is carried out described local script and described static inline script, embedded link to the dynamic generation that captures is handled, its method is: identify the script of the embedded link of described dynamic generation, and utilize described script executing engine to carry out this script; Identify the embedded link dynamic embedded page pointed of described dynamic generation simultaneously, and with this page as the page to be analyzed.
4. method as claimed in claim 1 or 2, it is characterized in that described script executing engine utilizes DOM Document Object Model as the script executing environment, its method is: utilize script simulation browser to offer the application programming interfaces of script, utilize script simulation browser to construct the document object model tree corresponding with the page to be analyzed then.
5. method as claimed in claim 4 is characterized in that adopting JavaScript script simulation browser to offer the application programming interfaces of the various document object model tree of script.
6. the method for claim 1 is characterized in that based on SAX the described page to be analyzed being carried out static state resolves, and extracts the local script and the static inline link of this page to be analyzed.
CN201010033724A 2010-01-05 2010-01-05 Method for constructing lightweight webpage dynamic view Pending CN101763432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010033724A CN101763432A (en) 2010-01-05 2010-01-05 Method for constructing lightweight webpage dynamic view

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010033724A CN101763432A (en) 2010-01-05 2010-01-05 Method for constructing lightweight webpage dynamic view

Publications (1)

Publication Number Publication Date
CN101763432A true CN101763432A (en) 2010-06-30

Family

ID=42494596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010033724A Pending CN101763432A (en) 2010-01-05 2010-01-05 Method for constructing lightweight webpage dynamic view

Country Status (1)

Country Link
CN (1) CN101763432A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346780A (en) * 2011-10-18 2012-02-08 北龙中网(北京)科技有限责任公司 Method and device for acquiring webpage address
CN102955854A (en) * 2012-11-06 2013-03-06 北京中娱在线网络科技有限公司 Webpage presenting method and device based on HTML5 (Hypertext Markup Language 5) protocol
CN103530289A (en) * 2012-07-02 2014-01-22 腾讯科技(深圳)有限公司 Webpage displaying method and device
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page
CN104169898A (en) * 2011-12-28 2014-11-26 英特尔公司 Method and apparatus for streaming metadata between devices using javaScript and HTML5
CN109960769A (en) * 2019-03-15 2019-07-02 广州视源电子科技股份有限公司 Webpage view display method and device, computer equipment and storage medium
US10540416B2 (en) 2011-06-23 2020-01-21 Microsoft Technology Licensing, Llc Linking source code to running element
CN117910438A (en) * 2024-03-13 2024-04-19 江苏中威科技软件系统有限公司 Dynamic format file DLF generation device

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103635897B (en) * 2011-06-23 2019-04-26 微软技术许可有限责任公司 The method that dynamic update is carried out to the operation page
US10540416B2 (en) 2011-06-23 2020-01-21 Microsoft Technology Licensing, Llc Linking source code to running element
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page
US10534830B2 (en) 2011-06-23 2020-01-14 Microsoft Technology Licensing, Llc Dynamically updating a running page
CN102346780A (en) * 2011-10-18 2012-02-08 北龙中网(北京)科技有限责任公司 Method and device for acquiring webpage address
CN102346780B (en) * 2011-10-18 2016-01-06 北龙中网(北京)科技有限责任公司 The acquisition methods of web page address and acquisition device
CN104169898A (en) * 2011-12-28 2014-11-26 英特尔公司 Method and apparatus for streaming metadata between devices using javaScript and HTML5
US9848032B2 (en) 2011-12-28 2017-12-19 Intel Corporation Method and apparatus for streaming metadata between devices using JavaScript and HTML5
CN104169898B (en) * 2011-12-28 2018-04-27 英特尔公司 Method and apparatus for transmitting metadata as a stream between devices using JavaScript and HTML5
CN103530289A (en) * 2012-07-02 2014-01-22 腾讯科技(深圳)有限公司 Webpage displaying method and device
CN103530289B (en) * 2012-07-02 2018-06-22 腾讯科技(深圳)有限公司 Webpage display process and device
WO2014071749A1 (en) * 2012-11-06 2014-05-15 北京中娱在线网络科技有限公司 Html5-protocol-based webpage presentation method and device
CN102955854B (en) * 2012-11-06 2015-11-25 搜游网络科技(北京)有限公司 A kind of webpage exhibiting method based on HTML5 agreement and device
CN102955854A (en) * 2012-11-06 2013-03-06 北京中娱在线网络科技有限公司 Webpage presenting method and device based on HTML5 (Hypertext Markup Language 5) protocol
CN109960769A (en) * 2019-03-15 2019-07-02 广州视源电子科技股份有限公司 Webpage view display method and device, computer equipment and storage medium
CN109960769B (en) * 2019-03-15 2021-08-31 广州视源电子科技股份有限公司 Webpage view display method and device, computer equipment and storage medium
CN117910438A (en) * 2024-03-13 2024-04-19 江苏中威科技软件系统有限公司 Dynamic format file DLF generation device

Similar Documents

Publication Publication Date Title
US12026216B2 (en) System and method for deep linking and search engine support for web sites integrating third party application and components
CN101763432A (en) Method for constructing lightweight webpage dynamic view
CN102184184B (en) Method for acquiring webpage dynamic information
JP5695027B2 (en) Method and system for acquiring AJAX web page content
CN110059282A (en) A kind of acquisition methods and system of interactive class data
US10261984B2 (en) Browser and operating system compatibility
CN101562618B (en) Method and device for detecting web Trojan
CN111045678A (en) Method, device and equipment for executing dynamic code on page and storage medium
CN107766344B (en) Template rendering method and device and browser
CN101609399B (en) Intelligent website development system based on modeling and method thereof
CN104461513B (en) A kind of method and device for generating form interface
CN103268361A (en) Extracting method, device and system of hidden URL (Uniform Resource Locator) in webpage
CN102750352A (en) Method and device for classified collection of historical access records in browser
CN110309386B (en) Method and device for crawling web page
CN104615748A (en) Watir-based (web application testing in ruby based) internet-of-things web event processing method
CN105528369B (en) Webpage code-transferring method, device and server
CN105447198A (en) Convenient page script importing method and device
CN106294885A (en) A kind of data collection towards isomery webpage and mask method
CN110045950A (en) Static page based on nodejs develops scaffold method
CN103365919B (en) Web analysis container and method
CN101895517B (en) Method and device for extracting script semantics
KR101287371B1 (en) Method and Device for Collecting Web Contents and Computer-readable Recording Medium for the same
CN103853717A (en) Web crawler
CN104836779A (en) XSS vulnerability detection method, system and Web server
CN103246680B (en) A kind of method in browser, web page contents polymerization being represented and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100630