CN105630843A - Webpage change monitoring method and device - Google Patents

Webpage change monitoring method and device Download PDF

Info

Publication number
CN105630843A
CN105630843A CN201410652444.3A CN201410652444A CN105630843A CN 105630843 A CN105630843 A CN 105630843A CN 201410652444 A CN201410652444 A CN 201410652444A CN 105630843 A CN105630843 A CN 105630843A
Authority
CN
China
Prior art keywords
page
same time
webpage
difference
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410652444.3A
Other languages
Chinese (zh)
Other versions
CN105630843B (en
Inventor
梁捷
张云龙
钟国英
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangzhou Dongjing Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dongjing Computer Technology Co Ltd filed Critical Guangzhou Dongjing Computer Technology Co Ltd
Priority to CN201410652444.3A priority Critical patent/CN105630843B/en
Priority to PCT/CN2015/090969 priority patent/WO2016078479A1/en
Publication of CN105630843A publication Critical patent/CN105630843A/en
Application granted granted Critical
Publication of CN105630843B publication Critical patent/CN105630843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a webpage change monitoring method and device; wherein the method comprises following steps of respectively recording the page data of the same one webpage after the webpage is loaded at different moments; carrying out screenshot storage to the same webpage after the webpage is loaded at different moments, wherein the data structures of the page data of the same webpage after the webpage is loaded at different moments are recorded as correspondingly specific data structures; through comparing the specific data structures recorded at different moments, determining the differences among the page data of the same webpage after the webpage is loaded at different moments; and respectively marking the differences on the page screenshots of different moments. Through adoption of the webpage change monitoring method and device provided by the invention, the changes of the same webpage generated at different moments can be monitored and compared accurately.

Description

Web evolution monitoring method and device
Technical field
The present invention relates to mobile internet technical field, more specifically, relate to a kind of Web evolution monitoring method and device.
Background technology
The Internet is famous with iteratively faster, and web application can carry out weekly multi-batch products issue and operation content update, and therefore, product is carried out one of web monitor emphasis becoming enterprise network page management by enterprise.
At present, product is carried out what the picture pixels contrast after the method for page monitoring and contrast is all based on page screenshot realized by most enterprises, and its rate of false alarm is high, can not get rid of for the random content region on webpage, very dumb. Therefore, snapshot is done in the history amendment of webpage, and the difference between twice historical snapshot is contrasted, marked difference position, become the urgent needs that product is monitored by enterprise.
Therefore, the change that how can monitor, contrast same webpage exactly becomes the subject matter of current enterprise web monitor.
Summary of the invention
In view of the above problems, it is an object of the invention to provide a kind of Web evolution monitoring method and device, by webpage is carried out structured record and contrast at not page data in the same time, thus finding out webpage in not difference in the same time, simultaneously, the difference found out is marked on the sectional drawing of webpage, thus improving the accuracy of webpage contrast difference, web monitor of being more convenient for.
Web evolution provided by the invention monitoring method, including:
Record respectively same webpage not in the same time load after page data, and to same webpage not in the same time load after the page carry out sectional drawing preservation; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure;
By contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time;
This difference is marked at respectively not on page screenshot in the same time.
Web evolution supervising device provided by the invention, including:
Page data record unit, for recording same webpage page data after not loading in the same time respectively; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure;
Page screenshot unit, carries out sectional drawing preservation for the page after same webpage is not loaded in the same time;
Difference determining unit, for contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time;
Difference indexing unit, for being marked at not page screenshot in the same time respectively by this difference.
Above-mentioned according to Web evolution provided by the invention monitoring method and device, sectional drawing is carried out by the page after same webpage is not loaded in the same time, and the page data after not loaded in the same time by same webpage is recorded as specific data structure, and the specific data structure in any two moment is contrasted, find out the part of difference, by the part correspondence markings of difference on the sectional drawing in two moment, it is possible to contrast same webpage exactly in the change not occurred in the same time, it is simple to web monitor.
In order to realize above-mentioned and relevant purpose, one or more aspects of the present invention include the feature that will be explained in below and be particularly pointed out in the claims. Description below and accompanying drawing describe some illustrative aspects of the present invention in detail. But, some modes in the various modes that only can use principles of the invention of these aspects instruction. Additionally, it is contemplated that include all these aspects and their equivalent.
Accompanying drawing explanation
By the content of the reference explanation below in conjunction with accompanying drawing and claims, and along with being more fully understood from the present invention, other purpose of the present invention and result will be more apparent and should be readily appreciated that. In the accompanying drawings:
Fig. 1 is the schematic flow sheet of Web evolution monitoring method according to embodiments of the present invention;
Fig. 2 is the schematic flow sheet of snapshots of web pages storage according to embodiments of the present invention;
Fig. 3 is the schematic flow sheet of snapshot contrast according to embodiments of the present invention;
Fig. 4 a��Fig. 4 d respectively difference according to embodiments of the present invention represents result figure;
Fig. 5 is the building-block of logic of Web evolution supervising device according to embodiments of the present invention;
Fig. 6 is the building-block of logic of one detailed description of the invention of Web evolution supervising device according to embodiments of the present invention;
Fig. 7 is the building-block of logic of device end according to embodiments of the present invention.
Label identical in all of the figs indicates similar or corresponding feature or function.
Detailed description of the invention
In the following description, for purposes of illustration, in order to provide the comprehensive understanding to one or more embodiments, many details are elaborated. It may be evident, however, that these embodiments can also be realized when not having these details. In other example, for the ease of describing one or more embodiment, known structure and equipment illustrate in block form an.
For the change of same webpage, existing webpage control methods is based on what the contrast of the picture pixels after page screenshot realized, and its rate of false alarm is high. For this problem, the data structure records of the page data of webpage is specific data structure by the present invention, which mark page data by the difference between contrast specific data structure to be modified, the page data of amendment is the content of Web evolution, it is possible to reduce the rate of false alarm of webpage contrast.
Wherein, page data is exactly web page element, namely refers to the element of composition web page contents, and web page element includes, word, picture, audio frequency, animation, video, word etc.
Below with reference to accompanying drawing, specific embodiments of the invention are described in detail.
Fig. 1 illustrates the flow process of Web evolution monitoring method according to embodiments of the present invention.
As it is shown in figure 1, the Web evolution monitoring method that the embodiment of the present invention provides, including:
Step S110: record respectively same webpage not in the same time load after page data, and to same webpage not in the same time load after the page carry out sectional drawing preservation; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure.
Wherein, same webpage refers to the webpage of same URL, page data refers to web page element, the data structure of web page element is DOM structure (DocumentObjectModel, document dbject model), page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure, namely the DOM structure of web page element is recorded as specific data structure, and the DOM structure of web page element is recorded as the flow process of specific data structure and the flow process order in no particular order of page screenshot.
Here, record moment of the page data of webpage and to carry out moment of page screenshot be moment one to one. Such as, record the page data of webpage respectively in the first moment and the second moment, this webpage webpage in the first moment and the second moment is carried out sectional drawing preservation respectively simultaneously.
It addition, web page element includes element pattern, element property information, element content, element tags and element occupy-place information.
Owing to the element data amount of DOM structure is big, when carrying out element contrast, amount of calculation is excessively huge, therefore the DOM structure of web page element is recorded as specific data structure by the present invention, to reduce amount of calculation during element contrast, specific data structure in the embodiment of the present invention is JSON structure (JavascriptObjectNotation, the data interchange format of lightweight) but it also may the DOM structure of web page element is recorded as other specific data structure.
Owing to the element of JSON structure cannot store in a hard disk, it is thus desirable to the element sequence of JSON structure is turned to the form that hard disk can store, storage is in a hard disk, the DOM structure of web page element is recorded as JSON structure and serializes the process of storage and be called that snapshots of web pages stores by the embodiment of the present invention, storage element in a hard disk is snapshot data, and its content includes the cryptographic Hash of element pattern, element property information, element content, element tags and element occupy-place information.
Step S120: by contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time.
The specific data structure that contrast does not record in the same time, it is exactly find the part differed between the web page element of not JSON structure in the same time, namely contrast snapshot data not in the same time, so that it is determined that the difference gone out between same webpage page data after not loading in the same time.
Owing to the snapshot data stored in a hard disk cannot be carried out contrast, so before difference between contrast not snapshot data in the same time, needing not snapshot data in the same time is deserialized as specific data structure, the process of contrast not snapshot data in the same time is called that snapshot contrasts by the embodiment of the present invention.
Difference between snapshot data does not include newly-increased element, deletes element in the same time, pattern amendment and content of text change, above-mentioned four kinds of changes represent not in the same time in same webpage element between difference, be respectively as follows:
The newly-increased same webpage of element representation adds an element not comparing in the same time;
The deletion same webpage of element representation is not compared in the same time and is deleted an element;
Pattern amendment represents that same webpage does not increase or delete element not comparing in the same time, but element pattern there occurs change;
Content of text change represent same webpage not in the same time in the content of text of only element there occurs change.
Step S130: this difference is marked at respectively not on page screenshot in the same time.
By after the web data Structure Comparison that do not record in the same time, it can be deduced that this webpage is in not difference in the same time. Described page screenshot is for showing described difference intuitively. Specifically, it is possible on described page screenshot, mark out the type of difference and position that described difference occurs on the page.
For the ease of contrast not page screenshot in the same time, not page screenshot in the same time is stitched together, again the difference between not page data in the same time is marked on the page screenshot being stitched together, namely by the portion markings that do not differ between element in same webpage in the same time on the page screenshot being stitched together, the mode of labelling is varied, specifically, it is marked on the page screenshot being stitched together according to the type different colours of difference. The position of labelling is difference corresponding position occurred on page screenshot. Such as, the page and the page in the first moment in the second moment are compared, and add a web page element, then carry out labelling with the colored position increasing element that is marked on page screenshot corresponding to the second moment.
On page screenshot, marked difference namely represents difference content on page screenshot, and the embodiment of the present invention is called that difference represents.
The data processing step that the Web evolution monitoring method that above-mentioned steps provides for realizing the embodiment of the present invention is taked, wherein, main details of the invention process is in that when difference is represented by the storage of element information snapshot, snapshot, separately below these three aspect is described in detail.
One, snapshots of web pages storage
Fig. 2 illustrates the flow process of snapshots of web pages storage according to embodiments of the present invention, as in figure 2 it is shown, the flow process that the snapshots of web pages that the embodiment of the present invention provides stores comprises the following steps:
Step S210: utility command row browser access webpage.
Owing to needing at the element accessed while webpage in operation webpage, therefore, utility command row browser of the present invention browses webpage, by to order line browser injection script control command row browser access webpage, the embodiment of the present invention preferably employs phantomjs browser but it also may adopt other order line browser.
Step S220: to order line browser injection script.
After webpage has loaded, to order line browser injection script, for operating the element in webpage.
Step S230: access the DOM node specified, record element property information, element pattern, element tags, element content and element occupy-place information.
Element property information includes element property, element property values (the html attribute of element, such as id, class etc.) and element property name;
Element pattern includes background colour, frame, projection etc.;
Element occupy-place information includes the X-coordinate of element, Y coordinate, width and height;
Element tags is html bookmark name, such as body, div, h1, h2 etc.;
The set of element content and daughter element.
Specify node be in the page all of DOM node or all of DOM node is filtered process after DOM node, owing to the element of DOM node is eventually presented on the page, therefore, think that the content on the shielding page can be passed through to filter this DOM node and realize, specified to access which DOM node by Script controlling order line browser, the DOM node of order line browser access is the DOM node specified, and the DOM node that order line browser does not access is the DOM node filtered out.
Step S240: element pattern is spliced into a character string, according to MD5 algorithm, this character string is sought cryptographic Hash.
Owing to the page has substantial amounts of element, if too big for each element information complete documentation amount of storage of getting off, so the present invention is when storage element information, element pattern in element information is spliced into a character string, then with md5 algorithm (that is: Message Digest 5), this character string asked summary info (namely seeking cryptographic Hash), obtain the character string of 32 bytes, this character string just can be stored in JSON structure, so can save memory space, if element information there occurs change, so inevitably result in this character string to change, just in the process of contrast, change can be there occurs by labelled element pattern.
Step S250: be stored as JSON structure after the label of described element, occupy-place information, attribute and property value and described cryptographic Hash being serialized.
It is exactly that element information is existed for JSON structure.
Step S260: judge whether element has daughter element; If it has, perform step S230; If it did not, perform step S270.
If unit have daughter element, then access the DOM node of daughter element, record element property information, element pattern, element tags and element occupy-place information, element pattern is spliced into a character string, according to MD5 algorithm, this character string asked summary info, then element property information, cryptographic Hash, element tags and element occupy-place information sequence are stored as JSON structure.
Step S270: the data of JSON data are stored in file system.
After completing all node traverses, storing in file system by the element information of the JSON structure of acquisition, the element information in file system is JSON structure.
It addition, file system refers to the file system of operating system of user.
Page screenshot can be accomplished in several ways, and the present invention does not elaborate.
Above-mentioned steps S210��S260 is the data processing step taked implementing snapshots of web pages storage, web page element can be carried out snapshot storage, realize monitoring, contrast the function of the historical variations of same webpage, it is also possible to realize the function of given content on shielding web page.
Two, snapshot contrast
After snapshots of web pages storage, when web page contents changes, need webpage not content in the same time is contrasted, namely contrast snapshots of web pages, Fig. 3 illustrates the flow process of snapshot contrast according to embodiments of the present invention, as it is shown on figure 3, the flow process that the snapshot that the embodiment of the present invention provides contrasts comprises the following steps:
S310: input two historical time points, reads two groups of snapshot datas according to two historical time points.
Below will using t1 moment and t2 moment as historical time point, the difference content between snapshot data and the snapshot data in t2 moment in contrast t1 moment, wherein, the t1 moment, the t2 moment was nearer according to current time according to current time farther out.
Read the element information that snapshot data is exactly JSON structure, owing to the element information of JSON structure is to be stored as snapshot data through serializing, so before the character string reading JSON structure, it is necessary to unserializing snapshot data, obtain the element information of JSON structure, afterwards read operation.
S320: judge that whether two element patterns are consistent, if unanimously, perform step S340; If inconsistent, perform step S330.
First determine whether that whether the element pattern in element information is identical, namely the 32 byte character strings obtained, if the character string of two JSON structures is the same, illustrate that element pattern does not make an amendment, if the character string of two JSON structures is different, illustrate that element pattern is modified.
Step S330: record element pattern amendment difference.
S340: judge whether two elements have daughter element, if it has, perform step S350; If it did not, perform step S370.
S350: utilize LCS algorithm to obtain the longest common subsequence that in the daughter element of two elements, label is consistent with element property.
LCS algorithm and longest common subsequence algorithm (LongestCommonSubsequence), it is prior art, and the present invention does not explain.
The longest common subsequence that in the daughter element of two elements, label is consistent with element property is the unchanged daughter element set that t1 moment and two snapshot datas of t2 moment are total, is namely unchanged part in t1 moment and t2 moment page screenshot.
Calculate two daughter element acceptance of the bid label longest common subsequences consistent with element property to be contemplated to judge whether webpage is deleting some elements or increasing some elements newly or element is modified.
Step S360: sign the longest common subsequence consistent with element property according to two daughter element acceptances of the bid and mark the difference content of two daughter elements.
If the daughter element in longest common subsequence is text children, then judging whether content of text changes, if changed, illustrating that content of text changes, without change, then illustrate that content of text does not change; If the daughter element in longest common subsequence is other daughter elements, then the daughter element of the subsequence of not common subsequence in the snapshot data in t1 moment is labeled as deletion element, the daughter element of the subsequence of not common subsequence in the snapshot data in t2 moment is labeled as newly-increased element.
Element amendment includes the amendment of element content and the amendment of element pattern, and element content is exactly the character string in text children.
The change one of one webpage is divided into three kinds, each classification correspondence one situation, say, that the content viewable of a webpage is if it occur that change, and one fixes in these three situation:
1. delete some element, corresponding to deleting element;
2. increased some element newly, corresponding to newly-increased element;
3. in the element that no deletion is not newly-increased yet, some there occurs change: the change of element content or the change of element pattern.
In three kinds, element is newly-increased, delete element and element amendment (includes the amendment of content and pattern, it is possible to occur simultaneously, but also without permanent order) it is the classification of mutual exclusion, if it is to say, an element is newly-increased, then it is inevitable is not delete or amendment; If the element deleted, inevitable is not newly-increased or amendment yet, and the rubidium marking of deletion is in the page screenshot in t1 moment, and newly-increased rubidium marking is in the page screenshot in t2 moment, and the rubidium marking of amendment is in the page screenshot in t2 moment.
Step S370: the element set of output all differences.
Return the set of the element all having amendment.
Three, difference represents
The structure on web-page histories time point can be obtained by the storage of above-mentioned snapshots of web pages and snapshot contrast stage, the difference of pattern and content, owing to the snapshots of web pages data of storage have recorded the occupy-place information (coordinate of all elements, wide high), and have recorded page screenshot at that time, therefore the page screenshot of two time points can be spliced, and by three species diversity (the newly-increased element of Web evolution, delete element, amendment element) it is marked on spliced sectional drawing, specifically, three species diversity can be marked on sectional drawing by different colors, three species diversity can also be marked otherwise on sectional drawing.
Fig. 4 a��Fig. 4 d illustrates that for the result that difference according to embodiments of the present invention represents, the left side in figure is equivalent to the page screenshot in t1 moment, the page screenshot being comparable to the t2 moment on right side; As shown in fig. 4 a, the difference for the newly-increased element of webpage represents result, and in figure, left side Green Marker goes out the element that webpage is newly-increased, and newly-increased element is " iamnewhere "; As shown in Figure 4 b, the difference deleting element for webpage represents result, the element that in figure, right side is deleted with Lycoperdon polymorphum Vitt marking of web pages, and the element of deletion is " hello "; As illustrated in fig. 4 c, the difference for homepage modification element pattern represents result, and in figure, namely the element pattern of right side red-label homepage modification have modified font size and the color of helloworld; As shown in figure 4d, the difference for homepage modification element text represents result, and in figure, the right side element text of yellow flag homepage modification, was " Baidu " and " Sina " originally, becomes " hundred-Du " and " new-wave " after amendment.
Foregoing describes the Web evolution monitoring method that the embodiment of the present invention provides in detail, to not in the same time carry out snapshots of web pages storage time, a page screenshot picture file (png or jpeg) and a JSON file (data structure of record element information) can be obtained, both of these documents is stored on computer hard disc, when contrasting snapshot, the JSON file of not preservation in the same time is done Data Comparison, is just marked on two sectional drawings if it find that variant.
Monitoring method with above-mentioned Web evolution corresponding, the present invention provides a kind of Web evolution supervising device. Fig. 5 illustrates the logical structure of Web evolution supervising device according to embodiments of the present invention.
As it is shown in figure 5, the Web evolution supervising device 500 that the embodiment of the present invention provides, including page screenshot unit 510, page data record unit 520, Difference determining unit 530, difference indexing unit 540.
Wherein, page screenshot unit 510 carries out sectional drawing preservation for the page after same webpage is not loaded in the same time.
Page data record unit 520 is for recording same webpage page data after not loading in the same time respectively; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure.
Difference determining unit 530 is for contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time.
Difference indexing unit 540 for being marked at not page screenshot in the same time respectively by this difference.
Fig. 6 illustrates the logical structure of a detailed description of the invention of Web evolution supervising device according to embodiments of the present invention. As shown in Figure 6, page data record unit 520 includes DOM node access modules 521, element information acquisition module 522, element information concatenation module 523 and element information memory module 524.
Wherein, DOM node access modules 521 is for accessing the DOM node specified in the webpage not loaded in the same time; Element information logging modle 522 is used for the element pattern, element property information, element content and the element tags that record in DOM node; Character string, for the element pattern of record is spliced into character string, is sought cryptographic Hash by element pattern concatenation module 523; Element information memory module 524 is for being stored as specific data structure after the label of element, occupy-place information, attribute and property value and cryptographic Hash being serialized.
It addition, DOM node access modules access appointment node be in the page all of DOM node or all of DOM node is filtered process after DOM node.
It addition, Difference determining unit 530 contrasts the specific data structure not recorded in the same time according to LCS algorithm, it is determined that the difference between same webpage page data after not loading in the same time.
Furthermore, difference is marked at not on page screenshot in the same time by difference indexing unit 540 according to the type of difference and difference position on the page.
The present invention correspondingly provides a kind of device end, and referring to Fig. 7, this device end 700 includes the file system 710 for memory page sectional drawing and snapshot data and Web evolution supervising device 500, and this device includes:
Page screenshot unit, carries out sectional drawing preservation for the page after same webpage is not loaded in the same time;
Page data record unit, for recording same webpage page data after not loading in the same time respectively; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure;
Difference determining unit, for contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time;
Difference indexing unit, for being marked at not page screenshot in the same time respectively by this difference.
Web evolution supervising device has the structure described in Fig. 6, referring specifically to previously mentioned, repeats no more herein.
Above content describes Web evolution provided by the invention monitoring method and device in detail, by the web page contents of different time points is carried out sectional drawing, and the element information of the webpage of different time points is recorded as specific data structure, record by the snapshot data in the different time points of same webpage, and the snapshot data on any two time point is contrasted, find out the part of difference, by the portion markings of two snapshot data differences on the sectional drawing of two time points, it is possible to monitor, contrast the change of same webpage exactly.
The above; being only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, any those familiar with the art is in the technical scope that the invention discloses; change can be readily occurred in or replace, all should be encompassed within protection scope of the present invention. Therefore, protection scope of the present invention should described be as the criterion with scope of the claims.

Claims (10)

1. a Web evolution monitoring method, including:
Record respectively same webpage not in the same time load after page data, and to same webpage not in the same time load after the page carry out sectional drawing preservation; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure;
By contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time;
Described difference is marked at respectively not on page screenshot in the same time.
2. Web evolution monitoring method as claimed in claim 1, wherein,
Page data after not loaded in the same time by same webpage is recorded as in the process of correspondingly specific data structure,
Access the DOM node specified in the page after not loading in the same time and child node thereof, record the label of element in each node, occupy-place information, element pattern, attribute and property value, described element pattern is spliced into character string and described character string is sought cryptographic Hash;
It is stored as specific data structure after the label of described element, occupy-place information, attribute and property value and described cryptographic Hash being serialized.
3. Web evolution monitoring method as claimed in claim 2, wherein, described appointment node is all of DOM node or all of DOM node is filtered the DOM node after processing in the described page.
4. Web evolution monitoring method as claimed in claim 1, wherein, is passing through the specific data structure that contrast does not record in the same time, it is determined that in the process of the difference between same webpage page data after not loading in the same time,
The specific data structure not recorded in the same time is contrasted, it is determined that the difference between same webpage page data after not loading in the same time according to LCS algorithm.
5. Web evolution monitoring method as claimed in claim 1, wherein, in the process described difference being marked at respectively on not page screenshot in the same time,
Described difference is marked at not on page screenshot in the same time by type according to described difference and described difference position on the page.
6. a Web evolution supervising device, including:
Page data record unit, for recording same webpage page data after not loading in the same time respectively; Wherein, the page data after not loaded in the same time by same webpage is recorded as correspondingly specific data structure;
Page screenshot unit, carries out sectional drawing preservation for the page after same webpage is not loaded in the same time;
Difference determining unit, for contrasting the specific data structure not recorded in the same time, it is determined that the difference between same webpage page data after not loading in the same time;
Difference indexing unit, for being marked at not page screenshot in the same time respectively by described difference.
7. Web evolution supervising device as claimed in claim 6, wherein,
Described page data record unit includes:
DOM node access modules, is used for accessing the DOM node specified in the page after not loading in the same time and child node thereof;
Element information logging modle, is used for the element pattern, element property information, element content and the element tags that record in the DOM node specified;
Element pattern concatenation module, for element pattern is spliced into character string, and seeks cryptographic Hash to described character string;
Element information memory module, for being stored as specific data structure after the label of described element, occupy-place information, attribute and property value and described cryptographic Hash being serialized.
8. Web evolution supervising device as claimed in claim 7, wherein, the appointment node accessed in described DOM node access modules is all of DOM node or all of DOM node is filtered the DOM node after processing in the described page.
9. Web evolution supervising device as claimed in claim 6, wherein, described Difference determining unit contrasts the specific data structure not recorded in the same time according to LCS algorithm, it is determined that the difference between same webpage page data after not loading in the same time.
10. Web evolution supervising device as claimed in claim 6, wherein, described difference is marked at not on page screenshot in the same time by described difference indexing unit according to the type of described difference and described difference position on the page.
CN201410652444.3A 2014-11-17 2014-11-17 Web evolution monitoring method and device Active CN105630843B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410652444.3A CN105630843B (en) 2014-11-17 2014-11-17 Web evolution monitoring method and device
PCT/CN2015/090969 WO2016078479A1 (en) 2014-11-17 2015-09-28 Method and device for monitoring web page changes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410652444.3A CN105630843B (en) 2014-11-17 2014-11-17 Web evolution monitoring method and device

Publications (2)

Publication Number Publication Date
CN105630843A true CN105630843A (en) 2016-06-01
CN105630843B CN105630843B (en) 2019-04-12

Family

ID=56013260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410652444.3A Active CN105630843B (en) 2014-11-17 2014-11-17 Web evolution monitoring method and device

Country Status (2)

Country Link
CN (1) CN105630843B (en)
WO (1) WO2016078479A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446118A (en) * 2016-09-19 2017-02-22 中国南方电网有限责任公司信息中心 Method for automatically generating page change template
CN106960058A (en) * 2017-04-05 2017-07-18 金电联行(北京)信息技术有限公司 A kind of structure of web page alteration detection method and system
CN107870914A (en) * 2016-09-23 2018-04-03 北京京东尚科信息技术有限公司 A kind of method and apparatus for preventing that the page is tampered
CN108073828A (en) * 2016-11-16 2018-05-25 阿里巴巴集团控股有限公司 A kind of webpage integrity assurance, apparatus and system
CN108335164A (en) * 2017-01-20 2018-07-27 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment for realizing shopping at network
CN108595304A (en) * 2018-04-19 2018-09-28 腾讯科技(深圳)有限公司 Web monitor method and device
CN108880921A (en) * 2017-05-11 2018-11-23 腾讯科技(北京)有限公司 Webpage monitoring method
CN109299352A (en) * 2018-11-14 2019-02-01 百度在线网络技术(北京)有限公司 The update method of website data, device and search engine in search engine
CN109408780A (en) * 2018-09-07 2019-03-01 山东中磁视讯股份有限公司 A kind of method that Excel file is converted to JSON file
CN109582885A (en) * 2018-10-31 2019-04-05 阿里巴巴集团控股有限公司 It is a kind of that the method and device that block chain deposits card is carried out to webpage by webpage monitoring
CN109978626A (en) * 2019-03-29 2019-07-05 上海幻电信息科技有限公司 Web advertisement change monitoring method, apparatus and storage medium
CN110046072A (en) * 2019-03-13 2019-07-23 平安城市建设科技(深圳)有限公司 Monitoring method, device, terminal and the readable storage medium storing program for executing of the page
CN110865843A (en) * 2018-08-09 2020-03-06 阿里巴巴集团控股有限公司 Page backtracking, information backup and problem solving method, system and equipment
CN111443969A (en) * 2020-03-24 2020-07-24 深圳前海微众银行股份有限公司 Method and device for recording webpage
CN111581672A (en) * 2020-05-14 2020-08-25 杭州安恒信息技术股份有限公司 Method, system, computer device and readable storage medium for webpage tampering detection
CN112035315A (en) * 2020-07-31 2020-12-04 重庆锐云科技有限公司 Webpage data monitoring method and device, computer equipment and storage medium
CN112307384A (en) * 2020-10-21 2021-02-02 深圳市欢太科技有限公司 Page snapshot display method and device, electronic equipment and storage medium
CN113987318A (en) * 2021-11-01 2022-01-28 盐城金堤科技有限公司 Page monitoring method, device, equipment and computer storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309461B (en) * 2019-07-04 2023-10-27 郑州悉知信息科技股份有限公司 Page display method and device
CN110795676A (en) * 2019-10-31 2020-02-14 北京知道创宇信息技术股份有限公司 Website monitoring method and device, electronic equipment and storage medium
CN111061633B (en) * 2019-12-05 2024-04-30 北京达佳互联信息技术有限公司 Webpage first screen time detection method, device, terminal and medium
CN111538658A (en) * 2020-04-20 2020-08-14 卓望数码技术(深圳)有限公司 Automatic testing method for interface loading duration
US11561962B2 (en) 2020-07-22 2023-01-24 Content Square SAS System and method for detecting changes in webpages and generating metric correlations therefrom
WO2022018492A1 (en) * 2020-07-22 2022-01-27 Content Square SAS System and method for detecting changes in webpages and generating metric correlations therefrom
CN113778429B (en) * 2020-09-28 2024-10-18 北京沃东天骏信息技术有限公司 Walk-checking method, walk-checking device and storage medium
CN115544969B (en) * 2022-11-29 2023-03-21 明度智云(浙江)科技有限公司 Page comparison method, equipment and medium based on hypertext markup language

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1435782A (en) * 2002-01-31 2003-08-13 百度在线网络技术(北京)有限公司 Method for recording and analysis of information over network by snap shot mode
CN101207524A (en) * 2006-12-22 2008-06-25 上海亿动信息技术有限公司 Method and system for supervising broadcast of web advertisement
CN101782914A (en) * 2009-06-23 2010-07-21 北京搜狗科技发展有限公司 Method and system for prompting web page information
CN103246678A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for previewing web page contents
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system
CN103885960A (en) * 2012-12-20 2014-06-25 上海明想电子科技有限公司 Method for monitoring webpage change

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8868533B2 (en) * 2006-06-30 2014-10-21 International Business Machines Corporation Method and apparatus for intelligent capture of document object model events

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1435782A (en) * 2002-01-31 2003-08-13 百度在线网络技术(北京)有限公司 Method for recording and analysis of information over network by snap shot mode
CN101207524A (en) * 2006-12-22 2008-06-25 上海亿动信息技术有限公司 Method and system for supervising broadcast of web advertisement
CN101782914A (en) * 2009-06-23 2010-07-21 北京搜狗科技发展有限公司 Method and system for prompting web page information
CN103246678A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for previewing web page contents
CN103885960A (en) * 2012-12-20 2014-06-25 上海明想电子科技有限公司 Method for monitoring webpage change
CN103544213A (en) * 2013-09-16 2014-01-29 青岛英网资讯股份有限公司 Network content upgrading detection assessment method and system

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446118A (en) * 2016-09-19 2017-02-22 中国南方电网有限责任公司信息中心 Method for automatically generating page change template
CN107870914A (en) * 2016-09-23 2018-04-03 北京京东尚科信息技术有限公司 A kind of method and apparatus for preventing that the page is tampered
CN107870914B (en) * 2016-09-23 2020-07-31 北京京东尚科信息技术有限公司 Method and device for preventing page from being tampered
CN108073828A (en) * 2016-11-16 2018-05-25 阿里巴巴集团控股有限公司 A kind of webpage integrity assurance, apparatus and system
CN108335164A (en) * 2017-01-20 2018-07-27 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment for realizing shopping at network
CN106960058A (en) * 2017-04-05 2017-07-18 金电联行(北京)信息技术有限公司 A kind of structure of web page alteration detection method and system
CN106960058B (en) * 2017-04-05 2021-01-12 金电联行(北京)信息技术有限公司 Webpage structure change detection method and system
CN108880921A (en) * 2017-05-11 2018-11-23 腾讯科技(北京)有限公司 Webpage monitoring method
CN108595304B (en) * 2018-04-19 2022-12-27 腾讯科技(深圳)有限公司 Webpage monitoring method and device
CN108595304A (en) * 2018-04-19 2018-09-28 腾讯科技(深圳)有限公司 Web monitor method and device
CN110865843A (en) * 2018-08-09 2020-03-06 阿里巴巴集团控股有限公司 Page backtracking, information backup and problem solving method, system and equipment
CN110865843B (en) * 2018-08-09 2024-03-26 阿里巴巴集团控股有限公司 Page backtracking, information backup and problem solving method, system and equipment
CN109408780A (en) * 2018-09-07 2019-03-01 山东中磁视讯股份有限公司 A kind of method that Excel file is converted to JSON file
CN111898047A (en) * 2018-10-31 2020-11-06 创新先进技术有限公司 Method and device for carrying out block link evidence storage on webpage through webpage monitoring
CN111898047B (en) * 2018-10-31 2024-03-29 创新先进技术有限公司 Method and device for conducting blockchain certification on webpage through webpage monitoring
CN109582885A (en) * 2018-10-31 2019-04-05 阿里巴巴集团控股有限公司 It is a kind of that the method and device that block chain deposits card is carried out to webpage by webpage monitoring
TWI705342B (en) * 2018-10-31 2020-09-21 香港商阿里巴巴集團服務有限公司 Method and device for performing blockchain certificate deposit on webpage through webpage monitoring
CN109299352A (en) * 2018-11-14 2019-02-01 百度在线网络技术(北京)有限公司 The update method of website data, device and search engine in search engine
CN109299352B (en) * 2018-11-14 2022-02-01 百度在线网络技术(北京)有限公司 Method and device for updating website data in search engine and search engine
CN110046072A (en) * 2019-03-13 2019-07-23 平安城市建设科技(深圳)有限公司 Monitoring method, device, terminal and the readable storage medium storing program for executing of the page
CN109978626A (en) * 2019-03-29 2019-07-05 上海幻电信息科技有限公司 Web advertisement change monitoring method, apparatus and storage medium
CN111443969A (en) * 2020-03-24 2020-07-24 深圳前海微众银行股份有限公司 Method and device for recording webpage
CN111581672A (en) * 2020-05-14 2020-08-25 杭州安恒信息技术股份有限公司 Method, system, computer device and readable storage medium for webpage tampering detection
CN112035315A (en) * 2020-07-31 2020-12-04 重庆锐云科技有限公司 Webpage data monitoring method and device, computer equipment and storage medium
CN112307384A (en) * 2020-10-21 2021-02-02 深圳市欢太科技有限公司 Page snapshot display method and device, electronic equipment and storage medium
CN112307384B (en) * 2020-10-21 2024-05-03 深圳市欢太科技有限公司 Page snapshot display method and device, electronic equipment and storage medium
CN113987318A (en) * 2021-11-01 2022-01-28 盐城金堤科技有限公司 Page monitoring method, device, equipment and computer storage medium
CN113987318B (en) * 2021-11-01 2024-03-12 盐城天眼察微科技有限公司 Page monitoring method, device, equipment and computer storage medium

Also Published As

Publication number Publication date
WO2016078479A1 (en) 2016-05-26
CN105630843B (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN105630843A (en) Webpage change monitoring method and device
US11481540B2 (en) Discrepancy resolution processor and methods for implementing the same
US11907203B2 (en) Path encoded tree structures for operations
CN104252410A (en) Method and equipment for testing control in page
CN106126779B (en) Applied to the configuration multiplexing method in SVG picture configuration
CN109189686A (en) Automation regression testing method, apparatus, storage medium and computer equipment
US20150089415A1 (en) Method of processing big data, apparatus performing the same and storage media storing the same
CN103365877B (en) Method and server to establishing catalogue after webpage progress transcoding
US20150074519A1 (en) Method and apparatus of controlling page element
CN107622080A (en) A kind of data processing method and equipment
CN103577477A (en) Method and system for displaying browsing history of browser
CN109558548B (en) Method for eliminating CSS style redundancy and related product
Kiesel et al. WASP: web archiving and search personalized
US10331800B2 (en) Search results modulator
TWI744216B (en) Method and device for providing prompt information
CN106126084A (en) A kind of display packing for electricity paper ink screen
CN112068828A (en) Title control generation method, device, system, equipment and medium
US10788958B2 (en) Personalization of a web application
WO2016201814A1 (en) Field displaying method and device
CN109558549A (en) A kind of method and Related product for eliminating CSS style redundancy
CN116662143A (en) Test method, apparatus, device, storage medium and computer program product
CN105653144A (en) Webpage-based hand input method and editor
Xin et al. Screen Recognition: Creating Accessibility Metadata for Mobile Applications using View Type Detection
CN110008281A (en) Method and device for processing visualized data based on Redis database
CN104462247A (en) Webpage loading method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200526

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping B radio 14 floor tower square

Patentee before: GUANGZHOU UCWEB COMPUTER TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right