CN110046295A - Structure of web page alteration detection method, apparatus and computer readable storage medium - Google Patents

Structure of web page alteration detection method, apparatus and computer readable storage medium Download PDF

Info

Publication number
CN110046295A
CN110046295A CN201910185344.7A CN201910185344A CN110046295A CN 110046295 A CN110046295 A CN 110046295A CN 201910185344 A CN201910185344 A CN 201910185344A CN 110046295 A CN110046295 A CN 110046295A
Authority
CN
China
Prior art keywords
data
web page
web
check value
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910185344.7A
Other languages
Chinese (zh)
Inventor
檀传华
冉梦龙
孟文斌
李祖光
陈锦韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Financial Assets Exchange LLC
Original Assignee
Chongqing Financial Assets Exchange LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Financial Assets Exchange LLC filed Critical Chongqing Financial Assets Exchange LLC
Priority to CN201910185344.7A priority Critical patent/CN110046295A/en
Publication of CN110046295A publication Critical patent/CN110046295A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Abstract

The present invention relates to UI design fields, disclose a kind of structure of web page alteration detection method, this method comprises: being layered in the way of layered configuration to the structure of web page of targeted website, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;Using data from the sample survey comparison method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before once for the last time web data extracted at same position, carry out data comparison;According to the comparison result of described this web data and last time web data, judge whether the structure of web page changes.The present invention also proposes a kind of structure of web page modification detection device and a kind of computer readable storage medium.The present invention realizes a kind of whether changed structure of web page change active probing technique of the method active detecting structure of web page compared using data from the sample survey.

Description

Structure of web page alteration detection method, apparatus and computer readable storage medium
Technical field
The present invention relates to field of computer technology more particularly to a kind of structure of web page alteration detection method, apparatus and calculating Machine readable storage medium storing program for executing.
Background technique
With the rapid development of Internet technology, the mode that people obtain information using webpage is also popularized substantially. The layout of web page contents will have a direct impact on the user experience and correlation of the Webpage, and influence website to a certain extent Overall structure and the quantity that is included of the page;And structure of web page is actually to navigation bar, column and body matter The tissue and layout that this three big basic component of the page is carried out.
Under normal conditions, structure of web page can be all adjusted according to the content of webpage, and different web page contents determine different Structure of web page;When targeted website, corresponding web page contents change, structure of web page generally also can be and then adjusted. If the structure of web page of targeted website is changed, data grabber system (i.e. crawler system) at runtime, will be unable to grab It is directly abnormal to correct data or data grabber system, is had occurred that passively perceive structure of web page Change.This feeling of passivity Hownet page structure takes appropriate measures again after changing, it will usually delay longer time.Cause This, how whether active detecting structure of web page changes to take corresponding counter-measure in advance, becomes and urgently solves at present One of certainly the problem of.
Summary of the invention
The present invention provides a kind of structure of web page alteration detection method, apparatus and computer readable storage medium, it is intended to use Whether the method active detecting structure of web page that data from the sample survey compares changes.
To achieve the above object, the present invention provides a kind of structure of web page alteration detection methods, this method comprises:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs Variation.
Optionally, described in the way of layered configuration, the structure of web page of targeted website is layered, and for layering Every layer of obtained structure of web page is configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer Structure of web page configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
Optionally, described according to predetermined period, the web data after extracting layered configuration, and to the webpage number of extraction According to progress data processing, comprising:
The webpage sample that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period The corresponding web page fragments content of example;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, is obtained described after data processing The corresponding check value of web page fragments content.
Optionally, described to utilize data from the sample survey comparison method, by this extracted web data after data processing, With before extracting this web data, it is adjacent before once for the last time web data extracted at same position, carry out data Comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, And record every group of data recognition result.
Optionally, the comparison result of described this web data and last time web data according to, judges the webpage Whether structure changes, comprising:
If the check value and the last time webpage number of corresponding this web data of one or more web page fragments According to check value it is consistent, then judge the structure of web page there is no variation;
If the verification of the check value and the last time web data of this web data at the n position extracted Value, it is all inconsistent, then judge that structure of web page is changed.
In addition, to achieve the above object, the present invention also provides a kind of structure of web page modification detection device, described device includes Memory and processor are stored with the structure of web page alteration detection program that can be run on the processor on the memory, The structure of web page alteration detection program realizes following steps when being executed by the processor:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs Variation.
Optionally, the structure of web page alteration detection program can also be executed by the processor, according to layered configuration Mode, the structure of web page of targeted website is layered, and is matched accordingly for the obtained every layer of structure of web page of layering It sets, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer Structure of web page configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
Optionally, the structure of web page alteration detection program can also be executed by the processor, with according to predetermined period, Web data after extracting layered configuration, and data processing is carried out to the web data of extraction, comprising:
The webpage sample that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period The corresponding web page fragments content of example;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, is obtained described after data processing The corresponding check value of web page fragments content.
Optionally, the structure of web page alteration detection program can also be executed by the processor, to utilize data from the sample survey Comparison method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before The primary last time web data for extraction at same position, carries out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, And record every group of data recognition result.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Structure of web page alteration detection program is stored on storage medium, the structure of web page alteration detection program can be by one or more Processor executes, the step of to realize structure of web page alteration detection method as described above.
Structure of web page alteration detection method, apparatus proposed by the present invention and computer readable storage medium, match according to layering The mode set is layered the structure of web page of targeted website, and the every layer of structure of web page obtained for layering carries out accordingly Configuration;According to predetermined period, web data after extracting layered configuration, and the web data of extraction is carried out at data Reason;Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage number of extraction According to before, it is adjacent before once for the last time web data extracted at same position, carry out data comparison;According to it is described this The comparison result of web data and last time web data, judges whether the structure of web page changes;Reach using sampling The whether changed purpose of method active detecting structure of web page that data compare, can find to cope in time as early as possible, realize Quick inspection to structure of web page change, use scope is wide and accuracy rate is high.
Detailed description of the invention
Fig. 1 is the flow diagram for the structure of web page alteration detection method that one embodiment of the invention provides;
Fig. 2 is the schematic diagram of internal structure for the structure of web page modification detection device that one embodiment of the invention provides;
Structure of web page alteration detection program in the structure of web page modification detection device that Fig. 3 provides for one embodiment of the invention Module diagram.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of structure of web page alteration detection method.As shown in FIG. 1, FIG. 1 is one embodiment of the invention offers Structure of web page alteration detection method flow diagram.This method can be executed by a device, which can be by software And/or hardware realization.
In the present embodiment, structure of web page alteration detection method may be embodied as the step S10-S40 of Fig. 1 description:
Step S10 is layered the structure of web page of targeted website in the way of layered configuration, and is directed to and is layered To every layer of structure of web page configured accordingly.
Whether change to structure of web page carry out active detecting when, the structure of web page of targeted website to be detected is carried out Layering, for example, being divided into two layers to the corresponding structure of web page in one of targeted website;And in the way of layered configuration, to point The every layer of structure of web page obtained after layer is configured accordingly.For example, for every layer of structure of web page after every layer, it is specific according to it Structure of web page content, the obtained each layer of structure of web page of layering is pointedly configured respectively;For example, after for layering A certain layer structure of web page configure path language to be detected etc..
Specific layering quantity and classification foundation can according to the corresponding particular content in targeted website, specific structure of web page with And detection demand determines, the embodiment of the present invention to the layering quantity and classification foundation of the structure of web page of targeted website to be detected not It is defined;When being configured to the structure of web page after layering, can be obtained according to specific layering quantity and after being layered The corresponding content of structure of web page configure, the embodiment of the present invention is to the configuration mode of structure of web page after layering without specifically limiting It is fixed.
Step S20, according to predetermined period, web data after extracting layered configuration, and to the web data of extraction Carry out data processing.
When carrying out data extraction for the web data after layered configuration, the specific duration of predetermined period can be according to mesh The specific degree of dependence of website and webpage structure is marked to determine, higher to the degree of dependence of structure of web page, detection is more frequent, corresponding Predetermined period is arranged shorter.For example, primary frequency is detected according to daily timed task, for targeted website after layered configuration Corresponding web data extracts.
For the ease of accurately judging whether structure of web page changes, in one embodiment, to the webpage number of extraction When according to carrying out data processing, the web data of extraction is converted into the intuitive check value of comparison.
Step S30, using data from the sample survey comparison method, by this extracted web data after data processing, with The preceding primary last time web data for extraction at same position before extracting this web data, adjacent, carries out data pair Than.
Step S40 judges the structure of web page according to the comparison result of described this web data and last time web data Whether change.
When whether the structure of web page for judging targeted website changes, it is determined using data from the sample survey comparison method.Needle To the web data at same position, by this extracted web data after treatment, with this web data of extraction Before, the adjacent preceding last time web data once extracted carries out data comparison.Due in step S20, for the net of extraction Page data obtains corresponding check value after carrying out data processing, therefore, can be by comparing the institute that this is extracted at same position The corresponding check value of this web data is stated, check value corresponding with the last time web data of last fetched carries out data Comparison, by the comparison result of this web data and last time web data, judges whether the structure of web page changes.
Since when progress data compare, the comparison other of extraction is the corresponding web data in targeted website after layering, What is extracted is the web page fragments of targeted website, therefore, when the corresponding check value of this described web data and the last time net When the corresponding check value difference of page data, it may be possible to which the structure of web page of the targeted website is changed, it is also possible to the target The corresponding web page contents in website are changed.The web data at multiple and different positions of extraction, if at each position This described web data and the last time web data are all inconsistent, then judge that structure of web page is changed;If only This described web data and the last time web data at one or more position is inconsistent, then judges to be only the target The data of web page contents are changed in website.
Further, in one embodiment, the monitoring when judging that variation has occurred in structure of web page, to technical staff Client sends prompt information, and so as to reminding technology, personnel determine the need for manpower intervention.
The structure of web page alteration detection method that the present embodiment proposes, in the way of layered configuration, to the net of targeted website Page structure is layered, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, extracts and divide Layer carries out data processing with the web data postponed, and to the web data of extraction;It, will be through using data from the sample survey comparison method This extracted web data after crossing data processing, with extract this web data before, it is adjacent before once be directed to phase With the last time web data extracted at position, data comparison is carried out;According to this described web data and last time web data Comparison result, judges whether the structure of web page changes;The method active detecting net compared using data from the sample survey is reached The whether changed purpose of page structure can find to cope in time as early as possible, realize the quick inspection to structure of web page change, Use scope is wide and accuracy rate is high.
Further, " the step S10, according to layered configuration in an embodiment of the method for the present invention, in Fig. 1 embodiment Mode, the structure of web page of targeted website is layered, and is matched accordingly for the obtained every layer of structure of web page of layering Set " it can implement in the following way:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer Structure of web page configures the address real web pages URL based on the corresponding webpage sample of the module.
For example, directly dividing two layers of configuration when the structure of web page to targeted website detects, first layer is module, for The XML Path Language (XML Path Language, xpath) that the configuration of first layer module needs to detect;For example, by targeted website First layer be divided into 20 modules.The second layer is webpage sample, for example targeted website corresponds to the reality under this 20 modules Webpage URL (Uniform Resource Locator, uniform resource locator) address.
The embodiment of the present invention is layered by the structure of web page to targeted website, and for layering after structure of web page into Row configuration, this processing mode is simply handy, and accuracy rate is high;And the structure of web page after being layered is not limited to existing targeted website The structure feature of itself, strong applicability;It therefore in practical applications, can be right by writing unified structure of web page locator All source websites/source web page realizes detecting function, has a wide range of application.
Further, in an embodiment of the method for the present invention, in Fig. 1 embodiment " step S20, according to predetermined period, Web data after extracting layered configuration, and data processing is carried out to the web data of extraction " can be in the following way Implement:
According to predetermined period, according to the address webpage URL of configuration, webpage sample pair under each module after extraction layering The content for the web page fragments answered;
According to preset algorithm, data processing is carried out to the content of the web page fragments of acquisition, obtains institute after data processing State the corresponding check value of web page fragments.
Preset algorithm described in the embodiment of the present invention includes but is not limited to: MD5;For example, using MD5 algorithm to extraction Web data carries out data processing, obtains the corresponding check value of the web page fragments extracted after data processing.
In embodiments of the present invention, in Fig. 1 embodiment " step S30 will pass through data using data from the sample survey comparison method Treated this extracted web data, with before extracting this web data, it is adjacent before be once directed to same position Locate the last time web data extracted, carry out data comparison " it can implement in the following way:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical; It also will be understood that are as follows: n group check value is identified one by one, such as: whether the M11 and M12 in identification n group check value is identical, until knowing It whether identical is clipped to Mn1 and Mn2, and records every group of data recognition result.
It is first layer structure of web page when the structure of web page to targeted website carries out layered configuration in the embodiment of the present invention In modules be respectively configured and need the XML Path Language that detects therefore, can be true according to the XML Path Language of configuration Corresponding web placement is identical in the target website for this described web data and the last web data extracted for protecting extraction.
When whether the structure of web page for judging targeted website changes, if this school of one or more web page fragments It is consistent with previous check value to test value, then it is assumed that there is no changing for structure of web page;If this school at the n position extracted Value and a preceding check value are tested, it is all inconsistent, then judge that structure of web page is changed.
In a specific application scenarios, for example, the structure of web page alteration detection method described using the present invention is to mesh When the structure of web page of mark website is detected, targeted website is divided two layers first, obtained first layer is module, for first layer In modules, configuration needs each module to be respectively necessary for the Xpath (i.e. XML Path Language) of detection;It is layered the obtained Two layers are the address real web pages URL under webpage sample, such as 20 first layer modules.In the detection of the task of execution, system is every Its timing executes subtask detection, according to the address URL of configuration, fetches above-mentioned 20 webpages;It is fetched further according to Xpath Above-mentioned 20 webpages web page fragments content;According to MD5 algorithm, for the web page fragments content for above-mentioned 20 webpages fetched Md5 is executed to calculate;Due to consideration that if md5 is different it could also be possible that caused by data variation, it is of course also possible to be webpage Caused by structure change;Therefore, when the md5 value of one or more web page fragments is consistent with the md5 value at last time same position, then Structure of web page is thought there is no changing, if the md5 of all web page fragments is all inconsistent, then it is assumed that the targeted website Structure of web page is changed, and when necessary, prompt information is sent to the monitor client of technical staff, so that technical staff determines Whether manpower intervention is needed.
The structure of web page alteration detection method described through the embodiment of the present invention, for targeted website structure of web page whether Changing can find to cope in time as early as possible, realize the quick inspection to structure of web page change, and use scope is wide and accurate Rate is high.
The present invention also provides a kind of structure of web page modification detection devices.Referring to shown in Fig. 2, provided for one embodiment of the invention Structure of web page modification detection device schematic diagram of internal structure.
In the present embodiment, structure of web page modification detection device 1 can be PC (PersonalComputer, personal electricity Brain), it is also possible to the terminal devices such as smart phone, tablet computer, portable computer.The structure of web page modification detection device 1 to It less include memory 11, processor 12, communication bus 13 and network interface 14.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory, Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11 It can be the internal storage unit of structure of web page modification detection device 1, such as structure of web page change inspection in some embodiments Survey the hard disk of device 1.Memory 11 is also possible to the external storage of structure of web page modification detection device 1 in further embodiments The plug-in type hard disk being equipped in equipment, such as structure of web page modification detection device 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, memory 11 may be used also With the internal storage unit both including structure of web page modification detection device 1 or including External memory equipment.Memory 11 not only may be used It is installed on the application software and Various types of data of structure of web page modification detection device 1, such as structure of web page change inspection for storage The code etc. of ranging sequence 01 can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11 Code or processing data, such as execute structure of web page alteration detection program 01 etc..
Communication bus 13 is for realizing the connection communication between these components.
Network interface 14 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in Communication connection is established between the device 1 and other electronic equipments.
Optionally, which can also include user interface, and user interface may include display Device (Display), input unit such as keyboard (Keyboard), optional user interface can also include that the wired of standard connects Mouth, wireless interface.Optionally, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch control type LCD Display and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Wherein, it shows Device appropriate can also be known as display screen or display unit, for being shown in the letter handled in structure of web page modification detection device 1 It ceases and for showing visual user interface.
Fig. 2 illustrates only the structure of web page alteration detection with component 11-14 and structure of web page alteration detection program 01 Device 1, it will be appreciated by persons skilled in the art that structure shown in fig. 1 is not constituted to structure of web page modification detection device 1 Restriction, may include perhaps combining certain components or different component layouts than illustrating less perhaps more components.
In 1 embodiment of device shown in Fig. 2, structure of web page alteration detection program 01 is stored in memory 11;Processing Device 12 realizes following steps when executing the structure of web page alteration detection program 01 stored in memory 11:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains Structure of web page is configured accordingly.
Whether change to structure of web page carry out active detecting when, the structure of web page of targeted website to be detected is carried out Layering, for example, being divided into two layers to the corresponding structure of web page in one of targeted website;And in the way of layered configuration, to point The every layer of structure of web page obtained after layer is configured accordingly.For example, for every layer of structure of web page after every layer, it is specific according to it Structure of web page content, the obtained each layer of structure of web page of layering is pointedly configured respectively;For example, after for layering A certain layer structure of web page configure path language to be detected etc..
Specific layering quantity and classification foundation can according to the corresponding particular content in targeted website, specific structure of web page with And detection demand determines, the embodiment of the present invention to the layering quantity and classification foundation of the structure of web page of targeted website to be detected not It is defined;When being configured to the structure of web page after layering, can be obtained according to specific layering quantity and after being layered The corresponding content of structure of web page configure, the embodiment of the present invention is to the configuration mode of structure of web page after layering without specifically limiting It is fixed.
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction Processing.
When carrying out data extraction for the web data after layered configuration, the specific duration of predetermined period can be according to mesh The specific degree of dependence of website and webpage structure is marked to determine, higher to the degree of dependence of structure of web page, detection is more frequent, corresponding Predetermined period is arranged shorter.For example, primary frequency is detected according to daily timed task, for targeted website after layered configuration Corresponding web data extracts.
For the ease of accurately judging whether structure of web page changes, in one embodiment, to the webpage number of extraction When according to carrying out data processing, the web data of extraction is converted into the intuitive check value of comparison.
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison.
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs Variation.
When whether the structure of web page for judging targeted website changes, it is determined using data from the sample survey comparison method.Needle To the web data at same position, by this extracted web data after treatment, with this web data of extraction Before, the adjacent preceding last time web data once extracted carries out data comparison.Due in step S20, for the net of extraction Page data obtains corresponding check value after carrying out data processing, therefore, can be by comparing the institute that this is extracted at same position The corresponding check value of this web data is stated, check value corresponding with the last time web data of last fetched carries out data Comparison, by the comparison result of this web data and last time web data, judges whether the structure of web page changes.
Since when progress data compare, the comparison other of extraction is the corresponding web data in targeted website after layering, What is extracted is the web page fragments of targeted website, therefore, when the corresponding check value of this described web data and the last time net When the corresponding check value difference of page data, it may be possible to which the structure of web page of the targeted website is changed, it is also possible to the target The corresponding web page contents in website are changed.The web data at multiple and different positions of extraction, if at each position This described web data and the last time web data are all inconsistent, then judge that structure of web page is changed;If only This described web data and the last time web data at one or more position is inconsistent, then judges to be only the target The data of web page contents are changed in website.
Further, in one embodiment, the monitoring when judging that variation has occurred in structure of web page, to technical staff Client sends prompt information, and so as to reminding technology, personnel determine the need for manpower intervention.
The structure of web page alteration detection method that the present embodiment proposes, in the way of layered configuration, to the net of targeted website Page structure is layered, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, extracts and divide Layer carries out data processing with the web data postponed, and to the web data of extraction;It, will be through using data from the sample survey comparison method This extracted web data after crossing data processing, with extract this web data before, it is adjacent before once be directed to phase With the last time web data extracted at position, data comparison is carried out;According to this described web data and last time web data Comparison result, judges whether the structure of web page changes;The method active detecting net compared using data from the sample survey is reached The whether changed purpose of page structure can find to cope in time as early as possible, realize the quick inspection to structure of web page change, Use scope is wide and accuracy rate is high.
Further, in an embodiment of the invention, the structure of web page alteration detection program 01 can also be by the place It manages device 12 to execute, be obtained to be layered to the structure of web page of targeted website in a manner of according to layered configuration, and for layering Every layer of structure of web page configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer Structure of web page configures the address real web pages URL based on the corresponding webpage sample of the module.
For example, directly dividing two layers of configuration when the structure of web page to targeted website detects, first layer is module, for The XML Path Language (XML Path Language, xpath) that the configuration of first layer module needs to detect;For example, by targeted website First layer be divided into 20 modules.The second layer is webpage sample, for example targeted website corresponds to the reality under this 20 modules Webpage URL (Uniform Resource Locator, uniform resource locator) address.
The embodiment of the present invention is layered by the structure of web page to targeted website, and for layering after structure of web page into Row configuration, this processing mode is simply handy, and accuracy rate is high;And the structure of web page after being layered is not limited to existing targeted website The structure feature of itself, strong applicability;It therefore in practical applications, can be right by writing unified structure of web page locator All source websites/source web page realizes detecting function, has a wide range of application.
Further, in an embodiment of the invention, the structure of web page alteration detection program 01 can also be by the place Device 12 is managed to execute, with according to predetermined period, web data after extracting layered configuration, and to the web data of extraction into Row data processing, comprising:
According to predetermined period, according to the address webpage URL of configuration, webpage sample pair under each module after extraction layering The content for the web page fragments answered;
According to preset algorithm, data processing is carried out to the content of the web page fragments of acquisition, obtains institute after data processing State the corresponding check value of web page fragments.
Preset algorithm described in the embodiment of the present invention includes but is not limited to: MD5;For example, using MD5 algorithm to extraction Web data carries out data processing, obtains the corresponding check value of the web page fragments extracted after data processing.
In embodiments of the present invention, the structure of web page alteration detection program 01 can also be executed by the processor 12, with Data from the sample survey comparison method is being utilized, by this extracted web data after data processing, with this webpage number of extraction According to before, it is adjacent before once for the last time web data extracted at same position, carry out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical; It also will be understood that are as follows: n group check value is identified one by one, such as: whether the M11 and M12 in identification n group check value is identical, until knowing It whether identical is clipped to Mn1 and Mn2, and records every group of data recognition result.
It is first layer structure of web page when the structure of web page to targeted website carries out layered configuration in the embodiment of the present invention In modules be respectively configured and need the XML Path Language that detects therefore, can be true according to the XML Path Language of configuration Corresponding web placement is identical in the target website for this described web data and the last web data extracted for protecting extraction.
When whether the structure of web page for judging targeted website changes, if this school of one or more web page fragments It is consistent with previous check value to test value, then it is assumed that there is no changing for structure of web page;If this school at the n position extracted Value and a preceding check value are tested, it is all inconsistent, then judge that structure of web page is changed.
In a specific application scenarios, for example, right using the structure of web page alteration detection program 01 that the present invention describes When the structure of web page of targeted website is detected, targeted website is divided two layers first, obtained first layer is module, for first Modules in layer, configuration need each module to be respectively necessary for the Xpath (i.e. XML Path Language) detected;What layering obtained The second layer is the address real web pages URL under webpage sample, such as 20 first layer modules.In the detection of the task of execution, system Timing daily executes subtask detection, according to the address URL of configuration, fetches above-mentioned 20 webpages;It is taken further according to Xpath The web page fragments content of above-mentioned 20 webpages returned;According to MD5 algorithm, in the web page fragments for above-mentioned 20 webpages fetched Hold and executes md5 calculating;Due to consideration that if md5 is different it could also be possible that caused by data variation, it is of course also possible to be net Caused by page structure changes;Therefore, when the md5 value of one or more web page fragments is consistent with the md5 value at last time same position, Structure of web page is then thought there is no changing, if the md5 of all web page fragments is all inconsistent, then it is assumed that the targeted website Structure of web page changed, when necessary, to the monitor client of technical staff send prompt information, so that technical staff is true It is fixed whether to need manpower intervention.
The structure of web page alteration detection method described through the embodiment of the present invention, for targeted website structure of web page whether Changing can find to cope in time as early as possible, realize the quick inspection to structure of web page change, and use scope is wide and accurate Rate is high.
Optionally, in other embodiments, structure of web page alteration detection program 01 can also be divided into one or more A module, one or more module are stored in memory 11, and (the present embodiment is processing by one or more processors Device 12) it is performed to complete the present invention, the so-called module of the present invention is the series of computation machine journey for referring to complete specific function Sequence instruction segment, for describing implementation procedure of the structure of web page alteration detection program 01 in structure of web page modification detection device 1.
For example, referring to shown in Fig. 3, changed for the structure of web page in one embodiment of structure of web page modification detection device of the present invention The program module schematic diagram for detecting program, in the embodiment, structure of web page alteration detection program 01 can be divided into layering and match Module 10, data processing module 20 and sampling comparison module 30 are set, illustratively:
Layered configuration module 10 is used for: in the way of layered configuration, the structure of web page of targeted website is layered, and The every layer of structure of web page obtained for layering is configured accordingly;
Data processing module 20 is used for: according to predetermined period, web data after extracting layered configuration, and to the institute of extraction It states web data and carries out data processing;
Sampling comparison module 30 is used for:
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs Variation.
The program modules such as above-mentioned layered configuration module 10, data processing module 20 and sampling comparison module 30 are performed institute Functions or operations step and above-described embodiment of realization are substantially the same, and details are not described herein.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with structure of web page alteration detection program, the structure of web page alteration detection program can be held by one or more processors Row, to realize following operation:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs Variation.
Computer readable storage medium specific embodiment of the present invention and above-mentioned structure of web page modification detection device and method Each embodiment is essentially identical, does not make tired state herein.
It should be noted that the serial number of the above embodiments of the invention is only for description, do not represent the advantages or disadvantages of the embodiments.And The terms "include", "comprise" herein or any other variant thereof is intended to cover non-exclusive inclusion, so that packet Process, device, article or the method for including a series of elements not only include those elements, but also including being not explicitly listed Other element, or further include for this process, device, article or the intrinsic element of method.Do not limiting more In the case where, the element that is limited by sentence "including a ...", it is not excluded that including process, device, the article of the element Or there is also other identical elements in method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of structure of web page alteration detection method, which is characterized in that the described method includes:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer of webpage that layering obtains Structure is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;
Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage of extraction The preceding primary last time web data for extraction at same position before data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page becomes Change.
2. structure of web page alteration detection method as described in claim 1, which is characterized in that the side according to layered configuration Formula is layered the structure of web page of targeted website, and the every layer of structure of web page obtained for layering is configured accordingly, wraps It includes:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer structure of web page pair The webpage sample in the corresponding second layer structure of web page of module and the module answered;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to second layer webpage Structure configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
3. structure of web page alteration detection method as described in claim 1, which is characterized in that it is described according to predetermined period, it extracts Web data after layered configuration, and data processing is carried out to the web data of extraction, comprising:
The webpage sample pair that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period The web page fragments content answered;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, obtains the webpage after data processing The corresponding check value of segment contents.
4. structure of web page alteration detection method as described in claim 1, which is characterized in that described to be compared using data from the sample survey Method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before it is primary The last time web data extracted at same position carries out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to the identical pre- imputation Method calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location is calculated, and is extracted with this The corresponding check value Mn2 of the last time web data at same position obtains this web data described in n group and last time net The corresponding check value of page data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, and remember Record every group of data recognition result.
5. such as the described in any item structure of web page alteration detection methods of Claims 1-4, which is characterized in that described according to The comparison result of this web data and last time web data, judges whether the structure of web page changes, comprising:
If the check value and the last time web data of corresponding this web data of one or more web page fragments Check value is consistent, then judging the structure of web page, there is no variations;
If the check value of this web data at the n position extracted and the check value of the last time web data, entirely It is all inconsistent, then judge that structure of web page is changed.
6. a kind of structure of web page modification detection device, which is characterized in that described device includes memory and processor, the storage The structure of web page alteration detection program that can be run on the processor, the structure of web page alteration detection program are stored on device Following steps are realized when being executed by the processor:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer of webpage that layering obtains Structure is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;
Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage of extraction The preceding primary last time web data for extraction at same position before data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page becomes Change.
7. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey Sequence can also be executed by the processor, to be layered to the structure of web page of targeted website in a manner of according to layered configuration, and The every layer of structure of web page obtained for layering is configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer structure of web page pair The webpage sample in the corresponding second layer structure of web page of module and the module answered;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to second layer webpage Structure configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
8. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey Sequence can also be executed by the processor, with according to predetermined period, web data after extracting layered configuration, and to the institute of extraction It states web data and carries out data processing, comprising:
The webpage sample pair that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period The web page fragments content answered;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, obtains the webpage after data processing The corresponding check value of segment contents.
9. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey Sequence can also be executed by the processor, with utilize data from the sample survey comparison method, by after data processing it is extracted this Web data, with before extracting this web data, it is adjacent before once for the last time web data extracted at same position, Carry out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to the identical pre- imputation Method calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location is calculated, and is extracted with this The corresponding check value Mn2 of the last time web data at same position obtains this web data described in n group and last time net The corresponding check value of page data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, and remember Record every group of data recognition result.
10. a kind of computer readable storage medium, which is characterized in that be stored with webpage knot on the computer readable storage medium Structure alteration detection program, the structure of web page alteration detection program can be executed by one or more processor, to realize as weighed Benefit require any one of 1 to 5 described in structure of web page alteration detection method the step of.
CN201910185344.7A 2019-03-12 2019-03-12 Structure of web page alteration detection method, apparatus and computer readable storage medium Pending CN110046295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910185344.7A CN110046295A (en) 2019-03-12 2019-03-12 Structure of web page alteration detection method, apparatus and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910185344.7A CN110046295A (en) 2019-03-12 2019-03-12 Structure of web page alteration detection method, apparatus and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110046295A true CN110046295A (en) 2019-07-23

Family

ID=67274652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910185344.7A Pending CN110046295A (en) 2019-03-12 2019-03-12 Structure of web page alteration detection method, apparatus and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110046295A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN103761330A (en) * 2014-02-10 2014-04-30 赛特斯信息科技股份有限公司 System and method for achieving automatic Internet information extraction based on template configuration
CN106960058A (en) * 2017-04-05 2017-07-18 金电联行(北京)信息技术有限公司 A kind of structure of web page alteration detection method and system
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN109450844A (en) * 2018-09-18 2019-03-08 华为技术有限公司 Trigger the method and device of Hole Detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN103761330A (en) * 2014-02-10 2014-04-30 赛特斯信息科技股份有限公司 System and method for achieving automatic Internet information extraction based on template configuration
CN106960058A (en) * 2017-04-05 2017-07-18 金电联行(北京)信息技术有限公司 A kind of structure of web page alteration detection method and system
CN108304498A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Webpage data acquiring method, device, computer equipment and storage medium
CN109450844A (en) * 2018-09-18 2019-03-08 华为技术有限公司 Trigger the method and device of Hole Detection

Similar Documents

Publication Publication Date Title
US9846634B2 (en) Visual graphical user interface verification
CN107783898B (en) Test method and test equipment for mobile application
US11137909B2 (en) Secure data entry via a virtual keyboard
CN107870976A (en) Resume identification device, method and computer-readable recording medium
CN111459495B (en) Unit test code file generation method, electronic device and storage medium
CN104252531B (en) A kind of file type identification method and device
CN110704304B (en) Application program testing method and device, storage medium and server
US20150370688A1 (en) Automatic updating of graphical user interface element locators based on dimension comparison
CN103095681A (en) Loophole detection method and device
CN103617213B (en) Method and system for identifying newspage attributive characters
CN106161133B (en) Method and device for testing webpage loading time
US11080373B1 (en) Cyclically dependent checks for software tamper-proofing
CN109783351A (en) Interface detection method, apparatus and computer readable storage medium
US20170371888A1 (en) Method for advertisement interception in dual-kernel browser and browser apparatus
CN107480068A (en) Code integrity detection method, device, electric terminal and readable storage medium storing program for executing
CN110750750A (en) Webpage generation method and device, computer equipment and storage medium
CN104468459B (en) A kind of leak detection method and device
CN113506045A (en) Risk user identification method, device, equipment and medium based on mobile equipment
CN110929110B (en) Electronic document detection method, device, equipment and storage medium
CN111783159A (en) Webpage tampering verification method and device, computer equipment and storage medium
CN113705691B (en) Image annotation verification method, device, equipment and medium based on artificial intelligence
JP5441043B2 (en) Program, information processing apparatus, and information processing method
US20200034217A1 (en) Method and device for acquiring application information
CN113886204A (en) User behavior data collection method and device, electronic equipment and readable storage medium
CN110874475A (en) Vulnerability mining method, vulnerability mining platform and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination