CN110046295A - Structure of web page alteration detection method, apparatus and computer readable storage medium - Google Patents
Structure of web page alteration detection method, apparatus and computer readable storage medium Download PDFInfo
- Publication number
- CN110046295A CN110046295A CN201910185344.7A CN201910185344A CN110046295A CN 110046295 A CN110046295 A CN 110046295A CN 201910185344 A CN201910185344 A CN 201910185344A CN 110046295 A CN110046295 A CN 110046295A
- Authority
- CN
- China
- Prior art keywords
- data
- web page
- web
- check value
- extracted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Abstract
The present invention relates to UI design fields, disclose a kind of structure of web page alteration detection method, this method comprises: being layered in the way of layered configuration to the structure of web page of targeted website, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;Using data from the sample survey comparison method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before once for the last time web data extracted at same position, carry out data comparison;According to the comparison result of described this web data and last time web data, judge whether the structure of web page changes.The present invention also proposes a kind of structure of web page modification detection device and a kind of computer readable storage medium.The present invention realizes a kind of whether changed structure of web page change active probing technique of the method active detecting structure of web page compared using data from the sample survey.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of structure of web page alteration detection method, apparatus and calculating
Machine readable storage medium storing program for executing.
Background technique
With the rapid development of Internet technology, the mode that people obtain information using webpage is also popularized substantially.
The layout of web page contents will have a direct impact on the user experience and correlation of the Webpage, and influence website to a certain extent
Overall structure and the quantity that is included of the page;And structure of web page is actually to navigation bar, column and body matter
The tissue and layout that this three big basic component of the page is carried out.
Under normal conditions, structure of web page can be all adjusted according to the content of webpage, and different web page contents determine different
Structure of web page;When targeted website, corresponding web page contents change, structure of web page generally also can be and then adjusted.
If the structure of web page of targeted website is changed, data grabber system (i.e. crawler system) at runtime, will be unable to grab
It is directly abnormal to correct data or data grabber system, is had occurred that passively perceive structure of web page
Change.This feeling of passivity Hownet page structure takes appropriate measures again after changing, it will usually delay longer time.Cause
This, how whether active detecting structure of web page changes to take corresponding counter-measure in advance, becomes and urgently solves at present
One of certainly the problem of.
Summary of the invention
The present invention provides a kind of structure of web page alteration detection method, apparatus and computer readable storage medium, it is intended to use
Whether the method active detecting structure of web page that data from the sample survey compares changes.
To achieve the above object, the present invention provides a kind of structure of web page alteration detection methods, this method comprises:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains
Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction
Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted
The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs
Variation.
Optionally, described in the way of layered configuration, the structure of web page of targeted website is layered, and for layering
Every layer of obtained structure of web page is configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot
Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer
Structure of web page configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
Optionally, described according to predetermined period, the web data after extracting layered configuration, and to the webpage number of extraction
According to progress data processing, comprising:
The webpage sample that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period
The corresponding web page fragments content of example;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, is obtained described after data processing
The corresponding check value of web page fragments content.
Optionally, described to utilize data from the sample survey comparison method, by this extracted web data after data processing,
With before extracting this web data, it is adjacent before once for the last time web data extracted at same position, carry out data
Comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default
Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this
Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on
The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical,
And record every group of data recognition result.
Optionally, the comparison result of described this web data and last time web data according to, judges the webpage
Whether structure changes, comprising:
If the check value and the last time webpage number of corresponding this web data of one or more web page fragments
According to check value it is consistent, then judge the structure of web page there is no variation;
If the verification of the check value and the last time web data of this web data at the n position extracted
Value, it is all inconsistent, then judge that structure of web page is changed.
In addition, to achieve the above object, the present invention also provides a kind of structure of web page modification detection device, described device includes
Memory and processor are stored with the structure of web page alteration detection program that can be run on the processor on the memory,
The structure of web page alteration detection program realizes following steps when being executed by the processor:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains
Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction
Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted
The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs
Variation.
Optionally, the structure of web page alteration detection program can also be executed by the processor, according to layered configuration
Mode, the structure of web page of targeted website is layered, and is matched accordingly for the obtained every layer of structure of web page of layering
It sets, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot
Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer
Structure of web page configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
Optionally, the structure of web page alteration detection program can also be executed by the processor, with according to predetermined period,
Web data after extracting layered configuration, and data processing is carried out to the web data of extraction, comprising:
The webpage sample that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period
The corresponding web page fragments content of example;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, is obtained described after data processing
The corresponding check value of web page fragments content.
Optionally, the structure of web page alteration detection program can also be executed by the processor, to utilize data from the sample survey
Comparison method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before
The primary last time web data for extraction at same position, carries out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default
Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this
Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on
The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical,
And record every group of data recognition result.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium
Structure of web page alteration detection program is stored on storage medium, the structure of web page alteration detection program can be by one or more
Processor executes, the step of to realize structure of web page alteration detection method as described above.
Structure of web page alteration detection method, apparatus proposed by the present invention and computer readable storage medium, match according to layering
The mode set is layered the structure of web page of targeted website, and the every layer of structure of web page obtained for layering carries out accordingly
Configuration;According to predetermined period, web data after extracting layered configuration, and the web data of extraction is carried out at data
Reason;Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage number of extraction
According to before, it is adjacent before once for the last time web data extracted at same position, carry out data comparison;According to it is described this
The comparison result of web data and last time web data, judges whether the structure of web page changes;Reach using sampling
The whether changed purpose of method active detecting structure of web page that data compare, can find to cope in time as early as possible, realize
Quick inspection to structure of web page change, use scope is wide and accuracy rate is high.
Detailed description of the invention
Fig. 1 is the flow diagram for the structure of web page alteration detection method that one embodiment of the invention provides;
Fig. 2 is the schematic diagram of internal structure for the structure of web page modification detection device that one embodiment of the invention provides;
Structure of web page alteration detection program in the structure of web page modification detection device that Fig. 3 provides for one embodiment of the invention
Module diagram.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of structure of web page alteration detection method.As shown in FIG. 1, FIG. 1 is one embodiment of the invention offers
Structure of web page alteration detection method flow diagram.This method can be executed by a device, which can be by software
And/or hardware realization.
In the present embodiment, structure of web page alteration detection method may be embodied as the step S10-S40 of Fig. 1 description:
Step S10 is layered the structure of web page of targeted website in the way of layered configuration, and is directed to and is layered
To every layer of structure of web page configured accordingly.
Whether change to structure of web page carry out active detecting when, the structure of web page of targeted website to be detected is carried out
Layering, for example, being divided into two layers to the corresponding structure of web page in one of targeted website;And in the way of layered configuration, to point
The every layer of structure of web page obtained after layer is configured accordingly.For example, for every layer of structure of web page after every layer, it is specific according to it
Structure of web page content, the obtained each layer of structure of web page of layering is pointedly configured respectively;For example, after for layering
A certain layer structure of web page configure path language to be detected etc..
Specific layering quantity and classification foundation can according to the corresponding particular content in targeted website, specific structure of web page with
And detection demand determines, the embodiment of the present invention to the layering quantity and classification foundation of the structure of web page of targeted website to be detected not
It is defined;When being configured to the structure of web page after layering, can be obtained according to specific layering quantity and after being layered
The corresponding content of structure of web page configure, the embodiment of the present invention is to the configuration mode of structure of web page after layering without specifically limiting
It is fixed.
Step S20, according to predetermined period, web data after extracting layered configuration, and to the web data of extraction
Carry out data processing.
When carrying out data extraction for the web data after layered configuration, the specific duration of predetermined period can be according to mesh
The specific degree of dependence of website and webpage structure is marked to determine, higher to the degree of dependence of structure of web page, detection is more frequent, corresponding
Predetermined period is arranged shorter.For example, primary frequency is detected according to daily timed task, for targeted website after layered configuration
Corresponding web data extracts.
For the ease of accurately judging whether structure of web page changes, in one embodiment, to the webpage number of extraction
When according to carrying out data processing, the web data of extraction is converted into the intuitive check value of comparison.
Step S30, using data from the sample survey comparison method, by this extracted web data after data processing, with
The preceding primary last time web data for extraction at same position before extracting this web data, adjacent, carries out data pair
Than.
Step S40 judges the structure of web page according to the comparison result of described this web data and last time web data
Whether change.
When whether the structure of web page for judging targeted website changes, it is determined using data from the sample survey comparison method.Needle
To the web data at same position, by this extracted web data after treatment, with this web data of extraction
Before, the adjacent preceding last time web data once extracted carries out data comparison.Due in step S20, for the net of extraction
Page data obtains corresponding check value after carrying out data processing, therefore, can be by comparing the institute that this is extracted at same position
The corresponding check value of this web data is stated, check value corresponding with the last time web data of last fetched carries out data
Comparison, by the comparison result of this web data and last time web data, judges whether the structure of web page changes.
Since when progress data compare, the comparison other of extraction is the corresponding web data in targeted website after layering,
What is extracted is the web page fragments of targeted website, therefore, when the corresponding check value of this described web data and the last time net
When the corresponding check value difference of page data, it may be possible to which the structure of web page of the targeted website is changed, it is also possible to the target
The corresponding web page contents in website are changed.The web data at multiple and different positions of extraction, if at each position
This described web data and the last time web data are all inconsistent, then judge that structure of web page is changed;If only
This described web data and the last time web data at one or more position is inconsistent, then judges to be only the target
The data of web page contents are changed in website.
Further, in one embodiment, the monitoring when judging that variation has occurred in structure of web page, to technical staff
Client sends prompt information, and so as to reminding technology, personnel determine the need for manpower intervention.
The structure of web page alteration detection method that the present embodiment proposes, in the way of layered configuration, to the net of targeted website
Page structure is layered, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, extracts and divide
Layer carries out data processing with the web data postponed, and to the web data of extraction;It, will be through using data from the sample survey comparison method
This extracted web data after crossing data processing, with extract this web data before, it is adjacent before once be directed to phase
With the last time web data extracted at position, data comparison is carried out;According to this described web data and last time web data
Comparison result, judges whether the structure of web page changes;The method active detecting net compared using data from the sample survey is reached
The whether changed purpose of page structure can find to cope in time as early as possible, realize the quick inspection to structure of web page change,
Use scope is wide and accuracy rate is high.
Further, " the step S10, according to layered configuration in an embodiment of the method for the present invention, in Fig. 1 embodiment
Mode, the structure of web page of targeted website is layered, and is matched accordingly for the obtained every layer of structure of web page of layering
Set " it can implement in the following way:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot
Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer
Structure of web page configures the address real web pages URL based on the corresponding webpage sample of the module.
For example, directly dividing two layers of configuration when the structure of web page to targeted website detects, first layer is module, for
The XML Path Language (XML Path Language, xpath) that the configuration of first layer module needs to detect;For example, by targeted website
First layer be divided into 20 modules.The second layer is webpage sample, for example targeted website corresponds to the reality under this 20 modules
Webpage URL (Uniform Resource Locator, uniform resource locator) address.
The embodiment of the present invention is layered by the structure of web page to targeted website, and for layering after structure of web page into
Row configuration, this processing mode is simply handy, and accuracy rate is high;And the structure of web page after being layered is not limited to existing targeted website
The structure feature of itself, strong applicability;It therefore in practical applications, can be right by writing unified structure of web page locator
All source websites/source web page realizes detecting function, has a wide range of application.
Further, in an embodiment of the method for the present invention, in Fig. 1 embodiment " step S20, according to predetermined period,
Web data after extracting layered configuration, and data processing is carried out to the web data of extraction " can be in the following way
Implement:
According to predetermined period, according to the address webpage URL of configuration, webpage sample pair under each module after extraction layering
The content for the web page fragments answered;
According to preset algorithm, data processing is carried out to the content of the web page fragments of acquisition, obtains institute after data processing
State the corresponding check value of web page fragments.
Preset algorithm described in the embodiment of the present invention includes but is not limited to: MD5;For example, using MD5 algorithm to extraction
Web data carries out data processing, obtains the corresponding check value of the web page fragments extracted after data processing.
In embodiments of the present invention, in Fig. 1 embodiment " step S30 will pass through data using data from the sample survey comparison method
Treated this extracted web data, with before extracting this web data, it is adjacent before be once directed to same position
Locate the last time web data extracted, carry out data comparison " it can implement in the following way:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default
Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this
Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on
The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical;
It also will be understood that are as follows: n group check value is identified one by one, such as: whether the M11 and M12 in identification n group check value is identical, until knowing
It whether identical is clipped to Mn1 and Mn2, and records every group of data recognition result.
It is first layer structure of web page when the structure of web page to targeted website carries out layered configuration in the embodiment of the present invention
In modules be respectively configured and need the XML Path Language that detects therefore, can be true according to the XML Path Language of configuration
Corresponding web placement is identical in the target website for this described web data and the last web data extracted for protecting extraction.
When whether the structure of web page for judging targeted website changes, if this school of one or more web page fragments
It is consistent with previous check value to test value, then it is assumed that there is no changing for structure of web page;If this school at the n position extracted
Value and a preceding check value are tested, it is all inconsistent, then judge that structure of web page is changed.
In a specific application scenarios, for example, the structure of web page alteration detection method described using the present invention is to mesh
When the structure of web page of mark website is detected, targeted website is divided two layers first, obtained first layer is module, for first layer
In modules, configuration needs each module to be respectively necessary for the Xpath (i.e. XML Path Language) of detection;It is layered the obtained
Two layers are the address real web pages URL under webpage sample, such as 20 first layer modules.In the detection of the task of execution, system is every
Its timing executes subtask detection, according to the address URL of configuration, fetches above-mentioned 20 webpages;It is fetched further according to Xpath
Above-mentioned 20 webpages web page fragments content;According to MD5 algorithm, for the web page fragments content for above-mentioned 20 webpages fetched
Md5 is executed to calculate;Due to consideration that if md5 is different it could also be possible that caused by data variation, it is of course also possible to be webpage
Caused by structure change;Therefore, when the md5 value of one or more web page fragments is consistent with the md5 value at last time same position, then
Structure of web page is thought there is no changing, if the md5 of all web page fragments is all inconsistent, then it is assumed that the targeted website
Structure of web page is changed, and when necessary, prompt information is sent to the monitor client of technical staff, so that technical staff determines
Whether manpower intervention is needed.
The structure of web page alteration detection method described through the embodiment of the present invention, for targeted website structure of web page whether
Changing can find to cope in time as early as possible, realize the quick inspection to structure of web page change, and use scope is wide and accurate
Rate is high.
The present invention also provides a kind of structure of web page modification detection devices.Referring to shown in Fig. 2, provided for one embodiment of the invention
Structure of web page modification detection device schematic diagram of internal structure.
In the present embodiment, structure of web page modification detection device 1 can be PC (PersonalComputer, personal electricity
Brain), it is also possible to the terminal devices such as smart phone, tablet computer, portable computer.The structure of web page modification detection device 1 to
It less include memory 11, processor 12, communication bus 13 and network interface 14.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory,
Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11
It can be the internal storage unit of structure of web page modification detection device 1, such as structure of web page change inspection in some embodiments
Survey the hard disk of device 1.Memory 11 is also possible to the external storage of structure of web page modification detection device 1 in further embodiments
The plug-in type hard disk being equipped in equipment, such as structure of web page modification detection device 1, intelligent memory card (Smart Media Card,
SMC), secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, memory 11 may be used also
With the internal storage unit both including structure of web page modification detection device 1 or including External memory equipment.Memory 11 not only may be used
It is installed on the application software and Various types of data of structure of web page modification detection device 1, such as structure of web page change inspection for storage
The code etc. of ranging sequence 01 can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11
Code or processing data, such as execute structure of web page alteration detection program 01 etc..
Communication bus 13 is for realizing the connection communication between these components.
Network interface 14 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in
Communication connection is established between the device 1 and other electronic equipments.
Optionally, which can also include user interface, and user interface may include display
Device (Display), input unit such as keyboard (Keyboard), optional user interface can also include that the wired of standard connects
Mouth, wireless interface.Optionally, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch control type LCD
Display and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Wherein, it shows
Device appropriate can also be known as display screen or display unit, for being shown in the letter handled in structure of web page modification detection device 1
It ceases and for showing visual user interface.
Fig. 2 illustrates only the structure of web page alteration detection with component 11-14 and structure of web page alteration detection program 01
Device 1, it will be appreciated by persons skilled in the art that structure shown in fig. 1 is not constituted to structure of web page modification detection device 1
Restriction, may include perhaps combining certain components or different component layouts than illustrating less perhaps more components.
In 1 embodiment of device shown in Fig. 2, structure of web page alteration detection program 01 is stored in memory 11;Processing
Device 12 realizes following steps when executing the structure of web page alteration detection program 01 stored in memory 11:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains
Structure of web page is configured accordingly.
Whether change to structure of web page carry out active detecting when, the structure of web page of targeted website to be detected is carried out
Layering, for example, being divided into two layers to the corresponding structure of web page in one of targeted website;And in the way of layered configuration, to point
The every layer of structure of web page obtained after layer is configured accordingly.For example, for every layer of structure of web page after every layer, it is specific according to it
Structure of web page content, the obtained each layer of structure of web page of layering is pointedly configured respectively;For example, after for layering
A certain layer structure of web page configure path language to be detected etc..
Specific layering quantity and classification foundation can according to the corresponding particular content in targeted website, specific structure of web page with
And detection demand determines, the embodiment of the present invention to the layering quantity and classification foundation of the structure of web page of targeted website to be detected not
It is defined;When being configured to the structure of web page after layering, can be obtained according to specific layering quantity and after being layered
The corresponding content of structure of web page configure, the embodiment of the present invention is to the configuration mode of structure of web page after layering without specifically limiting
It is fixed.
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction
Processing.
When carrying out data extraction for the web data after layered configuration, the specific duration of predetermined period can be according to mesh
The specific degree of dependence of website and webpage structure is marked to determine, higher to the degree of dependence of structure of web page, detection is more frequent, corresponding
Predetermined period is arranged shorter.For example, primary frequency is detected according to daily timed task, for targeted website after layered configuration
Corresponding web data extracts.
For the ease of accurately judging whether structure of web page changes, in one embodiment, to the webpage number of extraction
When according to carrying out data processing, the web data of extraction is converted into the intuitive check value of comparison.
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted
The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison.
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs
Variation.
When whether the structure of web page for judging targeted website changes, it is determined using data from the sample survey comparison method.Needle
To the web data at same position, by this extracted web data after treatment, with this web data of extraction
Before, the adjacent preceding last time web data once extracted carries out data comparison.Due in step S20, for the net of extraction
Page data obtains corresponding check value after carrying out data processing, therefore, can be by comparing the institute that this is extracted at same position
The corresponding check value of this web data is stated, check value corresponding with the last time web data of last fetched carries out data
Comparison, by the comparison result of this web data and last time web data, judges whether the structure of web page changes.
Since when progress data compare, the comparison other of extraction is the corresponding web data in targeted website after layering,
What is extracted is the web page fragments of targeted website, therefore, when the corresponding check value of this described web data and the last time net
When the corresponding check value difference of page data, it may be possible to which the structure of web page of the targeted website is changed, it is also possible to the target
The corresponding web page contents in website are changed.The web data at multiple and different positions of extraction, if at each position
This described web data and the last time web data are all inconsistent, then judge that structure of web page is changed;If only
This described web data and the last time web data at one or more position is inconsistent, then judges to be only the target
The data of web page contents are changed in website.
Further, in one embodiment, the monitoring when judging that variation has occurred in structure of web page, to technical staff
Client sends prompt information, and so as to reminding technology, personnel determine the need for manpower intervention.
The structure of web page alteration detection method that the present embodiment proposes, in the way of layered configuration, to the net of targeted website
Page structure is layered, and the every layer of structure of web page obtained for layering is configured accordingly;According to predetermined period, extracts and divide
Layer carries out data processing with the web data postponed, and to the web data of extraction;It, will be through using data from the sample survey comparison method
This extracted web data after crossing data processing, with extract this web data before, it is adjacent before once be directed to phase
With the last time web data extracted at position, data comparison is carried out;According to this described web data and last time web data
Comparison result, judges whether the structure of web page changes;The method active detecting net compared using data from the sample survey is reached
The whether changed purpose of page structure can find to cope in time as early as possible, realize the quick inspection to structure of web page change,
Use scope is wide and accuracy rate is high.
Further, in an embodiment of the invention, the structure of web page alteration detection program 01 can also be by the place
It manages device 12 to execute, be obtained to be layered to the structure of web page of targeted website in a manner of according to layered configuration, and for layering
Every layer of structure of web page configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer webpage knot
Webpage sample in the corresponding module of structure and the corresponding second layer structure of web page of the module;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to the second layer
Structure of web page configures the address real web pages URL based on the corresponding webpage sample of the module.
For example, directly dividing two layers of configuration when the structure of web page to targeted website detects, first layer is module, for
The XML Path Language (XML Path Language, xpath) that the configuration of first layer module needs to detect;For example, by targeted website
First layer be divided into 20 modules.The second layer is webpage sample, for example targeted website corresponds to the reality under this 20 modules
Webpage URL (Uniform Resource Locator, uniform resource locator) address.
The embodiment of the present invention is layered by the structure of web page to targeted website, and for layering after structure of web page into
Row configuration, this processing mode is simply handy, and accuracy rate is high;And the structure of web page after being layered is not limited to existing targeted website
The structure feature of itself, strong applicability;It therefore in practical applications, can be right by writing unified structure of web page locator
All source websites/source web page realizes detecting function, has a wide range of application.
Further, in an embodiment of the invention, the structure of web page alteration detection program 01 can also be by the place
Device 12 is managed to execute, with according to predetermined period, web data after extracting layered configuration, and to the web data of extraction into
Row data processing, comprising:
According to predetermined period, according to the address webpage URL of configuration, webpage sample pair under each module after extraction layering
The content for the web page fragments answered;
According to preset algorithm, data processing is carried out to the content of the web page fragments of acquisition, obtains institute after data processing
State the corresponding check value of web page fragments.
Preset algorithm described in the embodiment of the present invention includes but is not limited to: MD5;For example, using MD5 algorithm to extraction
Web data carries out data processing, obtains the corresponding check value of the web page fragments extracted after data processing.
In embodiments of the present invention, the structure of web page alteration detection program 01 can also be executed by the processor 12, with
Data from the sample survey comparison method is being utilized, by this extracted web data after data processing, with this webpage number of extraction
According to before, it is adjacent before once for the last time web data extracted at same position, carry out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to identical described default
Algorithm calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location of calculating, and and this
Extract the corresponding check value Mn2 of the last time web data at same position, obtain this described web data of n group and on
The corresponding check value of secondary web data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical;
It also will be understood that are as follows: n group check value is identified one by one, such as: whether the M11 and M12 in identification n group check value is identical, until knowing
It whether identical is clipped to Mn1 and Mn2, and records every group of data recognition result.
It is first layer structure of web page when the structure of web page to targeted website carries out layered configuration in the embodiment of the present invention
In modules be respectively configured and need the XML Path Language that detects therefore, can be true according to the XML Path Language of configuration
Corresponding web placement is identical in the target website for this described web data and the last web data extracted for protecting extraction.
When whether the structure of web page for judging targeted website changes, if this school of one or more web page fragments
It is consistent with previous check value to test value, then it is assumed that there is no changing for structure of web page;If this school at the n position extracted
Value and a preceding check value are tested, it is all inconsistent, then judge that structure of web page is changed.
In a specific application scenarios, for example, right using the structure of web page alteration detection program 01 that the present invention describes
When the structure of web page of targeted website is detected, targeted website is divided two layers first, obtained first layer is module, for first
Modules in layer, configuration need each module to be respectively necessary for the Xpath (i.e. XML Path Language) detected;What layering obtained
The second layer is the address real web pages URL under webpage sample, such as 20 first layer modules.In the detection of the task of execution, system
Timing daily executes subtask detection, according to the address URL of configuration, fetches above-mentioned 20 webpages;It is taken further according to Xpath
The web page fragments content of above-mentioned 20 webpages returned;According to MD5 algorithm, in the web page fragments for above-mentioned 20 webpages fetched
Hold and executes md5 calculating;Due to consideration that if md5 is different it could also be possible that caused by data variation, it is of course also possible to be net
Caused by page structure changes;Therefore, when the md5 value of one or more web page fragments is consistent with the md5 value at last time same position,
Structure of web page is then thought there is no changing, if the md5 of all web page fragments is all inconsistent, then it is assumed that the targeted website
Structure of web page changed, when necessary, to the monitor client of technical staff send prompt information, so that technical staff is true
It is fixed whether to need manpower intervention.
The structure of web page alteration detection method described through the embodiment of the present invention, for targeted website structure of web page whether
Changing can find to cope in time as early as possible, realize the quick inspection to structure of web page change, and use scope is wide and accurate
Rate is high.
Optionally, in other embodiments, structure of web page alteration detection program 01 can also be divided into one or more
A module, one or more module are stored in memory 11, and (the present embodiment is processing by one or more processors
Device 12) it is performed to complete the present invention, the so-called module of the present invention is the series of computation machine journey for referring to complete specific function
Sequence instruction segment, for describing implementation procedure of the structure of web page alteration detection program 01 in structure of web page modification detection device 1.
For example, referring to shown in Fig. 3, changed for the structure of web page in one embodiment of structure of web page modification detection device of the present invention
The program module schematic diagram for detecting program, in the embodiment, structure of web page alteration detection program 01 can be divided into layering and match
Module 10, data processing module 20 and sampling comparison module 30 are set, illustratively:
Layered configuration module 10 is used for: in the way of layered configuration, the structure of web page of targeted website is layered, and
The every layer of structure of web page obtained for layering is configured accordingly;
Data processing module 20 is used for: according to predetermined period, web data after extracting layered configuration, and to the institute of extraction
It states web data and carries out data processing;
Sampling comparison module 30 is used for:
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted
The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs
Variation.
The program modules such as above-mentioned layered configuration module 10, data processing module 20 and sampling comparison module 30 are performed institute
Functions or operations step and above-described embodiment of realization are substantially the same, and details are not described herein.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
On be stored with structure of web page alteration detection program, the structure of web page alteration detection program can be held by one or more processors
Row, to realize following operation:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer that layering obtains
Structure of web page is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data are carried out to the web data of extraction
Processing;
Using data from the sample survey comparison method, by this extracted web data after data processing, and this is extracted
The preceding primary last time web data for extraction at same position before web data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page occurs
Variation.
Computer readable storage medium specific embodiment of the present invention and above-mentioned structure of web page modification detection device and method
Each embodiment is essentially identical, does not make tired state herein.
It should be noted that the serial number of the above embodiments of the invention is only for description, do not represent the advantages or disadvantages of the embodiments.And
The terms "include", "comprise" herein or any other variant thereof is intended to cover non-exclusive inclusion, so that packet
Process, device, article or the method for including a series of elements not only include those elements, but also including being not explicitly listed
Other element, or further include for this process, device, article or the intrinsic element of method.Do not limiting more
In the case where, the element that is limited by sentence "including a ...", it is not excluded that including process, device, the article of the element
Or there is also other identical elements in method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of structure of web page alteration detection method, which is characterized in that the described method includes:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer of webpage that layering obtains
Structure is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;
Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage of extraction
The preceding primary last time web data for extraction at same position before data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page becomes
Change.
2. structure of web page alteration detection method as described in claim 1, which is characterized in that the side according to layered configuration
Formula is layered the structure of web page of targeted website, and the every layer of structure of web page obtained for layering is configured accordingly, wraps
It includes:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer structure of web page pair
The webpage sample in the corresponding second layer structure of web page of module and the module answered;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to second layer webpage
Structure configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
3. structure of web page alteration detection method as described in claim 1, which is characterized in that it is described according to predetermined period, it extracts
Web data after layered configuration, and data processing is carried out to the web data of extraction, comprising:
The webpage sample pair that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period
The web page fragments content answered;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, obtains the webpage after data processing
The corresponding check value of segment contents.
4. structure of web page alteration detection method as described in claim 1, which is characterized in that described to be compared using data from the sample survey
Method, by this extracted web data after data processing, with before extracting this web data, it is adjacent before it is primary
The last time web data extracted at same position carries out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to the identical pre- imputation
Method calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location is calculated, and is extracted with this
The corresponding check value Mn2 of the last time web data at same position obtains this web data described in n group and last time net
The corresponding check value of page data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, and remember
Record every group of data recognition result.
5. such as the described in any item structure of web page alteration detection methods of Claims 1-4, which is characterized in that described according to
The comparison result of this web data and last time web data, judges whether the structure of web page changes, comprising:
If the check value and the last time web data of corresponding this web data of one or more web page fragments
Check value is consistent, then judging the structure of web page, there is no variations;
If the check value of this web data at the n position extracted and the check value of the last time web data, entirely
It is all inconsistent, then judge that structure of web page is changed.
6. a kind of structure of web page modification detection device, which is characterized in that described device includes memory and processor, the storage
The structure of web page alteration detection program that can be run on the processor, the structure of web page alteration detection program are stored on device
Following steps are realized when being executed by the processor:
In the way of layered configuration, the structure of web page of targeted website is layered, and for every layer of webpage that layering obtains
Structure is configured accordingly;
According to predetermined period, web data after extracting layered configuration, and data processing is carried out to the web data of extraction;
Using data from the sample survey comparison method, by this extracted web data after data processing, with this webpage of extraction
The preceding primary last time web data for extraction at same position before data, adjacent, carries out data comparison;
According to the comparison result of described this web data and last time web data, judge whether the structure of web page becomes
Change.
7. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey
Sequence can also be executed by the processor, to be layered to the structure of web page of targeted website in a manner of according to layered configuration, and
The every layer of structure of web page obtained for layering is configured accordingly, comprising:
For the structure of web page of targeted website to be detected, the structure of web page is divided into two layers, obtains first layer structure of web page pair
The webpage sample in the corresponding second layer structure of web page of module and the module answered;
The corresponding modules of first layer structure of web page are respectively configured with the XML Path Language for needing to detect, to second layer webpage
Structure configures the address the module corresponding real web pages URL based on the corresponding webpage sample of the module.
8. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey
Sequence can also be executed by the processor, with according to predetermined period, web data after extracting layered configuration, and to the institute of extraction
It states web data and carries out data processing, comprising:
The webpage sample pair that each module includes after being layered is extracted according to the address webpage URL of configuration according to predetermined period
The web page fragments content answered;
According to preset algorithm, data processing is carried out to the web page fragments content of acquisition, obtains the webpage after data processing
The corresponding check value of segment contents.
9. structure of web page modification detection device as claimed in claim 6, which is characterized in that the structure of web page alteration detection journey
Sequence can also be executed by the processor, with utilize data from the sample survey comparison method, by after data processing it is extracted this
Web data, with before extracting this web data, it is adjacent before once for the last time web data extracted at same position,
Carry out data comparison, comprising:
According to preset algorithm, the corresponding check value M11 of this described web data is calculated, according still further to the identical pre- imputation
Method calculates the check value M12 that at same position, the adjacent preceding last time web data once extracted is extracted with this;
The corresponding check value Mn1 of this corresponding described web data at n different location is calculated, and is extracted with this
The corresponding check value Mn2 of the last time web data at same position obtains this web data described in n group and last time net
The corresponding check value of page data;
The obtained n group check value is compared respectively, identifies whether the Mn1 and Mn2 in n group check value is identical, and remember
Record every group of data recognition result.
10. a kind of computer readable storage medium, which is characterized in that be stored with webpage knot on the computer readable storage medium
Structure alteration detection program, the structure of web page alteration detection program can be executed by one or more processor, to realize as weighed
Benefit require any one of 1 to 5 described in structure of web page alteration detection method the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185344.7A CN110046295A (en) | 2019-03-12 | 2019-03-12 | Structure of web page alteration detection method, apparatus and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910185344.7A CN110046295A (en) | 2019-03-12 | 2019-03-12 | Structure of web page alteration detection method, apparatus and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110046295A true CN110046295A (en) | 2019-07-23 |
Family
ID=67274652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910185344.7A Pending CN110046295A (en) | 2019-03-12 | 2019-03-12 | Structure of web page alteration detection method, apparatus and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046295A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103761330A (en) * | 2014-02-10 | 2014-04-30 | 赛特斯信息科技股份有限公司 | System and method for achieving automatic Internet information extraction based on template configuration |
CN106960058A (en) * | 2017-04-05 | 2017-07-18 | 金电联行(北京)信息技术有限公司 | A kind of structure of web page alteration detection method and system |
CN108304498A (en) * | 2018-01-12 | 2018-07-20 | 深圳壹账通智能科技有限公司 | Webpage data acquiring method, device, computer equipment and storage medium |
CN109450844A (en) * | 2018-09-18 | 2019-03-08 | 华为技术有限公司 | Trigger the method and device of Hole Detection |
-
2019
- 2019-03-12 CN CN201910185344.7A patent/CN110046295A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103761330A (en) * | 2014-02-10 | 2014-04-30 | 赛特斯信息科技股份有限公司 | System and method for achieving automatic Internet information extraction based on template configuration |
CN106960058A (en) * | 2017-04-05 | 2017-07-18 | 金电联行(北京)信息技术有限公司 | A kind of structure of web page alteration detection method and system |
CN108304498A (en) * | 2018-01-12 | 2018-07-20 | 深圳壹账通智能科技有限公司 | Webpage data acquiring method, device, computer equipment and storage medium |
CN109450844A (en) * | 2018-09-18 | 2019-03-08 | 华为技术有限公司 | Trigger the method and device of Hole Detection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9846634B2 (en) | Visual graphical user interface verification | |
CN107783898B (en) | Test method and test equipment for mobile application | |
US11137909B2 (en) | Secure data entry via a virtual keyboard | |
CN107870976A (en) | Resume identification device, method and computer-readable recording medium | |
CN111459495B (en) | Unit test code file generation method, electronic device and storage medium | |
CN104252531B (en) | A kind of file type identification method and device | |
CN110704304B (en) | Application program testing method and device, storage medium and server | |
US20150370688A1 (en) | Automatic updating of graphical user interface element locators based on dimension comparison | |
CN103095681A (en) | Loophole detection method and device | |
CN103617213B (en) | Method and system for identifying newspage attributive characters | |
CN106161133B (en) | Method and device for testing webpage loading time | |
US11080373B1 (en) | Cyclically dependent checks for software tamper-proofing | |
CN109783351A (en) | Interface detection method, apparatus and computer readable storage medium | |
US20170371888A1 (en) | Method for advertisement interception in dual-kernel browser and browser apparatus | |
CN107480068A (en) | Code integrity detection method, device, electric terminal and readable storage medium storing program for executing | |
CN110750750A (en) | Webpage generation method and device, computer equipment and storage medium | |
CN104468459B (en) | A kind of leak detection method and device | |
CN113506045A (en) | Risk user identification method, device, equipment and medium based on mobile equipment | |
CN110929110B (en) | Electronic document detection method, device, equipment and storage medium | |
CN111783159A (en) | Webpage tampering verification method and device, computer equipment and storage medium | |
CN113705691B (en) | Image annotation verification method, device, equipment and medium based on artificial intelligence | |
JP5441043B2 (en) | Program, information processing apparatus, and information processing method | |
US20200034217A1 (en) | Method and device for acquiring application information | |
CN113886204A (en) | User behavior data collection method and device, electronic equipment and readable storage medium | |
CN110874475A (en) | Vulnerability mining method, vulnerability mining platform and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |