CN106599242A - Webpage change monitoring method and system based on similarity calculation - Google Patents
Webpage change monitoring method and system based on similarity calculation Download PDFInfo
- Publication number
- CN106599242A CN106599242A CN201611182671.XA CN201611182671A CN106599242A CN 106599242 A CN106599242 A CN 106599242A CN 201611182671 A CN201611182671 A CN 201611182671A CN 106599242 A CN106599242 A CN 106599242A
- Authority
- CN
- China
- Prior art keywords
- web page
- page contents
- webpage
- module
- judge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Virology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a webpage change monitoring method and system based on similarity calculation, and the method comprises the steps: storing webpage contents locally through employing the technology of web spider, obtaining the webpage contents again in a set time period, and carrying out the similarity comparison of the obtained webpage contents with the local webpage contents through a fuzzy hash algorithm. The method can customize the attributes of webpage contents, and the webpage contents cannot be changed. The monitoring steps are simpler, and the monitoring efficiency is high. For the webpage contents which can be changed, the method further carries out the difference analysis, recognizes the tampering of characters or images, can accurately recognize whether the webpage contents are tampered or updated normally in the first time, and improves the safety of webpage contents.
Description
Technical field
The present invention relates to a kind of info web monitoring technology, relates in particular to a kind of webpage based on Similarity Measure and becomes
More monitoring method and system.
Background technology
A key content for ensureing user's normal browsing webpage is the webpage (page) for preventing website side from issuing by hacker
Distort.It is so-called to distort, legal web page contents modification (refreshing) are different from, the change for referring to web page contents does not meet portal management
Member or the expection of user institute requested webpage.Webpage all faces with internet information explosive growth, in every day the Internet
Face the risk being tampered.Such as can not in time find that webpage is tampered and will bring immeasurable loss to website and user.
Webpage is mainly had by the mode that hacker distorts:Hacker may break through website, and directly the web page contents of the issue are entered
Row modification.Detect that the scheme that webpage is tampered is in prior art::Periodicity monitoring is carried out to website using scanning device, specifically
For:Surface sweeping device software is installed, URL (Uniform Resoure Locator, the unification for accessing monitored webpage is periodically obtained
Resource localizer), according to certain algorithm, the benchmark page is set, and the page of monitored webpage is compared with the benchmark page, obtain
Go out the ratio that the page elements changed in monitored webpage account for all page elements of the webpage, and according to the ratio with set in advance
The proportion threshold value put judges whether the page is changed, and the ratio thinks that monitored website is not tampered with less than proportion threshold value, otherwise
Think that monitored webpage is tampered.Or, some sensitive words are pre-set, judge that monitored webpage includes such sensitive word
When, then it is assumed that the page is distorted by hacker.Because existing website dynamic web page technique is a lot, therefore existing technical scheme is difficult
Accurately identify webpage to be tampered or normal content refreshing, be inevitably present flase drop and missing inspection.
The content of the invention
For this purpose, the technical problem to be solved is that real-time monitoring webpage cannot accurately identify net in prior art
Page is tampered or normal update content.
To solve above-mentioned technical problem, the technical solution adopted in the present invention:
A kind of webpage change monitoring method based on Similarity Measure, comprises the steps of:
S1:Web page contents in network are stored the mould that web page contents are calculated to local memory device by using web crawlers
Paste cryptographic Hash;
S2:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make respective markers, the
One type of webpage is the webpage that web page contents will not change, and the second type of webpage is the net that web page contents can change
Page;
S3:Crawl the web page contents from network again after the time interval of setting, and calculate the mould of web page contents this moment
Paste cryptographic Hash;
S4:The similarity of the fuzzy hash value obtained in the fuzzy hash value obtained in calculation procedure S3 and step S1, similarity
Span be 0-100;
S5:Judge the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page contents, carry out step
S6;If the web page contents belong to the second web page contents, step S7 is carried out;
S6:Whether the value for judging similarity is 100, is, then carry out step S61;It is no, then carry out step S62;
S61:Terminate the monitoring of the web page contents;
S62:Give a warning, terminate the monitoring of the web page contents;
S7:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then carry out step
S71;
S71:The difference that the web page contents compare original state is found out using DIFF instruments;
S72:Judge that difference, whether because picture change causes, is then to carry out step S8;It is no, then carry out step S9;
S8:Image content is matched with hostile content feature, whether is had anomalous content in detection picture;It is then to be walked
Rapid S81;It is no, then carry out step S82;
S81:Give a warning, terminate the monitoring of the web page contents;
S82:Terminate the monitoring of the web page contents;
S9:Matched with sensitive dictionary, if matching sensitive word, given a warning.
In step S9, also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, give a warning.
Picture recognizer is called to be identified image content in step S8, image content and hostile content is special
Levy and matched, whether have anomalous content in detection picture;It is then to carry out step S81;Otherwise carry out step S82.
A kind of webpage change monitoring system based on Similarity Measure, comprising with lower module:
Initial acquisition module:Web page contents in network are stored to local memory device by using web crawlers, net is calculated
The fuzzy hash value of page content;
Judge module:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding
Labelling, the first type of webpage is the webpage that web page contents will not change, and the second type of webpage can become for web page contents
The webpage of change;
Real-time Collection module:Crawl the web page contents from network again after the time interval of setting, and calculate net this moment
The fuzzy hash value of page content;
Computing module:Calculate the fuzzy Hash of the fuzzy hash value and acquisition in initial acquisition module obtained in Real-time Collection module
The similarity of value, the span of similarity is 0-100;
Webpage judge module:Judge the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page contents,
Then proceed to the first judge module;If the web page contents belong to the second web page contents, the second judge module is proceeded to;
First judge module:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then
Proceed to the first alert module;
First alert module:Give a warning, terminate the monitoring of the web page contents;
Second judge module:Whether the value for judging similarity is 100, is then to proceed to the first termination module;It is no, then proceed to difference
Different analysis module;
First terminates module:Terminate the monitoring of the web page contents;
Variation analyses module:The difference that the web page contents compare original state is found out using DIFF instruments;
3rd judge module:Judge that difference, whether because picture change causes, is then to proceed to the first matching module;It is no, then access
Second matching module;
First matching module:Image content is matched with hostile content feature, whether is had anomalous content in detection picture;
It is then to proceed to the second alert module;It is no, then proceed to the second termination module;
Second alert module:Give a warning, terminate the monitoring of the web page contents;
Second terminates module:Terminate the monitoring of the web page contents;
Second matching module:Matched with sensitive dictionary, if matching sensitive word, given a warning.
Second matching module also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, sends police
Accuse.
Call picture recognizer to be identified image content in 3rd judge module, judge difference whether due to picture
Change causes, and is, then proceed to the first matching module;It is no, then the matching module of access second.
The above-mentioned technical proposal of the present invention has compared to existing technology advantages below.
A kind of webpage change monitoring method and system based on Similarity Measure of the present invention, will using web crawlers technology
Web page contents are saved in locally, and in the time interval of setting web page contents are obtained again, using fuzzy hash algorithm and local guarantor
The content of pages similarity deposited is compared.Can be with self-defined web page contents attribute, the web page contents that content will not change, monitoring
Step is more succinct, and monitoring efficiency is high.For the changeable web page contents of content, variation analyses are further carried out, recognize character
Or picture is distorted, web page contents can be accurately identified with the very first time and be tampered or normally update, be improved in webpage
The safety of appearance.
Description of the drawings
In order that present disclosure is more likely to be clearly understood, the specific embodiment below according to the present invention is simultaneously combined
Accompanying drawing, the present invention is further detailed explanation, wherein,
Fig. 1 is a kind of flow chart of the webpage change monitoring method based on Similarity Measure of the present invention;
Fig. 2 is a kind of structured flowchart of the webpage change monitoring system based on Similarity Measure of the present invention.
Reference is expressed as in figure:1- initial acquisition modules;2- judge modules;3- Real-time Collection modules;4- calculates mould
Block;5- webpage judge modules;The judge modules of 6- first;The alert modules of 61- first;The judge modules of 7- second;71- first terminates mould
Block;72- variation analyses modules;The judge modules of 8- the 3rd;The matching modules of 81- first;The matching modules of 82- second;811- second is warned
Accuse module;812- second terminates module.
Specific embodiment
A kind of webpage change monitoring method based on Similarity Measure, as shown in figure 1, comprising the steps of:
S1:Web page contents in network are stored the mould that web page contents are calculated to local memory device by using web crawlers
Paste cryptographic Hash.Fuzzy hash value mainly uses fuzzy hash algorithm, can call ssdeep instruments.Fuzzy hash algorithm is called
Burst hash algorithm based on content segmentation(context triggered piecewise hashing, CTPH), it is main to use
In the similarity system design of file.2006, Jesse Kornblum proposed CTPH, and provide the algorithm of an entitled spamsum
Example.Subsequently, Jason Sherman develop ssdeep instruments(http://ssdeep.sourceforge.net/).The calculation
Method can be used in the present invention Malicious Code Detection, it is also possible to for bug excavation etc..The cardinal principle of fuzzy Hash is to make
With a weak Hash calculation file local content, under given conditions burst is carried out to file, then using a strong Hash pair
File calculates cryptographic Hash per piece, takes a part for these values and couples together, and a fuzzy Hash is constituted together with fragmented condition
As a result.Using string-similarity contrast algorithm judge two fuzzy hash values similarity how many, so as to judge two
The similarity degree of individual file.The part of file is changed(Many places modification is included in, is increased, deleted partial content), using fuzzy
Hash can find the similarity relation with source file, be to judge similarity a kind of preferably method at present.
S2:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding mark
Note, the first type of webpage is the webpage that web page contents will not change, and the second type of webpage can change for web page contents
Webpage.Can with manually classified, it is also possible to using web page contents of the prior art identification and sorting technique(Such as China
Patent documentation 201210299843.7,201210376933.1 etc. is recorded)Web page contents are classified.
S3:Crawl the web page contents from network again after the time interval of setting, and calculate web page contents this moment
Fuzzy hash value.
The calculating process of fuzzy hash value is as follows in step S1 and S3:
With file fragmentation of the weak hash algorithm to the web page contents.Concrete grammar is:
Read a part of content hereof, calculated with weak hash algorithm Alder-32, to roll Hash in the way of obtain
The cryptographic Hash of one 4 byte.It is so-called rolling Hash refer to, such as had calculated that cryptographic Hash h1 of abcdef originally, next
Calculate the cryptographic Hash of bcdefg, it is not necessary to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are two
Individual function, that is, only need to accordingly increase and decrease impact of the residual quantity to cryptographic Hash.This Hash can greatly speed up burst judgement
Speed.
Setting burst value n, by it fragmented condition is controlled.The value of n determines according to file size, file content etc..It is determined that
Principle and method are as follows:
The value of n takes all the time 2 integer power, such Alder-32 cryptographic Hash divided by n remainder close to being uniformly distributed.Only when remaining
Burst when number is equal to n-1, can burst in the case of being equivalent to only similar 1/n.That is, to a file, window
It is often mobile once, just there is a 1/n may burst.If the piece number that certain once divides is too little, that is reduced by the value of n, makes to divide every time
The probability of piece increases, and increases piece number.And if the piece felt point is too many, just increasing the value of n, the probability for making each burst subtracts
It is few, reduce piece number.Every time by n divided by or be multiplied by 2, be adjusted, make final piece number as far as possible between 32 to 64.Due to
The probability of burst is almost 1/n, so during each run ssdeep, the n values attempted for the first time are exactly one close to text
The value of part length/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location burst;Otherwise, regardless of
Piece, window rolls a byte backward, Alder-32 cryptographic Hash is then calculated again and is judged, so continues.
Cryptographic Hash is calculated to each piece obtained in S101 with a strong hash algorithm.Fowler-Noll-Vo can be used
Hash hash algorithms.
Compression cryptographic Hash.To each file fragmentation, after being calculated a cryptographic Hash, can select to compress result
It is short.Specially:Minimum 6 of Hash result are taken, and is showed with an ascii character, as the final Kazakhstan of this burst
The result of uncommon value.
Connection cryptographic Hash.Cryptographic Hash after compressing per piece is connected together, that is, obtains the fuzzy hash value of this document.Such as
Fruit burst value n is different to different files, should also n be included in fuzzy hash value, and specific practice is that directly n is added in former Kazakhstan
A part of the uncommon value finally, as cryptographic Hash.
S4:The similarity of the fuzzy hash value obtained in the fuzzy hash value obtained in calculation procedure S3 and step S1, phase
It is 0-100 like the span spent.The calculating process of similarity is as follows in step S4:The fuzzy hash value of the web page contents is
One character string, is set to s1, s2.Using the weighing edit distance of s1 to s2 as the foundation for evaluating its similarity;Weighting editor away from
From referring to, first judge to be changed into s2 from s1, it is minimum to need how many step operations(Including insertion, delete, change), then to different operating
Provide a weights.Insertion, the weights deleted, change are set to:0.2、0.3、0.5.Finally, result is added up, is obtained final product
To weighing edit distance.
By this distance divided by s1 and s2 length and, absolute results are changed into into relative result, re-map 0-100's
In one integer value, wherein, 100 represent that two character strings are completely the same, and 0 represents completely dissimilar;The result can be used
To judge the similarity degree of two web page contents.
S5:Judge the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page contents, carry out
Step S6;If the web page contents belong to the second web page contents, step S7 is carried out.
S6:Whether the value for judging similarity is 100, is, then carry out step S61;It is no, then carry out step S62.
S61:Terminate the monitoring of the web page contents.
S62:Give a warning, terminate the monitoring of the web page contents.
S7:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then walked
Rapid S71.
S71:The difference that the web page contents compare original state is found out using DIFF instruments.
S72:Judge that difference, whether because picture change causes, is then to carry out step S8;It is no, then carry out step S9.
S8:Image content is matched with hostile content feature, whether is had anomalous content in detection picture;It is then to enter
Row step S81;It is no, then carry out step S82.
S81:Give a warning, terminate the monitoring of the web page contents.
S82:Terminate the monitoring of the web page contents.
S9:Matched with sensitive dictionary, if matching sensitive word, given a warning.If changing unit is character string, use
Regular expression mode is matched with default sensitive dictionary, such as matches sensitive word, then alerted.
In step S9, also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, give a warning.It is also
Matched with default Trojan characteristics storehouse using regular expression mode.
Picture recognizer is called to be identified image content in step S8, image content and hostile content is special
Levy and matched, whether have anomalous content in detection picture;It is then to carry out step S81;Otherwise carry out step S82.
A kind of webpage change monitoring system based on Similarity Measure, comprising with lower module:
Initial acquisition module 1:Web page contents in network are stored to local memory device by using web crawlers, is calculated
The fuzzy hash value of web page contents.Fuzzy hash value mainly uses fuzzy hash algorithm, can call ssdeep instruments.
Judge module 2:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make
Respective markers, the first type of webpage is the webpage that web page contents will not change, and the second type of webpage can be sent out for web page contents
The webpage of changing.Can with manually classified, it is also possible to using web page contents of the prior art identification and sorting technique(It is all
Such as Chinese patent literature 201210299843.7,201210376933.1 is recorded)Web page contents are classified.
Real-time Collection module 3:The web page contents are crawled from network again after the time interval of setting, and calculate this
Carve the fuzzy hash value of web page contents.
The calculating process of fuzzy hash value is as follows in initial acquisition module 1 and Real-time Collection module 3:
With file fragmentation of the weak hash algorithm to the web page contents.Concrete grammar is:
Read a part of content hereof, calculated with weak hash algorithm Alder-32, to roll Hash in the way of obtain
The cryptographic Hash of one 4 byte.It is so-called rolling Hash refer to, such as had calculated that cryptographic Hash h1 of abcdef originally, next
Calculate the cryptographic Hash of bcdefg, it is not necessary to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are two
Individual function, that is, only need to accordingly increase and decrease impact of the residual quantity to cryptographic Hash.This Hash can greatly speed up burst judgement
Speed.
Setting burst value n, by it fragmented condition is controlled.The value of n determines according to file size, file content etc..It is determined that
Principle and method are as follows:
The value of n takes all the time 2 integer power, such Alder-32 cryptographic Hash divided by n remainder close to being uniformly distributed.Only when remaining
Burst when number is equal to n-1, can burst in the case of being equivalent to only similar 1/n.That is, to a file, window
It is often mobile once, just there is a 1/n may burst.If the piece number that certain once divides is too little, that is reduced by the value of n, makes to divide every time
The probability of piece increases, and increases piece number.And if the piece felt point is too many, just increasing the value of n, the probability for making each burst subtracts
It is few, reduce piece number.Every time by n divided by or be multiplied by 2, be adjusted, make final piece number as far as possible between 32 to 64.Due to
The probability of burst is almost 1/n, so during each run ssdeep, the n values attempted for the first time are exactly one close to text
The value of part length/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location burst;Otherwise, regardless of
Piece, window rolls a byte backward, Alder-32 cryptographic Hash is then calculated again and is judged, so continues.
Cryptographic Hash is calculated to each piece obtained in S101 with a strong hash algorithm.Fowler-Noll-Vo can be used
Hash hash algorithms.
Compression cryptographic Hash.To each file fragmentation, after being calculated a cryptographic Hash, can select to compress result
It is short.Specially:Minimum 6 of Hash result are taken, and is showed with an ascii character, as the final Kazakhstan of this burst
The result of uncommon value.
Connection cryptographic Hash.Cryptographic Hash after compressing per piece is connected together, that is, obtains the fuzzy hash value of this document.Such as
Fruit burst value n is different to different files, should also n be included in fuzzy hash value, and specific practice is that directly n is added in former Kazakhstan
A part of the uncommon value finally, as cryptographic Hash.
Computing module 4:Calculate the mould of the fuzzy hash value and acquisition in initial acquisition module obtained in Real-time Collection module
The similarity of paste cryptographic Hash, the span of similarity is as follows for the calculating process of similarity in 0-100 computing modules 4:It is described
The fuzzy hash value of web page contents is a character string, is set to s1, s2.Using the weighing edit distance of s1 to s2 as its phase of evaluation
Like the foundation of property;Weighing edit distance refers to, first judges to be changed into s2 from s1, minimum to need how many step operations(Including inserting, delete
Except, modification), a weights are then given to different operating.Insertion, the weights deleted, change are set to:0.2、0.3、0.5.
Finally, result is added up, that is, obtains weighing edit distance.
Webpage judge module 5:The affiliated type of webpage of the web page contents is judged, if the web page contents belong to the first webpage
Content, then proceed to the first judge module 6;If the web page contents belong to the second web page contents, the second judge module 7 is proceeded to.
First judge module 6:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;
It is no, then proceed to the first alert module 61.
First alert module 61:Give a warning, terminate the monitoring of the web page contents.
Second judge module 7:Whether the value for judging similarity is 100, is then to proceed to the first termination module 71;It is no, then
Proceed to variation analyses module 72.
First terminates module 71:Terminate the monitoring of the web page contents.
Variation analyses module 72:The difference that the web page contents compare original state is found out using DIFF instruments.
3rd judge module 8:Judge that difference, whether because picture change causes, is then to proceed to the first matching module 81;
It is no, then the second matching module of access 82.
First matching module 81:Image content is matched with hostile content feature, whether is had exception in detection picture
Content;It is then to proceed to the second alert module 811;It is no, then proceed to the second termination module 812.
Second alert module 811:Give a warning, terminate the monitoring of the web page contents.
Second terminates module 812:Terminate the monitoring of the web page contents.
Second matching module 82:Matched with sensitive dictionary, if matching sensitive word, given a warning.
Second matching module 82 also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, sends
Warning.
Call picture recognizer to be identified image content in 3rd judge module 8, judge difference whether due to figure
Piece change causes, and is then to proceed to the first matching module 81;It is no, then the second matching module of access 82.
A kind of webpage change monitoring method and system based on Similarity Measure of the present invention, will using web crawlers technology
Web page contents are saved in locally, and in the time interval of setting web page contents are obtained again, using fuzzy hash algorithm and local guarantor
The content of pages similarity deposited is compared.Can be with self-defined web page contents attribute, the web page contents that content will not change, monitoring
Step is more succinct, and monitoring efficiency is high.For the changeable web page contents of content, variation analyses are further carried out, recognize character
Or picture is distorted, web page contents can be accurately identified with the very first time and be tampered or normally update, be improved in webpage
The safety of appearance.
Obviously, above-described embodiment is only intended to clearly illustrate example, and not to the restriction of embodiment.It is right
For those of ordinary skill in the art, can also make on the basis of the above description other multi-forms change or
Change.There is no need to be exhaustive to all of embodiment.And the obvious change thus extended out or
Among changing still in the protection domain of the invention.
Claims (6)
1. a kind of webpage change monitoring method based on Similarity Measure, it is characterised in that comprise the steps of:
S1:Web page contents in network are stored the mould that web page contents are calculated to local memory device by using web crawlers
Paste cryptographic Hash;
S2:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make respective markers, the
One type of webpage is the webpage that web page contents will not change, and the second type of webpage is the net that web page contents can change
Page;
S3:Crawl the web page contents from network again after the time interval of setting, and calculate the mould of web page contents this moment
Paste cryptographic Hash;
S4:The similarity of the fuzzy hash value obtained in the fuzzy hash value obtained in calculation procedure S3 and step S1, similarity
Span be 0-100;
S5:Judge the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page contents, carry out step
S6;If the web page contents belong to the second web page contents, step S7 is carried out;
S6:Whether the value for judging similarity is 100, is, then carry out step S61;It is no, then carry out step S62;
S61:Terminate the monitoring of the web page contents;
S62:Give a warning, terminate the monitoring of the web page contents;
S7:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then carry out step
S71;
S71:The difference that the web page contents compare original state is found out using DIFF instruments;
S72:Judge that difference, whether because picture change causes, is then to carry out step S8;It is no, then carry out step S9;
S8:Image content is matched with hostile content feature, whether is had anomalous content in detection picture;It is then to be walked
Rapid S81;It is no, then carry out step S82;
S81:Give a warning, terminate the monitoring of the web page contents;
S82:Terminate the monitoring of the web page contents;
S9:Matched with sensitive dictionary, if matching sensitive word, given a warning.
2. a kind of webpage change monitoring method based on Similarity Measure according to claim 1, it is characterised in that step
In S9, also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, give a warning.
3. a kind of webpage change monitoring method based on Similarity Measure according to claim 2, it is characterised in that described
Call picture recognizer to be identified image content in step S8, image content matched with hostile content feature,
Whether there is anomalous content in detection picture;It is then to carry out step S81;Otherwise carry out step S82.
4. a kind of webpage change monitoring system based on Similarity Measure, it is characterised in that comprising with lower module:
Initial acquisition module:Web page contents in network are stored to local memory device by using web crawlers, net is calculated
The fuzzy hash value of page content;
Judge module:Judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding
Labelling, the first type of webpage is the webpage that web page contents will not change, and the second type of webpage can become for web page contents
The webpage of change;
Real-time Collection module:Crawl the web page contents from network again after the time interval of setting, and calculate net this moment
The fuzzy hash value of page content;
Computing module:Calculate the fuzzy Hash of the fuzzy hash value and acquisition in initial acquisition module obtained in Real-time Collection module
The similarity of value, the span of similarity is 0-100;
Webpage judge module:Judge the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page contents,
Then proceed to the first judge module;If the web page contents belong to the second web page contents, the second judge module is proceeded to;
First judge module:Whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then
Proceed to the first alert module;
First alert module:Give a warning, terminate the monitoring of the web page contents;
Second judge module:Whether the value for judging similarity is 100, is then to proceed to the first termination module;It is no, then proceed to difference
Different analysis module;
First terminates module:Terminate the monitoring of the web page contents;
Variation analyses module:The difference that the web page contents compare original state is found out using DIFF instruments;
3rd judge module:Judge that difference, whether because picture change causes, is then to proceed to the first matching module;It is no, then access
Second matching module;
First matching module:Image content is matched with hostile content feature, whether is had anomalous content in detection picture;
It is then to proceed to the second alert module;It is no, then proceed to the second termination module;
Second alert module:Give a warning, terminate the monitoring of the web page contents;
Second terminates module:Terminate the monitoring of the web page contents;
Second matching module:Matched with sensitive dictionary, if matching sensitive word, given a warning.
5. a kind of webpage change monitoring system based on Similarity Measure according to claim 4, it is characterised in that described
Second matching module also comprising being matched with Trojan characteristics storehouse, if matching Trojan characteristics, gives a warning.
6. a kind of webpage change monitoring system based on Similarity Measure according to claim 5, it is characterised in that the 3rd
Call picture recognizer to be identified image content in judge module, judge that difference, whether because picture change causes, is,
Then proceed to the first matching module;It is no, then the matching module of access second.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182671.XA CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182671.XA CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599242A true CN106599242A (en) | 2017-04-26 |
CN106599242B CN106599242B (en) | 2019-03-26 |
Family
ID=58600081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611182671.XA Active CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599242B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301355A (en) * | 2017-06-20 | 2017-10-27 | 深信服科技股份有限公司 | A kind of webpage tamper monitoring method and device |
CN107612908A (en) * | 2017-09-15 | 2018-01-19 | 杭州安恒信息技术有限公司 | webpage tamper monitoring method and device |
CN108021692A (en) * | 2017-12-18 | 2018-05-11 | 北京天融信网络安全技术有限公司 | A kind of method of web page monitored, server and computer-readable recording medium |
CN108540466A (en) * | 2018-03-31 | 2018-09-14 | 甘肃万维信息技术有限责任公司 | Based on webpage tamper monitoring and alarming system |
CN108595583A (en) * | 2018-04-18 | 2018-09-28 | 平安科技(深圳)有限公司 | Dynamic chart class page data crawling method, device, terminal and storage medium |
CN108809943A (en) * | 2018-05-14 | 2018-11-13 | 苏州闻道网络科技股份有限公司 | Web publishing method and its device |
CN109241779A (en) * | 2018-08-27 | 2019-01-18 | 浙江每日互动网络科技股份有限公司 | A method of the detection page is distorted |
CN109495471A (en) * | 2018-11-15 | 2019-03-19 | 东信和平科技股份有限公司 | A kind of pair of WEB attack result determination method, device, equipment and readable storage medium storing program for executing |
CN109740094A (en) * | 2018-12-27 | 2019-05-10 | 上海掌门科技有限公司 | Page monitoring method, equipment and computer storage medium |
CN110034921A (en) * | 2019-04-18 | 2019-07-19 | 成都信息工程大学 | The webshell detection method of hash is obscured based on cum rights |
CN110598478A (en) * | 2019-09-19 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Block chain based evidence verification method, device, equipment and storage medium |
CN110659439A (en) * | 2019-09-23 | 2020-01-07 | 杭州迪普科技股份有限公司 | Black chain protection method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571791A (en) * | 2011-12-31 | 2012-07-11 | 奇智软件(北京)有限公司 | Method and system for analyzing tampering of Web page contents |
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN102779245A (en) * | 2011-05-12 | 2012-11-14 | 李朝荣 | Webpage abnormality detection method based on image processing technology |
CN103279475A (en) * | 2013-04-11 | 2013-09-04 | 广东电网公司信息中心 | Detection method and system for WEB application system content change |
CN105678193A (en) * | 2016-01-06 | 2016-06-15 | 杭州数梦工场科技有限公司 | Tamper-proof processing method and device |
-
2016
- 2016-12-20 CN CN201611182671.XA patent/CN106599242B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779245A (en) * | 2011-05-12 | 2012-11-14 | 李朝荣 | Webpage abnormality detection method based on image processing technology |
CN102571791A (en) * | 2011-12-31 | 2012-07-11 | 奇智软件(北京)有限公司 | Method and system for analyzing tampering of Web page contents |
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103279475A (en) * | 2013-04-11 | 2013-09-04 | 广东电网公司信息中心 | Detection method and system for WEB application system content change |
CN105678193A (en) * | 2016-01-06 | 2016-06-15 | 杭州数梦工场科技有限公司 | Tamper-proof processing method and device |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301355A (en) * | 2017-06-20 | 2017-10-27 | 深信服科技股份有限公司 | A kind of webpage tamper monitoring method and device |
CN107301355B (en) * | 2017-06-20 | 2021-07-02 | 深信服科技股份有限公司 | Webpage tampering monitoring method and device |
CN107612908B (en) * | 2017-09-15 | 2020-06-05 | 杭州安恒信息技术股份有限公司 | Webpage tampering monitoring method and device |
CN107612908A (en) * | 2017-09-15 | 2018-01-19 | 杭州安恒信息技术有限公司 | webpage tamper monitoring method and device |
CN108021692A (en) * | 2017-12-18 | 2018-05-11 | 北京天融信网络安全技术有限公司 | A kind of method of web page monitored, server and computer-readable recording medium |
CN108021692B (en) * | 2017-12-18 | 2022-03-11 | 北京天融信网络安全技术有限公司 | Method for monitoring webpage, server and computer readable storage medium |
CN108540466A (en) * | 2018-03-31 | 2018-09-14 | 甘肃万维信息技术有限责任公司 | Based on webpage tamper monitoring and alarming system |
CN108595583A (en) * | 2018-04-18 | 2018-09-28 | 平安科技(深圳)有限公司 | Dynamic chart class page data crawling method, device, terminal and storage medium |
CN108809943B (en) * | 2018-05-14 | 2021-05-14 | 苏州闻道网络科技股份有限公司 | Website monitoring method and device |
CN108809943A (en) * | 2018-05-14 | 2018-11-13 | 苏州闻道网络科技股份有限公司 | Web publishing method and its device |
CN109241779A (en) * | 2018-08-27 | 2019-01-18 | 浙江每日互动网络科技股份有限公司 | A method of the detection page is distorted |
CN109495471A (en) * | 2018-11-15 | 2019-03-19 | 东信和平科技股份有限公司 | A kind of pair of WEB attack result determination method, device, equipment and readable storage medium storing program for executing |
CN109495471B (en) * | 2018-11-15 | 2021-07-02 | 东信和平科技股份有限公司 | Method, device and equipment for judging WEB attack result and readable storage medium |
CN109740094A (en) * | 2018-12-27 | 2019-05-10 | 上海掌门科技有限公司 | Page monitoring method, equipment and computer storage medium |
CN110034921A (en) * | 2019-04-18 | 2019-07-19 | 成都信息工程大学 | The webshell detection method of hash is obscured based on cum rights |
CN110034921B (en) * | 2019-04-18 | 2022-04-15 | 成都信息工程大学 | Webshell detection method based on weighted fuzzy hash |
CN110598478A (en) * | 2019-09-19 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Block chain based evidence verification method, device, equipment and storage medium |
CN110598478B (en) * | 2019-09-19 | 2024-06-07 | 腾讯科技(深圳)有限公司 | Block chain-based evidence verification method, device, equipment and storage medium |
CN110659439A (en) * | 2019-09-23 | 2020-01-07 | 杭州迪普科技股份有限公司 | Black chain protection method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106599242B (en) | 2019-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599242A (en) | Webpage change monitoring method and system based on similarity calculation | |
US9215246B2 (en) | Website scanning device and method | |
CN108985057B (en) | Webshell detection method and related equipment | |
US9519718B2 (en) | Webpage information detection method and system | |
CN108650260B (en) | Malicious website identification method and device | |
CN107038173B (en) | Application query method and device and similar application detection method and device | |
CN110798488B (en) | Web application attack detection method | |
JP5254443B2 (en) | Surveillance method used for communication system images or multimedia video images | |
CN103634593B (en) | Video camera movement detection method and system | |
CN112532624A (en) | Black chain detection method and device, electronic equipment and readable storage medium | |
CN109302383B (en) | URL monitoring method and device | |
CN107426136B (en) | Network attack identification method and device | |
CN116956080A (en) | Data processing method, device and storage medium | |
CN109670153B (en) | Method and device for determining similar posts, storage medium and terminal | |
CN112257546B (en) | Event early warning method and device, electronic equipment and storage medium | |
CN111488621A (en) | Method and system for detecting falsified webpage, electronic equipment and storage medium | |
CN111382432A (en) | Malicious software detection and classification model generation method and device | |
CN113378161A (en) | Security detection method, device, equipment and storage medium | |
CN108881154A (en) | Webpage is tampered detection method, apparatus and system | |
KR102423784B1 (en) | Apparatus and method for correcting transportation data | |
CN107241342A (en) | A kind of network attack crosstalk detecting method and device | |
CN111460448A (en) | Malicious software family detection method and device | |
CN111083705A (en) | Group-sending fraud short message detection method, device, server and storage medium | |
CN116028112A (en) | Small program clone detection method based on complex network analysis | |
CN108536713B (en) | Character string auditing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |