CN106599242B - A kind of webpage change monitoring method and system based on similarity calculation - Google Patents
A kind of webpage change monitoring method and system based on similarity calculation Download PDFInfo
- Publication number
- CN106599242B CN106599242B CN201611182671.XA CN201611182671A CN106599242B CN 106599242 B CN106599242 B CN 106599242B CN 201611182671 A CN201611182671 A CN 201611182671A CN 106599242 B CN106599242 B CN 106599242B
- Authority
- CN
- China
- Prior art keywords
- web page
- page contents
- webpage
- module
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
Abstract
A kind of webpage change monitoring method and system based on similarity calculation of the invention, web page contents are saved in local using web crawlers technology, web page contents are obtained again in the time interval of setting, are compared using fuzzy hash algorithm with the content of pages similarity locally saved.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring step is more succinct, and monitoring efficiency is high.Distorting for web page contents changeable for content, further progress variance analysis, identification character or picture, can accurately identify web page contents at the first time and be tampered or normally update, improve the safety of web page contents.
Description
Technical field
The present invention relates to a kind of webpage information monitoring technology, relate in particular to a kind of webpage change based on similarity calculation
More monitoring method and system.
Background technique
A key content for guaranteeing user's normal browsing webpage is to prevent the webpage (page) of website side publication by hacker
It distorts.It is so-called to distort, it is different from legal web page contents modification (refreshing), refers to that the variation of web page contents does not meet portal management
The expection of member or user institute requested webpage.Webpage all faces with internet information explosive growth, in every day internet
Face the risk being tampered.Cannot such as find that webpage is tampered in time will bring immeasurable loss to website and user.
Webpage is mainly had by the mode that hacker distorts: hacker may break through website, directly to the web page contents of the publication into
Row modification.The scheme that detection webpage is tampered in the prior art are as follows:: periodical monitoring is carried out to website using scanner, specifically
Are as follows: installation surface sweeping device software periodically acquires URL (Uniform Resoure Locator, the unification for accessing monitored webpage
Resource localizer), the benchmark page is set according to certain algorithm, and the page of monitored webpage is compared with the benchmark page, obtains
Be monitored the ratio that the page elements modified in webpage account for all page elements of the webpage out, and according to the ratio with set in advance
The proportion threshold value set judges whether the page is modified, which is less than proportion threshold value and thinks that monitored website is not tampered with, otherwise
Think that monitored webpage is tampered.Alternatively, presetting certain sensitive words, judge in monitored webpage to include such sensitive word
When, then it is assumed that the page is distorted by hacker.Since there are many existing website dynamic web page technique, existing technical solution is difficult
Accurately identify webpage be tampered or normal content refresh, be inevitably present erroneous detection and missing inspection.
Summary of the invention
For this purpose, technical problem to be solved by the present invention lies in real-time monitoring webpages in the prior art can not accurately identify net
Page is tampered or normal more new content.
In order to solve the above technical problems, the technical solution adopted in the present invention:
A kind of webpage change monitoring method based on similarity calculation comprising the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, web page contents are calculated
Fuzzy hash value;
S2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding mark
Note, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can change
Webpage;
S3: the web page contents are crawled from network again after the time interval of setting, and calculate web page contents this moment
Fuzzy hash value;
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, phase are calculated
Value range like degree is 0-100;
S5: judge that the affiliated type of webpage of the web page contents carries out if the web page contents belong to the first web page contents
Step S6;If the web page contents belong to the second web page contents, step S7 is carried out;
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62;
S61: terminate the monitoring of the web page contents;
S62: giving a warning, and terminates the monitoring of the web page contents;
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then it is walked
Rapid S71;
S71: the difference that the web page contents compare original state is found out using DIFF tool;
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9;
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;Be, then into
Row step S81;It is no, then carry out step S82;
S81: giving a warning, and terminates the monitoring of the web page contents;
S82: terminate the monitoring of the web page contents;
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
In step S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.
Picture recognizer is called to identify image content in the step S8, image content and hostile content is special
Sign is matched, and whether detect in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
A kind of webpage change monitoring system based on similarity calculation, comprising with lower module:
Initial acquisition module: the web page contents in network are stored by using web crawlers to local memory device, meter
Calculate the fuzzy hash value of web page contents;
Judgment module: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make
Respective markers, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can send out
The webpage for changing;
Real-time acquisition module: the web page contents are crawled from network again after the time interval of setting, and calculate this
Carve the fuzzy hash value of web page contents;
Computing module: calculating the fuzzy hash value that obtains in real-time acquisition module and obtains in initial acquisition module fuzzy
The similarity of cryptographic Hash, the value range of similarity are 0-100;
Webpage judgment module: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first webpage
Content is then transferred to first judgment module;If the web page contents belong to the second web page contents, it is transferred to the second judgment module;
First judgment module: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;
It is no, then it is transferred to the first alert module;
First alert module: giving a warning, and terminates the monitoring of the web page contents;
Second judgment module: whether the value for judging similarity is 100, is then to be transferred to the first termination module;It is no, then turn
Enter variance analysis module;
First termination module: terminate the monitoring of the web page contents;
Variance analysis module: the difference that the web page contents compare original state is found out using DIFF tool;
Third judgment module: judge that difference is then to be transferred to the first matching module whether since picture variation causes;It is no, then
The second matching module of access;
First matching module: image content is matched with hostile content feature, and whether detect has exception interior in picture
Hold;It is then to be transferred to the second alert module;It is no, then it is transferred to the second termination module;
Second alert module: giving a warning, and terminates the monitoring of the web page contents;
Second termination module: terminate the monitoring of the web page contents;
Second matching module: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
Second matching module also includes to be matched with Trojan characteristics library, if being matched to Trojan characteristics, issues police
It accuses.
It calls picture recognizer to identify image content in third judgment module, judges difference whether due to picture
Variation causes, and is, then is transferred to the first matching module;It is no, then the second matching module of access.
The above technical solution of the present invention has the following advantages over the prior art.
A kind of webpage change monitoring method and system based on similarity calculation of the invention, will using web crawlers technology
Web page contents are saved in local, obtain web page contents again in the time interval of setting, utilize fuzzy hash algorithm and local guarantor
The content of pages similarity deposited is compared.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring
Step is more succinct, and monitoring efficiency is high.Web page contents changeable for content, further progress variance analysis identify character
Or picture is distorted, and can be accurately identified web page contents at the first time and is tampered or normally updates, and is improved in webpage
The safety of appearance.
Detailed description of the invention
In order to make the content of the present invention more clearly understood, it below according to specific embodiments of the present invention and combines
Attached drawing, the present invention is described in further detail, wherein
Fig. 1 is a kind of flow chart of the webpage change monitoring method based on similarity calculation of the present invention;
Fig. 2 is a kind of structural block diagram of the webpage change monitoring system based on similarity calculation of the present invention.
Appended drawing reference indicates in figure are as follows: 1- initial acquisition module;2- judgment module;The real-time acquisition module of 3-;4- calculates mould
Block;5- webpage judgment module;6- first judgment module;The first alert module of 61-;The second judgment module of 7-;71- first terminates mould
Block;72- variance analysis module;8- third judgment module;The first matching module of 81-;The second matching module of 82-;811- second is alert
Accuse module;812- second terminates module.
Specific embodiment
A kind of webpage change monitoring method based on similarity calculation, as shown in Figure 1 comprising the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, web page contents are calculated
Fuzzy hash value.Fuzzy hash value mainly using fuzzy hash algorithm, can call ssdeep tool.Fuzzy hash algorithm
It is called the fragment hash algorithm (context triggered piecewise hashing, CTPH) based on content segmentation, mainly
Similarity system design for file.2006, Jesse Kornblum proposed CTPH, and provides the calculation of an entitled spamsum
Method example.Then, Jason Sherman develops ssdeep tool (http://ssdeep.sourceforge.net/).It should
Algorithm can be used for Malicious Code Detection in the present invention, can be used for bug excavation etc..The cardinal principle of fuzzy Hash is,
Using a weak Hash calculation file local content, fragment is carried out to file under given conditions, then uses a strong Hash
It to every calculating cryptographic Hash of file, takes a part of these values and connects, a fuzzy Kazakhstan is constituted together with fragmented condition
Uncommon result.Using string-similarity comparison algorithm judge two fuzzy hash values similarity how many, to judge
The similarity degree of two files.Part variation (be included in many places modification, increase, deletion partial content) to file, uses mould
Paste Hash can be found and the similarity relation of source file, is to judge preferably a kind of method of similitude at present.
S2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding mark
Note, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can change
Webpage.Can with manually classify, it is (such as Chinese with sorting technique also to can use web page contents in the prior art identification
Patent document 201210299843.7,201210376933.1 etc. is recorded) classify to web page contents.
S3: the web page contents are crawled from network again after the time interval of setting, and calculate web page contents this moment
Fuzzy hash value.
The calculating process of fuzzy hash value is as follows in step S1 and S3:
With a weak hash algorithm to the file fragmentation of the web page contents.Method particularly includes:
A part of content is read hereof, is calculated with weak hash algorithm Alder-32, in a manner of rolling Hash
Obtain the cryptographic Hash of 4 bytes.So-called rolling Hash refers to, for example has calculated that the cryptographic Hash h1 of abcdef originally, connects
Get off the cryptographic Hash of bcdefg to be calculated, does not need to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are
Two functions only need accordingly to increase and decrease influence of the residual quantity to cryptographic Hash.This Hash can greatly speed up fragment judgement
Speed.
Fragment value n is set, fragmented condition is controlled by it.The value of n is determined according to file size, file content etc..It determines
Principle and method are as follows:
The value of n takes 2 integer power always, and such Alder-32 cryptographic Hash is divided by the remainder of n close to being uniformly distributed.Only
The fragment when remainder is equal to n-1, can fragment in the case where being equivalent to only similar 1/n.That is, to a file,
The every movement of window is primary, and just have a 1/n may fragment.If certain the piece number once divided is too small, that is reduced by the value of n, makes every
A possibility that secondary fragment, increases, and increases the piece number.And if the piece felt point is too many, just increase the value of n, makes the possibility of each fragment
Property reduce, reduce the piece number.N is adjusted, makes final the piece number as far as possible between 32 to 64 divided by or multiplied by 2 every time.
Since a possibility that fragment is almost 1/n, so the n value attempted for the first time is exactly one close when each run ssdeep
Value in file size/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location fragment;Otherwise, regardless of
Piece, window roll a byte backward, then calculate Alder-32 cryptographic Hash again and judge, so continue.
With one strong hash algorithm to each calculating cryptographic Hash obtained in S101.Fowler-Noll-Vo can be used
Hash hash algorithm.
Compress cryptographic Hash.To each file fragmentation, it is calculated after a cryptographic Hash, can choose and compress result
It is short.Specifically: minimum 6 of Hash result are taken, and are showed with an ascii character, the final Kazakhstan as this fragment
The result of uncommon value.
Connect cryptographic Hash.Every compressed cryptographic Hash is connected together to get the fuzzy hash value of this document is arrived.Such as
Fruit fragment value n is different to different files, and also n should be included in fuzzy hash value, and specific practice is directly that n is additional in original Kazakhstan
The a part of uncommon value finally, as cryptographic Hash.
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, phase are calculated
Value range like degree is 0-100.The calculating process of similarity is as follows in step S4: the fuzzy hash value of the web page contents is
One character string, is set as s1, s2.Using the weighing edit distance of s1 to s2 as the foundation for evaluating its similitude;Weighting editor away from
From referring to, first judge to become s2 from s1, it is minimum to need how many step operations (including insertion, deletion, modification), then to different operation
Provide a weight.Insertion, the weight deleted, modified are set to: 0.2,0.3,0.5.Finally, by result add up to get
To weighing edit distance.
By this distance divided by s1 and s2 length and, absolute results are become into relative result, re-map 0-100's
In one integer value, wherein 100 indicate that two character strings are completely the same, and 0 indicates completely dissimilar;The result can be used
To judge the similarity degree of two web page contents.
S5: judge that the affiliated type of webpage of the web page contents carries out if the web page contents belong to the first web page contents
Step S6;If the web page contents belong to the second web page contents, step S7 is carried out.
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62.
S61: terminate the monitoring of the web page contents.
S62: giving a warning, and terminates the monitoring of the web page contents.
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then it is walked
Rapid S71.
S71: the difference that the web page contents compare original state is found out using DIFF tool.
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9.
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;Be, then into
Row step S81;It is no, then carry out step S82.
S81: giving a warning, and terminates the monitoring of the web page contents.
S82: terminate the monitoring of the web page contents.
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.If changing unit is character string, use
Regular expression mode is matched with preset sensitive dictionary, is such as matched to sensitive word, is then alerted.
In step S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.It is also
It is matched in the way of regular expression with preset Trojan characteristics library.
Picture recognizer is called to identify image content in the step S8, image content and hostile content is special
Sign is matched, and whether detect in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
A kind of webpage change monitoring system based on similarity calculation, comprising with lower module:
Initial acquisition module 1: the web page contents in network are stored by using web crawlers to local memory device, meter
Calculate the fuzzy hash value of web page contents.Fuzzy hash value mainly using fuzzy hash algorithm, can call ssdeep tool.
Judgment module 2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make
Respective markers, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can send out
The webpage for changing.Can with manually classify, it is (all with sorting technique also to can use web page contents in the prior art identification
Such as Chinese patent literature 201210299843.7,201210376933.1 is recorded) classify to web page contents.
Real-time acquisition module 3: the web page contents are crawled from network again after the time interval of setting, and calculate this
Carve the fuzzy hash value of web page contents.
The calculating process of fuzzy hash value is as follows in initial acquisition module 1 and real-time acquisition module 3:
With a weak hash algorithm to the file fragmentation of the web page contents.Method particularly includes:
A part of content is read hereof, is calculated with weak hash algorithm Alder-32, in a manner of rolling Hash
Obtain the cryptographic Hash of 4 bytes.So-called rolling Hash refers to, for example has calculated that the cryptographic Hash h1 of abcdef originally, connects
Get off the cryptographic Hash of bcdefg to be calculated, does not need to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are
Two functions only need accordingly to increase and decrease influence of the residual quantity to cryptographic Hash.This Hash can greatly speed up fragment judgement
Speed.
Fragment value n is set, fragmented condition is controlled by it.The value of n is determined according to file size, file content etc..It determines
Principle and method are as follows:
The value of n takes 2 integer power always, and such Alder-32 cryptographic Hash is divided by the remainder of n close to being uniformly distributed.Only
The fragment when remainder is equal to n-1, can fragment in the case where being equivalent to only similar 1/n.That is, to a file,
The every movement of window is primary, and just have a 1/n may fragment.If certain the piece number once divided is too small, that is reduced by the value of n, makes every
A possibility that secondary fragment, increases, and increases the piece number.And if the piece felt point is too many, just increase the value of n, makes the possibility of each fragment
Property reduce, reduce the piece number.N is adjusted, makes final the piece number as far as possible between 32 to 64 divided by or multiplied by 2 every time.
Since a possibility that fragment is almost 1/n, so the n value attempted for the first time is exactly one close when each run ssdeep
Value in file size/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location fragment;Otherwise, regardless of
Piece, window roll a byte backward, then calculate Alder-32 cryptographic Hash again and judge, so continue.
With one strong hash algorithm to each calculating cryptographic Hash obtained in S101.Fowler-Noll-Vo can be used
Hash hash algorithm.
Compress cryptographic Hash.To each file fragmentation, it is calculated after a cryptographic Hash, can choose and compress result
It is short.Specifically: minimum 6 of Hash result are taken, and are showed with an ascii character, the final Kazakhstan as this fragment
The result of uncommon value.
Connect cryptographic Hash.Every compressed cryptographic Hash is connected together to get the fuzzy hash value of this document is arrived.Such as
Fruit fragment value n is different to different files, and also n should be included in fuzzy hash value, and specific practice is directly that n is additional in original Kazakhstan
The a part of uncommon value finally, as cryptographic Hash.
Computing module 4: the mould obtained in the fuzzy hash value and initial acquisition module obtained in real-time acquisition module is calculated
The similarity of cryptographic Hash is pasted, the value range of similarity is that the calculating process of similarity in 0-100 computing module 4 is as follows: described
The fuzzy hash value of web page contents is a character string, is set as s1, s2.Using the weighing edit distance of s1 to s2 as its phase of evaluation
Like the foundation of property;Weighing edit distance refers to, first judges to become s2 from s1, how much minimum needs, which walk, operates (including be inserted into, delete
Remove, modify), a weight then is provided to different operation.Insertion, the weight deleted, modified are set to: 0.2,0.3,0.5.
Finally, result is added up to get weighing edit distance is arrived.
Webpage judgment module 5: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first webpage
Content is then transferred to first judgment module 6;If the web page contents belong to the second web page contents, it is transferred to the second judgment module 7.
First judgment module 6: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;
It is no, then it is transferred to the first alert module 61.
First alert module 61: giving a warning, and terminates the monitoring of the web page contents.
Second judgment module 7: whether the value for judging similarity is 100, is then to be transferred to the first termination module 71;It is no, then
It is transferred to variance analysis module 72.
First termination module 71: terminate the monitoring of the web page contents.
Variance analysis module 72: the difference that the web page contents compare original state is found out using DIFF tool.
Third judgment module 8: judge that difference is then to be transferred to the first matching module 81 whether since picture variation causes;
It is no, then the second matching module of access 82.
First matching module 81: image content is matched with hostile content feature, and whether detect in picture has exception
Content;It is then to be transferred to the second alert module 811;It is no, then it is transferred to the second termination module 812.
Second alert module 811: giving a warning, and terminates the monitoring of the web page contents.
Second termination module 812: terminate the monitoring of the web page contents.
Second matching module 82: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
Second matching module 82 also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, issues
Warning.
It calls picture recognizer to identify image content in third judgment module 8, judges difference whether due to figure
Piece variation causes, and is then to be transferred to the first matching module 81;It is no, then the second matching module of access 82.
A kind of webpage change monitoring method and system based on similarity calculation of the invention, will using web crawlers technology
Web page contents are saved in local, obtain web page contents again in the time interval of setting, utilize fuzzy hash algorithm and local guarantor
The content of pages similarity deposited is compared.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring
Step is more succinct, and monitoring efficiency is high.Web page contents changeable for content, further progress variance analysis identify character
Or picture is distorted, and can be accurately identified web page contents at the first time and is tampered or normally updates, and is improved in webpage
The safety of appearance.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right
For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or
It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or
It changes still within the protection scope of the invention.
Claims (4)
1. a kind of webpage change monitoring method based on similarity calculation, which is characterized in that comprise the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, the mould of web page contents is calculated
Paste cryptographic Hash;
S2: judging that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make respective markers, the
One type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can changed net
Page;
S3: the web page contents are crawled from network again after the time interval of setting, and calculate the mould of web page contents this moment
Paste cryptographic Hash;
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, similarity are calculated
Value range be 0-100;
S5: judge that the affiliated type of webpage of the web page contents carries out step if the web page contents belong to the first type of webpage
S6;If the web page contents belong to the second type of webpage, step S7 is carried out;
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62;
S61: terminate the monitoring of the web page contents;
S62: giving a warning, and terminates the monitoring of the web page contents;
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then carry out step
S71;
S71: the difference that the web page contents compare original state is found out using DIFF tool;
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9;
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;It is then to be walked
Rapid S81;It is no, then carry out step S82;
S81: giving a warning, and terminates the monitoring of the web page contents;
S82: terminate the monitoring of the web page contents;
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning;
Call picture recognizer to identify image content in the step S8, by image content and hostile content feature into
Whether row matching, detecting in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
2. a kind of webpage change monitoring method based on similarity calculation according to claim 1, which is characterized in that step
In S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.
3. a kind of webpage change monitoring system based on similarity calculation, which is characterized in that comprising with lower module:
Initial acquisition module: the web page contents in network are stored by using web crawlers to local memory device, net is calculated
The fuzzy hash value of page content;
Judgment module: judging that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and makes corresponding
Label, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can become
The webpage of change;
Real-time acquisition module: the web page contents are crawled from network again after the time interval of setting, and calculate net this moment
The fuzzy hash value of page content;
Computing module: the fuzzy Hash obtained in the fuzzy hash value and initial acquisition module obtained in real-time acquisition module is calculated
The similarity of value, the value range of similarity are 0-100;
Webpage judgment module: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page class
Type is then transferred to first judgment module;If the web page contents belong to the second type of webpage, it is transferred to the second judgment module;
First judgment module: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then
It is transferred to the first alert module;
First alert module: giving a warning, and terminates the monitoring of the web page contents;
Second judgment module: whether the value for judging similarity is 100, is then to be transferred to the first termination module;It is no, then it is transferred to difference
Different analysis module;
First termination module: terminate the monitoring of the web page contents;
Variance analysis module: the difference that the web page contents compare original state is found out using DIFF tool;It is transferred to third judgement
Module;
Third judgment module: judge that difference is then to be transferred to the first matching module whether since picture variation causes;It is no, then it is transferred to
Second matching module;
First matching module: image content is matched with hostile content feature, and whether detect in picture has anomalous content;
It is then to be transferred to the second alert module;It is no, then it is transferred to the second termination module;
Second alert module: giving a warning, and terminates the monitoring of the web page contents;
Second termination module: terminate the monitoring of the web page contents;
Second matching module: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning;
It calls picture recognizer to identify image content in third judgment module, judges difference whether since picture changes
Cause, be, is then transferred to the first matching module;It is no, then it is transferred to the second matching module.
4. a kind of webpage change monitoring system based on similarity calculation according to claim 3, which is characterized in that described
Second matching module also includes to be matched with Trojan characteristics library, if being matched to Trojan characteristics, is given a warning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182671.XA CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611182671.XA CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599242A CN106599242A (en) | 2017-04-26 |
CN106599242B true CN106599242B (en) | 2019-03-26 |
Family
ID=58600081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611182671.XA Active CN106599242B (en) | 2016-12-20 | 2016-12-20 | A kind of webpage change monitoring method and system based on similarity calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599242B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301355B (en) * | 2017-06-20 | 2021-07-02 | 深信服科技股份有限公司 | Webpage tampering monitoring method and device |
CN107612908B (en) * | 2017-09-15 | 2020-06-05 | 杭州安恒信息技术股份有限公司 | Webpage tampering monitoring method and device |
CN108021692B (en) * | 2017-12-18 | 2022-03-11 | 北京天融信网络安全技术有限公司 | Method for monitoring webpage, server and computer readable storage medium |
CN108540466A (en) * | 2018-03-31 | 2018-09-14 | 甘肃万维信息技术有限责任公司 | Based on webpage tamper monitoring and alarming system |
CN108595583B (en) * | 2018-04-18 | 2022-12-02 | 平安科技(深圳)有限公司 | Dynamic graph page data crawling method, device, terminal and storage medium |
CN108809943B (en) * | 2018-05-14 | 2021-05-14 | 苏州闻道网络科技股份有限公司 | Website monitoring method and device |
CN109241779A (en) * | 2018-08-27 | 2019-01-18 | 浙江每日互动网络科技股份有限公司 | A method of the detection page is distorted |
CN109495471B (en) * | 2018-11-15 | 2021-07-02 | 东信和平科技股份有限公司 | Method, device and equipment for judging WEB attack result and readable storage medium |
CN109740094A (en) * | 2018-12-27 | 2019-05-10 | 上海掌门科技有限公司 | Page monitoring method, equipment and computer storage medium |
CN110034921B (en) * | 2019-04-18 | 2022-04-15 | 成都信息工程大学 | Webshell detection method based on weighted fuzzy hash |
CN110598478A (en) * | 2019-09-19 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Block chain based evidence verification method, device, equipment and storage medium |
CN110659439A (en) * | 2019-09-23 | 2020-01-07 | 杭州迪普科技股份有限公司 | Black chain protection method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN102779245A (en) * | 2011-05-12 | 2012-11-14 | 李朝荣 | Webpage abnormality detection method based on image processing technology |
CN103279475A (en) * | 2013-04-11 | 2013-09-04 | 广东电网公司信息中心 | Detection method and system for WEB application system content change |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571791B (en) * | 2011-12-31 | 2015-03-25 | 奇智软件(北京)有限公司 | Method and system for analyzing tampering of Web page contents |
CN105678193B (en) * | 2016-01-06 | 2018-08-14 | 杭州数梦工场科技有限公司 | A kind of anti-tamper treating method and apparatus |
-
2016
- 2016-12-20 CN CN201611182671.XA patent/CN106599242B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779245A (en) * | 2011-05-12 | 2012-11-14 | 李朝荣 | Webpage abnormality detection method based on image processing technology |
CN102682098A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and device for detecting web page content changes |
CN103279475A (en) * | 2013-04-11 | 2013-09-04 | 广东电网公司信息中心 | Detection method and system for WEB application system content change |
Also Published As
Publication number | Publication date |
---|---|
CN106599242A (en) | 2017-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599242B (en) | A kind of webpage change monitoring method and system based on similarity calculation | |
US9215246B2 (en) | Website scanning device and method | |
KR101162051B1 (en) | Using string comparison malicious code detection and classification system and method | |
US10474818B1 (en) | Methods and devices for detection of malware | |
CN108985057B (en) | Webshell detection method and related equipment | |
CN103077250B (en) | A kind of capturing webpage contents method and device | |
JP7120350B2 (en) | SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM | |
US9355250B2 (en) | Method and system for rapidly scanning files | |
JP5254443B2 (en) | Surveillance method used for communication system images or multimedia video images | |
CN102779245A (en) | Webpage abnormality detection method based on image processing technology | |
CN112148305A (en) | Application detection method and device, computer equipment and readable storage medium | |
CN112532624A (en) | Black chain detection method and device, electronic equipment and readable storage medium | |
US9613271B2 (en) | Determining severity of a geomagnetic disturbance on a power grid using similarity measures | |
CN104036190A (en) | Method and device for detecting page tampering | |
CN104036189A (en) | Page distortion detecting method and black link database generating method | |
CN108363711B (en) | Method and device for detecting dark chain in webpage | |
CN113535813A (en) | Data mining method and device, electronic equipment and storage medium | |
CN111460448A (en) | Malicious software family detection method and device | |
CN111104674A (en) | Power firmware homologous binary file association method and system | |
CN113850297B (en) | Road data monitoring method and device, electronic equipment and storage medium | |
JP7140268B2 (en) | WARNING DEVICE, CONTROL METHOD AND PROGRAM | |
CN114722806A (en) | Text processing method, device and equipment | |
JP6749865B2 (en) | INFORMATION COLLECTION DEVICE AND INFORMATION COLLECTION METHOD | |
CN111339453A (en) | Navigation page distinguishing method and device | |
CN111488621A (en) | Method and system for detecting falsified webpage, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |