CN106599242B - A kind of webpage change monitoring method and system based on similarity calculation - Google Patents

A kind of webpage change monitoring method and system based on similarity calculation Download PDF

Info

Publication number
CN106599242B
CN106599242B CN201611182671.XA CN201611182671A CN106599242B CN 106599242 B CN106599242 B CN 106599242B CN 201611182671 A CN201611182671 A CN 201611182671A CN 106599242 B CN106599242 B CN 106599242B
Authority
CN
China
Prior art keywords
web page
page contents
webpage
module
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611182671.XA
Other languages
Chinese (zh)
Other versions
CN106599242A (en
Inventor
刘坤朋
郑杭
练军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FUJIAN LIUREN NETWORK SECURITY Co Ltd
Original Assignee
FUJIAN LIUREN NETWORK SECURITY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FUJIAN LIUREN NETWORK SECURITY Co Ltd filed Critical FUJIAN LIUREN NETWORK SECURITY Co Ltd
Priority to CN201611182671.XA priority Critical patent/CN106599242B/en
Publication of CN106599242A publication Critical patent/CN106599242A/en
Application granted granted Critical
Publication of CN106599242B publication Critical patent/CN106599242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements

Abstract

A kind of webpage change monitoring method and system based on similarity calculation of the invention, web page contents are saved in local using web crawlers technology, web page contents are obtained again in the time interval of setting, are compared using fuzzy hash algorithm with the content of pages similarity locally saved.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring step is more succinct, and monitoring efficiency is high.Distorting for web page contents changeable for content, further progress variance analysis, identification character or picture, can accurately identify web page contents at the first time and be tampered or normally update, improve the safety of web page contents.

Description

A kind of webpage change monitoring method and system based on similarity calculation
Technical field
The present invention relates to a kind of webpage information monitoring technology, relate in particular to a kind of webpage change based on similarity calculation More monitoring method and system.
Background technique
A key content for guaranteeing user's normal browsing webpage is to prevent the webpage (page) of website side publication by hacker It distorts.It is so-called to distort, it is different from legal web page contents modification (refreshing), refers to that the variation of web page contents does not meet portal management The expection of member or user institute requested webpage.Webpage all faces with internet information explosive growth, in every day internet Face the risk being tampered.Cannot such as find that webpage is tampered in time will bring immeasurable loss to website and user.
Webpage is mainly had by the mode that hacker distorts: hacker may break through website, directly to the web page contents of the publication into Row modification.The scheme that detection webpage is tampered in the prior art are as follows:: periodical monitoring is carried out to website using scanner, specifically Are as follows: installation surface sweeping device software periodically acquires URL (Uniform Resoure Locator, the unification for accessing monitored webpage Resource localizer), the benchmark page is set according to certain algorithm, and the page of monitored webpage is compared with the benchmark page, obtains Be monitored the ratio that the page elements modified in webpage account for all page elements of the webpage out, and according to the ratio with set in advance The proportion threshold value set judges whether the page is modified, which is less than proportion threshold value and thinks that monitored website is not tampered with, otherwise Think that monitored webpage is tampered.Alternatively, presetting certain sensitive words, judge in monitored webpage to include such sensitive word When, then it is assumed that the page is distorted by hacker.Since there are many existing website dynamic web page technique, existing technical solution is difficult Accurately identify webpage be tampered or normal content refresh, be inevitably present erroneous detection and missing inspection.
Summary of the invention
For this purpose, technical problem to be solved by the present invention lies in real-time monitoring webpages in the prior art can not accurately identify net Page is tampered or normal more new content.
In order to solve the above technical problems, the technical solution adopted in the present invention:
A kind of webpage change monitoring method based on similarity calculation comprising the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, web page contents are calculated Fuzzy hash value;
S2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding mark Note, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can change Webpage;
S3: the web page contents are crawled from network again after the time interval of setting, and calculate web page contents this moment Fuzzy hash value;
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, phase are calculated Value range like degree is 0-100;
S5: judge that the affiliated type of webpage of the web page contents carries out if the web page contents belong to the first web page contents Step S6;If the web page contents belong to the second web page contents, step S7 is carried out;
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62;
S61: terminate the monitoring of the web page contents;
S62: giving a warning, and terminates the monitoring of the web page contents;
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then it is walked Rapid S71;
S71: the difference that the web page contents compare original state is found out using DIFF tool;
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9;
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;Be, then into Row step S81;It is no, then carry out step S82;
S81: giving a warning, and terminates the monitoring of the web page contents;
S82: terminate the monitoring of the web page contents;
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
In step S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.
Picture recognizer is called to identify image content in the step S8, image content and hostile content is special Sign is matched, and whether detect in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
A kind of webpage change monitoring system based on similarity calculation, comprising with lower module:
Initial acquisition module: the web page contents in network are stored by using web crawlers to local memory device, meter Calculate the fuzzy hash value of web page contents;
Judgment module: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make Respective markers, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can send out The webpage for changing;
Real-time acquisition module: the web page contents are crawled from network again after the time interval of setting, and calculate this Carve the fuzzy hash value of web page contents;
Computing module: calculating the fuzzy hash value that obtains in real-time acquisition module and obtains in initial acquisition module fuzzy The similarity of cryptographic Hash, the value range of similarity are 0-100;
Webpage judgment module: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first webpage Content is then transferred to first judgment module;If the web page contents belong to the second web page contents, it is transferred to the second judgment module;
First judgment module: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents; It is no, then it is transferred to the first alert module;
First alert module: giving a warning, and terminates the monitoring of the web page contents;
Second judgment module: whether the value for judging similarity is 100, is then to be transferred to the first termination module;It is no, then turn Enter variance analysis module;
First termination module: terminate the monitoring of the web page contents;
Variance analysis module: the difference that the web page contents compare original state is found out using DIFF tool;
Third judgment module: judge that difference is then to be transferred to the first matching module whether since picture variation causes;It is no, then The second matching module of access;
First matching module: image content is matched with hostile content feature, and whether detect has exception interior in picture Hold;It is then to be transferred to the second alert module;It is no, then it is transferred to the second termination module;
Second alert module: giving a warning, and terminates the monitoring of the web page contents;
Second termination module: terminate the monitoring of the web page contents;
Second matching module: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
Second matching module also includes to be matched with Trojan characteristics library, if being matched to Trojan characteristics, issues police It accuses.
It calls picture recognizer to identify image content in third judgment module, judges difference whether due to picture Variation causes, and is, then is transferred to the first matching module;It is no, then the second matching module of access.
The above technical solution of the present invention has the following advantages over the prior art.
A kind of webpage change monitoring method and system based on similarity calculation of the invention, will using web crawlers technology Web page contents are saved in local, obtain web page contents again in the time interval of setting, utilize fuzzy hash algorithm and local guarantor The content of pages similarity deposited is compared.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring Step is more succinct, and monitoring efficiency is high.Web page contents changeable for content, further progress variance analysis identify character Or picture is distorted, and can be accurately identified web page contents at the first time and is tampered or normally updates, and is improved in webpage The safety of appearance.
Detailed description of the invention
In order to make the content of the present invention more clearly understood, it below according to specific embodiments of the present invention and combines Attached drawing, the present invention is described in further detail, wherein
Fig. 1 is a kind of flow chart of the webpage change monitoring method based on similarity calculation of the present invention;
Fig. 2 is a kind of structural block diagram of the webpage change monitoring system based on similarity calculation of the present invention.
Appended drawing reference indicates in figure are as follows: 1- initial acquisition module;2- judgment module;The real-time acquisition module of 3-;4- calculates mould Block;5- webpage judgment module;6- first judgment module;The first alert module of 61-;The second judgment module of 7-;71- first terminates mould Block;72- variance analysis module;8- third judgment module;The first matching module of 81-;The second matching module of 82-;811- second is alert Accuse module;812- second terminates module.
Specific embodiment
A kind of webpage change monitoring method based on similarity calculation, as shown in Figure 1 comprising the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, web page contents are calculated Fuzzy hash value.Fuzzy hash value mainly using fuzzy hash algorithm, can call ssdeep tool.Fuzzy hash algorithm It is called the fragment hash algorithm (context triggered piecewise hashing, CTPH) based on content segmentation, mainly Similarity system design for file.2006, Jesse Kornblum proposed CTPH, and provides the calculation of an entitled spamsum Method example.Then, Jason Sherman develops ssdeep tool (http://ssdeep.sourceforge.net/).It should Algorithm can be used for Malicious Code Detection in the present invention, can be used for bug excavation etc..The cardinal principle of fuzzy Hash is, Using a weak Hash calculation file local content, fragment is carried out to file under given conditions, then uses a strong Hash It to every calculating cryptographic Hash of file, takes a part of these values and connects, a fuzzy Kazakhstan is constituted together with fragmented condition Uncommon result.Using string-similarity comparison algorithm judge two fuzzy hash values similarity how many, to judge The similarity degree of two files.Part variation (be included in many places modification, increase, deletion partial content) to file, uses mould Paste Hash can be found and the similarity relation of source file, is to judge preferably a kind of method of similitude at present.
S2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make corresponding mark Note, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can change Webpage.Can with manually classify, it is (such as Chinese with sorting technique also to can use web page contents in the prior art identification Patent document 201210299843.7,201210376933.1 etc. is recorded) classify to web page contents.
S3: the web page contents are crawled from network again after the time interval of setting, and calculate web page contents this moment Fuzzy hash value.
The calculating process of fuzzy hash value is as follows in step S1 and S3:
With a weak hash algorithm to the file fragmentation of the web page contents.Method particularly includes:
A part of content is read hereof, is calculated with weak hash algorithm Alder-32, in a manner of rolling Hash Obtain the cryptographic Hash of 4 bytes.So-called rolling Hash refers to, for example has calculated that the cryptographic Hash h1 of abcdef originally, connects Get off the cryptographic Hash of bcdefg to be calculated, does not need to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are Two functions only need accordingly to increase and decrease influence of the residual quantity to cryptographic Hash.This Hash can greatly speed up fragment judgement Speed.
Fragment value n is set, fragmented condition is controlled by it.The value of n is determined according to file size, file content etc..It determines Principle and method are as follows:
The value of n takes 2 integer power always, and such Alder-32 cryptographic Hash is divided by the remainder of n close to being uniformly distributed.Only The fragment when remainder is equal to n-1, can fragment in the case where being equivalent to only similar 1/n.That is, to a file, The every movement of window is primary, and just have a 1/n may fragment.If certain the piece number once divided is too small, that is reduced by the value of n, makes every A possibility that secondary fragment, increases, and increases the piece number.And if the piece felt point is too many, just increase the value of n, makes the possibility of each fragment Property reduce, reduce the piece number.N is adjusted, makes final the piece number as far as possible between 32 to 64 divided by or multiplied by 2 every time. Since a possibility that fragment is almost 1/n, so the n value attempted for the first time is exactly one close when each run ssdeep Value in file size/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location fragment;Otherwise, regardless of Piece, window roll a byte backward, then calculate Alder-32 cryptographic Hash again and judge, so continue.
With one strong hash algorithm to each calculating cryptographic Hash obtained in S101.Fowler-Noll-Vo can be used Hash hash algorithm.
Compress cryptographic Hash.To each file fragmentation, it is calculated after a cryptographic Hash, can choose and compress result It is short.Specifically: minimum 6 of Hash result are taken, and are showed with an ascii character, the final Kazakhstan as this fragment The result of uncommon value.
Connect cryptographic Hash.Every compressed cryptographic Hash is connected together to get the fuzzy hash value of this document is arrived.Such as Fruit fragment value n is different to different files, and also n should be included in fuzzy hash value, and specific practice is directly that n is additional in original Kazakhstan The a part of uncommon value finally, as cryptographic Hash.
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, phase are calculated Value range like degree is 0-100.The calculating process of similarity is as follows in step S4: the fuzzy hash value of the web page contents is One character string, is set as s1, s2.Using the weighing edit distance of s1 to s2 as the foundation for evaluating its similitude;Weighting editor away from From referring to, first judge to become s2 from s1, it is minimum to need how many step operations (including insertion, deletion, modification), then to different operation Provide a weight.Insertion, the weight deleted, modified are set to: 0.2,0.3,0.5.Finally, by result add up to get To weighing edit distance.
By this distance divided by s1 and s2 length and, absolute results are become into relative result, re-map 0-100's In one integer value, wherein 100 indicate that two character strings are completely the same, and 0 indicates completely dissimilar;The result can be used To judge the similarity degree of two web page contents.
S5: judge that the affiliated type of webpage of the web page contents carries out if the web page contents belong to the first web page contents Step S6;If the web page contents belong to the second web page contents, step S7 is carried out.
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62.
S61: terminate the monitoring of the web page contents.
S62: giving a warning, and terminates the monitoring of the web page contents.
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then it is walked Rapid S71.
S71: the difference that the web page contents compare original state is found out using DIFF tool.
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9.
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;Be, then into Row step S81;It is no, then carry out step S82.
S81: giving a warning, and terminates the monitoring of the web page contents.
S82: terminate the monitoring of the web page contents.
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.If changing unit is character string, use Regular expression mode is matched with preset sensitive dictionary, is such as matched to sensitive word, is then alerted.
In step S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.It is also It is matched in the way of regular expression with preset Trojan characteristics library.
Picture recognizer is called to identify image content in the step S8, image content and hostile content is special Sign is matched, and whether detect in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
A kind of webpage change monitoring system based on similarity calculation, comprising with lower module:
Initial acquisition module 1: the web page contents in network are stored by using web crawlers to local memory device, meter Calculate the fuzzy hash value of web page contents.Fuzzy hash value mainly using fuzzy hash algorithm, can call ssdeep tool.
Judgment module 2: judge that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make Respective markers, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can send out The webpage for changing.Can with manually classify, it is (all with sorting technique also to can use web page contents in the prior art identification Such as Chinese patent literature 201210299843.7,201210376933.1 is recorded) classify to web page contents.
Real-time acquisition module 3: the web page contents are crawled from network again after the time interval of setting, and calculate this Carve the fuzzy hash value of web page contents.
The calculating process of fuzzy hash value is as follows in initial acquisition module 1 and real-time acquisition module 3:
With a weak hash algorithm to the file fragmentation of the web page contents.Method particularly includes:
A part of content is read hereof, is calculated with weak hash algorithm Alder-32, in a manner of rolling Hash Obtain the cryptographic Hash of 4 bytes.So-called rolling Hash refers to, for example has calculated that the cryptographic Hash h1 of abcdef originally, connects Get off the cryptographic Hash of bcdefg to be calculated, does not need to recalculate completely, it is only necessary to h1-X (a)+Y (g).Wherein X, Y are Two functions only need accordingly to increase and decrease influence of the residual quantity to cryptographic Hash.This Hash can greatly speed up fragment judgement Speed.
Fragment value n is set, fragmented condition is controlled by it.The value of n is determined according to file size, file content etc..It determines Principle and method are as follows:
The value of n takes 2 integer power always, and such Alder-32 cryptographic Hash is divided by the remainder of n close to being uniformly distributed.Only The fragment when remainder is equal to n-1, can fragment in the case where being equivalent to only similar 1/n.That is, to a file, The every movement of window is primary, and just have a 1/n may fragment.If certain the piece number once divided is too small, that is reduced by the value of n, makes every A possibility that secondary fragment, increases, and increases the piece number.And if the piece felt point is too many, just increase the value of n, makes the possibility of each fragment Property reduce, reduce the piece number.N is adjusted, makes final the piece number as far as possible between 32 to 64 divided by or multiplied by 2 every time. Since a possibility that fragment is almost 1/n, so the n value attempted for the first time is exactly one close when each run ssdeep Value in file size/64.
When Alder-32 cryptographic Hash is exactly equal to n-1 divided by the remainder of n, just in current location fragment;Otherwise, regardless of Piece, window roll a byte backward, then calculate Alder-32 cryptographic Hash again and judge, so continue.
With one strong hash algorithm to each calculating cryptographic Hash obtained in S101.Fowler-Noll-Vo can be used Hash hash algorithm.
Compress cryptographic Hash.To each file fragmentation, it is calculated after a cryptographic Hash, can choose and compress result It is short.Specifically: minimum 6 of Hash result are taken, and are showed with an ascii character, the final Kazakhstan as this fragment The result of uncommon value.
Connect cryptographic Hash.Every compressed cryptographic Hash is connected together to get the fuzzy hash value of this document is arrived.Such as Fruit fragment value n is different to different files, and also n should be included in fuzzy hash value, and specific practice is directly that n is additional in original Kazakhstan The a part of uncommon value finally, as cryptographic Hash.
Computing module 4: the mould obtained in the fuzzy hash value and initial acquisition module obtained in real-time acquisition module is calculated The similarity of cryptographic Hash is pasted, the value range of similarity is that the calculating process of similarity in 0-100 computing module 4 is as follows: described The fuzzy hash value of web page contents is a character string, is set as s1, s2.Using the weighing edit distance of s1 to s2 as its phase of evaluation Like the foundation of property;Weighing edit distance refers to, first judges to become s2 from s1, how much minimum needs, which walk, operates (including be inserted into, delete Remove, modify), a weight then is provided to different operation.Insertion, the weight deleted, modified are set to: 0.2,0.3,0.5. Finally, result is added up to get weighing edit distance is arrived.
Webpage judgment module 5: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first webpage Content is then transferred to first judgment module 6;If the web page contents belong to the second web page contents, it is transferred to the second judgment module 7.
First judgment module 6: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents; It is no, then it is transferred to the first alert module 61.
First alert module 61: giving a warning, and terminates the monitoring of the web page contents.
Second judgment module 7: whether the value for judging similarity is 100, is then to be transferred to the first termination module 71;It is no, then It is transferred to variance analysis module 72.
First termination module 71: terminate the monitoring of the web page contents.
Variance analysis module 72: the difference that the web page contents compare original state is found out using DIFF tool.
Third judgment module 8: judge that difference is then to be transferred to the first matching module 81 whether since picture variation causes; It is no, then the second matching module of access 82.
First matching module 81: image content is matched with hostile content feature, and whether detect in picture has exception Content;It is then to be transferred to the second alert module 811;It is no, then it is transferred to the second termination module 812.
Second alert module 811: giving a warning, and terminates the monitoring of the web page contents.
Second termination module 812: terminate the monitoring of the web page contents.
Second matching module 82: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning.
Second matching module 82 also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, issues Warning.
It calls picture recognizer to identify image content in third judgment module 8, judges difference whether due to figure Piece variation causes, and is then to be transferred to the first matching module 81;It is no, then the second matching module of access 82.
A kind of webpage change monitoring method and system based on similarity calculation of the invention, will using web crawlers technology Web page contents are saved in local, obtain web page contents again in the time interval of setting, utilize fuzzy hash algorithm and local guarantor The content of pages similarity deposited is compared.Can be with customized web page contents attribute, the web page contents that content will not change, monitoring Step is more succinct, and monitoring efficiency is high.Web page contents changeable for content, further progress variance analysis identify character Or picture is distorted, and can be accurately identified web page contents at the first time and is tampered or normally updates, and is improved in webpage The safety of appearance.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes still within the protection scope of the invention.

Claims (4)

1. a kind of webpage change monitoring method based on similarity calculation, which is characterized in that comprise the steps of:
S1: the web page contents in network are stored by using web crawlers to local memory device, the mould of web page contents is calculated Paste cryptographic Hash;
S2: judging that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and make respective markers, the One type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can changed net Page;
S3: the web page contents are crawled from network again after the time interval of setting, and calculate the mould of web page contents this moment Paste cryptographic Hash;
S4: the similarity of the fuzzy hash value obtained in step S3 and the fuzzy hash value obtained in step S1, similarity are calculated Value range be 0-100;
S5: judge that the affiliated type of webpage of the web page contents carries out step if the web page contents belong to the first type of webpage S6;If the web page contents belong to the second type of webpage, step S7 is carried out;
S6: whether the value for judging similarity is 100, is then to carry out step S61;It is no, then carry out step S62;
S61: terminate the monitoring of the web page contents;
S62: giving a warning, and terminates the monitoring of the web page contents;
S7: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then carry out step S71;
S71: the difference that the web page contents compare original state is found out using DIFF tool;
S72: judge that difference is then to carry out step S8 whether since picture variation causes;It is no, then carry out step S9;
S8: image content is matched with hostile content feature, and whether detect in picture has anomalous content;It is then to be walked Rapid S81;It is no, then carry out step S82;
S81: giving a warning, and terminates the monitoring of the web page contents;
S82: terminate the monitoring of the web page contents;
S9: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning;
Call picture recognizer to identify image content in the step S8, by image content and hostile content feature into Whether row matching, detecting in picture has anomalous content;It is then to carry out step S81;Otherwise step S82 is carried out.
2. a kind of webpage change monitoring method based on similarity calculation according to claim 1, which is characterized in that step In S9, also comprising being matched with Trojan characteristics library, if being matched to Trojan characteristics, give a warning.
3. a kind of webpage change monitoring system based on similarity calculation, which is characterized in that comprising with lower module:
Initial acquisition module: the web page contents in network are stored by using web crawlers to local memory device, net is calculated The fuzzy hash value of page content;
Judgment module: judging that the web page contents belong to the first type of webpage and still fall within the second type of webpage, and makes corresponding Label, the first type of webpage be web page contents will not changed webpage, the second type of webpage be web page contents can become The webpage of change;
Real-time acquisition module: the web page contents are crawled from network again after the time interval of setting, and calculate net this moment The fuzzy hash value of page content;
Computing module: the fuzzy Hash obtained in the fuzzy hash value and initial acquisition module obtained in real-time acquisition module is calculated The similarity of value, the value range of similarity are 0-100;
Webpage judgment module: judging the affiliated type of webpage of the web page contents, if the web page contents belong to the first web page class Type is then transferred to first judgment module;If the web page contents belong to the second type of webpage, it is transferred to the second judgment module;
First judgment module: whether the value for judging similarity is 100, is then to terminate the monitoring of the web page contents;It is no, then It is transferred to the first alert module;
First alert module: giving a warning, and terminates the monitoring of the web page contents;
Second judgment module: whether the value for judging similarity is 100, is then to be transferred to the first termination module;It is no, then it is transferred to difference Different analysis module;
First termination module: terminate the monitoring of the web page contents;
Variance analysis module: the difference that the web page contents compare original state is found out using DIFF tool;It is transferred to third judgement Module;
Third judgment module: judge that difference is then to be transferred to the first matching module whether since picture variation causes;It is no, then it is transferred to Second matching module;
First matching module: image content is matched with hostile content feature, and whether detect in picture has anomalous content; It is then to be transferred to the second alert module;It is no, then it is transferred to the second termination module;
Second alert module: giving a warning, and terminates the monitoring of the web page contents;
Second termination module: terminate the monitoring of the web page contents;
Second matching module: being matched with sensitive dictionary, if being matched to sensitive word, is given a warning;
It calls picture recognizer to identify image content in third judgment module, judges difference whether since picture changes Cause, be, is then transferred to the first matching module;It is no, then it is transferred to the second matching module.
4. a kind of webpage change monitoring system based on similarity calculation according to claim 3, which is characterized in that described Second matching module also includes to be matched with Trojan characteristics library, if being matched to Trojan characteristics, is given a warning.
CN201611182671.XA 2016-12-20 2016-12-20 A kind of webpage change monitoring method and system based on similarity calculation Active CN106599242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611182671.XA CN106599242B (en) 2016-12-20 2016-12-20 A kind of webpage change monitoring method and system based on similarity calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611182671.XA CN106599242B (en) 2016-12-20 2016-12-20 A kind of webpage change monitoring method and system based on similarity calculation

Publications (2)

Publication Number Publication Date
CN106599242A CN106599242A (en) 2017-04-26
CN106599242B true CN106599242B (en) 2019-03-26

Family

ID=58600081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611182671.XA Active CN106599242B (en) 2016-12-20 2016-12-20 A kind of webpage change monitoring method and system based on similarity calculation

Country Status (1)

Country Link
CN (1) CN106599242B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301355B (en) * 2017-06-20 2021-07-02 深信服科技股份有限公司 Webpage tampering monitoring method and device
CN107612908B (en) * 2017-09-15 2020-06-05 杭州安恒信息技术股份有限公司 Webpage tampering monitoring method and device
CN108021692B (en) * 2017-12-18 2022-03-11 北京天融信网络安全技术有限公司 Method for monitoring webpage, server and computer readable storage medium
CN108540466A (en) * 2018-03-31 2018-09-14 甘肃万维信息技术有限责任公司 Based on webpage tamper monitoring and alarming system
CN108595583B (en) * 2018-04-18 2022-12-02 平安科技(深圳)有限公司 Dynamic graph page data crawling method, device, terminal and storage medium
CN108809943B (en) * 2018-05-14 2021-05-14 苏州闻道网络科技股份有限公司 Website monitoring method and device
CN109241779A (en) * 2018-08-27 2019-01-18 浙江每日互动网络科技股份有限公司 A method of the detection page is distorted
CN109495471B (en) * 2018-11-15 2021-07-02 东信和平科技股份有限公司 Method, device and equipment for judging WEB attack result and readable storage medium
CN109740094A (en) * 2018-12-27 2019-05-10 上海掌门科技有限公司 Page monitoring method, equipment and computer storage medium
CN110034921B (en) * 2019-04-18 2022-04-15 成都信息工程大学 Webshell detection method based on weighted fuzzy hash
CN110598478A (en) * 2019-09-19 2019-12-20 腾讯科技(深圳)有限公司 Block chain based evidence verification method, device, equipment and storage medium
CN110659439A (en) * 2019-09-23 2020-01-07 杭州迪普科技股份有限公司 Black chain protection method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN102779245A (en) * 2011-05-12 2012-11-14 李朝荣 Webpage abnormality detection method based on image processing technology
CN103279475A (en) * 2013-04-11 2013-09-04 广东电网公司信息中心 Detection method and system for WEB application system content change

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571791B (en) * 2011-12-31 2015-03-25 奇智软件(北京)有限公司 Method and system for analyzing tampering of Web page contents
CN105678193B (en) * 2016-01-06 2018-08-14 杭州数梦工场科技有限公司 A kind of anti-tamper treating method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779245A (en) * 2011-05-12 2012-11-14 李朝荣 Webpage abnormality detection method based on image processing technology
CN102682098A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting web page content changes
CN103279475A (en) * 2013-04-11 2013-09-04 广东电网公司信息中心 Detection method and system for WEB application system content change

Also Published As

Publication number Publication date
CN106599242A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
CN106599242B (en) A kind of webpage change monitoring method and system based on similarity calculation
US9215246B2 (en) Website scanning device and method
KR101162051B1 (en) Using string comparison malicious code detection and classification system and method
US10474818B1 (en) Methods and devices for detection of malware
CN108985057B (en) Webshell detection method and related equipment
CN103077250B (en) A kind of capturing webpage contents method and device
JP7120350B2 (en) SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM
US9355250B2 (en) Method and system for rapidly scanning files
JP5254443B2 (en) Surveillance method used for communication system images or multimedia video images
CN102779245A (en) Webpage abnormality detection method based on image processing technology
CN112148305A (en) Application detection method and device, computer equipment and readable storage medium
CN112532624A (en) Black chain detection method and device, electronic equipment and readable storage medium
US9613271B2 (en) Determining severity of a geomagnetic disturbance on a power grid using similarity measures
CN104036190A (en) Method and device for detecting page tampering
CN104036189A (en) Page distortion detecting method and black link database generating method
CN108363711B (en) Method and device for detecting dark chain in webpage
CN113535813A (en) Data mining method and device, electronic equipment and storage medium
CN111460448A (en) Malicious software family detection method and device
CN111104674A (en) Power firmware homologous binary file association method and system
CN113850297B (en) Road data monitoring method and device, electronic equipment and storage medium
JP7140268B2 (en) WARNING DEVICE, CONTROL METHOD AND PROGRAM
CN114722806A (en) Text processing method, device and equipment
JP6749865B2 (en) INFORMATION COLLECTION DEVICE AND INFORMATION COLLECTION METHOD
CN111339453A (en) Navigation page distinguishing method and device
CN111488621A (en) Method and system for detecting falsified webpage, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant