CN108809943B - Website monitoring method and device - Google Patents

Website monitoring method and device Download PDF

Info

Publication number
CN108809943B
CN108809943B CN201810453599.2A CN201810453599A CN108809943B CN 108809943 B CN108809943 B CN 108809943B CN 201810453599 A CN201810453599 A CN 201810453599A CN 108809943 B CN108809943 B CN 108809943B
Authority
CN
China
Prior art keywords
hash value
website
value corresponding
comparing
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810453599.2A
Other languages
Chinese (zh)
Other versions
CN108809943A (en
Inventor
袁学文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wendao Network Technology Co ltd
Original Assignee
Suzhou Wendao Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wendao Network Technology Co ltd filed Critical Suzhou Wendao Network Technology Co ltd
Priority to CN201810453599.2A priority Critical patent/CN108809943B/en
Publication of CN108809943A publication Critical patent/CN108809943A/en
Application granted granted Critical
Publication of CN108809943B publication Critical patent/CN108809943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Abstract

The disclosure relates to a website monitoring method and a device thereof, wherein the method comprises the following steps: acquiring a plurality of pictures corresponding to a website at a plurality of time points; calculating a hash value corresponding to each of the plurality of pictures; comparing the first hash value corresponding to each picture; and if not, determining that the website is tampered. The website monitoring method and the website monitoring device provided by the embodiment of the disclosure can judge whether the website is tampered by using the hash value of the picture corresponding to the website, and no additional condition needs to be set, so that the website monitoring is simple and effective.

Description

Website monitoring method and device
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a website monitoring method and apparatus.
Background
With the development and application of networks, various information in daily life of people is more closely combined with the networks. Meanwhile, the security problem of the network system is more and more prominent.
Webpage tampering is a common network security problem. After attacking a website, an attacker often modifies an existing webpage and writes malicious codes or junk information into the existing webpage, so that traffic of the website is hijacked. Therefore, the website needs to be monitored in real time so as to prevent the website from being tampered.
For this, the prior art usually starts with website addresses and html entities, but these methods have more limitations. Specifically, the premise that whether the website is tampered is determined by monitoring the website address is that the website is similar to the tampered website in address; whether a website is tampered with or not by monitoring html entities needs to be limited by keyword rules.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a website monitoring method and apparatus.
According to a first aspect of the embodiments of the present disclosure, a website monitoring method is provided, including: acquiring a plurality of pictures corresponding to a website at a plurality of time points; calculating a first hash value corresponding to each of the plurality of pictures; comparing the first hash value corresponding to each picture; and if not, determining that the website is tampered.
In one possible implementation, comparing the first hash value corresponding to each picture includes: determining the number of different data bits by respectively comparing the data bits of the first hash value corresponding to each picture; comparing the number to a first threshold; and if the number is larger than a first threshold value, determining that the first hash value corresponding to each picture is the same.
In one possible implementation, calculating the hash value corresponding to each of the plurality of pictures includes: obtaining DCT matrixes respectively corresponding to each picture by calculating DCT transformation of each picture; calculating the average value of each DCT matrix; comparing the value in each DCT matrix with the average value corresponding to the DCT matrix; if the hash value is greater than the average value, the hash value is set to 1, and if the hash value is less than the average value, the hash value is set to 0, so that the hash value corresponding to each of the plurality of pictures is calculated.
In a possible implementation manner, the HTML of the page of the website at the multiple time points is obtained while the multiple pictures of the website corresponding to the multiple time points are obtained.
In a possible implementation manner, if the first hash value corresponding to each picture is the same, calculating a second hash value corresponding to each page HTML; comparing the second hash value corresponding to each page HTML; and if not, determining that the website is tampered.
In one possible implementation, calculating the second hash value corresponding to each page HTML includes: determining each tag in each page HTML and the number corresponding to each tag; and calculating a second hash value corresponding to each page HTML through the number corresponding to each label.
In one possible implementation, comparing the second hash value corresponding to each page HTML includes: respectively comparing the data bits of the second hash value corresponding to each page HTML to determine the number of different data bits; comparing the number to a second threshold; and if the number is larger than a second threshold value, determining that the second hash value corresponding to each page HTML is different.
In a possible implementation manner, if the first hash value corresponding to each picture is the same, acquiring the traffic of the website in a time period corresponding to the multiple time points; comparing the traffic to an average traffic for the website over the time period; and if the flow is more than a plurality of times of the average flow, determining that the website is tampered.
In a possible implementation manner, if the flow is less than several times of the average flow, calculating a third hash value corresponding to each page HTML; comparing the third hash value corresponding to each page HTML; and if not, determining that the website is tampered.
In a possible implementation manner, if the second hash value corresponding to each picture is the same, acquiring the traffic of the website in the time period corresponding to the multiple time points; comparing the traffic to an average traffic for the website over the time period; and if the flow is more than a plurality of times of the average flow, determining that the website is tampered.
According to a second aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a website monitoring method, the method comprising: acquiring a plurality of pictures corresponding to a website at a plurality of time points; calculating a first hash value corresponding to each picture of the plurality of pictures; comparing the first hash value corresponding to each picture; and if not, determining that the website is tampered.
According to a third aspect of the embodiments of the present disclosure, there is provided a website monitoring apparatus, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: acquiring a plurality of pictures corresponding to a website at a plurality of time points; calculating a first hash value corresponding to each of the plurality of pictures; comparing the first hash value corresponding to each picture; and if not, determining that the website is tampered.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a website monitoring apparatus, including: the image acquisition module is used for acquiring a plurality of images corresponding to a plurality of time points of the website; a first hash value calculation module for calculating a first hash value corresponding to each of the plurality of pictures; the first hash value comparison module is used for comparing the first hash value corresponding to each picture; and the first hash value determining module is used for comparing that the first hash values corresponding to the pictures are different through the first hash value comparing module so as to determine that the website is tampered.
In one possible implementation manner, the first hash value comparison module further includes: the first quantity determining module is used for determining the quantity of different data bits by respectively comparing the data bits of the first hash value corresponding to each picture; a first quantity comparison module for comparing the quantity to a first threshold; and the first sub-determination module is used for determining that the first hash value corresponding to each picture is the same when the first quantity comparison module compared by the first quantity comparison module is larger than the first threshold value.
In one possible implementation, the first hash value calculation module includes: a DCT matrix acquisition module: the DCT matrix is used for acquiring DCT matrixes respectively corresponding to each picture by calculating DCT transformation of each picture; the average value calculation module is used for calculating the average value of each DCT matrix; the first DCT comparison module is used for comparing the value in each DCT matrix with the average value corresponding to the DCT matrix; a first hash value sub-calculation module for setting to 1 when the value in the DCT matrix is greater than the average value and setting to 0 when the value in the DCT matrix is less than the average value, thereby calculating a hash value corresponding to each of the plurality of pictures.
In one possible implementation, the apparatus further includes: and the page HTML acquisition module is used for acquiring a plurality of pictures corresponding to the plurality of time points of the website by the picture acquisition module and acquiring page HTML of the website at the plurality of time points.
In one possible implementation, the apparatus further includes: a second hash value determination module comprising: the second hash value calculation module is used for calculating a second hash value corresponding to each page HTML when the first hash value comparison module determines that the first hash value corresponding to each picture is the same; the second hash value comparison module is used for comparing the second hash value corresponding to each page HTML; and the second hash value determining module is used for comparing that the second hash values corresponding to the HTML of each page are different through the second hash value comparing module so as to determine that the website is tampered.
In one possible implementation, the second hash value calculation module includes: the tag quantity determining module is used for determining each tag in each page HTML and the quantity corresponding to each tag; and the second hash value sub-calculation module is used for calculating a second hash value corresponding to each page HTML according to the number corresponding to each label.
In one possible implementation, the second hash value comparison module includes: the second quantity determining module is used for respectively comparing each data bit of the second hash value corresponding to each page HTML to determine the quantity of different data bits; a second quantity comparison module for comparing the quantity to a second threshold; and the second sub-determining module is used for comparing that the number is larger than a second threshold value through the second number comparing module to determine that the second hash value corresponding to each page HTML is the same.
In one possible implementation, the apparatus further includes: the flow acquisition module is used for acquiring the flow of the website in a time period corresponding to a plurality of time points when the first hash value corresponding to each picture is the same; a traffic comparison module for comparing the traffic with an average traffic of the website over the time period; and the third hash value determining module is used for comparing that the flow is more than a plurality of times of the average flow through the flow comparison module so as to determine that the website is tampered.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: whether the website is tampered can be judged without additionally setting conditions, so that the flow of the website is prevented from being hijacked, and the technical scheme is wide in application range and simple to operate.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a schematic diagram illustrating an application scenario for website monitoring, according to an example embodiment.
FIG. 2 is a flow diagram illustrating a website monitoring method according to an example embodiment.
FIG. 3 is a flow diagram illustrating a website monitoring method according to an example embodiment.
FIG. 4 is a flow diagram illustrating a website monitoring method according to an example embodiment.
FIG. 5 is a block diagram illustrating a website monitoring apparatus according to an example embodiment.
FIG. 6 is a block diagram illustrating a website monitoring apparatus according to an example embodiment.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure. In order that the present invention may be more clearly understood, terms related to the present invention will be explained below.
In the hash algorithm, an input with an arbitrary length can be changed into an output with a fixed length by the hash algorithm, and the output value is a hash value (also called a hash code). Common hash algorithms include SimHash, MinHash, and the like. In the present application, the hash value corresponding to the picture may be calculated by a hash algorithm, and the hash value corresponding to the table may also be calculated by a hash algorithm, which are all conventional techniques in the art, and therefore will not be described in detail below.
The Uniform Resource Locator (URL) is used for describing a character string of the information resource, can uniquely identify the information resource and locate the information resource, and can locate the resource of the website through the URL when a user inputs a website address or clicks a website link.
The HTML tag is the most basic unit in the HTML language, that is, the page HTML is composed of HTML tags.
FIG. 1 is a schematic diagram illustrating an application scenario for website monitoring, according to an example embodiment. As shown in fig. 1, after the user inputs a website address or clicks a website link on a browser of the electronic device 110, the browser sends a request to the server 120 according to an IP address resolved from a URL corresponding to the website address or the website link. The electronic device 110 may include, but is not limited to, any of the following devices having a display unit: personal Computers (PCs), mobile devices such as cellular phones, Personal Digital Assistants (PDAs), digital cameras, portable game consoles, MP3 players, portable/Personal Multimedia Players (PMPs), handheld electronic books, tablet PCs, portable laptop PCs, and Global Positioning System (GPS) navigators, smart TVs, and the like.
The server 120 performs processing upon receiving the request, and then the electronic device 110 receives information returned from the server 120 and displays the page. In an exemplary embodiment, the website monitoring apparatus 130 may monitor the website while the browser displays the page, thereby determining whether the website is tampered with. The website monitoring operation of the website monitoring apparatus 130 on the website will be described in detail below with reference to fig. 2 to 6, and will not be described herein again.
FIG. 2 is a flow diagram illustrating a website monitoring method according to an example embodiment. As shown in fig. 2, in step S210, a plurality of pictures corresponding to a website at a plurality of time points are obtained, and specifically, when a display of the electronic device 110 displays a page of the website, a plurality of screenshots of the page corresponding to different time points of the website may be obtained, which is a method commonly used in the art and will not be described in detail herein. In an alternative embodiment, the plurality of time points include any time point in a plurality of equal time intervals, for example, any time point in every 12 hours, and may include two time points, and preferably, since the website is generally updated between eight points early and eight points late, the plurality of time points preferably include three time points, that is, a single screenshot of the page may be randomly acquired every 8 hours, and the subsequent operation may be performed.
Subsequently, in step S220, a first hash value corresponding to each of the plurality of pictures is calculated. Specifically, the pictures acquired in step S210 may be subjected to Discrete Cosine Transform (DCT), which is a common image processing and will not be described in detail herein, to acquire a DCT matrix corresponding to each picture, and then, an average value of each DCT matrix is calculated, and finally, the value in each DCT matrix is compared with the average value corresponding to the DCT matrix. If the average value is larger than the average value, the average value is set to 1, and if the average value is smaller than the average value, the average value is set to 0, so that a hash value corresponding to each picture is obtained. Preferably, the size of the picture may be reduced to 32 × 32 before the DCT transformation is performed on the picture, and then the reduced picture is converted into a grayscale picture, so that the amount of computation may be simplified, and then, in order to further simplify the amount of computation, only a part of the picture in the upper left corner of each picture, that is, the picture in the upper left corner 8 × 8 may be used and the hash value may be calculated for the picture.
After the first hash value of each picture is calculated, the first hash value corresponding to each picture is compared in step S230. Specifically, the number of unequal data bits is determined by comparing the respective data bits of the hash values calculated in step 220, wherein the number of unequal data bits can be determined by using a hamming distance, and then the number is compared with a first threshold, if the number is greater than the first threshold, the first hash value corresponding to each picture is determined to be the same, and if the number is less than or equal to the first threshold, the first hash value corresponding to each picture is determined to be different, wherein the first threshold is determined by a technician according to actual experience, in the present application, the first threshold may be set to 5, that is, when the number of unequal data bits in the hash values exceeds 5, the hash values are determined to be different. Further, since the first threshold is determined by a technician based on practical experience, there is a case where the website is tampered with although the number of data bits that are not identical is less than or equal to the threshold. In view of this situation, the present application provides a web site monitoring method according to other exemplary embodiments, which will be discussed in detail below with reference to the flowcharts of fig. 3 to 4.
Next, the website monitoring method shown in fig. 2 will be exemplified. Specifically, a first picture may be obtained by capturing a website page at any time point between 0 and 8 points, a second picture may be obtained by capturing a website page at any time point between 8 and 16 points, a first hash value corresponding to the first picture and a first hash value corresponding to the second picture are calculated, the two first hash values are compared, and if the two first hash values are different, it is determined that the website is tampered. In a preferred embodiment, a screenshot may be taken of the web site's page at any time between 16 and 24 points to obtain a third picture, and a third hash value corresponding to the third picture is calculated. Finally, the three hash values are compared respectively, so that whether the website is tampered or not can be determined more accurately, for example, if the website is tampered at 7 points, and when website monitoring is performed, only pictures of the website corresponding to 12 points and 20 points are used, there is a case that the two first hash values are the same, but the website is tampered. However, by comparing the first hash values at three time points, this situation can be effectively avoided.
As described above, in one or more embodiments of the present application, only by operating the picture corresponding to the website, it can be effectively determined whether the website is tampered, so that the website monitoring is simple and effective.
FIG. 3 is a flow diagram illustrating a website monitoring method according to an example embodiment. Since steps S210 to S240 are the same as steps S310 to S340, steps S310 to S340 will not be described in detail herein. In the website monitoring method, while the plurality of pictures corresponding to the website at the plurality of time points are acquired in step S310, page HTML of the website at the plurality of time points is acquired in step S350, and any method for acquiring the page HTML may be applied thereto, and will not be described herein. When it is determined that the first hash values are the same by comparing the hash values at step S340, as shown in fig. 3, the website monitoring method proceeds to step S360, and calculates a second hash value corresponding to each page HTML at step S360. Specifically, determining each tag in each page HTML and the number corresponding to each tag; and calculating a second hash value corresponding to each page HTML according to the number corresponding to each label. The following will be described by way of example.
In step 350, the page HTML of the website page acquired at a certain time point is as follows:
Figure BSA0000163714780000071
Figure BSA0000163714780000081
subsequently, it is determined that the tags in the page HTML are HTML, head, title, meta, body, h1, h2, div, label, and span and the numbers of these tags are determined, respectively, for example, the number of HTML is 2, the number of head is 2, the number of title is 2, the number of meta is 3, the number of body is 2, the number of h1 is 2, the number of h2 is 2, the number of div is 10, the number of label is 4, and the number of span is 4. That is, table 1 may be generated from the DOM tree structure of the HTML content above, as follows:
html head title meta body h1 h2 div lable Span
2 2 2 3 2 2 2 10 4 4
TABLE 1
And calculating the table 1 through a hash algorithm to obtain a second hash value corresponding to the table 1. The second hash values corresponding to the web pages at other points in time may be calculated respectively as described above.
These second hash values may then be compared to determine if the second hash values are the same at step S380. Specifically, the data bits of the second hash value corresponding to each page HTML are respectively compared to determine the number of different data bits; comparing the number to a second threshold; and if the number is larger than a second threshold value, determining that the second hash value corresponding to each page HTML is the same. The second threshold is also determined by a skilled person according to practical experience, and in this application, the first threshold and the second threshold may be the same or different.
If the second hash value is different after the comparison, it is determined in step S390 that the website is tampered.
As described above, in one or more embodiments of the present application, only by operating the picture corresponding to the website, the HTML of the page corresponding to the website is operated, and by combining the two operations, whether the website is tampered or not is determined, so that the website monitoring is simple and effective, and is more accurate.
In addition, in another exemplary embodiment, for the case that the number of the data bits which are not identical in step S330 is less than or equal to the first threshold, but the website is tampered with, a website monitoring method as shown in fig. 4 may also be provided.
FIG. 4 is a flow diagram illustrating a website monitoring method according to an example embodiment. Since steps S210 to S240 are the same as steps S410 to S440, steps S410 to S440 will not be described in detail herein. When it is determined by comparing the first hash values at step S430 that the first hash values are the same, as shown in fig. 4, the website monitoring method proceeds to step S450, and acquires traffic of the website during a time period corresponding to a plurality of time points at step S450. In an alternative embodiment, the time period may be a longest time period corresponding to a plurality of time points. In another alternative embodiment, the time points may be randomly determined within equal time intervals, and the time period may be the total length of a plurality of time intervals.
Subsequently, in step S460, the traffic acquired in step S450 is compared with the average traffic of the website in the same time period. For example, if the time period is one day, the obtained daily traffic of the website may be compared with the average daily traffic. If the traffic acquired in step S450 is several times the average traffic, then in step S470, it is determined that the website is tampered with, wherein several are preferably 2 times. The means of flow acquisition, which are conventional in the art, will not be described here.
As described above, in one or more embodiments of the present application, the flow of the website is monitored while only the picture corresponding to the website is operated, and whether the website is tampered with is determined by combining the two operations, so that the website monitoring is simple and effective, and is more accurate.
In an alternative embodiment, in the case where it is determined in step S380 that the second hash values are the same, steps such as step S450 to step S470 are performed. In yet another alternative embodiment, steps S350 to S390 may be performed in case it is determined in step S460 that the acquired flow is not enough for several times the average flow. The two embodiments can enable website monitoring to be more accurate, and therefore the website is effectively prevented from being tampered.
In order to more clearly understand the inventive concept of one or more embodiments of the present specification, a block diagram of a website monitoring apparatus according to one or more embodiments of the present specification will be described below with reference to fig. 5. Those of ordinary skill in the art will understand that: the website monitoring apparatus in fig. 5 shows only components related to the present exemplary embodiment, and general components other than those shown in fig. 5 are also included in the website monitoring apparatus 500.
Fig. 5 is a block diagram illustrating a website monitoring apparatus 500 according to an example embodiment. As shown in fig. 5, the website monitoring apparatus 500 includes a picture obtaining module 510, a first hash value calculating module 520, a first hash value comparing module 530, and a first hash value determining module 540.
The picture acquiring module 510 acquires a plurality of pictures corresponding to a website at a plurality of time points.
The first hash value calculation module 520 calculates a first hash value corresponding to each of the plurality of pictures;
the first hash value comparison module 530 compares a first hash value corresponding to each picture;
the first hash value determining module 540 determines that the website is tampered when the first hash value comparing module compares that the first hash value corresponding to each picture is not the same.
Optionally, the first hash value calculation module 520 includes a DCT matrix acquisition module (not shown), an average value calculation module (not shown), a first DCT comparison module (not shown), and a first hash value sub-calculation module (not shown), wherein the DCT matrix acquisition module: the DCT matrix is used for acquiring DCT matrixes respectively corresponding to each picture by calculating DCT transformation of each picture; the average value calculation module is used for calculating the average value of each DCT matrix; the first DCT comparison module is used for comparing the value in each DCT matrix with the average value corresponding to the DCT matrix; a first hash value sub-calculation module for setting to 1 when the value in the DCT matrix is greater than the average value and setting to 0 when the value in the DCT matrix is less than the average value, thereby calculating a hash value corresponding to each of the plurality of pictures.
Optionally, the first hash value comparison module 530 further includes a first number determination module (not shown), a first number comparison module (not shown), and a first sub-determination module (not shown), wherein the first number determination module determines the number of data bits that are not identical by comparing the respective data bits of the first hash value corresponding to each picture respectively; a first quantity comparison module compares the quantity to a first threshold; the first sub-determination module determines that the first hash value corresponding to each picture is the same when the first number comparison module compared by the first number comparison module is greater than a first threshold.
The first threshold is determined by a technician according to practical experience, and in the present application, the first threshold may be set to 5, that is, when different data bits in the hash value exceed 5, the hash value is different. Further, since the first threshold value is determined by a technician based on practical experience, there is a case where the website is tampered with although the number of data bits that are not identical is equal to or less than the threshold value.
Therefore, optionally, the website monitoring apparatus may further include a page HTML obtaining module (not shown), where the page HTML obtaining module is configured to obtain page HTML of the website at the multiple time points while the picture obtaining module obtains multiple pictures corresponding to the multiple time points of the website.
Optionally, the website monitoring apparatus may further include a second hash value calculation module (not shown), a second hash value comparison module (not shown), and a second hash value determination module (not shown). The second hash value calculation module calculates a second hash value corresponding to each page HTML when the first hash value comparison module determines that the first hash value corresponding to each picture is the same; the second hash value comparison module compares second hash values corresponding to each page HTML; and the second hash value determining module determines that the website is tampered under the condition that the second hash value corresponding to each page HTML is different from the second hash value corresponding to each page HTML, which is compared by the second hash value comparing module.
Optionally, the second hash value calculation module includes a label number determination module (not shown) and a second hash value sub-calculation module (not shown). The tag quantity determining module determines each tag in each page HTML and the quantity corresponding to each tag; and the second hash value sub-calculation module calculates a second hash value corresponding to each page HTML according to the number corresponding to each label.
Optionally, the second hash value comparison module comprises a second quantity determination module (not shown), a second quantity comparison module (not shown) and a second sub-determination module (not shown). The second quantity determining module compares the data bits of the second hash value corresponding to each page HTML to determine the quantity of different data bits; a second quantity comparison module compares the quantity with a second threshold; and the second sub-determining module determines that the second hash value corresponding to each page HTML is different under the condition that the number is larger than the second threshold value through the comparison of the second number comparing module.
Optionally, the website monitoring apparatus further includes a traffic obtaining module (not shown), a traffic comparing module (not shown), and a traffic determining module (not shown). The flow acquisition module acquires the flow of the website in a time period corresponding to a plurality of time points when the first hash value corresponding to each picture is the same; a traffic comparison module for comparing the traffic with an average traffic of the website over the time period; and the flow determining module is used for determining that the website is tampered when the flow comparing module compares that the flow is more than a plurality of times of the average flow.
Optionally, the website monitoring apparatus further includes a third hash value calculation module (not shown), a third hash value comparison module (not shown), and a third hash value determination module (not shown). The third hash value calculation module calculates a third hash value corresponding to each page HTML under the condition that the flow comparison module compares that the flow is less than a plurality of times of the average flow; the third hash value comparison module is used for comparing the third hash value corresponding to each page HTML; and the third hash value determining module is used for determining that the website is tampered when the third hash values are different through the comparison of the third hash value comparing module.
Optionally, the website monitoring apparatus further includes a sub-flow obtaining module, a sub-flow comparing module, and a sub-flow determining module. The sub-flow obtaining module obtains the flow of the website in the time period corresponding to the plurality of time points under the condition that the second hash value comparison module compares that the second hash value corresponding to each picture is the same; the sub-flow comparison module compares the flow with the average flow of the website in the time period; and the sub-flow determining module determines that the website is tampered under the condition that the sub-flow comparing module compares that the flow is more than a plurality of times of the average flow.
As described above, in one or more embodiments of the present application, only by operating the picture corresponding to the website, it can be effectively determined whether the website is tampered with, so that the website monitoring is simple and effective. Furthermore, when the image corresponding to the website is operated, the page HTML corresponding to the website is operated, and whether the website is tampered or not is judged by combining the two operations, so that the website monitoring is simple and effective, and the website monitoring is more accurate. In addition, whether the website is tampered or not can be judged by monitoring the flow of the website, and website monitoring can be achieved in many aspects.
Fig. 6 is a block diagram illustrating a website monitoring apparatus 1900 according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 6, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided that includes instructions, such as the memory 1932 that includes instructions, which are executable by the processing component 1922 of the apparatus 1900 to perform the method described above.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer readable storage medium having computer readable program instructions embodied therewith for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as a punch card or an in-groove protrusion structure having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely-propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), the electronic circuit can execute the computer-readable program instructions to thereby implement various aspects of the present disclosure.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (20)

1. A website monitoring method, comprising:
acquiring a plurality of pictures corresponding to a website at a plurality of time points;
calculating a first hash value corresponding to each of the plurality of pictures;
comparing the first hash value corresponding to each picture, including: determining a first number of non-identical data bits by comparing respective data bits of the first hash value corresponding to each picture, respectively; comparing the first quantity to a first threshold; if the first number is larger than a first threshold value, determining that the first hash value corresponding to each picture is different;
and if the first hash value corresponding to each picture is different, determining that the website is tampered.
2. The method of claim 1, wherein calculating the first hash value corresponding to each of the plurality of pictures comprises:
obtaining DCT matrixes respectively corresponding to each picture by calculating DCT transformation of each picture;
calculating the average value of each DCT matrix;
comparing the value in each DCT matrix with the average value corresponding to the DCT matrix;
and if the value in the DCT matrix is greater than the average value, setting the value to be 1, and if the value in the DCT matrix is less than the average value, setting the value to be 0, thereby calculating a first hash value corresponding to each picture in the plurality of pictures.
3. The method of claim 1, further comprising:
and acquiring page HTML of the website at the multiple time points while acquiring multiple pictures of the website corresponding to the multiple time points.
4. The method of claim 3, further comprising:
if the first hash value corresponding to each picture is the same, calculating a second hash value corresponding to each page HTML;
comparing the second hash value corresponding to each page HTML;
and if the second hash value corresponding to each page HTML is different, determining that the website is tampered.
5. The method of claim 4, wherein computing the second hash value corresponding to each page HTML comprises:
determining each tag in each page HTML and the number corresponding to each tag;
and calculating a second hash value corresponding to each page HTML according to the number corresponding to each label.
6. The method of claim 4, wherein comparing the second hash value corresponding to each page HTML comprises:
comparing the data bits of the second hash value corresponding to each page HTML to determine a second number of different data bits;
comparing the second quantity to a second threshold;
and if the second quantity is greater than a second threshold value, determining that the second hash value corresponding to each page HTML is different.
7. The method of claim 1, further comprising:
if the first hash value corresponding to each picture is the same, acquiring the flow of the website in the time period corresponding to the plurality of time points;
comparing the traffic to an average traffic for the website over the period of time;
and if the flow rate exceeds twice the average flow rate, determining that the website is tampered.
8. The method of claim 7, further comprising:
if the flow is less than twice of the average flow, calculating a third hash value corresponding to each page HTML;
comparing the third hash value corresponding to each page HTML;
and if the third hash value corresponding to each page HTML is different, determining that the website is tampered.
9. The method of claim 5, further comprising:
if the second hash value corresponding to each picture is the same, acquiring the flow of the website in the time period corresponding to the plurality of time points;
comparing the traffic to an average traffic for the website over the period of time;
and if the flow rate exceeds twice the average flow rate, determining that the website is tampered.
10. A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1 to 9.
11. A website monitoring device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
acquiring a plurality of pictures corresponding to a website at a plurality of time points;
calculating a first hash value corresponding to each of the plurality of pictures;
comparing the first hash value corresponding to each picture, including: determining a first number of non-identical data bits by comparing respective data bits of the first hash value corresponding to each picture, respectively; comparing the first quantity to a first threshold; if the first number is larger than a first threshold value, determining that the first hash value corresponding to each picture is different;
and if the first hash value corresponding to each picture is different, determining that the website is tampered.
12. A website monitoring device, comprising:
the image acquisition module is used for acquiring a plurality of images corresponding to a plurality of time points of the website;
a first hash value calculation module for calculating a first hash value corresponding to each of the plurality of pictures;
a first hash value comparison module, configured to compare first hash values corresponding to each picture, including: determining a first number of non-identical data bits by comparing respective data bits of the first hash value corresponding to each picture, respectively; comparing the first quantity to a first threshold; if the first number is larger than a first threshold value, determining that the first hash value corresponding to each picture is different;
and the first hash value determining module is used for determining that the website is tampered when the first hash value corresponding to each picture is different from the first hash value corresponding to each picture.
13. The apparatus of claim 12, wherein the first hash value calculation module comprises:
a DCT matrix acquisition module: the DCT matrix is used for acquiring DCT matrixes respectively corresponding to each picture by calculating DCT transformation of each picture;
the average value calculation module is used for calculating the average value of each DCT matrix;
the first DCT comparison module is used for comparing the value in each DCT matrix with the average value corresponding to the DCT matrix;
a first hash value sub-calculation module for setting to 1 when the value in the DCT matrix is greater than the average value and setting to 0 when the value in the DCT matrix is less than the average value, thereby calculating a first hash value corresponding to each of the plurality of pictures.
14. The apparatus of claim 12, further comprising: and the page HTML acquisition module is used for acquiring the page HTML of the website at the time points while the picture acquisition module acquires the plurality of pictures corresponding to the time points of the website.
15. The apparatus of claim 14, further comprising:
the second hash value calculation module is used for calculating a second hash value corresponding to each page HTML when the first hash value comparison module determines that the first hash value corresponding to each picture is the same;
the second hash value comparison module is used for comparing the second hash value corresponding to each page HTML;
and the second hash value determining module is used for determining that the website is tampered under the condition that the second hash value corresponding to each page HTML is different from the second hash value corresponding to each page HTML, which is compared by the second hash value comparing module.
16. The apparatus of claim 15, wherein the second hash value calculation module comprises:
the tag number determining module is used for determining each tag in each page HTML and the number corresponding to each tag;
and the second hash value sub-calculation module is used for calculating a second hash value corresponding to each page HTML according to the number corresponding to each label.
17. The apparatus of claim 15, wherein the second hash value comparison module comprises:
the second quantity determining module is used for respectively comparing each data bit of the second hash value corresponding to each page HTML to determine the second quantity of different data bits;
a second quantity comparison module for comparing the second quantity to a second threshold;
and the second sub-determining module is used for determining that the second hash value corresponding to each page HTML is different under the condition that the second quantity is greater than the second threshold value through the second quantity comparing module.
18. The apparatus of claim 12, further comprising:
the flow acquisition module is used for acquiring the flow of the website in a time period corresponding to a plurality of time points when the first hash value corresponding to each picture is the same;
a traffic comparison module for comparing the traffic with an average traffic of the website over the time period;
and the flow determining module is used for determining that the website is tampered when the flow comparing module compares that the flow exceeds twice the average flow.
19. The apparatus of claim 18, further comprising:
the third hash value calculation module is used for calculating a third hash value corresponding to each page HTML under the condition that the flow comparison module compares that the flow is less than twice the average flow;
the third hash value comparison module is used for comparing the third hash value corresponding to each page HTML;
and the third hash value determining module is used for determining that the website is tampered when the third hash value corresponding to each page HTML is different through the comparison of the third hash value comparing module.
20. The apparatus of claim 15, further comprising:
a sub-flow acquisition module: the second hash value comparison module is used for comparing the second hash values corresponding to the pictures and obtaining the flow of the website in the time period corresponding to the time points under the condition that the second hash values corresponding to the pictures are the same;
the sub-flow comparison module is used for comparing the flow with the average flow of the website in the time period;
and the sub-flow determining module is used for determining that the website is tampered under the condition that the sub-flow comparing module compares that the flow exceeds twice of the average flow.
CN201810453599.2A 2018-05-14 2018-05-14 Website monitoring method and device Active CN108809943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810453599.2A CN108809943B (en) 2018-05-14 2018-05-14 Website monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810453599.2A CN108809943B (en) 2018-05-14 2018-05-14 Website monitoring method and device

Publications (2)

Publication Number Publication Date
CN108809943A CN108809943A (en) 2018-11-13
CN108809943B true CN108809943B (en) 2021-05-14

Family

ID=64092328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810453599.2A Active CN108809943B (en) 2018-05-14 2018-05-14 Website monitoring method and device

Country Status (1)

Country Link
CN (1) CN108809943B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110572378B (en) * 2019-08-22 2021-11-23 上海易点时空网络有限公司 Method, terminal and server for preventing web hijacking based on mark tracking
CN110572376B (en) * 2019-08-22 2021-11-23 上海易点时空网络有限公司 Method, terminal and server for preventing network hijacking based on mark tracking
CN112528115B (en) * 2019-09-17 2023-04-25 中国移动通信集团安徽有限公司 Website monitoring method and device
CN110795676A (en) * 2019-10-31 2020-02-14 北京知道创宇信息技术股份有限公司 Website monitoring method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629261A (en) * 2012-03-01 2012-08-08 南京邮电大学 Method for finding landing page from phishing page
CN104142987A (en) * 2014-07-24 2014-11-12 腾讯科技(深圳)有限公司 Page content management method and device and terminal device
CN104199962A (en) * 2014-09-19 2014-12-10 合肥工业大学 Trusted webpage forensics system and trusted webpage forensics method based on three-layer trusted webpage forensic model
CN106599242A (en) * 2016-12-20 2017-04-26 福建六壬网安股份有限公司 Webpage change monitoring method and system based on similarity calculation
CN107204960A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Web page identification method and device, server
CN107786529A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 The detection method of website, apparatus and system
CN107835191A (en) * 2017-11-29 2018-03-23 中科信息安全共性技术国家工程研究中心有限公司 A kind of method and apparatus for detecting webpage malicious and distorting

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629261A (en) * 2012-03-01 2012-08-08 南京邮电大学 Method for finding landing page from phishing page
CN104142987A (en) * 2014-07-24 2014-11-12 腾讯科技(深圳)有限公司 Page content management method and device and terminal device
CN104199962A (en) * 2014-09-19 2014-12-10 合肥工业大学 Trusted webpage forensics system and trusted webpage forensics method based on three-layer trusted webpage forensic model
CN107204960A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Web page identification method and device, server
CN107786529A (en) * 2016-08-31 2018-03-09 阿里巴巴集团控股有限公司 The detection method of website, apparatus and system
CN106599242A (en) * 2016-12-20 2017-04-26 福建六壬网安股份有限公司 Webpage change monitoring method and system based on similarity calculation
CN107835191A (en) * 2017-11-29 2018-03-23 中科信息安全共性技术国家工程研究中心有限公司 A kind of method and apparatus for detecting webpage malicious and distorting

Also Published As

Publication number Publication date
CN108809943A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108809943B (en) Website monitoring method and device
US20180121322A1 (en) Methods and Systems for Testing Versions of Applications
US9400848B2 (en) Techniques for context-based grouping of messages for translation
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
US10417422B2 (en) Method and apparatus for detecting application
US20120054598A1 (en) Method and system for viewing web page and computer Program product thereof
CN113641873B (en) Data processing method and device, electronic equipment and readable storage medium
US9946712B2 (en) Techniques for user identification of and translation of media
CN110633383A (en) Method and device for identifying repeated house sources, electronic equipment and readable medium
CN113495825A (en) Line alarm processing method and device, electronic equipment and readable storage medium
CN111783005B (en) Method, device and system for displaying web page, computer system and medium
CN111783010B (en) Webpage blank page monitoring method, device, terminal and storage medium
CN117113430A (en) Webpage violation picture detection method and device, electronic equipment and storage medium
CN113132400B (en) Business processing method, device, computer system and storage medium
CN110457632B (en) Webpage loading processing method and device
CN109145220B (en) Data processing method and device and electronic equipment
CN113378025A (en) Data processing method and device, electronic equipment and storage medium
CN113656286A (en) Software testing method and device, electronic equipment and readable storage medium
CN112417324A (en) Chrome-based URL (Uniform resource locator) interception method and device and computer equipment
CN114791996B (en) Information processing method, device, system, electronic equipment and storage medium
CN113722642B (en) Webpage conversion method and device, electronic equipment and storage medium
CN110929512A (en) Data enhancement method and device
CN114721882B (en) Data backup method and device, electronic equipment and storage medium
CN115333858B (en) Login page cracking method, device, equipment and storage medium
CN113590447B (en) Buried point processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant