CN108650250A - Illegal page detection method, system, computer system and readable storage medium storing program for executing - Google Patents

Illegal page detection method, system, computer system and readable storage medium storing program for executing Download PDF

Info

Publication number
CN108650250A
CN108650250A CN201810390940.4A CN201810390940A CN108650250A CN 108650250 A CN108650250 A CN 108650250A CN 201810390940 A CN201810390940 A CN 201810390940A CN 108650250 A CN108650250 A CN 108650250A
Authority
CN
China
Prior art keywords
page
dom trees
depth
dom
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810390940.4A
Other languages
Chinese (zh)
Other versions
CN108650250B (en
Inventor
李忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qianxin Technology Co Ltd
Original Assignee
Beijing Qianxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qianxin Technology Co Ltd filed Critical Beijing Qianxin Technology Co Ltd
Priority to CN201810390940.4A priority Critical patent/CN108650250B/en
Publication of CN108650250A publication Critical patent/CN108650250A/en
Application granted granted Critical
Publication of CN108650250B publication Critical patent/CN108650250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Present disclose provides a kind of illegal page detection methods, including:Obtain the first frame feature information of current page;Obtain the second frame feature information of predetermined page;Compare the first frame feature information and the second frame feature information, obtains the similarity of current page and predetermined page;And it is based on similarity, judge whether current page is the illegal page.The disclosure additionally provides a kind of illegal page detection system, a kind of computer system and a kind of computer readable storage medium.

Description

Illegal page detection method, system, computer system and readable storage medium storing program for executing
Technical field
This disclosure relates to a kind of illegal page detection method, system, computer system and readable storage medium storing program for executing.
Background technology
Webshell be one kind in the form of web page files existing for order performing environment, it can be used for management of website service Device.Currently, hacker would generally use various means that the Webshell of mutation is uploaded to Website server and utilize Webshell This management characteristic invade website.
After uploading Webshell, hacker, which must connect Webshell, could realize invasion, therefore can pass through detection The connection behavior of Webshell is to determine whether there are hacker attacks.
Due to connection Webshell during would generally be related to Http request and Http response, and Http request and Corresponding text feature can be generally carried in Http responses, thus the relevant technologies generally can detect whether exist based on text feature Hacker attacks.
However, during realizing disclosure design, inventor has found at least there is following defect in the related technology:Base It detects whether that there are hacker attacks in text feature, is not only easy to be bypassed by mutation Http requests and Http responses by hacker, And since the Http of mutation requests and Http responses cause staff to need to safeguard huge rule base too much.
Invention content
An aspect of this disclosure provides a kind of illegal page detection method, including:Obtain the first frame of current page Frame characteristic information;Obtain the second frame feature information of predetermined page;More above-mentioned first frame feature information and above-mentioned second Frame feature information obtains the similarity of above-mentioned current page and above-mentioned predetermined page;And it is based on above-mentioned similarity, in judgement State whether current page is the illegal page.
Optionally, the first frame feature information of above-mentioned current page includes the depth of the first dom trees of above-mentioned current page Degree;Second frame feature information of above-mentioned predetermined page includes the depth of the 2nd dom trees of above-mentioned predetermined page;In above-mentioned comparison The first frame feature information and above-mentioned second frame feature information are stated, it is similar to above-mentioned predetermined page to obtain above-mentioned current page Degree, including:The depth of the depth and above-mentioned 2nd dom trees of more above-mentioned first dom trees, obtain above-mentioned current page with it is above-mentioned pre- Determine the similarity of the page.
Optionally, above-mentioned predetermined page includes multiple predetermined pages;The depth of above-mentioned 2nd dom trees includes multiple second The depth of dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;The depth of above-mentioned first dom trees With the depth of above-mentioned 2nd dom trees, the similarity of above-mentioned current page and above-mentioned predetermined page is obtained, including:It determines above-mentioned more The threeth dom tree similar or identical with above-mentioned first dom tree types in a 2nd dom trees;And more above-mentioned first dom trees The depth of depth and above-mentioned 3rd dom trees obtains the similarity of above-mentioned current page and the corresponding page of above-mentioned 3rd dom trees.
Optionally it is determined that the 3rd dom similar or identical with above-mentioned first dom tree types in above-mentioned multiple 2nd dom trees Tree, including:Extraction meets the 4th dom trees of predetermined depth from above-mentioned first dom trees;It is extracted from each 2nd dom trees full 5th dom trees of the above-mentioned predetermined depth of foot, obtain multiple 5th dom trees;It determines in above-mentioned multiple 5th dom trees with the above-mentioned 4th The similar or identical target dom trees of dom trees;And the 2nd dom trees corresponding with above-mentioned target dom trees are determined as above-mentioned third Dom trees.
Optionally, above-mentioned predetermined page includes multiple predetermined pages;The depth of above-mentioned 2nd dom trees includes multiple second The depth of dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;The depth of above-mentioned first dom trees With the depth of above-mentioned 2nd dom trees, the similarity of above-mentioned current page and above-mentioned predetermined page is obtained, including:More above-mentioned Depth in each of the depth of the depth of one dom trees and above-mentioned multiple 2nd dom trees obtains above-mentioned current page and above-mentioned more The similarity of a predetermined page.
Optionally, above-mentioned to be based on above-mentioned similarity, judge whether above-mentioned current page is the illegal page, including:In judgement State whether similarity is more than similarity threshold;And in the case where above-mentioned similarity is more than above-mentioned similarity threshold, in determination It is the illegal page to state current page.
Another aspect of the disclosure provides a kind of illegal page detection system, including:First acquisition module, for obtaining Take the first frame feature information of current page;Second acquisition module, the second frame feature information for obtaining predetermined page; Comparison module obtains above-mentioned current page for more above-mentioned first frame feature information and above-mentioned second frame feature information With the similarity of above-mentioned predetermined page;And judgment module, for be based on above-mentioned similarity, judge above-mentioned current page whether be The illegal page.
Optionally, the first frame feature information of above-mentioned current page includes the depth of the first dom trees of above-mentioned current page Degree;Second frame feature information of above-mentioned predetermined page includes the depth of the 2nd dom trees of above-mentioned predetermined page;Above-mentioned relatively mould Block is additionally operable to, the depth of the depth and above-mentioned 2nd dom trees of more above-mentioned first dom trees, obtain above-mentioned current page with it is above-mentioned The similarity of predetermined page.
Optionally, above-mentioned predetermined page includes multiple predetermined pages;The depth of above-mentioned 2nd dom trees includes multiple second The depth of dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;Above-mentioned comparison module includes:First determines list Member, for determining the 3rd dom trees similar or identical with above-mentioned first dom tree types in above-mentioned multiple 2nd dom trees;And ratio Compared with unit, the depth of depth and above-mentioned 3rd dom trees for more above-mentioned first dom trees, obtain above-mentioned current page with it is upper State the similarity of the corresponding page of the 3rd dom trees.
Optionally, above-mentioned determination unit includes:First extraction subelement, meets for being extracted from above-mentioned first dom trees 4th dom trees of predetermined depth;Second extraction subelement, meets above-mentioned predetermined depth for being extracted from each 2nd dom trees The 5th dom trees, obtain multiple 5th dom trees;First determination subelement, for determine in above-mentioned multiple 5th dom trees with State the similar or identical target dom trees of the 4th dom trees;And second determination subelement, for will be corresponding with above-mentioned target dom trees The 2nd dom trees be determined as above-mentioned 3rd dom trees.
Optionally, above-mentioned predetermined page includes multiple predetermined pages;The depth of above-mentioned 2nd dom trees includes multiple second The depth of dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;Above-mentioned comparison module is additionally operable to, more above-mentioned Depth in each of the depth of the depth of first dom trees and above-mentioned multiple 2nd dom trees, obtain above-mentioned current page with it is above-mentioned The similarity of multiple predetermined pages.
Optionally, above-mentioned judgment module includes:Judging unit, for judging whether above-mentioned similarity is more than similarity threshold Value;And second determination unit, in the case where above-mentioned similarity is more than above-mentioned similarity threshold, determining above-mentioned current page Face is the illegal page.
Another aspect of the disclosure provides a kind of computer system, including:One or more processors;Computer can Storage medium is read, for storing one or more programs, wherein when said one or multiple programs are by said one or multiple places When managing device execution so that said one or multiple processors realize illegal page detection method as described above.
Another aspect of the disclosure provides a kind of computer readable storage medium, is stored thereon with executable instruction, The instruction makes the processor realize illegal page detection method as described above when being executed by processor.
Description of the drawings
In order to which the disclosure and its advantage is more fully understood, referring now to being described below in conjunction with attached drawing, wherein:
Fig. 1 diagrammatically illustrates the application scenarios of the illegal page detection method and system according to the embodiment of the present disclosure;
Fig. 2 diagrammatically illustrates the flow chart of the illegal page detection method according to the embodiment of the present disclosure;
Fig. 3 A diagrammatically illustrate according to the embodiment of the present disclosure by comparing the depth of dom trees obtain current page in advance Determine the flow chart of the similarity of the page;
Fig. 3 B diagrammatically illustrate the flow chart of the 3rd dom trees of determination according to the embodiment of the present disclosure;
Fig. 3 C diagrammatically illustrate according to the embodiment of the present disclosure judge current page whether be the illegal page flow chart;
Fig. 3 D diagrammatically illustrate the flow chart of the connection webshell according to the embodiment of the present disclosure;
Fig. 3 E diagrammatically illustrate the flow chart of the illegal page detection method according to another embodiment of the disclosure;
Fig. 4 diagrammatically illustrates the block diagram of the illegal page detection system according to the embodiment of the present disclosure;
Fig. 5 A diagrammatically illustrate the block diagram of the comparison module according to the embodiment of the present disclosure;
Fig. 5 B diagrammatically illustrate the block diagram of the determination unit according to the embodiment of the present disclosure;
Fig. 5 C diagrammatically illustrate the block diagram of the judgment module according to the embodiment of the present disclosure;And
Fig. 6 diagrammatically illustrates the computer system for being adapted for carrying out illegal page detection method according to the embodiment of the present disclosure Block diagram.
Specific implementation mode
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary , and it is not intended to limit the scope of the present disclosure.In the following detailed description, it for ease of explaining, elaborates many specific thin Section is to provide the comprehensive understanding to the embodiment of the present disclosure.It may be evident, however, that one or more embodiments are not having these specific thin It can also be carried out in the case of section.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid Unnecessarily obscure the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of Or other one or more features of addition, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification Meaning, without should by idealization or it is excessively mechanical in a manner of explain.
It, in general should be according to this using " in A, B and C etc. at least one " such statement is similar to Field technology personnel are generally understood the meaning of the statement to make an explanation (for example, " with system at least one in A, B and C " Should include but not limited to individually with A, individually with B, individually with C, with A and B, with A and C, with B and C, and/or System etc. with A, B, C).Using " in A, B or C etc. at least one " such statement is similar to, it is general come Say be generally understood the meaning of the statement to make an explanation (for example, " having in A, B or C at least according to those skilled in the art One system " should include but not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or system etc. with A, B, C).It should also be understood by those skilled in the art that substantially arbitrarily indicating two or more The adversative conjunction and/or phrase of optional project shall be construed as either in specification, claims or attached drawing It gives including one of these projects, the possibility of these projects either one or two projects.For example, phrase " A or B " should It is understood to include the possibility of " A " or " B " or " A and B ".
Shown in the drawings of some block diagrams and/or flow chart.It should be understood that some sides in block diagram and/or flow chart Frame or combinations thereof can be realized by computer program instructions.These computer program instructions can be supplied to all-purpose computer, The processor of special purpose computer or other programmable data processing units, to which these instructions can be with when being executed by the processor Create the device for realizing function/operation illustrated in these block diagrams and/or flow chart.
Therefore, the technology of the disclosure can be realized in the form of hardware and/or software (including firmware, microcode etc.).Separately Outside, the technology of the disclosure can take the form of the computer program product on the computer-readable medium for being stored with instruction, should Computer program product uses for instruction execution system or instruction execution system is combined to use.In the context of the disclosure In, computer-readable medium can be the arbitrary medium can include, store, transmitting, propagating or transmitting instruction.For example, calculating Machine readable medium can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, device or propagation medium. The specific example of computer-readable medium includes:Magnetic memory apparatus, such as tape or hard disk (HDD);Light storage device, such as CD (CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication link.
Embodiment of the disclosure provides a kind of detection method, including:Obtain the first frame feature information of current page; Obtain the second frame feature information of predetermined page;Compare the first frame feature information and the second frame feature information, is worked as The similarity of the preceding page and predetermined page;And it is based on similarity, judge whether current page is the illegal page.
Fig. 1 diagrammatically illustrates the application scenarios of the illegal page detection method and system according to the embodiment of the present disclosure.It needs It should be noted that being only the example for the scene that can apply the embodiment of the present disclosure shown in Fig. 1, to help those skilled in the art to manage The technology contents of the disclosure are solved, but are not meant to that the embodiment of the present disclosure may not be usable for other equipment, system, environment or field Scape.
Webshell be one kind in the form of web page files existing for order performing environment, it can be used for management of website service Device.Currently, hacker would generally use various means that the webshell of mutation is uploaded to Website server and utilize webshell This management characteristic invade website.After uploading webshell, hacker, which must connect Webshell, could realize invasion, because This can be by detecting the connection behavior of webshell to determine whether there are hacker attacks.
As shown in Figure 1, it is assumed that user 101 wants the webshell in Connection Service device 102, to manage or invade server 102.At this point it is possible to which the connection behavior by the webshell judges whether hacker attacks.
Currently, the relevant technologies be based on connection webshell during the http request that is related to and http responses carried Text feature, detect whether that there are hacker attacks.But detect whether that there are hacker attacks based on text feature, not only hold It is easily bypassed by mutation http request and http responses by hacker, and http request due to mutation and http responses are led too much Staff is caused to need to safeguard huge rule base.
At this point it is possible to detect the connection behavior of the webshell by embodiment of the disclosure.Specifically, it can obtain and work as First frame feature information of the preceding page;Obtain the second frame feature information of predetermined page;Compare the first frame feature information With the second frame feature information, the similarity of current page and predetermined page is obtained;Based on similarity, whether current page is judged It is the illegal page.
Fig. 2 diagrammatically illustrates the flow chart of the illegal page detection method according to the embodiment of the present disclosure.
As shown in Fig. 2, the illegal page detection method may include operation S201~operation S204, wherein:
In operation S201, the first frame feature information of current page is obtained.
In operation S202, the second frame feature information of predetermined page is obtained.
In operation S203, compare the first frame feature information and the second frame feature information, obtains current page and make a reservation for The similarity of the page.
In operation S204, it is based on similarity, judges whether current page is the illegal page.
Webshell be one kind in the form of web page files existing for order performing environment, it is a kind of net that can also be referred to as Page back door, referred to as web back door, can be used for managing web server.Currently, hacker would generally use various means that will become The webshell of kind uploads to Website server and utilizes this management characteristic of webshell invasion website.However, either Administrative staff or hacker, if wanting that website is managed or invaded by webshell, it is necessary to it is successfully connected webshell, and After successful connection, the server where webshell can return to an administration page.At this point, administrative staff or hacker can lead to Cross the administration page management or invasion Website server.In order to judge whether the connection behavior currently for webshell is hacker Intrusion behavior, can whether by detecting the administration page to be the illegal page judge.
Currently, the webshell utilized by hacker is although numerous, still, major part therein is all specific according to certain Webshell mutation and come, these mutation webshell most of is title or the increase for changing some administration pages Sub-fraction function.Therefore, the frame feature information of the administration page of these mutation webshell is with its parent webshell's The frame feature information of administration interface will not change too much substantially.So can be collected into these mutation webshell's The frame feature information of the corresponding administration pages of parent webshell, and by comparing the similarity of administration page, to judge to work as Whether preceding administration interface is the illegal page.
In embodiment of the disclosure, current page may include the current administration page that server returns, the first frame Characteristic information can be used to indicate that the architectural features of current page.Predetermined page may include the illegal page collected in advance, example Such as the unauthorized management page, which may include one or more pages.Second frame feature information can be used to indicate that The architectural features of predetermined page.
In accordance with an embodiment of the present disclosure, after server returns to current page, this that can obtain server return is current First frame information of the page, and obtain the second frame information of pre-stored predetermined page.Wherein, what server returned works as The preceding page can be returned to external equipment, can also be by showing the current page on display connected to it.It is predetermined The page can be stored in the server, be can also be to be stored in External memory equipment, do not limited herein.In addition, this The illegal page detection method of open embodiment can be applied in the server, also can the individual detection device of application program In.
In accordance with an embodiment of the present disclosure, the first frame feature information and the second frame feature information, Jin Erke can be compared To obtain the similarity of current page and predetermined page.It, can be by current page when predetermined page includes multiple predetermined pages The first frame feature information be compared with the second frame feature information of each predetermined page, and then current page can be obtained The similarity in face and each predetermined page.Further, it is possible to based on obtained similarity, judge whether current page is illegal page Face.
It is different from embodiment of the disclosure, due to connection webshell during would generally be related to http request and http Response, and corresponding text feature can be generally carried in http request and http responses, thus the relevant technologies can be generally based on Text feature detects whether that there are hacker attacks.Specifically, the text that the relevant technologies can will carry in http request and http responses Eigen compares with the text feature in pre-stored feature database, if comparing successfully, show this be unauthorized person for example Hacker is directed to the connection behavior of webshell.But detect whether that there are hacker attacks based on text feature, it is not only easy to be hacked Visitor is bypassed by mutation http request and http responses, and http request due to mutation and http responses lead to work too much Personnel need to safeguard huge rule base.
And by embodiment of the disclosure, since frame feature information is more complicated, even if hacker is to frame feature information A little adjustment has been done, can also be based on whether similarity accurate judgement current page is the illegal page;And due to by hacker's profit The type of the parent webshell of mutation webshell is few, and the frame for collecting the administration page of parent webshell is special Reference breath is used as the second frame feature information, it is possible to reduce the maintenance of staff.
As a kind of optional embodiment, the first frame feature information of current page may include the first of current page The depth of dom trees;Second frame feature information of predetermined page may include the depth of the 2nd dom trees of predetermined page;Compare First frame feature information and the second frame feature information, obtain the similarity of current page and predetermined page, may include:Than Compared with the depth of the depth and the 2nd dom trees of the first dom trees, the similarity of current page and predetermined page is obtained.
In embodiment of the disclosure, dom trees can refer to the html tag sets arranged in order in the html pages, dom Tree can describe the feature of a html page to a certain extent.
In accordance with an embodiment of the present disclosure, the first frame feature information may include the first dom trees, and the first dom trees can be with It is the page-tag set arranged in order in current page;Second frame feature information may include the 2nd dom trees, this second Dom trees can be the page-tag set arranged in order in predetermined page.In addition, the first frame feature information can also include The label of current page, the label of the current page can not arranged in sequence;Second frame feature information number can be with Label including predetermined page, the label of the predetermined page can also be not arrange in sequence.
In the implementation of the disclosure, since each dom trees have certain depth, the first frame feature information can To include the depth of the first dom trees, the second frame feature information may include the depth of the 2nd dom trees.Compare the first frame spy Reference breath may include with the second frame feature information compared with the first dom trees depth and the 2nd dom trees depth, and then can be with Obtain the similarity of current page and predetermined page.
In accordance with an embodiment of the present disclosure, the depth for comparing the first dom trees and the depth of the 2nd dom trees can be calculating first Levenstein ratio between the depth of dom trees and the depth of the 2nd dom trees, and then can be by Levenstein than indicating current page The similarity in face and predetermined page.So-called Levenstein ratio can be used for indicating the similarity degree of two character strings, Levenstein The value of ratio indicates that two similarity of character string are higher closer to 1, in the case where Levenstein ratio is equal to 1, indicates two characters It goes here and there equal.
By embodiment of the disclosure, by comparing the depth of the depth and the 2nd dom trees of the first dom trees, it may be determined that The similarity of current page and predetermined page, and be difficult that other features are passed through by hacker since dom tree constructions are complex Means bypass, and then can improve the recall rate of the illegal page.
As a kind of optional embodiment, predetermined page may include multiple predetermined pages;The depth of 2nd dom trees can be with Include the depth of multiple 2nd dom trees;Each predetermined page can correspond to the depth of a 2nd dom tree;Compare the first dom The depth of the depth of tree and the 2nd dom trees obtains the similarity of current page and predetermined page, may include:Determine multiple The threeth dom tree similar or identical with the first dom tree types in two dom trees;And compare the depth and third of the first dom trees The depth of dom trees obtains the similarity of the corresponding page of current page and the 3rd dom trees.
Fig. 3 A diagrammatically illustrate according to the embodiment of the present disclosure by comparing the depth of dom trees obtain current page in advance Determine the flow chart of the similarity of the page.
As shown in Figure 3A, the depth for comparing the depth and the 2nd dom trees of the first dom trees obtains current page and predetermined page The similarity in face may include operation S301 and operate S302, wherein:
In operation S301, the 3rd dom trees similar or identical with the first dom tree types in multiple 2nd dom trees are determined.
In operation S302, compares the depth of the depth and the 3rd dom trees of the first dom trees, obtain current page and the 3rd dom The similarity of the corresponding page of tree.
In embodiment of the disclosure, when predetermined page includes multiple predetermined pages, in order to reduce calculation amount, can only compare Compared with the predetermined page similar or identical with the type of current page.
Specifically, threeth dom similar or identical with the first dom tree types can be determined from multiple 2nd dom trees Tree, and compares the depth of the depth and the 3rd dom trees of the first dom trees, and then can obtain current page and the 3rd dom trees The similarity of corresponding page.Wherein, the depth for comparing the first dom trees and the depth of the 3rd dom trees can calculate the first dom Levenstein ratio between the depth of tree and the depth of the 3rd dom trees.
By embodiment of the disclosure, the depth of the same or analogous dom trees of comparative type, it is possible to reduce calculation amount, Improve detection speed.
Fig. 3 B diagrammatically illustrate the flow chart of the 3rd dom trees of determination according to the embodiment of the present disclosure.
As shown in Figure 3B, determine that the 3rd dom trees similar or identical with the first dom tree types in multiple 2nd dom trees can To include operation S401~operation S404, wherein:
In operation S401, the 4th dom trees for meeting predetermined depth are extracted from the first dom trees.
In operation S402, the 5th dom trees for meeting predetermined depth are extracted from each 2nd dom trees, obtain multiple five Dom trees.
In operation S403, target dom trees similar or identical with the 4th dom trees in multiple 5th dom trees are determined.
In operation S404, the 2nd dom trees corresponding with target dom trees are determined as the 3rd dom trees.
In embodiment of the disclosure, the 4th dom trees can refer in the first dom trees initial position between predetermined depth Part, the 5th dom trees can refer to initial position to the part between predetermined depth in the 2nd dom trees, wherein predetermined depth Less than the depth of each dom trees.
In accordance with an embodiment of the present disclosure, determine that the threeth dom tree similar or identical with the first dom tree types can be, it will 4th dom trees are compared with each 5th dom trees, and by multiple 5th dom trees and the similar or identical dom of the 4th dom trees Tree is used as target dom trees, and then can the 2nd dom trees corresponding with target dom trees be determined as the 3rd dom trees.
Since dom trees can refer to the tag set arranged in order in the page, the 4th dom trees can refer to the first dom To the tag set arranged in order between predetermined depth, the 5th dom trees can refer to rise in the 2nd dom trees for initial position in tree Beginning position is to the tag set arranged in order between predetermined depth.In turn, by the 4th dom trees and each 5th dom trees into Row relatively can be compare initial position in the first dom trees between predetermined depth the tag set arranged in order with it is each Initial position is to the tag set arranged in order between predetermined depth in 2nd dom trees.
By embodiment of the disclosure, by by dom trees similar or identical with the 4th dom trees in multiple 5th dom trees It is determined as the 3rd dom trees as target dom trees, and by the 2nd dom trees corresponding with target dom trees, and then can only compares The depth of the depth and the 3rd dom of one dom trees, reduces the calculation amount of system, improves the detection speed of system.
As a kind of optional embodiment, predetermined page may include multiple predetermined pages;The depth of 2nd dom trees can be with Include the depth of multiple 2nd dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;Compare the first dom trees The depth of depth and the 2nd dom trees obtains the similarity of current page and predetermined page, may include:Compare the first dom trees Depth in each of depth of the depth with multiple 2nd dom trees, obtains the similarity of current page and multiple predetermined pages.
In embodiment of the disclosure, it when predetermined page includes multiple predetermined pages, in order to improve accuracy, can incite somebody to action Current page is compared with each predetermined page.
Specifically, the depth of the first dom trees can be compared with the depth of each 2nd dom trees, and then can obtained To the similarity of current page and each predetermined page, which may include multiple similarities.Wherein, compare this first The depth of dom trees and the depth of each 2nd dom trees can be the depths of the depth and each 2nd dom trees that calculate the first dom trees Levenstein ratio between degree, and then multiple Levenstein ratios can be obtained.
By embodiment of the disclosure, the first dom trees and each 2nd dom trees are compared, can prevent omit or Person judges by accident, improves the recall rate of the illegal page.
Fig. 3 C diagrammatically illustrate according to the embodiment of the present disclosure judge current page whether be the illegal page flow chart.
As shown in Figure 3 C, it is based on similarity, judges that current page whether be the illegal page may include operation S501 and behaviour Make S502, wherein:
In operation S501, judge whether similarity is more than similarity threshold.
Determine that current page is the illegal page in the case where similarity is more than similarity threshold in operation S502.
In embodiment of the disclosure, similarity can be by Levenstein than indicating, similarity threshold can be 0.9, then In the case where similarity is more than 0.9, it may be determined that current page is the illegal page.Wherein, the 0.9 of embodiment of the present disclosure setting Threshold value by being calculated with more than 100,000 random normal pages.
In accordance with an embodiment of the present disclosure, compare the depth of the depth and the 3rd dom trees of the first dom trees, can obtain current First similarity of the page and predetermined page.It is then based on similarity, judges whether current page is the illegal page and can be, is judged Whether the first similarity is more than the first similarity threshold, in the case where the first similarity is more than the first similarity threshold, determines Current page is the illegal page.Wherein, the first similarity can be the first dom trees depth and the depth of the 3rd dom trees between The first Levenstein ratio, the first similarity threshold can be 0.9, then the first Levenstein ratio be more than 0.9 in the case of, can To determine that current page is the illegal page.
In accordance with an embodiment of the present disclosure, the depth for comparing the depth and each 2nd dom trees of the first dom trees, can obtain Second similarity of current page and each predetermined page in multiple predetermined pages, obtains multiple second similarities.It is then based on similar Degree judges current page whether be the illegal page can be to judge in multiple second similarities with the presence or absence of similar more than second The similarity for spending threshold value can be in the case of there is the similarity more than the second similarity threshold in multiple second similarities Determine that current page is the illegal page.Wherein, multiple second similarities can be the depth of the first dom trees and multiple 2nd dom The second Levenstein ratio between the depth of tree, the second Levenstein ratio include multiple second Levenstein ratios, the first similarity Threshold value can be 0.9, then, can be true in the case where multiple second Levensteins are than the middle Levenstein ratio existed more than 0.9 It is the illegal page to determine current page.
By embodiment of the disclosure, in the case where similarity is more than similarity threshold, it may be determined that current page is The illegal page, can be sent out alarm or by display screen display reminding information, so as to administrative staff can prevent in time it is black The illegal invasion of visitor.
Below by taking hacker attacks as an example, the illegal page detection method that the disclosure provides is described in detail.
Currently, what hacker mainly carried out the control of website by webshell, webshell can be divided into pony With big horse.Pony function simple code is brief, is generally used for upload function or executes command functions;The function of big horse is very rich Richness can be used for manipulating, managing web.Therefore, hacker typically first uploads pony during invasion, then uploads big horse, The webshell of embodiment of the present disclosure meaning can be big horse.
Hacker is finally required for attempting to connect, this is a necessary step, therefore is detected for the webshell of upload The connection behavior of webshell is a very crucial point.It below can be with the mistake of the connection webshell described in reference chart 3D Journey.
Fig. 3 D diagrammatically illustrate the flow chart of the connection webshell according to the embodiment of the present disclosure.
As shown in Figure 3D, wherein:
In operation S601, hacker initiates the request of webshell.
In operation S602, server returns to a login page.
In operation S603, hacker inputs the password set before.
In operation S604, whether detection password is correct.
In operation S605, server returns to an administration page to hacker.
In embodiment of the disclosure, hacker accesses on oneself advance by other means uploads onto the server Webshell, then for the server where webshell to hacker's one login page of return, hacker's input is pre-set close Code, if password is correct, server can return to the administration page of webshell;If password bad, server can be after It is continuous to return to login page.
In accordance with an embodiment of the present disclosure, it can be seen that hacker needs to access once connecting webshell one with pipe The webpage (also known as administration page) for managing interface, so we can examine according to the dom tree constructions feature of this administration page Survey the behavior of webshell connections.It should be noted that the disclosure is detected primarily with respect to the big horse in webshell.
Fig. 3 E diagrammatically illustrate the flow chart of the illegal page detection method according to another embodiment of the disclosure.
As shown in FIGURE 3 E, wherein:
In the administration page that operation S701, extraction http are returned;
In operation S702, the dom trees of the extract management page;
In operation S703, the Levenstein ratio of the dom trees in the dom trees and sample database of extraction is calculated;
In operation S704, judge Levenstein than whether being more than 0.9;
In operation S705, if so, judgement is the connection behavior of webshell;
Operation S706, if it is not, then judge be not webshell connection behavior.
In embodiment of the disclosure, the upper common webshell of hacker, and duplicate removal processing can be collected.Then this is extracted The dom trees of the administration page of a little webshell, and will be in the storage to sample database of these dom trees.Administration page is returned in server Afterwards, then the dom trees that can extract the administration page calculate the Levenstein of the dom trees in the dom trees and sample database extracted Than.Pre-set Levenstein than threshold value be 0.9, the Levenstein of the dom trees inside the dom trees extracted and sample database When than being more than 0.9, judge that the corresponding webshell of the administration page is that there are a kind of webshell in sample database, and then can sentence Determine the connection behavior that this is a webshell, i.e. the administration page is the illegal page.Conversely, it is not webshell then to judge Connection behavior, the i.e. administration page are not the illegal pages.
In accordance with an embodiment of the present disclosure, the dom trees of the extract management page can be used for by the module write in advance.Specifically Ground, can be with all label nodes of the extract management page, and arranged in sequence, may further by the label node of arranged in sequence into Row compression, the label node repeated with removal for example remove the form tag node of repetition, and by the mark of remaining arranged in sequence Sign dom tree of the node as the administration page.
Fig. 4 diagrammatically illustrates the block diagram of the illegal page detection system according to the embodiment of the present disclosure.
As shown in figure 4, the illegal page detection system 400 may include the first acquisition module 410, the second acquisition module 420, comparison module 430 and judgment module 440, wherein:
First acquisition module 410 is used to obtain the first frame feature information of current page.
Second acquisition module 420 is used to obtain the second frame feature information of predetermined page.
Comparison module 430 for comparing the first frame feature information and the second frame feature information, obtain current page and The similarity of predetermined page.
Judgment module 440 is used to be based on similarity, judges whether current page is the illegal page.
By embodiment of the disclosure, since frame feature information is more complicated, even if hacker does frame feature information A little adjustment can also be based on whether similarity accurate judgement current page be the illegal page;And due to being utilized by hacker Mutation webshell parent webshell type it is few, collect the frame feature of the administration page of parent webshell Information is as the second frame feature information, it is possible to reduce the maintenance of staff.
As a kind of optional embodiment, the first frame feature information of current page includes the first dom of current page The depth of tree;Second frame feature information of predetermined page includes the depth of the 2nd dom trees of predetermined page;Comparison module is also used In comparing the depth of the depth and the 2nd dom trees of the first dom trees, obtain the similarity of current page and predetermined page.
By embodiment of the disclosure, by comparing the depth of the depth and the 2nd dom trees of the first dom trees, it may be determined that The similarity of current page and predetermined page, and be difficult that other features are passed through by hacker since dom tree constructions are complex Means bypass, and then can improve the recall rate of the illegal page.
As a kind of optional embodiment, predetermined page includes multiple predetermined pages;The depth of 2nd dom trees includes multiple The depth of 2nd dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;Comparison module includes:First determines list Member, for determining the 3rd dom trees similar or identical with the first dom tree types in multiple 2nd dom trees;And comparing unit, Depth for the depth and the 3rd dom trees that compare the first dom trees obtains the corresponding page of current page and the 3rd dom trees Similarity.
Fig. 5 A diagrammatically illustrate the block diagram of the comparison module according to the embodiment of the present disclosure.
As shown in Figure 5A, comparison module 430 may include the first determination unit 431 and comparing unit 432, wherein:
First determination unit 431 is for determining third similar or identical with the first dom tree types in multiple 2nd dom trees Dom trees.
Comparing unit 432 is used to compare the depth of the depth and the 3rd dom trees of the first dom trees, obtains current page and the The similarity of the corresponding page of three dom trees.
By embodiment of the disclosure, the depth of the same or analogous dom trees of comparative type, it is possible to reduce calculation amount, Improve detection speed.
Fig. 5 B diagrammatically illustrate the block diagram of the determination unit according to the embodiment of the present disclosure.
As shown in Figure 5 B, the first determination unit 431 may include the first extraction subelement 4311, second extraction subelement 4312, the first determination subelement 4313 and the second determination subelement 4314, wherein:
First extraction subelement 4311 from the first dom trees for extracting the 4th dom trees for meeting predetermined depth.
Second extraction subelement 4312 is obtained for extracting the 5th dom trees for meeting predetermined depth from each 2nd dom trees To multiple 5th dom trees.
First determination subelement 4313 is for determining target similar or identical with the 4th dom trees in multiple 5th dom trees Dom trees.
Second determination subelement 4314 is used to the 2nd dom trees corresponding with target dom trees being determined as the 3rd dom trees.
By embodiment of the disclosure, by by dom trees similar or identical with the 4th dom trees in multiple 5th dom trees It is determined as the 3rd dom trees as target dom trees, and by the 2nd dom trees corresponding with target dom trees, and then can only compares The depth of the depth and the 3rd dom of one dom trees, reduces the calculation amount of system, improves the detection speed of system.
As a kind of optional embodiment, predetermined page includes multiple predetermined pages;The depth of 2nd dom trees includes multiple The depth of 2nd dom trees;Each predetermined page corresponds to the depth of a 2nd dom tree;Comparison module is additionally operable to, and compares first Depth in each of depth of the depth of dom trees with multiple 2nd dom trees, obtains the phase of current page and multiple predetermined pages Like degree.
By embodiment of the disclosure, the first dom trees and each 2nd dom trees are compared, can prevent omit or Person judges by accident, improves the recall rate of the illegal page.
Fig. 5 C diagrammatically illustrate the block diagram of the judgment module according to the embodiment of the present disclosure
As shown in Figure 5 C, judgment module 440 may include judging unit 441 and the second determination unit 442, wherein:
Judging unit 441 is for judging whether similarity is more than similarity threshold.
Second determination unit 442 is used to, in the case where similarity is more than similarity threshold, determine that current page is illegal The page.
By embodiment of the disclosure, in the case where similarity is more than similarity threshold, it may be determined that current page is The illegal page, can be sent out alarm or by display screen display reminding information, so as to administrative staff can prevent in time it is black The illegal invasion of visitor.
It is any number of or in which arbitrary more in module according to an embodiment of the present disclosure, submodule, unit, subelement A at least partly function can be realized in a module.It is single according to the module of the embodiment of the present disclosure, submodule, unit, son Any one or more in member can be split into multiple modules to realize.According to the module of the embodiment of the present disclosure, submodule, Any one or more in unit, subelement can at least be implemented partly as hardware circuit, such as field programmable gate Array (FPGA), programmable logic array (PLA), system on chip, the system on substrate, the system in encapsulation, special integrated electricity Road (ASIC), or can be by the hardware or firmware for any other rational method for circuit integrate or encapsulate come real Show, or with any one in three kinds of software, hardware and firmware realization methods or with wherein arbitrary several appropriately combined next reality It is existing.Alternatively, can be at least by part according to one or more of the module of the embodiment of the present disclosure, submodule, unit, subelement Ground is embodied as computer program module, when the computer program module is run, can execute corresponding function.
For example, the first acquisition module 410, the second acquisition module 420, comparison module 430, judgment module 440, first determine Unit 431, comparing unit 432, judging unit 441, the second determination unit 442, first extraction extraction of subelement 4311, second Any number of in unit 4312, the first determination subelement 4313 and the second determination subelement 4314 may be incorporated in a module Middle realization or any one module therein can be split into multiple modules.Alternatively, one or more of these modules At least partly function of module can be combined at least partly function of other modules, and be realized in a module.According to Embodiment of the disclosure, the first acquisition module 410, the second acquisition module 420, comparison module 430, judgment module 440, first are true Order member 431, comparing unit 432, judging unit 441, the second determination unit 442, first extraction subelement 4311, second extract At least one of subelement 4312, the first determination subelement 4313 and second determination subelement 4314 can be at least by partly It is embodied as on hardware circuit, such as field programmable gate array (FPGA), programmable logic array (PLA), system on chip, substrate System, the system in encapsulation, application-specific integrated circuit (ASIC), can by circuit carry out it is integrated or encapsulate it is any its The hardware such as his rational method or firmware realize, or with any one in three kinds of software, hardware and firmware realization methods or It several appropriately combined is realized with wherein arbitrary.Alternatively, the first acquisition module 410, the second acquisition module 420, comparison module 430, judgment module 440, the first determination unit 431, comparing unit 432, judging unit 441, the second determination unit 442, first Subelement 4311, second is extracted to extract in subelement 4312, the first determination subelement 4313 and the second determination subelement 4314 It is at least one to be at least implemented partly as computer program module, it, can be with when the computer program module is run Execute corresponding function.
Fig. 6 diagrammatically illustrates the computer system for being adapted for carrying out illegal page detection method according to the embodiment of the present disclosure Block diagram.Computer system shown in Fig. 6 is only an example, should not be to the function and use scope band of the embodiment of the present disclosure Carry out any restrictions.
As shown in fig. 6, computer system 600 includes processor 610 and computer readable storage medium 620.The computer System 600 can execute the method according to the embodiment of the present disclosure.
Specifically, processor 610 for example may include general purpose microprocessor, instruction set processor and/or related chip group And/or special microprocessor (for example, application-specific integrated circuit (ASIC)), etc..Processor 610 can also include being used for caching The onboard storage device on way.Processor 610 can be performed for the different actions of the method flow according to the embodiment of the present disclosure Single treatment unit either multiple processing units.
Computer readable storage medium 620, such as can include, store, transmitting, propagating or transmitting appointing for instruction Meaning medium.For example, readable storage medium storing program for executing can include but is not limited to electricity, magnetic, optical, electromagnetic, infrared or semiconductor system, device, Device or propagation medium.The specific example of readable storage medium storing program for executing includes:Magnetic memory apparatus, such as tape or hard disk (HDD);Optical storage Device, such as CD (CD-ROM);Memory, such as random access memory (RAM) or flash memory;And/or wire/wireless communication chain Road.
Computer readable storage medium 620 may include computer program 621, which may include generation Code/computer executable instructions make processor 610 execute according to the embodiment of the present disclosure when being executed by processor 610 Method or its any deformation.
Computer program 621 can be configured with such as computer program code including computer program module.Example Such as, in the exemplary embodiment, the code in computer program 621 may include one or more program modules, such as including mould Block 621A, module 621B ....It should be noted that the dividing mode and number of module are not fixed, those skilled in the art It can be combined using suitable program module or program module according to actual conditions, when these program modules are combined by processor When 610 execution so that processor 610 can be executed according to the method for the embodiment of the present disclosure or its any deformation.
The disclosure additionally provides a kind of computer-readable medium, which can be retouched in above-described embodiment Included in the equipment/device/system stated;Can also be individualism, and without be incorporated the equipment/device/system in.On It states computer-readable medium and carries one or more program, when said one or multiple programs are performed, realize:It obtains Take the first frame feature information of current page;Obtain the second frame feature information of predetermined page;Compare the first frame feature Information and the second frame feature information, obtain the similarity of current page and predetermined page;And it is based on similarity, judge current Whether the page is the illegal page.
In accordance with an embodiment of the present disclosure, computer-readable medium can be that computer-readable signal media or computer can Storage medium either the two is read arbitrarily to combine.Computer readable storage medium for example can be --- but it is unlimited In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to:Being electrically connected, be portable with one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In the disclosure, can be any include computer readable storage medium or storage program Tangible medium, the program can be commanded execution system, device either device use or it is in connection.And in this public affairs In opening, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, In carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by Instruction execution system, device either device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to:Wirelessly, wired, optical cable, radiofrequency signal etc., or Above-mentioned any appropriate combination.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the disclosure, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, above-mentioned module, program segment, or code includes one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.
It will be understood by those skilled in the art that the feature described in each embodiment and/or claim of the disclosure can To carry out multiple combinations or/or combination, even if such combination or combination are not expressly recited in the disclosure.Particularly, exist In the case of not departing from disclosure spirit or teaching, the feature described in each embodiment and/or claim of the disclosure can To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.
Although the disclosure, art technology has shown and described with reference to the certain exemplary embodiments of the disclosure Personnel it should be understood that in the case of the spirit and scope of the present disclosure limited without departing substantially from the following claims and their equivalents, A variety of changes in form and details can be carried out to the disclosure.Therefore, the scope of the present disclosure should not necessarily be limited by above-described embodiment, But should be not only determined by appended claims, also it is defined by the equivalent of appended claims.

Claims (14)

1. a kind of illegal page detection method, including:
Obtain the first frame feature information of current page;
Obtain the second frame feature information of predetermined page;
Compare the first frame feature information and the second frame feature information, obtains the current page and make a reservation for described The similarity of the page;And
Based on the similarity, judge whether the current page is the illegal page.
2. according to the method described in claim 1, wherein:
First frame feature information of the current page includes the depth of the first dom trees of the current page;
Second frame feature information of the predetermined page includes the depth of the 2nd dom trees of the predetermined page;
The first frame feature information and the second frame feature information, obtain the current page with it is described The similarity of predetermined page, including:
The depth for comparing the depth and the 2nd dom trees of the first dom trees obtains the current page and the predetermined page The similarity in face.
3. according to the method described in claim 2, wherein:
The predetermined page includes multiple predetermined pages;
The depth of the 2nd dom trees includes the depth of multiple 2nd dom trees;
Each predetermined page corresponds to the depth of a 2nd dom tree;
The depth of the depth and the 2nd dom trees of the first dom trees, obtain the current page with it is described pre- Determine the similarity of the page, including:
Determine the 3rd dom trees similar or identical with the first dom tree types in the multiple 2nd dom trees;And
The depth for comparing the depth and the 3rd dom trees of the first dom trees, obtains the current page and the third The similarity of the corresponding page of dom trees.
4. according to the method described in claim 3, wherein it is determined that in the multiple 2nd dom trees with the first dom tree types The 3rd similar or identical dom trees, including:
Extraction meets the 4th dom trees of predetermined depth from the first dom trees;
Extraction meets the 5th dom trees of the predetermined depth from each 2nd dom trees, obtains multiple 5th dom trees;
Determine target dom trees similar or identical with the 4th dom trees in the multiple 5th dom trees;And
The 2nd dom trees corresponding with the target dom trees are determined as the 3rd dom trees.
5. according to the method described in claim 2, wherein:
The predetermined page includes multiple predetermined pages;
The depth of the 2nd dom trees includes the depth of multiple 2nd dom trees;
Each predetermined page corresponds to the depth of a 2nd dom tree;
The depth of the depth and the 2nd dom trees of the first dom trees, obtain the current page with it is described pre- Determine the similarity of the page, including:
Compare depth in each of the depth of the first dom trees and the depth of the multiple 2nd dom trees, obtains described work as The similarity of the preceding page and the multiple predetermined page.
6. described to be based on the similarity according to the method described in claim 1, wherein, judge the current page whether be The illegal page, including:
Judge whether the similarity is more than similarity threshold;And
In the case where the similarity is more than the similarity threshold, determine that the current page is the illegal page.
7. a kind of illegal page detection system, including:
First acquisition module, the first frame feature information for obtaining current page;
Second acquisition module, the second frame feature information for obtaining predetermined page;
Comparison module obtains described current for the first frame feature information and the second frame feature information The similarity of the page and the predetermined page;And
Judgment module judges whether the current page is the illegal page for being based on the similarity.
8. system according to claim 7, wherein:
First frame feature information of the current page includes the depth of the first dom trees of the current page;
Second frame feature information of the predetermined page includes the depth of the 2nd dom trees of the predetermined page;
The comparison module is additionally operable to, the depth of the depth and the 2nd dom trees of the first dom trees, is obtained described The similarity of current page and the predetermined page.
9. system according to claim 8, wherein:
The predetermined page includes multiple predetermined pages;
The depth of the 2nd dom trees includes the depth of multiple 2nd dom trees;
Each predetermined page corresponds to the depth of a 2nd dom tree;
The comparison module includes:
First determination unit, for determining similar or identical with the first dom tree types in the multiple 2nd dom trees Three dom trees;And
Comparing unit, the depth of depth and the 3rd dom trees for the first dom trees, obtains the current page The similarity in face and the corresponding page of the 3rd dom trees.
10. system according to claim 9, wherein first determination unit includes:
First extraction subelement, for extracting the 4th dom trees for meeting predetermined depth from the first dom trees;
Second extraction subelement is obtained for extracting the 5th dom trees for meeting the predetermined depth from each 2nd dom trees Multiple 5th dom trees;
First determination subelement, for determining target similar or identical with the 4th dom trees in the multiple 5th dom trees Dom trees;And
Second determination subelement, for the 2nd dom trees corresponding with the target dom trees to be determined as the 3rd dom trees.
11. system according to claim 8, wherein:
The predetermined page includes multiple predetermined pages;
The depth of the 2nd dom trees includes the depth of multiple 2nd dom trees;
Each predetermined page corresponds to the depth of a 2nd dom tree;
The comparison module is additionally operable to, the depth of the first dom trees with it is every in the depth of the multiple 2nd dom trees A depth obtains the similarity of the current page and the multiple predetermined page.
12. system according to claim 7, wherein the judgment module includes:
Judging unit, for judging whether the similarity is more than similarity threshold;And
Second determination unit, in the case where the similarity is more than the similarity threshold, determining the current page It is the illegal page.
13. a kind of computer system, including:
One or more processors;
Computer readable storage medium, for storing one or more programs,
Wherein, when one or more of programs are executed by one or more of processors so that one or more of Processor realizes illegal page detection method according to any one of claims 1 to 6.
14. a kind of computer readable storage medium, is stored thereon with executable instruction, which makes described when being executed by processor Processor realizes illegal page detection method according to any one of claims 1 to 6.
CN201810390940.4A 2018-04-27 2018-04-27 Illegal page detection method, system, computer system and readable storage medium Active CN108650250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810390940.4A CN108650250B (en) 2018-04-27 2018-04-27 Illegal page detection method, system, computer system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810390940.4A CN108650250B (en) 2018-04-27 2018-04-27 Illegal page detection method, system, computer system and readable storage medium

Publications (2)

Publication Number Publication Date
CN108650250A true CN108650250A (en) 2018-10-12
CN108650250B CN108650250B (en) 2021-07-23

Family

ID=63748251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810390940.4A Active CN108650250B (en) 2018-04-27 2018-04-27 Illegal page detection method, system, computer system and readable storage medium

Country Status (1)

Country Link
CN (1) CN108650250B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110191124A (en) * 2019-05-29 2019-08-30 哈尔滨安天科技集团股份有限公司 Website discrimination method, device and storage equipment based on web front-end exploitation data
CN111597107A (en) * 2020-04-22 2020-08-28 北京字节跳动网络技术有限公司 Information output method and device and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510887A (en) * 2009-03-27 2009-08-19 腾讯科技(深圳)有限公司 Method and device for identifying website
CN102129528A (en) * 2010-01-19 2011-07-20 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN102316081A (en) * 2010-06-30 2012-01-11 北京启明星辰信息技术股份有限公司 Method and device for identifying similar webpage
JP2013175053A (en) * 2012-02-24 2013-09-05 Hitachi Ltd Xml document retrieval device and program
US20170048273A1 (en) * 2014-08-21 2017-02-16 Salesforce.Com, Inc. Phishing and threat detection and prevention
US20170099319A1 (en) * 2015-09-16 2017-04-06 RiskIQ, Inc. Identifying phishing websites using dom characteristics
CN107204960A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Web page identification method and device, server
CN107612908A (en) * 2017-09-15 2018-01-19 杭州安恒信息技术有限公司 webpage tamper monitoring method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510887A (en) * 2009-03-27 2009-08-19 腾讯科技(深圳)有限公司 Method and device for identifying website
CN102129528A (en) * 2010-01-19 2011-07-20 北京启明星辰信息技术股份有限公司 WEB page tampering identification method and system
CN102316081A (en) * 2010-06-30 2012-01-11 北京启明星辰信息技术股份有限公司 Method and device for identifying similar webpage
JP2013175053A (en) * 2012-02-24 2013-09-05 Hitachi Ltd Xml document retrieval device and program
US20170048273A1 (en) * 2014-08-21 2017-02-16 Salesforce.Com, Inc. Phishing and threat detection and prevention
US20170099319A1 (en) * 2015-09-16 2017-04-06 RiskIQ, Inc. Identifying phishing websites using dom characteristics
CN107204960A (en) * 2016-03-16 2017-09-26 阿里巴巴集团控股有限公司 Web page identification method and device, server
CN107612908A (en) * 2017-09-15 2018-01-19 杭州安恒信息技术有限公司 webpage tamper monitoring method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENQIAN SHANG: ""Sensitive Information Acquisition Based on Machine Learning"", 《2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING》 *
冯庆等: "基于集成学习的钓鱼网页深度检测系统", 《计算机系统应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110191124A (en) * 2019-05-29 2019-08-30 哈尔滨安天科技集团股份有限公司 Website discrimination method, device and storage equipment based on web front-end exploitation data
CN110191124B (en) * 2019-05-29 2022-02-22 安天科技集团股份有限公司 Web front-end development data-based website identification method and device and storage equipment
CN111597107A (en) * 2020-04-22 2020-08-28 北京字节跳动网络技术有限公司 Information output method and device and electronic equipment
CN111597107B (en) * 2020-04-22 2023-04-28 北京字节跳动网络技术有限公司 Information output method and device and electronic equipment

Also Published As

Publication number Publication date
CN108650250B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
US11244011B2 (en) Ingestion planning for complex tables
CN113853239B (en) Intelligent identification and alarm method and system
US10603579B2 (en) Location-based augmented reality game control
CN111064745B (en) Self-adaptive back-climbing method and system based on abnormal behavior detection
CN107451476A (en) Webpage back door detection method, system, equipment and storage medium based on cloud platform
EP3561708A1 (en) Method and device for classifying uniform resource locators based on content in corresponding websites
CN106561025B (en) For providing the system and method for computer network security
US20120198342A1 (en) Automatic generation of task scripts from web browsing interaction history
US10148664B2 (en) Utilizing transport layer security (TLS) fingerprints to determine agents and operating systems
US11830099B2 (en) Machine learning modeling for protection against online disclosure of sensitive data
CN110855648B (en) Early warning control method and device for network attack
CN107508809B (en) Method and device for identifying website type
CN105938531A (en) Identifying malicious web infrastructures
CN111985545B (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN108650250A (en) Illegal page detection method, system, computer system and readable storage medium storing program for executing
CN115461796A (en) Monitoring position, trajectory and behavior of a human using thermal data
US10042824B2 (en) Detection and elimination for inapplicable hyperlinks
Kaya et al. Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation
US20200110875A1 (en) Vehicle intrusion detection system training data generation
US9323987B2 (en) Apparatus and method for detecting forgery/falsification of homepage
CN110659280A (en) Road blocking abnormity detection method and device, computer equipment and storage medium
CN109783713A (en) A kind of dynamic website classification method, system, equipment and medium
US20150135324A1 (en) Hyperlink data presentation
US20210377240A1 (en) System and methods for tokenized hierarchical secured asset distribution
CN114218574A (en) Data detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100088 Building 3 332, 102, 28 Xinjiekouwai Street, Xicheng District, Beijing

Applicant after: Qianxin Technology Group Co., Ltd.

Address before: 100016 15, 17 floor 1701-26, 3 building, 10 Jiuxianqiao Road, Chaoyang District, Beijing.

Applicant before: BEIJING QI'ANXIN SCIENCE & TECHNOLOGY CO., LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant