CN113676374B - Target website clue detection method, device, computer equipment and medium - Google Patents

Target website clue detection method, device, computer equipment and medium Download PDF

Info

Publication number
CN113676374B
CN113676374B CN202110932460.8A CN202110932460A CN113676374B CN 113676374 B CN113676374 B CN 113676374B CN 202110932460 A CN202110932460 A CN 202110932460A CN 113676374 B CN113676374 B CN 113676374B
Authority
CN
China
Prior art keywords
domain name
data packet
information
target website
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110932460.8A
Other languages
Chinese (zh)
Other versions
CN113676374A (en
Inventor
宓晨希
范渊
黄进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202110932460.8A priority Critical patent/CN113676374B/en
Publication of CN113676374A publication Critical patent/CN113676374A/en
Application granted granted Critical
Publication of CN113676374B publication Critical patent/CN113676374B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/10Mapping addresses of different types
    • H04L61/103Mapping addresses of different types across network layers, e.g. resolution of network layer into physical layer addresses or address resolution protocol [ARP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5007Internet protocol [IP] addresses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Abstract

The application relates to a target website clue detection method, a device, computer equipment and a computer readable storage medium, wherein mirror image flow of a metropolitan area network to be detected is subjected to multi-layer screening according to the characteristics of detected domain names and the characteristics of a domain name anti-blocking system, url link information of file bodies of screened data packets is extracted, real links of the url link information are used as target website clues, and target website clue detection is automatically and efficiently realized.

Description

Target website clue detection method, device, computer equipment and medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a method and apparatus for detecting clues of a target website, a computer device, and a computer readable storage medium.
Background
In the related art, a target website clue is searched from a big data engine by a human.
However, by manually searching for target website clues from the big data engine, a great amount of useless clues exist in the information acquired by the method, and the detection efficiency of the target website clues is low due to the completely manual mode. Aiming at the problem of low detection efficiency of target website clues in the related technology, no effective solution is proposed at present.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, apparatus, computer device and computer readable storage medium for detecting clues of a target website, so as to solve the problem of low detection efficiency of clues of a target website in the related art.
In a first aspect, an embodiment of the present application provides a method for detecting a thread of a target website, including the following steps:
obtaining mirror image flow of a metropolitan area network to be detected;
screening data packets carrying the detection domain name characteristics from the mirror image flow to be used as a first data packet group;
extracting HOST information of all data packets in the first data packet group, and screening a second data packet group deployed with a domain name anti-blocking system from the first data packet group based on the HOST information;
and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are clues of the target website.
In some embodiments, the screening the data packet carrying the detected domain name feature from the mirror traffic as the first data packet group includes the following steps:
And screening data packets, of which the data head information comprises keyword information in a preset keyword library and the url value of a file body is an irregular short domain name, from the mirror image flow to be used as the first data packet group.
In some embodiments, the extracting HOST information of all packets in the first packet group, and based on the HOST information, screening a second packet group with a domain name anti-blocking system deployed from the first packet group, includes the following steps:
judging whether the primary domain name corresponding to the HOST information exists in a preset first domain name library or not; the domain name of the domain name anti-sealing system is stored in the first domain name library;
if the primary domain name corresponding to the HOST information is in the first domain name library, generating a first data packet set based on the data packet corresponding to the HOST information;
if the primary domain name corresponding to the HOST information is not in the first domain name library, judging whether the primary domain name corresponding to the HOST information is provided with the domain name anti-sealing system or not; under the condition that the primary domain name corresponding to the HOST information deploys the domain name anti-blocking system, storing the primary domain name corresponding to the HOST information into the first domain name library, and generating a second data packet set based on the data packet corresponding to the HOST information; and generating the second data packet group based on the first data packet set and the second data packet set.
In some embodiments, the extracting url link information of the file body of all the data packets in the second data packet group and obtaining the real link of the url link information of the file body includes the following steps:
accessing the url link information;
if the url link information does not jump, judging the url link information as the real link and acquiring the real link;
and if the url link information is jumped, judging that the finally jumped link is the real link and acquiring the real link.
In some embodiments, after the acquiring the real link of the url link information of the file body, the method further includes:
acquiring domain name code information of the real link under the condition that the real link can be accessed;
outputting the real link as a target website under the condition that the domain name code information of the real link contains keywords in a preset keyword library;
under the condition that the real link cannot be accessed, acquiring an analysis IP of the real link;
acquiring other domain names bound by the resolution IP, traversing the other domain names bound by the resolution IP, and acquiring domain name code information of the other domain names bound by the resolution IP under the condition that the other domain names bound by the resolution IP can be accessed;
Judging whether domain name code information of other domain names bound by the resolved IP contains keywords in the keyword library, if so, outputting the real link as a suspicious target website; and if not, outputting the real link as a misjudgment target website.
In some embodiments, the obtaining the mirror traffic of the metro network to be detected includes the following steps:
copying the original flow of the metropolitan area network to be detected through a switch configuration mirror image port to obtain the mirror image flow;
or, the original flow of the metropolitan area network to be detected is duplicated through the beam splitter beam splitting, and the mirror image flow is obtained.
In some embodiments, after the obtaining the image traffic of the metro network to be detected, before the screening the data packet with the detected domain name feature from the image traffic, the method further includes:
and screening the mirror image flow, reserving all POST data packets in the mirror image flow, and updating the mirror image flow according to a filtering result.
In a second aspect, in this embodiment, there is provided a target website clue detection apparatus, including: the device comprises an acquisition module, a screening module, a first extraction module, a second extraction module and a result module:
The acquisition module is used for acquiring the mirror image flow of the metropolitan area network to be detected;
the screening module is used for screening the data packet carrying the detection domain name characteristic from the mirror image flow as a first data packet group;
the first extraction module is configured to extract HOST information of all data packets in the first data packet group, and screen a second data packet group deployed with a domain name anti-seal system from the first data packet group based on the HOST information;
the second extracting module is configured to extract url link information of a file body of all data packets in the second data packet group, and obtain a real link of the url link information of the file body, where the real link is the target website clue.
In a third aspect, in this embodiment, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect described above when the computer program is executed.
In a fourth aspect, in this embodiment a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method according to the first aspect described above.
The target website clue detection method, the target website clue detection device, the computer equipment and the computer readable storage medium are used for acquiring the mirror image flow of the metropolitan area network to be detected; screening data packets with domain name detection rule characteristics from the mirror image flow to be used as a first data packet group; extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information; and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues. Because the data of the target website all have the characteristics of detecting the domain name and are provided with the domain name anti-sealing system, the application carries out multi-layer screening on the image flow of the metropolitan area network to be detected according to the characteristics of the detected domain name characteristics and the domain name anti-sealing system, extracts url link information of a file body of the screened data packet, takes a real link of the url link information as a target website clue, and automatically and efficiently realizes target website clue detection.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is an application scenario diagram of a target website clue detection method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a method for detecting threads of a target website according to an embodiment of the present application;
FIG. 3 is a second flowchart of a target website clue detection method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a target website clue detection apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described and illustrated below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden on the person of ordinary skill in the art based on the embodiments provided herein, are intended to be within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the embodiments described herein can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar terms herein do not denote a limitation of quantity, but rather denote the singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein refers to two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
Fig. 1 is an application scenario diagram of a target website clue detection method according to an embodiment of the present application. As shown in fig. 1, data transmission between the server 101 and the mobile terminal 102 may be performed through a network. The mobile terminal 102 is configured to collect the mirror traffic of the metro network to be detected, and transmit the mirror traffic to the server 101. After the server 101 receives the mirror image flow, screening out a data packet carrying the detection domain name characteristic from the mirror image flow as a first data packet group; extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information; and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues. The server 101 may be implemented by a server cluster formed by a plurality of servers or an independent server, and the mobile terminal 102 may be any display screen with an input function.
The embodiment of the application provides a target website clue detection method, which can be used for target website clue detection in the technical field of Internet, as shown in fig. 2, and comprises the following steps:
Step S210, obtaining the mirror image flow of the metropolitan area network to be detected.
The traffic generated in the process of data interaction of the service system of the metropolitan area network is called as original traffic, the original traffic comprises an original data packet of the service system of the metropolitan area network, and the original traffic is normally forwarded according to the original configuration of the metropolitan area network. The mirror traffic is the traffic obtained by copying the original traffic, and the content contained in the mirror traffic is identical to the original traffic. The mirror image flow of the metropolitan area network to be detected can be obtained, and the mirror image flow which contains the same data as the original flow can be operated under the condition that the normal operation of the metropolitan area network is not affected. Specifically, the original traffic of the metro network to be detected can be duplicated to obtain the mirror traffic by configuring the mirror port in the switch. Or the original flow of the metropolitan area network to be detected is duplicated by the beam splitter beam splitting, so as to obtain the mirror image flow of the metropolitan area network to be detected.
Step S212, data packets carrying the detected domain name characteristics are screened out from the mirror image flow and used as a first data packet group.
Specifically, since the data of the target website all have the domain name detection feature, the mirror traffic is screened according to the domain name detection feature, so as to obtain the first data packet group. The domain name detection feature is obtained by carrying out packet grabbing analysis on the existing website deployed with the domain name anti-sealing system. The information of the general data head of the data packet carrying the detection domain name feature comprises key features such as "check", "getest_change", and the like.
Step S214, extract HOST (server side) information of all data packets in the first data packet group, and screen out the second data packet group deployed with the domain name anti-seal system from the first data packet group based on the HOST information.
Specifically, the domain name deployed with the domain name anti-sealing system is not blocked by social software such as WeChat, QQ and the like, whether the domain name is deployed with the domain name anti-sealing system is judged, whether the request packet carries the detected domain name features can be judged by carrying out packet grabbing analysis on the domain name, and if the request packet carries the detected domain name features, the domain name deployment with the domain name anti-sealing system is proved. Or carrying out packet grabbing analysis on the domain name, verifying whether the domain name carries out multi-layer domain name jump, and if the domain name carries out multi-layer domain name jump, proving that the domain name is provided with a domain name anti-sealing system. The first packet group screened in step S212 may be a packet with a target case clue, but the packet with the target case clue needs to be further screened according to whether the domain name corresponding to the HOST information of the packet in the first packet group is deployed with the domain name anti-blocking system. And screening the data packets with the domain name anti-sealing system from the first data packet group based on HOST information, wherein the data packets in the second data packet group are the data packets with the target case clues as the second data packet group.
Step S216, extracting url link information of all data packets in the second data packet group, and obtaining real links of url link information of the file body, wherein the real links are target website clues.
Specifically, through the step S214, a second data packet group with the target case clue is finally determined, and through the step S216, a real link of url link information of a file body of the second data packet group is obtained, where the real link is the target website clue. The real link of the url link information of the file body is the link of the url link information finally jumped.
In the related art, the target website clue is not detected by detecting the domain name characteristics and the characteristics of the domain name anti-sealing system, but the target website clue is searched from the big data engine manually, so that the problem of low detection efficiency of the target website clue exists. The present application obtains the mirror traffic of the metro network to be detected through the steps S210 to S216; screening data packets with domain name detection rule characteristics from the mirror image flow to be used as a first data packet group; extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information; and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues. Because the data of the target website all have the characteristics of detecting the domain name and are provided with the domain name anti-sealing system, the mirror image flow of the metropolitan area network to be detected is subjected to multi-layer screening according to the characteristics of the detected domain name characteristics and the domain name anti-sealing system, url link information of a file body of the screened data packet is extracted, and real links of the url link information are used as target website clues, so that target website clue detection is automatically and efficiently realized.
As an implementation manner, the step S210 described above may be implemented to obtain the mirror traffic of the metro network to be detected by:
copying the original flow of the metropolitan area network to be detected through the switch configuration mirror port to obtain mirror flow; or, the original traffic of the metropolitan area network to be detected is duplicated by the beam splitter to obtain the mirror traffic.
Other existing traffic replication methods can be adopted to replicate the original traffic of the metropolitan area network to be detected, so as to rapidly and efficiently obtain the mirror traffic of the metropolitan area network to be detected.
In one embodiment, after the image traffic of the metro network to be detected is obtained in the step S210, before the data packet carrying the detected domain name feature is screened out from the image traffic in the step S212, the target website clue detection method further includes the following steps:
step S211, the mirror image flow is filtered, all POST (set) data packets in the mirror image flow are reserved, and the mirror image flow is updated according to the filtering result.
Specifically, the mirror traffic contains GET packets and POST packets, but the GET packets are dense in traffic, and the target website clues are detected from all mirror traffic analysis, so that the resource consumption is too large and the detection efficiency is too low. The POST data packet in the mirror image flow can meet the detection requirement of the target website clue, so that the GET data packet is filtered out through the step S211, the POST data packet is reserved as the mirror image flow for analysis and detection, and the detection efficiency of the target website clue can be improved.
Specifically, the step S212 of screening the data packet carrying the detected domain name feature from the mirror traffic as the first data packet group includes the following steps:
step S2121, a data packet whose header information includes the keyword information in the preset keyword library and whose url value is an irregular short domain name is selected from the mirror traffic as the first data packet group.
Specifically, the existing anti-redness link generation action and anti-redness detection action of the website deployed with the domain name anti-seal system are subjected to packet capturing analysis in advance, the content of the data head of the captured data packet is extracted, keywords for submitting related actions, such as 'check', 'get_challenge', and the like, are obtained, and the keywords are stored in a preset keyword library which is established in advance and serve as the basis for follow-up detection of target website clues. The regular short domain names are known conventional domain names, and can intuitively judge that the domain names are not domain names used by target websites, such as baidu. In order to facilitate operation, the regular short domain name can be stored in a preset second domain name library, and when the data packet screening is performed, the url value of the file body is compared with the domain name stored in the preset second domain name library. In order to improve the accuracy of the subsequent detection of the target website clues, the keyword library and the second domain name library are updated continuously.
Through the step S2121, according to the detected domain name feature, the data packet possibly having the target case clue can be quickly and primarily screened from the mirror image flow, so as to lay a foundation for the subsequent target case clue detection.
Specifically, the step S214 extracts HOST information of all packets in the first packet group, and screens out the second packet group deployed with the domain name seal prevention system from the first packet group based on the HOST information, including the following steps:
step S2141, judging whether a primary domain name corresponding to HOST information exists in a preset first domain name library or not; and storing the domain name with the domain name anti-sealing system in a first domain name library.
Since the internet is continuously updated, the first domain name repository may not hold all websites in which the domain name anti-seal system is deployed. It is necessary to determine whether the primary domain name corresponding to the HOST information exists in the preset first domain name repository through step S2141.
In step S2142, if the primary domain name corresponding to the HOST information is in the first domain name repository, a first packet set is generated based on the packet corresponding to the HOST information.
Step S2143, if the primary domain name corresponding to HOST information is not in the first domain name library, judging whether a domain name anti-sealing system is deployed for the primary domain name corresponding to HOST information; under the condition that a domain name anti-sealing system is deployed on a primary domain name corresponding to HOST information, storing the primary domain name corresponding to HOST information into a first domain name library, and generating a second data packet set based on a data packet corresponding to HOST information; a second set of data packets is generated based on the first set of data packets and the second set of data packets.
Specifically, since even though the primary domain name corresponding to the HOST information is not in the first domain name repository, it is possible that the primary domain name corresponding to the HOST information deploys the domain name anti-blocking system. The method for judging whether the domain name deploys the domain name anti-blocking system is described above, and is not described here again.
Through the steps S2141 to S2143, the data packet with the domain name seal prevention system can be completely screened out, and the first domain name library is perfected at the same time.
Specifically, the step S216 extracts url link information of the file body of all the packets in the second packet group, and obtains a real link of the url link information of the file body, which includes the following steps:
in step S2161, url link information is accessed.
Specifically, since the url link information is not necessarily a true link, access verification of the url link information is required through step S2161.
In step S2162, if the url link information is not jumped, the url link information is determined to be a real link and acquired.
In step S2163, if the url link information is jumped, the link that is finally jumped is determined to be the real link and acquired.
Through the above steps S2161 to S2163, the target case cue can be accurately acquired.
In one embodiment, as shown in fig. 3, after the real link of the url link information of the document body is obtained in step S216, the target website clue detection method further includes the following steps:
step S218, obtaining domain name code information of the real link under the condition that the real link can be accessed.
Specifically, the domain name code information is information that is finally presented on the web page.
Step S220, outputting the real link as a target website in the case that the domain name code information of the real link contains keywords in a preset keyword library.
Specifically, the keyword library can be updated and perfected continuously in the use process. In the case that the domain name code information of the real link includes keywords in a preset keyword library, it can be determined that the real link is a target website.
Step S222, under the condition that the real link cannot be accessed, the analysis IP of the real link is obtained.
Specifically, under the condition that the real link cannot be accessed, the real link is parsed, and the parsed IP of the real link can be obtained.
Step S224, other domain names of the resolved IP binding are obtained, other domain names of the resolved IP binding are traversed, and domain name code information of the other domain names of the resolved IP binding is obtained under the condition that the other domain names of the resolved IP binding can be accessed.
Specifically, the domain name corresponding to the resolved IP of the real link is far more than the real link, and under the condition that the real link cannot be accessed, the clue of the target case is further judged according to other domain names bound by the resolved IP.
Step S226, judging whether domain name code information of other domain names bound by the resolved IP contains keywords in a keyword library, if so, outputting a real link as a suspicious target website; if not, outputting the real link as a misjudgment target website.
Specifically, in the domain name code information of other domain names bound by the resolved IP, as long as keywords in a keyword library are contained, the real links are output as suspicious target websites, the flow of the real links can be monitored in a key way, and once abnormality occurs, the alarm is immediately given. If the domain name code information of other domain names bound by the resolved IP does not contain keywords in the keyword library, outputting the real link as a misjudgment target website.
Through the steps S218 to S226, the target case clue can be further judged, whether the target case clue is a target website, a suspicious target website or a misjudgment target website is confirmed, different control actions are made according to different judging results, and the target website can be effectively hit.
The embodiment also provides a target website clue detection device, which is used for implementing the above embodiment and the preferred implementation, and is not described in detail. The above-mentioned various modules in the target website clue detection device may be implemented in whole or in part by software, hardware, and a combination thereof. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also conceivable.
Fig. 4 is a schematic diagram of a target website clue detection apparatus according to an embodiment of the present invention, and as shown in fig. 4, there is provided a target website clue detection apparatus 30, which includes a flow obtaining module 31, a screening module 32, a first extracting module 33, and a second extracting module 34, wherein:
an obtaining module 31, configured to obtain a mirror traffic of a metropolitan area network to be detected;
a screening module 32, configured to screen a data packet carrying a domain name feature from the mirror traffic as a first data packet group;
a first extracting module 33, configured to extract HOST information of all packets in the first packet group, and screen, based on the HOST information, a second packet group in which a domain name anti-sealing system is deployed from the first packet group;
The second extracting module 34 is configured to extract url link information of the file body of all the data packets in the second data packet group, and obtain a real link of the url link information of the file body, where the real link is a target website clue.
The target website clue detection device 30 obtains the mirror image flow of the metropolitan area network to be detected; screening data packets with domain name detection rule characteristics from the mirror image flow to be used as a first data packet group; extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information; and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues. Because the data of the target website all have the characteristics of detecting the domain name and are provided with the domain name anti-sealing system, the mirror image flow of the metropolitan area network to be detected is subjected to multi-layer screening according to the characteristics of the detected domain name characteristics and the domain name anti-sealing system, url link information of a file body of the screened data packet is extracted, and real links of the url link information are used as target website clues, so that target website clue detection is automatically and efficiently realized.
In one embodiment, the screening module 32 is further configured to screen, from the mirror traffic, a data packet whose header information includes key information in a preset key library and whose url value of the file body is an irregular short domain name, as the first data packet group.
In one embodiment, the first extraction module 33 is further configured to determine whether a primary domain name corresponding to the HOST information exists in a preset first domain name library; the domain name of the domain name anti-sealing system is stored in a first domain name library;
if the primary domain name corresponding to the HOST information is in the first domain name library, generating a first data packet set based on the data packet corresponding to the HOST information;
if the primary domain name corresponding to the HOST information is not in the first domain name library, judging whether a domain name anti-sealing system is deployed for the primary domain name corresponding to the HOST information; under the condition that a domain name anti-sealing system is deployed on a primary domain name corresponding to HOST information, storing the primary domain name corresponding to HOST information into a first domain name library, and generating a second data packet set based on a data packet corresponding to HOST information; a second set of data packets is generated based on the first set of data packets and the second set of data packets.
In one embodiment, the second extraction module 34 is further configured to access url link information; if the url link information does not jump, judging the url link information as a real link and acquiring the real link; if the url link information is jumped, determining that the finally jumped link is a real link and acquiring.
In one embodiment, the target website clue detection device 30 further includes a classification module, after acquiring the real link of the url link information of the document body, for acquiring domain name code information of the real link in the case that the real link is accessible;
outputting the real link as a target website under the condition that the domain name code information of the real link contains keywords in a preset keyword library;
under the condition that the real link cannot be accessed, acquiring an analysis IP of the real link;
acquiring other domain names of the resolved IP binding, traversing the other domain names of the resolved IP binding, and acquiring domain name code information of the other domain names of the resolved IP binding under the condition that the other domain names of the resolved IP binding can be accessed;
judging whether domain name code information of other domain names bound by the IP is analyzed to contain keywords in a keyword library, if so, outputting a real link as a suspicious target website; if not, outputting the real link as a misjudgment target website.
In one embodiment, the obtaining module 31 is further configured to copy, through the switch configuration mirror port, an original flow of the metro network to be detected, to obtain a mirror flow;
Or, the original traffic of the metropolitan area network to be detected is duplicated by the beam splitter to obtain the mirror traffic.
In one embodiment, the target website clue detection device 30 further includes a filtering module, after obtaining the mirror traffic of the metro network to be detected, before screening the data packets with the domain name feature from the mirror traffic, for screening the mirror traffic, retaining all POST data packets therein, and updating the mirror traffic according to the filtering result.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing a preset configuration information set. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the target website clue detection method described above.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a target website thread detection method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
obtaining mirror image flow of a metropolitan area network to be detected;
screening data packets carrying detection domain name characteristics from the mirror image flow to be used as a first data packet group;
extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information;
and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues.
In one embodiment, the processor when executing the computer program further performs the steps of:
and screening data packets, wherein the data header information comprises keyword information in a preset keyword library and the url value of the file body is an irregular short domain name, from the mirror image flow to be used as a first data packet group.
In one embodiment, the processor when executing the computer program further performs the steps of:
judging whether a primary domain name corresponding to HOST information exists in a preset first domain name library or not; the domain name of the domain name anti-sealing system is stored in a first domain name library;
If the primary domain name corresponding to the HOST information is in the first domain name library, generating a first data packet set based on the data packet corresponding to the HOST information;
if the primary domain name corresponding to the HOST information is not in the first domain name library, judging whether a domain name anti-sealing system is deployed for the primary domain name corresponding to the HOST information; under the condition that a domain name anti-sealing system is deployed on a primary domain name corresponding to HOST information, storing the primary domain name corresponding to HOST information into a first domain name library, and generating a second data packet set based on a data packet corresponding to HOST information; a second set of data packets is generated based on the first set of data packets and the second set of data packets.
In one embodiment, the processor when executing the computer program further performs the steps of:
accessing url link information;
if the url link information does not jump, judging the url link information as a real link and acquiring the real link;
if the url link information is jumped, determining that the finally jumped link is a real link and acquiring.
In one embodiment, after obtaining the real link of the url link information of the file body, the processor when executing the computer program further implements the steps of:
under the condition that the real link can be accessed, acquiring domain name code information of the real link;
Outputting the real link as a target website under the condition that the domain name code information of the real link contains keywords in a preset keyword library;
under the condition that the real link cannot be accessed, acquiring an analysis IP of the real link;
acquiring other domain names of the resolved IP binding, traversing the other domain names of the resolved IP binding, and acquiring domain name code information of the other domain names of the resolved IP binding under the condition that the other domain names of the resolved IP binding can be accessed;
judging whether domain name code information of other domain names bound by the IP is analyzed to contain keywords in a keyword library, if so, outputting a real link as a suspicious target website; if not, outputting the real link as a misjudgment target website.
In one embodiment, the processor when executing the computer program further performs the steps of:
copying the original flow of the metropolitan area network to be detected through the switch configuration mirror port to obtain mirror flow;
or, the original traffic of the metropolitan area network to be detected is duplicated by the beam splitter to obtain the mirror traffic.
In one embodiment, after obtaining the image traffic of the metro network to be detected, before screening the data packet with the detected domain name feature from the image traffic, the processor executes the computer program to further implement the following steps:
And screening the mirror image flow, reserving all POST data packets in the mirror image flow, and updating the mirror image flow according to the filtering result.
The storage medium acquires the mirror image flow of the metropolitan area network to be detected; screening data packets with domain name detection rule characteristics from the mirror image flow to be used as a first data packet group; extracting HOST information of all data packets in the first data packet group, and screening a second data packet group provided with a domain name anti-sealing system from the first data packet group based on the HOST information; and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are target website clues. Because the data of the target website all have the characteristics of detecting the domain name and are provided with the domain name anti-sealing system, the mirror image flow of the metropolitan area network to be detected is subjected to multi-layer screening according to the characteristics of the detected domain name characteristics and the domain name anti-sealing system, url link information of a file body of the screened data packet is extracted, and real links of the url link information are used as target website clues, so that target website clue detection is automatically and efficiently realized.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present application, are within the scope of the present application in light of the embodiments provided herein.
It is evident that the drawings are only examples or embodiments of the present application, from which the present application can also be adapted to other similar situations by a person skilled in the art without the inventive effort. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as an admission of insufficient detail.
The term "embodiment" in this application means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in this application can be combined with other embodiments without conflict.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. The target website clue detection method is characterized by comprising the following steps of:
obtaining mirror image flow of a metropolitan area network to be detected;
screening data packets carrying the detection domain name characteristics from the mirror image flow to be used as a first data packet group;
extracting HOST information of all data packets in the first data packet group, and screening a second data packet group deployed with a domain name anti-blocking system from the first data packet group based on the HOST information;
and extracting url link information of file bodies of all data packets in the second data packet group, and acquiring real links of the url link information of the file bodies, wherein the real links are clues of the target website.
2. The method for detecting threads of a target website according to claim 1, wherein the step of screening the data packet carrying the detected domain name feature from the mirrored traffic as the first data packet group includes the following steps:
and screening data packets, of which the data head information comprises keyword information in a preset keyword library and the url value of a file body is an irregular short domain name, from the mirror image flow to be used as the first data packet group.
3. The method for detecting threads of a target website according to claim 1, wherein the extracting HOST information of all packets in the first packet group, and based on the HOST information, screening out a second packet group with a domain name seal prevention system deployed therein from the first packet group, comprises the following steps:
Judging whether the primary domain name corresponding to the HOST information exists in a preset first domain name library or not; the domain name of the domain name anti-sealing system is stored in the first domain name library;
if the primary domain name corresponding to the HOST information is in the first domain name library, generating a first data packet set based on the data packet corresponding to the HOST information;
if the primary domain name corresponding to the HOST information is not in the first domain name library, judging whether the primary domain name corresponding to the HOST information is provided with the domain name anti-sealing system or not; under the condition that the primary domain name corresponding to the HOST information deploys the domain name anti-blocking system, storing the primary domain name corresponding to the HOST information into the first domain name library, and generating a second data packet set based on the data packet corresponding to the HOST information; and generating the second data packet group based on the first data packet set and the second data packet set.
4. The method for detecting clue of a target website according to claim 1, wherein the steps of extracting url link information of file bodies of all data packets in the second data packet group and obtaining real links of url link information of the file bodies include the following steps:
Accessing the url link information;
if the url link information does not jump, judging the url link information as the real link and acquiring the real link;
and if the url link information is jumped, judging that the finally jumped link is the real link and acquiring the real link.
5. The method for detecting clue to a target website according to claim 1, wherein after said acquiring real links of url link information of said document body, said method further comprises:
acquiring domain name code information of the real link under the condition that the real link can be accessed;
outputting the real link as a target website under the condition that the domain name code information of the real link contains keywords in a preset keyword library;
under the condition that the real link cannot be accessed, acquiring an analysis IP of the real link;
acquiring other domain names bound by the resolution IP, traversing the other domain names bound by the resolution IP, and acquiring domain name code information of the other domain names bound by the resolution IP under the condition that the other domain names bound by the resolution IP can be accessed;
judging whether domain name code information of other domain names bound by the resolved IP contains keywords in the keyword library, if so, outputting the real link as a suspicious target website; and if not, outputting the real link as a misjudgment target website.
6. The method for detecting target website clues according to any one of claims 1 to 5, wherein the step of obtaining the mirror traffic of the metropolitan area network to be detected comprises the steps of:
copying the original flow of the metropolitan area network to be detected through a switch configuration mirror image port to obtain the mirror image flow;
or, the original flow of the metropolitan area network to be detected is duplicated through the beam splitter beam splitting, and the mirror image flow is obtained.
7. The method for detecting threads of a target website according to claim 1, wherein after the obtaining of the mirrored traffic of the metropolitan area network to be detected, before the screening of the data packet with the detected domain name feature from the mirrored traffic, the method further comprises:
and screening the mirror image flow, reserving all POST data packets in the mirror image flow, and updating the mirror image flow according to a filtering result.
8. A target website thread detection apparatus, the apparatus comprising: the device comprises an acquisition module, a screening module, a first extraction module, a second extraction module and a result module:
the acquisition module is used for acquiring the mirror image flow of the metropolitan area network to be detected;
the screening module is used for screening the data packet carrying the detection domain name characteristic from the mirror image flow as a first data packet group;
The first extraction module is configured to extract HOST information of all data packets in the first data packet group, and screen a second data packet group deployed with a domain name anti-seal system from the first data packet group based on the HOST information;
the second extracting module is configured to extract url link information of a file body of all data packets in the second data packet group, and obtain a real link of the url link information of the file body, where the real link is the target website clue.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202110932460.8A 2021-08-13 2021-08-13 Target website clue detection method, device, computer equipment and medium Active CN113676374B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110932460.8A CN113676374B (en) 2021-08-13 2021-08-13 Target website clue detection method, device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110932460.8A CN113676374B (en) 2021-08-13 2021-08-13 Target website clue detection method, device, computer equipment and medium

Publications (2)

Publication Number Publication Date
CN113676374A CN113676374A (en) 2021-11-19
CN113676374B true CN113676374B (en) 2024-03-22

Family

ID=78542840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110932460.8A Active CN113676374B (en) 2021-08-13 2021-08-13 Target website clue detection method, device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN113676374B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6102406A (en) * 1999-06-07 2000-08-15 Steven A. Miles Internet-based advertising scheme employing scavenger hunt metaphor
CN101727471A (en) * 2008-10-30 2010-06-09 鸿富锦精密工业(深圳)有限公司 Website content retrieval system and method
CN105376217A (en) * 2015-10-15 2016-03-02 中国互联网络信息中心 Method for automatically determining malicious redirecting and malicious nesting offensive websites
CN107092826A (en) * 2017-03-24 2017-08-25 北京国舜科技股份有限公司 Web page contents real-time safety monitoring method
CN108173814A (en) * 2017-12-08 2018-06-15 深信服科技股份有限公司 Detection method for phishing site, terminal device and storage medium
CN109450880A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Detection method for phishing site, device and computer equipment based on decision tree
WO2020135233A1 (en) * 2018-12-26 2020-07-02 中兴通讯股份有限公司 Botnet detection method and system, and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6102406A (en) * 1999-06-07 2000-08-15 Steven A. Miles Internet-based advertising scheme employing scavenger hunt metaphor
CN101727471A (en) * 2008-10-30 2010-06-09 鸿富锦精密工业(深圳)有限公司 Website content retrieval system and method
CN105376217A (en) * 2015-10-15 2016-03-02 中国互联网络信息中心 Method for automatically determining malicious redirecting and malicious nesting offensive websites
CN107092826A (en) * 2017-03-24 2017-08-25 北京国舜科技股份有限公司 Web page contents real-time safety monitoring method
CN108173814A (en) * 2017-12-08 2018-06-15 深信服科技股份有限公司 Detection method for phishing site, terminal device and storage medium
CN109450880A (en) * 2018-10-26 2019-03-08 平安科技(深圳)有限公司 Detection method for phishing site, device and computer equipment based on decision tree
WO2020135233A1 (en) * 2018-12-26 2020-07-02 中兴通讯股份有限公司 Botnet detection method and system, and storage medium

Also Published As

Publication number Publication date
CN113676374A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN110233849B (en) Method and system for analyzing network security situation
US8606795B2 (en) Frequency based keyword extraction method and system using a statistical measure
KR101132197B1 (en) Apparatus and Method for Automatically Discriminating Malicious Code
CN107370719B (en) Abnormal login identification method, device and system
WO2011032094A1 (en) Extracting information from unstructured data and mapping the information to a structured schema using the naive bayesian probability model
CN112887341B (en) External threat monitoring method
US9871826B1 (en) Sensor based rules for responding to malicious activity
US10482240B2 (en) Anti-malware device, anti-malware system, anti-malware method, and recording medium in which anti-malware program is stored
CN111008405A (en) Website fingerprint identification method based on file Hash
CN113268739A (en) Docker mirror image security detection method
CN114003794A (en) Asset collection method, device, electronic equipment and medium
US11797617B2 (en) Method and apparatus for collecting information regarding dark web
CN111404937B (en) Method and device for detecting server vulnerability
CN107786529B (en) Website detection method, device and system
CN114422271A (en) Data processing method, device, equipment and readable storage medium
CN113676374B (en) Target website clue detection method, device, computer equipment and medium
CN111314326B (en) Method, device, equipment and medium for confirming HTTP vulnerability scanning host
KR101725404B1 (en) Method and apparatus for testing web site
CN112148545A (en) Security baseline detection method and security baseline detection system of embedded system
CN115001724B (en) Network threat intelligence management method, device, computing equipment and computer readable storage medium
CN115314483A (en) API asset determining method and abnormal calling early warning method
CN115643082A (en) Method and device for determining lost host and computer equipment
CN115098151A (en) Fine-grained intranet equipment firmware version detection method
KR101999130B1 (en) System and method of detecting confidential information based on 2-tier for endpoint DLP
Ahmed et al. A fault tolerant approach for malicious URL filtering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant