CN112866023B - Network detection method, model training method, device, equipment and storage medium - Google Patents

Network detection method, model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN112866023B
CN112866023B CN202110042191.8A CN202110042191A CN112866023B CN 112866023 B CN112866023 B CN 112866023B CN 202110042191 A CN202110042191 A CN 202110042191A CN 112866023 B CN112866023 B CN 112866023B
Authority
CN
China
Prior art keywords
domain name
target
data
detected
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110042191.8A
Other languages
Chinese (zh)
Other versions
CN112866023A (en
Inventor
王晓明
梁彧
田野
傅强
王杰
杨满智
蔡琳
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202110042191.8A priority Critical patent/CN112866023B/en
Publication of CN112866023A publication Critical patent/CN112866023A/en
Application granted granted Critical
Publication of CN112866023B publication Critical patent/CN112866023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The embodiment of the invention discloses a network detection method, a device, equipment and a storage medium. The method comprises the following steps: acquiring request domain name data corresponding to target access flow data of a target server; under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an resolution protocol address corresponding to the request domain name data; under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained; performing algorithm identification detection on all domain names to be detected, and acquiring the number of target algorithm domain names in all domain names to be detected; and under the condition that the number of the domain names of the target algorithm meets the detection condition of the target network, acquiring source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as the target network addresses. The embodiment of the invention can realize the rapid, accurate and comprehensive detection and identification of the target network.

Description

Network detection method, model training method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a network detection and model training method, a network detection and model training device, network detection and model training equipment and a storage medium.
Background
With the increasing development of computer networks, the network scale is larger and larger, and the computer and network technologies are more and more complex, so that the possibility of various network attacks and network anomalies is increased, and a hacker attacker performs a great amount of malicious activities on a host infected with viruses by using an illegal network.
Since the advent of the Domain Name System (DNS), DNS has been considered as one of the most important Internet services, and almost all network services rely on DNS services to resolve Domain names into IP addresses (Internet Protocol addresses), so that DNS is often used by lawbreakers to launch various network attacks. For example, botnets are an illegal network commonly used by attackers. Botnets mostly use Domain Flux technology, and their core algorithm is DGA (Domain Generation Algorithms). The DGA algorithm is an algorithm for generating a domain name using time, a dictionary, and hard-coded constants, and the generated domain name has randomness. Fig. 1 is a schematic diagram of the workflow of a botnet in the prior art, and as shown in fig. 1, the rough flow used by the DGA algorithm is as follows: firstly, an attacker generates a large number of DGA domain names by operating an algorithm, then randomly selects a small number of domain names for registration, and binds the domain names to a C & C Server (Command and Control Server); after a victims machine is implanted with malicious programs, a domain name is generated by running a DGA algorithm, whether the domain name can be connected or not is detected, if the domain name can not be connected, the next domain name is tried, and if the domain name can be connected, the domain name is selected as a control end server domain name of the malicious programs.
Based on the above process, the botnet based on the Domain-flux technology can well evade detection, and the difficulty of detection and management is greatly increased, so that when a large number of botnet hosts appear in the network, how to discover abnormal behaviors of the botnets as soon as possible to generate corresponding alarms and prevent further propagation and expansion of the botnets is very important.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, a device and a storage medium for network detection and model training, so as to achieve fast, accurate and comprehensive detection and identification of a target network.
In a first aspect, an embodiment of the present invention provides a network detection method, including:
acquiring request domain name data corresponding to target access flow data of a target server;
under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an resolution protocol address corresponding to the request domain name data;
under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained;
performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected;
and under the condition that the number of the domain names of the target algorithm meets the detection condition of the target network, acquiring source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as target network addresses.
In a second aspect, an embodiment of the present invention provides a model training method, including:
acquiring a target algorithm domain name sample, wherein the target algorithm domain name sample comprises a blacklist sample and a white list sample, and the blacklist sample comprises a multidimensional character sequence characteristic corresponding to a target algorithm domain name;
and training a domain name recognition model according to the target algorithm domain name sample.
In a third aspect, an embodiment of the present invention further provides a network detection apparatus, including:
the domain name data acquisition module is used for acquiring request domain name data corresponding to target access flow data of a target server;
the protocol address acquisition module is used for acquiring an analysis protocol address corresponding to the request domain name data under the condition that the request domain name data is determined to meet the target domain name characteristics;
the domain name acquisition module to be detected is used for acquiring all domain names to be detected corresponding to the resolution protocol address under the condition that the resolution protocol address is determined to meet the domain name detection condition;
the algorithm identification detection module is used for carrying out algorithm identification detection on all the domain names to be detected and acquiring the number of target algorithm domain names in all the domain names to be detected;
and the network address determining module is used for acquiring the source protocol addresses of all the domain names to be detected under the condition that the number of the domain names of the target algorithm is determined to meet the target network detection condition, and determining the source protocol addresses as the target network addresses.
In a fourth aspect, an embodiment of the present invention further provides a model training apparatus, including:
the system comprises a sample acquisition module, a data processing module and a data processing module, wherein the sample acquisition module is used for acquiring a target algorithm domain name sample, the target algorithm domain name sample comprises a blacklist sample and a white list sample, and the blacklist sample comprises a multidimensional character sequence characteristic corresponding to a target algorithm domain name;
and the model training module is used for training a domain name recognition model according to the target algorithm domain name sample.
In a fifth aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the network detection method or the model training method provided by any embodiment of the present invention.
In a sixth aspect, an embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the network detection method or the model training method provided in any embodiment of the present invention.
The embodiment of the invention is based on the characteristics of the domain name and the protocol address in the target network, and the requested domain name data meeting the characteristics of the target domain name is screened from the access flow data of the target server accessed by the target network node, the resolution protocol address corresponding to the part of requested domain name data is obtained, the domain name corresponding to the resolution protocol address meeting the domain name detection condition is subjected to algorithm identification detection, and the domain name generated based on the target algorithm is identified, so that the target network address can be traced reversely according to the domain name, and the rapid, accurate and comprehensive detection and identification of the target network are realized.
Drawings
Fig. 1 is a schematic diagram of a prior art botnet workflow.
Fig. 2 is a flowchart of a network detection method according to an embodiment of the present invention.
Fig. 3 is a flowchart of a network detection method according to a second embodiment of the present invention.
Fig. 4 is a schematic flow chart of top-level design of a traffic data ticket according to the second embodiment of the present invention.
Fig. 5 is a schematic flowchart of a target domain name feature extraction according to a second embodiment of the present invention.
Fig. 6 is a flowchart illustrating a Domain-Flux botnet detection method according to a second embodiment of the present invention.
Fig. 7 is a schematic structural diagram of a Domain-Flux botnet detection system according to a second embodiment of the present invention.
Fig. 8 is a service flow chart of a Domain-Flux botnet detection system according to a second embodiment of the present invention.
Fig. 9 is a network topology diagram of a Domain-Flux botnet detection system according to a second embodiment of the present invention.
Fig. 10 is a flowchart of a model training method according to a third embodiment of the present invention.
Fig. 11 is a schematic structural diagram of a domain name recognition model according to a third embodiment of the present invention.
Fig. 12 is a schematic flowchart of a DGA domain name detection method according to a third embodiment of the present invention.
Fig. 13 is a schematic structural diagram of a network detection apparatus according to a fourth embodiment of the present invention.
Fig. 14 is a schematic structural diagram of a model training apparatus according to a fifth embodiment of the present invention.
Fig. 15 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.
It should be further noted that, for the convenience of description, only some but not all of the relevant elements of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Example one
Fig. 2 is a flowchart of a network detection method according to an embodiment of the present invention, where this embodiment is applicable to a situation where a target network address is tracked backward according to a domain name based on characteristics of the domain name and a protocol address in a target network, and the method can be executed by a network detection apparatus provided in an embodiment of the present invention, and the apparatus can be implemented by software and/or hardware, and can be generally integrated in a computer device. Accordingly, as shown in fig. 2, the method includes the following operations:
s110, obtaining request domain name data corresponding to the target access flow data of the target server.
The target server may be a server accessed by the device, for example, a DNS server. The target access traffic data may be traffic data generated when a device accesses a target server to request a connection to be established. The request domain name data may be a domain name of a device that accesses the target server and generates target access traffic data.
Accordingly, a large amount of traffic data may be generated during the operation of the target server, including target access traffic data generated when the device accesses the target server to request connection establishment. Optionally, traffic data of the target server may be collected and analyzed by using a Traffic data collection device, for example, an NTA (Network Traffic Analysis) device, so that the target access Traffic data may be screened out according to a corresponding field of the Traffic data. Further, the target access traffic data generated by the device accessing the target server includes the domain name information of the device, and the request domain name data of the device can be acquired according to the corresponding field of the target access traffic data.
S120, under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an analysis protocol address corresponding to the request domain name data.
The target Domain name feature may be a feature of a Domain name used by a node device in the target network that needs to be detected, for example, a feature of a Domain name used by a node device in a botnet based on Domain-Flux technology. The target domain name feature may be any extracted domain name feature, for example, the target domain name feature may include character composition, arrangement rule feature, and the like of a domain name. The resolution protocol address may be an IP address obtained by the target server performing resolution according to the request domain name data when the device accesses the target server and requests connection.
Correspondingly, the characteristics of the domain name can be extracted as the characteristics of the target domain name by analyzing the currently known node equipment domain name in the target network. Therefore, the requested domain name data meeting the target domain name feature has a higher probability of being the domain name of the target network node device, and the resolved protocol address obtained by resolving the requested domain name data has a higher resolution protocol address possibly corresponding to the domain name of the target network node device. Further, an analysis protocol address obtained by analyzing the request domain name data by the target server can be obtained, so that analysis and screening can be performed according to the characteristics of the analysis protocol address, and whether the analysis protocol address is the analysis protocol address corresponding to the domain name of the target network node device or not can be further determined. Optionally, if the request domain name data does not satisfy the target domain name feature, it may be determined that the request domain name data is not the domain name of the target network node device, and the detection may be stopped.
S130, under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained.
The domain name detection condition may be a characteristic of an IP address obtained by analyzing a domain name of a node device in the target network through the target server. The domain name detection condition may be any feature of the IP address obtained by the resolution, for example, a character composition and an arrangement rule feature of the IP address, or a correspondence feature between the IP address and the domain name, and the specific content of the domain name detection condition is not limited in this embodiment. The domain name to be detected can be the domain name of the resolution protocol address which meets the domain name detection condition through the resolution of the target server.
For example, for a zombie network based on Domain-Flux technology, the correspondence between the Domain name of the node device and the IP address obtained by resolution is usually many-to-one in number, and therefore, the Domain name detection condition may be that the number of Domain names to be detected corresponding to the resolution protocol address is multiple. Specifically, the target server may analyze all request domain name data that have initiated access to obtain respective corresponding resolution protocol addresses, and when any two or more request domain name data correspond to the same resolution protocol address, the resolution protocol address satisfies the domain name detection condition.
Correspondingly, the resolution protocol address meeting the domain name detection condition has a higher probability of being obtained by resolving the domain name of the target network node equipment. Further, all the domain names to be detected corresponding to the resolution protocol addresses can be acquired, so that the characteristics of all the domain names to be detected are used as analysis and screening bases, and whether the resolution protocol addresses are the resolution protocol addresses corresponding to the domain names of the target network node devices is further determined by judging whether the domain names to be detected include the domain names of the target network node devices. Optionally, if the resolution protocol address does not satisfy the domain name detection condition, it may be determined that the resolution protocol address is not obtained by resolving the domain name of the target network node device, and the detection may be stopped.
S140, performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected.
The algorithm identification detection may be a process of determining whether the domain name to be detected is the target algorithm domain name. The number of target algorithm domain names may be the number of target algorithm domain names in all domain names to be detected. The target algorithm domain name may be a domain name generated from a generation algorithm of the target network node device domain name.
Accordingly, the target network may typically employ a specific algorithm to generate the Domain name of the network node device, e.g., a zombie network based on Domain-Flux technology may employ a DGA algorithm to generate the Domain name. The domain names generated according to the same specific algorithm can have uniform rules and characteristics, algorithm identification detection can be performed on the domain names to be detected based on the rules and the characteristics, target algorithm domain names in the domain names to be detected are screened out, and then the target algorithm domain names can be judged as the domain names of the target network node devices. According to the algorithm identification detection result of each domain name to be detected, the number of target algorithm domain names in all the domain names to be detected can be counted, so that whether the resolution protocol address is the resolution protocol address corresponding to the domain name of the target network node equipment or not is further determined according to the number of the target algorithm domain names corresponding to the resolution protocol address.
S150, under the condition that the number of the domain names of the target algorithm meets the target network detection condition, acquiring the source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as the target network addresses.
The target network detection condition may be a feature of a number of domain names of a target algorithm corresponding to a resolution protocol address corresponding to a domain name of the target network node device. The source protocol address may be an IP address of the device corresponding to the domain name to be detected. The target network address may be an IP address of a node device in the target network, i.e., a botnet.
Optionally, for a zombie network based on Domain-Flux technology, since the same zombie network generally controls node devices in the network in the same manner, for example, the same malicious program is deployed on the node devices, the Domain names of the same target network node device may all correspond to the same resolution protocol address. Therefore, the target network detection condition may be that the number of the target algorithm domain names is at least one, and when all the domain names to be detected corresponding to the resolution protocol address include at least one target algorithm domain name, it may be determined that all the domain names to be detected are the domain names of the target network node devices.
Correspondingly, the resolution protocol addresses whose number of domain names of the target algorithm meets the target network detection condition may be determined as the resolution protocol addresses corresponding to the domain names of the target network node devices, so that all the domain names to be detected may be determined as the domain names of the target network node devices, and the source protocol addresses of all the domain names to be detected are the IP addresses of the target network node devices. Optionally, if the number of domain names in the target algorithm does not satisfy the target network detection condition, it may be determined that the resolution protocol address is not obtained by resolution of the domain name of the target network node device, and the detection may be stopped if the domain name to be detected is not the domain name of the target network node device.
The embodiment of the invention provides a network detection method, which is based on the characteristics of a domain name and a protocol address in a target network, and is characterized in that request domain name data meeting the characteristics of the target domain name are screened from access flow data of a target server accessed by a target network node, an analytic protocol address corresponding to the part of the request domain name data is obtained, algorithm identification detection is carried out on the domain name corresponding to the analytic protocol address meeting the domain name detection condition, and the domain name generated based on a target algorithm is identified, so that the target network address can be tracked reversely according to the domain name, and the target network can be detected and identified quickly, accurately and comprehensively.
Example two
Fig. 3 is a flowchart of a network detection method according to a second embodiment of the present invention. The embodiment of the present invention is embodied on the basis of the above-mentioned embodiments, and in the embodiment of the present invention, a specific optional implementation manner for performing algorithm identification detection on all the domain names to be detected is provided.
As shown in fig. 3, the method of the embodiment of the present invention specifically includes:
s210, obtaining request domain name data corresponding to the target access flow data of the target server.
In an optional embodiment of the present invention, before the obtaining the request domain name data corresponding to the target access traffic data of the target server, the method may further include: acquiring an original access traffic data set of the target server; generating original access ticket data according to the original access traffic data set; acquiring traffic data association information according to the original access ticket data; and screening the target access flow data from the original access flow data set according to the flow data correlation information.
The original access traffic data set may be all traffic data generated during the operation of the target server. The original access ticket data may be ticket data in a preset format generated according to the original access traffic data set. The traffic data association information includes a traffic data protocol type, a traffic data request domain name list, and a traffic data basic feature, specifically, the traffic data protocol type may be a protocol type adopted by the device access server, the traffic data request domain name list may be a domain name white list including a common security domain name and a domain name black list including a known target network node device domain name, and the traffic data basic feature may be a traffic data feature read from original access ticket data, having a screening function, and may be predetermined as needed.
Correspondingly, all traffic data generated in the working process of the target server can be captured to obtain original access traffic data, and therefore original access ticket data are generated. The format of the original access ticket data can be designed in advance according to the requirement.
Exemplarily, fig. 4 is a schematic flow diagram of a top-level design of a traffic data ticket provided in an embodiment of the present invention, and as shown in fig. 4, for the traffic data of the DNS server, a top-level design flow of an original access ticket data may include: extracting DNS fields according to the actual flow condition of the DNS; aiming at different attack types, extracting DNS fields required to be applied correspondingly, and sorting all the required fields; designing a DNS ticket, testing whether the DNS ticket can be used for detecting the attack of a botnet to a DNS server in actual existing network data, and optimizing, modifying and adjusting according to a test result; and updating a new call bill, performing analysis and Inspection by comparing with original call bill data of a Deep Packet Inspection (DPI) of the existing system, performing feedback discussion on a part with problems, and finally determining a DNS call bill format. Finally, the DNS call ticket field obtained based on the process can reach 99% output, and basically meets the functions of identifying and monitoring the Domain-Flux botnet, and the specific call ticket format is shown in table 1.
Further, traffic data association information and a filtering condition for filtering the target access traffic data from the original access traffic data set may be predetermined. Optionally, for target access traffic data of the DNS server, data filtering may be performed for the DNS protocol type based on the traffic data protocol type; the filtering can be performed based on a flow data request domain name list, specifically, a domain name white list comprises the first 100 ten thousand domain names in the Alexa website flow ranking and domestic common domain names, a domain name black list comprises related malicious domain names which are disclosed on the network, and whether the domain name data belonging to the domain name white list or the domain name black list is the domain name of the target network node device or not can be determined; filtering may also be performed based on the basic characteristics of the traffic data, specifically, only the response packet may be considered, the QR field is 1 to indicate that there is the request content, the opcode field is 0 to indicate the forward query, and the Rcode field is 0.
TABLE 1
Figure BDA0002896338520000081
In the embodiment, the original access traffic data set of the target server is collected, the original access ticket data corresponding to the original access traffic data set is generated, the target access traffic data is screened out according to the original access ticket data, the traffic data is preprocessed, network detection is performed according to the accurately processed traffic data, and the accuracy of a network detection result is improved.
In an optional embodiment of the present invention, after obtaining the request domain name data corresponding to the target access traffic data of the target server, the method may further include: under the condition that the unit time access failure times of the request domain name data exceed a normal access threshold value, performing algorithm identification detection on the request domain name data to acquire a target algorithm domain name in the request domain name data; and extracting the domain name characteristics of the target algorithm domain name, and determining the domain name characteristics of the target algorithm domain name as the target domain name characteristics.
The access failure times per unit time may be the failure times of the device accessing the target server and requesting to establish a connection in the unit time. The normal access threshold may be a maximum number of access failures that may occur per unit time when the device of the security domain name accesses the target server and requests to establish a connection. The domain name feature may be a feature that the domain name of the target algorithm has in a unified manner, and may include any extracted feature, and the specific content of the domain name feature is not limited in this embodiment.
Accordingly, since the target network may generally generate a large number of domain names of the target algorithm and register only a small number of domain names, the node device of the target network may access the target server using a large number of unregistered domain names and request to establish a connection, which may result in a large number of access failures. Based on the above process, the requested domain name data with access failure times exceeding the normal access threshold value in unit time has a higher probability of being the domain name of the target network node device, and the algorithm identification detection is performed on the requested domain name data, so that whether the requested domain name data is the target algorithm domain name or not can be determined. If the request domain name data is determined to be the target algorithm domain name, that is, the request domain name data is the domain name of the target network node device, the target domain name feature can be extracted according to the request domain name data. Exemplarily, fig. 5 is a schematic flowchart of a process for extracting a target Domain name feature according to an embodiment of the present invention, as shown in fig. 5, a normal access threshold for a Domain-Flux botnet is set to 200 times, and after performing DGA algorithm identification detection on request Domain name data whose access failure times exceed 200 times, it is verified that more than 70% of the class of Domain names are identified as DGA Domain names, and in addition, a third-party platform query process is subsequently added to determine that the class of Domain names are indeed Domain names of target network node devices, so as to perform the target Domain name feature extraction.
In an optional embodiment of the invention, the target domain name feature may comprise a regular expression comprising feature characters, numbers, a main domain name and a suffix domain name.
And obtaining the target domain name feature according to the target domain name feature extraction process. Exemplarily, fig. 6 is a schematic flow diagram of a Domain-Flux botnet detection method according to an embodiment of the present invention, and as shown in fig. 6, regular expressions of Domain name types to be screened are request Domain name data of "m number + Domain (main Domain name) + suffix (suffix Domain name)" and "x number + Domain + suffix" according to characteristics of a target Domain name.
In the above embodiment, based on the characteristics that a large number of unregistered domain names exist in the target network and a large number of access failure conditions are caused, the request domain name data of the domain name of the suspected target network node device is screened out according to the access failure times corresponding to the request domain name data, and the domain name of the target network node device is further obtained through algorithm identification and detection, so that the domain name of the target network node device is subjected to feature extraction to obtain the target domain name features, accurate extraction of the target domain name features is realized, and the accuracy and the comprehensiveness of the domain name of the target network node device identified according to the target domain name features in the subsequent method flow are improved.
S220, under the condition that the request domain name data are determined to meet the characteristics of the target domain name, the resolution protocol address corresponding to the request domain name data is obtained.
And S230, under the condition that the resolution protocol address meets the domain name detection condition, acquiring all domain names to be detected corresponding to the resolution protocol address.
S240, performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected.
In an optional embodiment of the present invention, S240 may specifically include:
s241, extracting main domain name data to be detected of all the domain names to be detected, and matching the main domain name data to be detected with a domain name list.
The domain name list comprises a domain name white list and a domain name black list. The main domain name data to be detected may be a main domain name of the domain name to be detected.
Specifically, the domain white list may include common security domains, and optionally, may include top 100 ten thousand domains in the Alexa website traffic rank and common domain names in China. The domain name blacklist may include domain names of known target network node devices and, optionally, related malicious domain names that are disclosed on the network.
Correspondingly, the main domain name data to be detected is matched with the domain name list, and the main domain name data to be detected can be compared with the main domain name of the domain name in the domain name list to determine whether the main domain name data to be detected is the same as the main domain name of any domain name in the domain name white list or the domain name black list. If the main domain name data to be detected is the same as the main domain name of any domain name in the domain name white list or the domain name blacklist, matching the main domain name data to be detected with the domain name white list or the domain name blacklist; and if the main domain name data to be detected is different from the main domain names of the domain names in the domain name white list and the domain name blacklist, the main domain name data to be detected is not matched with the domain name white list and the domain name blacklist.
And S242, judging whether the main domain name data to be detected is matched with the domain name blacklist, if so, executing S243, otherwise, executing S244.
Correspondingly, the data of the main domain name to be detected is matched with the domain name blacklist, and it can be determined that the main domain name to be detected is the same as the main domain name of the known target network node device, that is, the main domain name to be detected is generated by a specific algorithm adopted by the target network, and the domain name to be detected can be the domain name of the target network node device. If the data of the main domain name to be detected is not matched with the domain name blacklist, whether the main domain name to be detected is generated by a specific algorithm adopted by a target network cannot be determined temporarily, and further whether the domain name to be detected is the domain name of the target network node equipment cannot be determined, and further detection is needed.
And S243, determining the domain name to be detected corresponding to the main domain name data to be detected as the target algorithm domain name.
And S244, judging whether the main domain name data to be detected is matched with the domain name white list, if so, executing S245, otherwise, executing S246.
Correspondingly, the data of the main domain name to be detected is matched with the domain name white list, so that it can be determined that the main domain name to be detected is the same as the main domain name of the safety domain name, that is, the main domain name to be detected is not generated by a specific algorithm adopted by the target network, and the domain name to be detected is not the domain name of the target network node device. If the data of the main domain name to be detected is not matched with the domain name white list, it is temporarily impossible to determine whether the main domain name to be detected is generated by a specific algorithm adopted by the target network, and further, it is impossible to determine whether the domain name to be detected is the domain name of the target network node device, and further detection is required.
S245, determining the domain name to be detected corresponding to the main domain name data to be detected as a non-target algorithm domain name.
Where the non-target algorithm domain name may be a domain name that is not generated by the particular algorithm employed by the target network.
It should be noted that, in the above S242, the execution sequence for determining whether the main domain name data to be detected matches the domain name blacklist or not and determining whether the main domain name data to be detected matches the domain name whitelist or not in S244 may be changed, and correspondingly, the execution sequence between S242 to S246 may be: s244, judging whether the main domain name data to be detected is matched with the domain name white list, if so, executing S245, otherwise, executing S242; and S242, judging whether the main domain name data to be detected is matched with the domain name blacklist, if so, executing S243, otherwise, executing S246.
The embodiment preferentially screens the main domain name data to be detected according to the known domain name white list and the domain name black list, greatly simplifies the data to be detected, reduces the complexity of the subsequent screening process, and improves the network detection speed and the accuracy of the network detection result.
S246, matching the main domain name data to be detected with a preset concatenation word list, determining whether the main domain name data to be detected matches with the preset concatenation word list, if yes, performing S243, otherwise, continuing to determine whether the corresponding domain name to be detected is the target algorithm domain name according to the multidimensional characteristics of the main domain name data to be detected, and in an optional embodiment of the present invention, performing S247.
The predetermined concatenation word list may be a set of words that includes domain names that are commonly used to concatenate the constituent target network node devices. The multi-dimensional features may include extracted main domain name features of any dimension.
In an optional embodiment of the present invention, the preset concatenation word list comprises: for concatenating the words that make up the main domain name data for the target algorithm domain name.
Correspondingly, in actual detection, a large number of missed-report situations can occur when main domain name data to be detected is screened only according to a domain name list and multidimensional characteristics, and the main domain name data to be detected which are found to be missed-report do not belong to a domain name blacklist and do not have the main domain name characteristics of target network node equipment, but are formed by splicing a plurality of words, specifically, the main domain name data to be detected can be formed by directly splicing two words or connecting the words through a "-" symbol. Therefore, words for splicing and forming the main domain name of the target network device node can be collected and sorted according to the known domain name of the target network device node, and a preset spliced word list is formed.
Optionally, for the Domain name of the node device of the Domain-Flux botnet, three types of DGA families are collected, specifically including a suppobox family, a pizza family and a nymain _ num family, and the characteristics that the main Domain name is formed by splicing words are met, the main Domain names of the three types of DGA families can be sorted, 7000 words used for splicing the main Domain names are extracted, and a preset spliced word list is formed.
Further, after the domain name list is screened, the main domain name data to be detected can be matched with words in the preset concatenation word list before the main domain name data to be detected is continuously screened according to the multidimensional characteristics. If the main domain name data to be detected is formed by splicing word data in a preset splicing word list, matching the main domain name data to be detected with the preset splicing word list, and determining that the domain name to be detected corresponding to the main domain name data to be detected is a target algorithm domain name; if the main domain name data to be detected is not formed by splicing word data in the preset splicing word list, the main domain name data to be detected is not matched with the preset splicing word list, and the main domain name data to be detected can be further screened according to the multidimensional characteristics.
According to the embodiment, before the main domain name data to be detected is screened according to the multidimensional characteristics, the special type of target algorithm domain name is screened out through the collected and sorted concatenation word list, the condition of missing report when network detection is carried out simply according to the domain name list and the multidimensional characteristics is avoided, and the accuracy and the comprehensiveness of the network detection result are further improved.
S247, inputting the main domain name data to be detected into a domain name recognition model, and determining an algorithm recognition detection result of the main domain name data to be detected according to an output result of the domain name recognition model.
The domain name recognition model may be a model for recognizing whether the main domain name data to be detected is the main domain name corresponding to the target algorithm domain name according to the characteristics of the main domain name, the input may be the main domain name data, and the output may be the classification result of whether the main domain name data is the main domain name corresponding to the target algorithm domain name. The output result of the domain name recognition model may include that the main domain name data to be detected is the main domain name corresponding to the target algorithm domain name, or that the main domain name data to be detected is not the main domain name corresponding to the target algorithm domain name. The algorithm identification detection result may be obtained according to an output result of the domain name identification model, and may include that the domain name data to be detected is a target algorithm domain name or a non-target algorithm domain name.
Correspondingly, a domain name recognition model can be obtained in advance through a model training method based on machine learning capacity, so that the domain name recognition model can perform multi-dimensional feature extraction on input main domain name data, classification is performed according to feature extraction results, and finally a classification result of whether the main domain name data is the main domain name corresponding to the target algorithm domain name is output.
Further, inputting the main domain name data to be detected into a domain name identification model, and if the output result is that the main domain name data to be detected is the main domain name corresponding to the target algorithm domain name, determining that the algorithm identification detection result is that the domain name data to be detected is the target algorithm domain name; and if the output result is that the main domain name data to be detected is not the main domain name corresponding to the target algorithm domain name, determining that the algorithm identification detection result is that the domain name data to be detected is the non-target algorithm domain name.
And S248, acquiring the number of the target algorithm domain names in all the domain names to be detected.
The embodiment is based on machine learning ability, and further screens the main domain name data to be detected according to the multidimensional characteristics, so that all forms of the domain name of the target network node equipment are screened and detected, and the accuracy and comprehensiveness of the network detection result are ensured.
S250, under the condition that the number of the domain names of the target algorithm meets the detection condition of the target network, acquiring the source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as the target network addresses.
Correspondingly, the embodiment of the invention also provides a Domain-Flux botnet detection system. The Domain-Flux botnet detection system can realize the network detection method in the embodiment of the invention, and realize detection and identification aiming at the Domain-Flux botnet. Fig. 7 is a schematic structural diagram of a Domain-Flux botnet detection system according to an embodiment of the present invention. As shown in fig. 7, the system includes a data acquisition layer, a data processing layer, and a presentation layer. The data acquisition layer mainly performs acquisition, analysis and reporting on the original traffic of the DNS server by using special NTA acquisition equipment. The data processing layer carries out preprocessing such as data cleaning, filtering, comparison, marking and statistics on the collected DNS original flow data, carries out data processing by utilizing technologies such as real-time flow calculation, off-line calculation, memory calculation and machine learning by utilizing a DGA algorithm and blacklist characteristics, and provides calculation processing support for the display layer. The display layer is mainly used for summarizing and displaying botnet main control IP, malicious domain names and controlled IP identified in DNS traffic and early warning real-time botnet attack events.
Fig. 8 is a service flow diagram of a Domain-Flux botnet detection system according to an embodiment of the present invention. As shown in fig. 8, the specific process may include collecting and generating a complete DNS access record ticket from the original traffic, relying on knowledge bases such as black and white Domain names and black and white IPs accumulated by the self and synchronized with the records, and then using a feature extraction rule to implement six-dimensional feature extraction, including feature character analysis of IP corresponding to a plurality of Domain names, domain name corresponding to a plurality of IPs, domain name formation, TTL feature analysis, nxdomain/MX feature analysis, and other feature analyses, and finally implementing monitoring and identification of the master control IP, malicious Domain name, and controlled IP (broiler chicken) through training of a word matching DGA algorithm model and a Domain-Flux type zombie network detection model.
Fig. 9 is a network topology diagram of a Domain-Flux botnet detection system according to an embodiment of the present invention. Based on the network detection method provided by the embodiment of the invention, a botnet DGA detection subsystem is established, and botnet detection is carried out on the DNS service node at the enterprise side. According to the current situation of a DNS system, a DGA detection engine and botnet detection engine equipment need to be added and configured, and equipment required by a platform can be properly added and configured according to the DNS traffic, wherein the specific configuration is shown in Table 2.
The embodiment of the invention provides a network detection method, based on the characteristics of a domain name and a protocol address in a target network, by screening request domain name data meeting the characteristics of the target domain name in access flow data of a target server receiving the access of a target network node, acquiring an analysis protocol address corresponding to the request domain name data, performing algorithm identification detection on the domain name corresponding to the analysis protocol address meeting the domain name detection condition, and identifying the domain name generated based on a target algorithm, so that the target network address can be tracked reversely according to the domain name, and the rapid, accurate and comprehensive detection and identification of the target network are realized; furthermore, according to the domain name list, the word splicing characteristics and the multi-dimensional characteristics, the main domain name data are comprehensively screened, so that the algorithm identification detection of the domain name data is realized, and the accuracy and the comprehensiveness of the network detection result are ensured.
TABLE 2
Figure BDA0002896338520000141
EXAMPLE III
Fig. 10 is a flowchart of a model training method provided in the third embodiment of the present invention, where this embodiment is applicable to training a domain name recognition model based on machine learning ability, and this method may be executed by the model training apparatus provided in this embodiment of the present invention, and this apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. Accordingly, as shown in fig. 10, the method includes the following operations:
s310, obtaining a target algorithm domain name sample.
The target algorithm domain name samples comprise blacklist samples and white list samples, and the blacklist samples comprise multidimensional character sequence characteristics corresponding to the target algorithm domain name.
In particular, the blacklist sample may include node device domain names generated by known target networks using a target algorithm. The white list sample may include known safe domain names. The multi-dimensional character sequence features may enable any extracted in the blacklist sample to distinguish the blacklist sample from the features of the whitelist sample.
Optionally, for detection of the Domain-Flux botnet, on one hand, the first 100 ten thousand Domain names of the Alexa website traffic rank can be collected and used as white samples for training, and meanwhile, in order to ensure that the model is more suitable for the domestic Domain names, 2 ten thousand domestic common Domain names are added and used as the white samples; on the other hand, all the existing DGA domain name data are not simply directly used as a blacklist sample for training, but the training data of part of families are increased through the analysis of the proportion of the newly added domain name proportion of each DGA family every day and the recall rate of each DGA family in algorithm detection, and the accuracy of the model is further improved.
And S320, training a domain name recognition model according to the target algorithm domain name sample.
Wherein, the domain name recognition model can be a deep neural network model based on machine learning ability.
Alternatively, an LSTM (Long Short-Term Memory) model may be employed. The LSTM model is a special type of RNN (Recurrent Neural Networks) that can learn long-term dependency information, such as text and language. In DGA domain name detection, the LSTM model can be used to learn patterns of character sequences (domain names), helping us identify which are DGA-generating domains and which are not.
The great advantage of using deep learning is to make up the problem that the extracted features are not comprehensive enough when the features are extracted in the feature engineering. If we use conventional methods to generate a long list of features, such as length, vowel, consonant, and n-gram language models, and use these features to identify DGA-generated domains and non-DGA-generated domains, then security personnel are required to update and create new feature libraries in real time, which can be an extremely laborious and painful process. Secondly, once the attacker masters the filtering rules, the attacker can easily escape from our detection by updating the DGA algorithm thereof, and the automatic characterization learning capability of deep learning also enables us to adapt to the changing opponents more quickly. Meanwhile, the huge investment of manpower and material resources is greatly reduced, when DGA domain name discrimination is carried out, because extra expensive infrastructure such as a network sensor and a third-party reputation system is often needed for generating the context function, the domain name is only identified without using any context function such as NXDomains, and the advantages of the domain name discrimination method are still obvious in an LSTM model without context information.
The LSTM model has four layers that interact in a special way. Fig. 11 is a schematic structural diagram of a domain name recognition model provided in an embodiment of the present invention, as shown in fig. 11, the domain name recognition model is based on an LSTM model, and a workflow thereof may specifically include: encoding the extracted main domain name, and converting the encoded sample vector into a fixed size through an embedding layer, such as converting the sample vector [ [4], [20] ] into [ [0.25,0.1], [0.6, -0.2] ]; the LSTM layer learns characteristics from samples, common domain names of Alexa websites and domain names collected when phishing websites are made in China are mainly used as white samples, the black samples are DGA domain name data disclosed within a period of time selected from 360netlab, and the ratio of the black samples to the white samples is ensured to be 1:1, aiming at a DGA family with less partial data in a black sample, the number of sample collection is increased so as to ensure the robustness of the model; the dropout layer is used for over-fitting of a trained neural network and randomly disconnecting a certain proportion of neuron connections; the dense layer is to map the learned features to the sample space; the activation layer converts the weight into a binary result.
Further, a model training result may be obtained, wherein the confusion matrix is shown in table 3, and the model training index is shown in table 4.
Fig. 12 is a schematic flowchart of a DGA domain name detection method according to an embodiment of the present invention. As shown in fig. 12, by performing data cleaning on the original data set of the DNS server traffic, a domain name with a null request domain name is filtered, and then a main domain name is extracted, and whether the domain name data is DGA domain name data is predicted by filtering a domain name list, matching a preset concatenation word list, and an LSTM algorithm. Specifically, in the actual data test, the data volume of the white sample is fixed at present, so that only the recall rate of the black sample is considered here, the black sample is the full data updated every day by 360netlab, and the data of four days are measured here, respectively, as shown in table 5.
TABLE 3
Predicting white samples Predicting black samples
Actual white sample 190454 2356
Actual black sample 2795 197126
TABLE 4
Rate of accuracy Recall rate
White sample 98.55% 98.78%
Black sample 98.82% 98.60%
TABLE 5
Date of day Total number of Recall rate Actually adding new number on the same day Recall rate of newly added samples
2 month and 20 days 1246776 99.32% 43808 93.48%
2 month and 21 days 1247018 99.32% 36877 93.42%
2 month and 24 days 1252847 99.32% 41002 93.43%
3 month and 17 days 1255261 99.33% 35372 93.40%
Under the same data set, in githu and some papers, including the currently mainstream model algorithm for DGA domain name detection, by comparison, the effect of the algorithm on the test set is shown in table 6, and the result under the new data considering only the recall rate of the newly darkened sample is shown in table 7. Specifically, taking a virus family in a DGA family as an example, and taking additional newly added sample data of 360netlab every day as a reference, the training data amount is increased by 4 times of the original amount, and the recall rate of the newly added sample data is increased by 16% on average, wherein the recall rate of the virus is 80% on average and is far higher than 13.5% of the original recall rate.
TABLE 6
xgboost DNN LSTM The algorithm
White sample accuracy 100% 98.10% 98.57% 98.55%
White sample recall 0% 98.81% 98.74% 98.78%
Black sample accuracy 50.91% 98.84% 98.78% 98.82%
Black sample recall 100% 98.15% 98.62% 98.60%
TABLE 7
Naive Bayes xgboost DNN LSTM The algorithm
2 month and 20 days 92.85% 100% 90.96% 92.27% 93.48%
2 month and 21 days 92.43% 100% 90.16% 91.95% 93.56%
The embodiment of the invention provides a model training method, which is characterized in that a large amount of sample data is collected and sorted, common sample data in the prior art is expanded, a domain name recognition model is trained based on machine learning capacity, the obtained domain name recognition model has extremely high accuracy and recall rate, and the accuracy and comprehensiveness of network detection results are further improved.
Example four
Fig. 13 is a schematic structural diagram of a network detection apparatus according to a fourth embodiment of the present invention, and as shown in fig. 13, the apparatus includes: a domain name data acquisition module 410, a protocol address acquisition module 420, a domain name to be detected acquisition module 430, an algorithm identification detection module 440 and a network address determination module 450.
The domain name data obtaining module 410 is configured to obtain request domain name data corresponding to target access traffic data of a target server.
A protocol address obtaining module 420, configured to obtain a resolution protocol address corresponding to the request domain name data when it is determined that the request domain name data meets the target domain name characteristic.
The domain name to be detected acquiring module 430 is configured to acquire all domain names to be detected corresponding to the resolution protocol address under the condition that it is determined that the resolution protocol address meets the domain name detection condition.
And the algorithm identification detection module 440 is configured to perform algorithm identification detection on all the domain names to be detected, and obtain the number of target algorithm domain names in all the domain names to be detected.
The network address determining module 450 is configured to, when it is determined that the number of the domain names in the target algorithm meets the target network detection condition, obtain source protocol addresses of all the domain names to be detected, and determine the source protocol addresses as target network addresses.
In an optional implementation manner of the embodiment of the present invention, the apparatus may further include: the target domain name feature determination module is used for carrying out algorithm identification detection on the request domain name data under the condition that the unit time access failure times of the request domain name data exceed a normal access threshold value, and acquiring a target algorithm domain name in the request domain name data; and extracting the domain name characteristics of the target algorithm domain name, and determining the domain name characteristics of the target algorithm domain name as the target domain name characteristics.
In an optional implementation manner of the embodiment of the present invention, the target domain name feature includes: a feature number, a main domain name, and a suffix domain name.
In an optional implementation manner of the embodiment of the present invention, the apparatus may further include: the target access flow data screening module is used for acquiring an original access flow data set of the target server; generating original access ticket data according to the original access traffic data set; acquiring traffic data association information according to the original access ticket data; the traffic data associated information comprises a traffic data protocol type, a traffic data request domain name list and traffic data basic characteristics; and screening the target access flow data from the original access flow data set according to the flow data correlation information.
In an optional implementation manner of the embodiment of the present invention, the algorithm identification detection module 440 may be specifically configured to: extracting main domain name data to be detected of all the domain names to be detected, and matching the main domain name data to be detected with a domain name list; the domain name list comprises a domain name white list and a domain name black list; determining the domain name to be detected corresponding to the main domain name data to be detected as the target algorithm domain name under the condition that the main domain name data to be detected is matched with the domain name blacklist; determining the domain name to be detected corresponding to the main domain name data to be detected as a non-target algorithm domain name under the condition that the main domain name data to be detected is matched with the domain name white list; and under the condition that the main domain name data to be detected are not matched with the domain name white list and the domain name black list, matching the main domain name data to be detected with word data in a preset splicing word list, and under the condition that the main domain name data to be detected are matched with the preset splicing word list, determining the domain name to be detected corresponding to the main domain name data to be detected as the target algorithm domain name.
In an optional implementation manner of the embodiment of the present invention, the algorithm identification detection module 440 may be further configured to: and under the condition that the main domain name data to be detected is determined to be not matched with the preset spliced word list, inputting the main domain name data to be detected into a domain name recognition model, and determining an algorithm recognition detection result of the main domain name data to be detected according to an output result of the domain name recognition model.
In an optional implementation manner of the embodiment of the present invention, the preset concatenation word list includes: for concatenating the words that make up the main domain name data for the target algorithm domain name.
The device can execute the network detection method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the network detection method.
The embodiment of the invention provides a network detection device, which is based on the characteristics of a domain name and a protocol address in a target network, and is characterized in that request domain name data meeting the characteristics of the target domain name are screened from access flow data of a target server accessed by a target network node, an analytic protocol address corresponding to the part of the request domain name data is obtained, algorithm identification detection is carried out on the domain name corresponding to the analytic protocol address meeting the domain name detection condition, and the domain name generated based on a target algorithm is identified, so that the target network address can be tracked reversely according to the domain name, and the target network can be detected and identified quickly, accurately and comprehensively.
EXAMPLE five
Fig. 14 is a schematic structural diagram of a model training apparatus according to a fifth embodiment of the present invention, and as shown in fig. 14, the apparatus includes: a sample acquisition module 510 and a model training module 520.
The sample obtaining module 510 is configured to obtain a domain name sample of a target algorithm.
The target algorithm domain name samples comprise blacklist samples and white list samples, and the blacklist samples comprise multidimensional character sequence characteristics corresponding to the target algorithm domain name.
And the model training module 520 is used for training the domain name recognition model according to the target algorithm domain name sample.
The device can execute the model training method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the model training method.
The embodiment of the invention provides a model training device, which is used for training a domain name recognition model based on machine learning capacity by collecting and sorting a large amount of sample data and expanding common sample data in the prior art, so that the obtained domain name recognition model has extremely high accuracy and recall rate, and the accuracy and comprehensiveness of network detection results are further improved.
Example six
Fig. 15 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 15 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 15 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in FIG. 15, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 15, commonly referred to as a "hard drive"). Although not shown in FIG. 15, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, the network adapter 20 communicates with the other modules of the computer device 12 over the bus 18. It should be understood that although not shown in FIG. 15, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 16 executes various functional applications and data processing by running the program stored in the memory 28, so as to implement the network detection method provided by the embodiment of the present invention: acquiring request domain name data corresponding to target access flow data of a target server; under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an resolution protocol address corresponding to the request domain name data; under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained; performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected; and under the condition that the number of the domain names of the target algorithm is determined to meet the detection condition of the target network, acquiring source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as target network addresses.
Or the model training method provided by the embodiment of the invention comprises the following steps: acquiring a target algorithm domain name sample, wherein the target algorithm domain name sample comprises a blacklist sample and a white list sample, and the blacklist sample comprises a multi-dimensional character sequence characteristic corresponding to a target algorithm domain name; and training a domain name recognition model according to the target algorithm domain name sample.
EXAMPLE seven
The seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where when the computer program is executed by a processor, the method for detecting a network provided by the embodiment of the present invention is implemented: acquiring request domain name data corresponding to target access flow data of a target server; under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an resolution protocol address corresponding to the request domain name data; under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained; performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected; and under the condition that the number of the domain names of the target algorithm meets the detection condition of the target network, acquiring source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as target network addresses.
Or the model training method provided by the embodiment of the invention comprises the following steps: acquiring a target algorithm domain name sample, wherein the target algorithm domain name sample comprises a blacklist sample and a white list sample, and the blacklist sample comprises a multidimensional character sequence characteristic corresponding to a target algorithm domain name; and training a domain name recognition model according to the target algorithm domain name sample.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (11)

1. A method for network detection, comprising:
acquiring request domain name data corresponding to target access flow data of a target server;
under the condition that the request domain name data are determined to meet the target domain name characteristics, acquiring an resolution protocol address corresponding to the request domain name data;
under the condition that the resolution protocol address meets the domain name detection condition, all domain names to be detected corresponding to the resolution protocol address are obtained;
performing algorithm identification detection on all the domain names to be detected, and acquiring the number of target algorithm domain names in all the domain names to be detected;
under the condition that the number of the domain names of the target algorithm is determined to meet the detection condition of a target network, acquiring source protocol addresses of all the domain names to be detected, and determining the source protocol addresses as target network addresses;
the domain name detection condition is specifically that the number of domain names to be detected corresponding to the resolution protocol address is multiple.
2. The method according to claim 1, further comprising, after the obtaining the request domain name data corresponding to the target access traffic data of the target server, the step of:
under the condition that the unit time access failure times of the request domain name data exceed a normal access threshold value, carrying out algorithm identification detection on the request domain name data to obtain a target algorithm domain name in the request domain name data;
and extracting the domain name characteristics of the target algorithm domain name, and determining the domain name characteristics of the target algorithm domain name as the target domain name characteristics.
3. The method of claim 1, wherein the target domain name feature comprises a regular expression comprising a feature character, a number, a main domain name, and a suffix domain name.
4. The method according to claim 1, before the obtaining the request domain name data corresponding to the target access traffic data of the target server, further comprising:
acquiring an original access traffic data set of the target server;
generating original access ticket data according to the original access traffic data set;
acquiring traffic data association information according to the original access ticket data; the traffic data associated information comprises a traffic data protocol type, a traffic data request domain name list and traffic data basic characteristics;
and screening the target access flow data from the original access flow data set according to the flow data correlation information.
5. The method according to claim 1, wherein the performing algorithm identification detection on all the domain names to be detected comprises:
extracting main domain name data to be detected of all the domain names to be detected, and matching the main domain name data to be detected with a domain name list; the domain name list comprises a domain name white list and a domain name black list;
determining the domain name to be detected corresponding to the main domain name data to be detected as the target algorithm domain name under the condition that the main domain name data to be detected is matched with the domain name blacklist;
determining the domain name to be detected corresponding to the main domain name data to be detected as a non-target algorithm domain name under the condition that the main domain name data to be detected is matched with the domain name white list;
and under the condition that the main domain name data to be detected is not matched with the domain name white list and the domain name black list, matching the main domain name data to be detected with a preset splicing word list, and under the condition that the main domain name data to be detected is matched with the preset splicing word list, determining the domain name to be detected corresponding to the main domain name data to be detected as the target algorithm domain name.
6. The method according to claim 5, wherein after the matching the main domain name data to be detected with a preset concatenation word list, the method further comprises:
and under the condition that the main domain name data to be detected is determined to be not matched with the preset spliced word list, inputting the main domain name data to be detected into a domain name recognition model, and determining an algorithm recognition detection result of the main domain name data to be detected according to an output result of the domain name recognition model.
7. The method of claim 6, wherein the pre-determined concatenation word list comprises: for concatenating the words that make up the main domain name data for the target algorithm domain name.
8. The method of claim 6, further comprising:
acquiring a target algorithm domain name sample, wherein the target algorithm domain name sample comprises a blacklist sample and a white list sample, and the blacklist sample comprises a multidimensional character sequence characteristic corresponding to a target algorithm domain name;
and training a domain name recognition model according to the target algorithm domain name sample.
9. A network sensing apparatus, comprising:
the domain name data acquisition module is used for acquiring request domain name data corresponding to target access flow data of a target server;
the protocol address acquisition module is used for acquiring an analysis protocol address corresponding to the request domain name data under the condition that the request domain name data is determined to meet the target domain name characteristics;
the domain name acquisition module to be detected is used for acquiring all domain names to be detected corresponding to the resolution protocol address under the condition that the resolution protocol address is determined to meet the domain name detection condition;
the algorithm identification detection module is used for carrying out algorithm identification detection on all the domain names to be detected and acquiring the number of target algorithm domain names in all the domain names to be detected;
the network address determining module is used for acquiring source protocol addresses of all domain names to be detected under the condition that the number of the domain names of the target algorithm is determined to meet the target network detection condition, and determining the source protocol addresses as target network addresses;
the domain name detection condition is specifically that the number of domain names to be detected corresponding to the resolution protocol address is multiple.
10. A computer device, characterized in that the computer device comprises:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the network detection method of any of claims 1-8.
11. A computer storage medium having a computer program stored thereon, the program, when executed by a processor, implementing the network detection method according to any one of claims 1-8.
CN202110042191.8A 2021-01-13 2021-01-13 Network detection method, model training method, device, equipment and storage medium Active CN112866023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110042191.8A CN112866023B (en) 2021-01-13 2021-01-13 Network detection method, model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110042191.8A CN112866023B (en) 2021-01-13 2021-01-13 Network detection method, model training method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112866023A CN112866023A (en) 2021-05-28
CN112866023B true CN112866023B (en) 2023-04-07

Family

ID=76003307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110042191.8A Active CN112866023B (en) 2021-01-13 2021-01-13 Network detection method, model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112866023B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452810B (en) * 2021-07-08 2023-05-12 恒安嘉新(北京)科技股份公司 Traffic classification method, device, equipment and medium
CN113434792B (en) * 2021-07-20 2023-07-18 北京百度网讯科技有限公司 Training method of network address matching model and network address matching method
CN113553487B (en) * 2021-07-28 2024-04-09 恒安嘉新(北京)科技股份公司 Method and device for detecting website type, electronic equipment and storage medium
CN113608946B (en) * 2021-08-10 2023-09-12 国家计算机网络与信息安全管理中心 Machine behavior recognition method based on feature engineering and representation learning
CN115051845A (en) * 2022-06-08 2022-09-13 北京启明星辰信息安全技术有限公司 Suspicious traffic identification method, device, equipment and storage medium
CN114866342B (en) * 2022-06-30 2023-01-17 广东睿江云计算股份有限公司 Flow characteristic identification method and device, computer equipment and storage medium
CN115550021A (en) * 2022-09-26 2022-12-30 东华理工大学 Method and system for accurately replicating network space in big data environment and storage medium
CN115834190B (en) * 2022-11-22 2024-04-09 中国联合网络通信集团有限公司 Host management and control method, device, equipment and storage medium
CN116723051B (en) * 2023-08-07 2023-10-27 北京安天网络安全技术有限公司 Domain name information generation method, device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107666490A (en) * 2017-10-18 2018-02-06 中国联合网络通信集团有限公司 A kind of suspicious domain name detection method and device
CN108270761A (en) * 2017-01-03 2018-07-10 中国移动通信有限公司研究院 A kind of domain name legitimacy detection method and device
CN109391602A (en) * 2017-08-11 2019-02-26 北京金睛云华科技有限公司 A kind of zombie host detection method
CN110138758A (en) * 2019-05-05 2019-08-16 哈尔滨英赛克信息技术有限公司 Mistake based on domain name vocabulary plants domain name detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11374897B2 (en) * 2018-01-15 2022-06-28 Shenzhen Leagsoft Technology Co., Ltd. CandC domain name analysis-based botnet detection method, device, apparatus and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108270761A (en) * 2017-01-03 2018-07-10 中国移动通信有限公司研究院 A kind of domain name legitimacy detection method and device
CN109391602A (en) * 2017-08-11 2019-02-26 北京金睛云华科技有限公司 A kind of zombie host detection method
CN107666490A (en) * 2017-10-18 2018-02-06 中国联合网络通信集团有限公司 A kind of suspicious domain name detection method and device
CN110138758A (en) * 2019-05-05 2019-08-16 哈尔滨英赛克信息技术有限公司 Mistake based on domain name vocabulary plants domain name detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于网络流量的Fast-Flux僵尸网络域名检测方法;谷勇浩等;《信息安全研究》;20200531;第6卷(第5期);第3节 *

Also Published As

Publication number Publication date
CN112866023A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN110233849B (en) Method and system for analyzing network security situation
Zhu et al. OFS-NN: an effective phishing websites detection model based on optimal feature selection and neural network
Uwagbole et al. Applied machine learning predictive analytics to SQL injection attack detection and prevention
CN110099059B (en) Domain name identification method and device and storage medium
CN111027069B (en) Malicious software family detection method, storage medium and computing device
CN110602029B (en) Method and system for identifying network attack
CN108920954B (en) Automatic malicious code detection platform and method
US11212297B2 (en) Access classification device, access classification method, and recording medium
CN107888606B (en) Domain name credit assessment method and system
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN107612911B (en) Method for detecting infected host and C & C server based on DNS traffic
CN113098887A (en) Phishing website detection method based on website joint characteristics
CN113179260B (en) Botnet detection method, device, equipment and medium
CN111368289B (en) Malicious software detection method and device
CN112929390A (en) Network intelligent monitoring method based on multi-strategy fusion
CN113704328A (en) User behavior big data mining method and system based on artificial intelligence
Nowroozi et al. An adversarial attack analysis on malicious advertisement url detection framework
Thakur et al. An intelligent algorithmically generated domain detection system
CN114024761B (en) Network threat data detection method and device, storage medium and electronic equipment
Hwang et al. Semi-supervised based unknown attack detection in EDR environment
CN110958244A (en) Method and device for detecting counterfeit domain name based on deep learning
CN112583827A (en) Data leakage detection method and device
US9323987B2 (en) Apparatus and method for detecting forgery/falsification of homepage
CN113691489A (en) Malicious domain name detection feature processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant