CN113794731A

CN113794731A - Method, device, equipment and medium for identifying disguised attack based on CDN flow

Info

Publication number: CN113794731A
Application number: CN202111095827.1A
Authority: CN
Inventors: 刘赫德; 祝萍; 莫思敏; 吴昊宇
Original assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Current assignee: Industrial and Commercial Bank of China Ltd ICBC; ICBC Technology Co Ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2021-12-14
Anticipated expiration: 2041-09-17
Also published as: CN113794731B

Abstract

The disclosure provides a method, a device, equipment and a medium for identifying disguised attacks based on CDN flow, and relates to the technical field of information security or the financial field and the like. The method comprises the steps of collecting CDN flow data at a network node in real time, executing grouping processing on the CDN flow data and obtaining DNS flow data from a grouping processing result; preprocessing DNS flow data to obtain characteristic information and obtain a characteristic value based on the characteristic information; inputting the characteristic value into a classification model, and outputting whether malicious attack exists in CDN flow data; and if the output CND flow data has malicious attacks, blocking the CDN flow. According to the method, the CDN flow is subjected to feature extraction, the extracted feature values of the features are input into a classification model established in advance through machine learning, the classification model is used for rapidly identifying the disguised attack behavior of the CDN flow, and malicious attacks are actively discovered.

Description

Method, device, equipment and medium for identifying disguised attack based on CDN flow

Technical Field

The present disclosure relates to the field of information security technologies, and more particularly, to a method, an apparatus, a device, a medium, and a program product for identifying a spoofing attack based on CDN traffic.

Background

Under the current network situation, in the process of penetration test or hacking scene, the CDN is often adopted to hide the real attack IP. When a hacker controls a target, in order to conceal the hacker, the hacker rarely directly uses a real IP to remotely control the target, but configures a CNAME (content delivery network) through a CDN acceleration technology to realize detection of a domain pre-positioned technology for avoiding an intrusion detection system. The CDN has the characteristic of well hiding the real IP of a website, so that the hiding performance of the attack behavior of a hacker is high, and the attack behavior is very difficult to discover.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a method, apparatus, device, medium, and program product for identifying a CDN-based traffic masquerading attack.

According to a first aspect of the present disclosure, a method for identifying a spoofing attack based on CDN traffic is provided, including acquiring CDN traffic data at a network node in real time, performing packet processing on the CDN traffic data, and acquiring DNS traffic data from a result of the packet processing; preprocessing the DNS traffic data to obtain characteristic information and obtain a characteristic value based on the characteristic information; inputting the characteristic value into a classification model, and outputting whether malicious attack exists in the CDN flow data; and if the output CND flow data has malicious attacks, blocking the CDN flow.

According to an embodiment of the present disclosure, the preprocessing the DNS traffic data to obtain feature information includes: expanding domain name information of CDN flow data based on open source information and collected historical CDN flow data; and determining the access behavior characteristics of the DNS traffic data in the CDN traffic data based on the expanded domain name information.

According to the embodiment of the disclosure, the expanded domain name information includes a white list mark, domain name record information, CND attribution, resolution IP, and resolution times.

According to the embodiment of the disclosure, the access behavior characteristics include access frequency, access time, access duration, source IP, destination IP, banner information, and certificate information.

According to the embodiment of the present disclosure, the method for disguising attacks based on CDN traffic further includes: creating a blacklist library; and if the output CND flow data has malicious attacks, recording the target IP of the CDN flow data into the blacklist library after the CDN flow is blocked.

According to an embodiment of the present disclosure, before the inputting the feature value into a classification model, further includes: and matching the target IP information in the characteristic information with the IP information in the blacklist library, and blocking the CDN flow if the target IP information in the characteristic information exists in the blacklist library.

According to an embodiment of the present disclosure, the acquiring CDN traffic data at a network node in real time includes: port mirror image collection is adopted for CDN flow at the network node; or collecting CDN flow at the network node by adopting an optical splitter.

According to an embodiment of the present disclosure, the performing packet processing on the CDN traffic data and obtaining DNS traffic data from a result of the packet processing includes: and acquiring the DNS flow data from the CND flow according to a DNS protocol of an application layer.

A second aspect of the present disclosure provides an apparatus for masquerading an attack based on identifying CDN traffic, including: the system comprises a flow acquisition module, a flow processing module and a flow management module, wherein the flow acquisition module is used for acquiring CDN flow data at a network node in real time, executing grouping processing on the CDN flow data and acquiring DNS flow data from a grouping processing result; the first feature extraction module is used for preprocessing the DNS traffic data to acquire feature information and acquiring a feature value based on the feature information; the flow judgment module is used for inputting the characteristic value into a classification model and outputting whether malicious attack exists in the CDN flow data; and the blocking module is used for blocking the CDN flow if the output CND flow data has malicious attacks.

According to the embodiment of the disclosure, the system further includes a blacklist module, configured to create a blacklist library, and record a destination IP of the CDN traffic data into the blacklist library after blocking the CDN traffic if the output CND traffic data has a malicious attack.

A third aspect of the present disclosure provides a method of training a classification model, comprising: clustering historically acquired CDN flow data based on an unsupervised clustering algorithm, wherein the clustering result comprises two types of normal access and malicious attack; carrying out feature extraction on historically acquired CDN flow data and acquiring a feature value; and taking the characteristic value as the input of the classification model, and training the classification model based on the output of the classification model and the clustering result.

A fourth aspect of the present disclosure provides a training classification model, comprising:

the traffic label classification module is used for clustering historically acquired CDN traffic data based on an unsupervised clustering algorithm, and the clustering result comprises two types of normal access and malicious attack;

the second feature extraction module is used for performing feature extraction on historically acquired CDN flow data and acquiring a feature value;

and the model training module is used for taking the characteristic value as the input of the classification model and training the classification model based on the output of the classification model and the clustering result.

A fifth aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method for spoofing attacks based on CDN traffic as described above.

A sixth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above method of identifying CDN-based traffic masquerading attacks.

A seventh aspect of the present disclosure also provides a computer program product including a computer program, which when executed by a processor, implements the above method for identifying a CDN-based traffic masquerading attack.

According to the method for identifying the disguise attack based on the CDN flow, the CDN flow data at the network node are collected in real time, the packet processing is carried out on the CDN flow data, and DNS flow data are obtained from the result of the packet processing; preprocessing DNS flow data to obtain characteristic information and obtain a characteristic value based on the characteristic information; inputting the characteristic value into a classification model, and outputting whether malicious attack exists in CDN flow data; and if the output CND flow data has malicious attacks, blocking the CDN flow. In the embodiment of the disclosure, the CDN flow of the hidden IP is subjected to feature extraction, the extracted feature value of the feature is input into a classification model established in advance through machine learning, the classification model is utilized to rapidly identify the disguised attack behavior of the CDN flow, and malicious attack is actively discovered.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium, and program product for identifying CDN traffic based masquerading attacks in accordance with embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of identifying a CDN traffic based masquerading attack according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow diagram of another embodiment of a method of identifying a CDN traffic based masquerading attack according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram of another embodiment of a method of identifying a CDN traffic based masquerading attack according to an embodiment of the present disclosure;

FIG. 5 schematically shows a block diagram of an apparatus for identifying a CDN traffic based masquerading attack according to an embodiment of the present disclosure;

fig. 6 schematically illustrates a block diagram of another embodiment of an apparatus for identifying CDN traffic-based masquerading attacks, according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow diagram of a method of training a classification model according to an embodiment of the present disclosure;

FIG. 8 is a block diagram schematically illustrating an apparatus for training a classification model according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of a computer system suitable for implementing a method of identifying CDN traffic based masquerading attacks in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The detailed background may include other technical problems than those which are exclusively addressed.

The embodiment of the disclosure provides a method for identifying a disguise attack based on CDN flow, which comprises the steps of collecting CDN flow data at a network node in real time, executing packet processing on the CDN flow data and obtaining DNS flow data from the result of the packet processing; preprocessing DNS flow data to obtain characteristic information and obtain a characteristic value based on the characteristic information; inputting the characteristic value into a classification model, and outputting whether malicious attack exists in CDN flow data; and if the output CND flow data has malicious attacks, blocking the CDN flow.

It should be noted that the method for identifying the disguise attack based on the CDN traffic provided in the embodiment of the present disclosure may be used in the aspects related to the transmission of the micro-service data by the big data and the distributed technology, and may also be used in various fields other than the big data and the distributed technology, such as the financial field and the like.

It should be noted that, the CDN in the embodiment of the present disclosure is called a Content Delivery Network, that is, a Content Delivery Network. The CDN is an intelligent virtual network constructed on the basis of the existing network, and by means of edge servers deployed in various places and functional modules of load balancing, content distribution, scheduling and the like of a central platform, a user can obtain required content nearby, network congestion is reduced, and the access response speed and hit rate of the user are improved. In a scenario, based on a content delivery network, a CAME (one of DNS resolution records that allow mapping of multiple names to the same computer, for example, CNNAME of accessing www.baidu.com is www.a.shifen.com) is configured to implement domain pre-positioning technology, so as to hide a real IP and further avoid an intrusion detection system, thereby implementing a network attack on a target user.

Fig. 1 schematically shows an application scenario diagram of a method for identifying a CDN traffic-based masquerading attack according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) that sends network traffic to the

terminal devices

101, 102, and 103 to perform feature extraction, and inputs the result of the feature extraction into a classification model to identify whether there is a malicious attack in the network traffic. The background management server may form a processing result of the network traffic, and feed back the processing result (e.g., a webpage, information, or data obtained or generated according to a user request) to the terminal device.

It should be noted that the method for identifying a spoofing attack based on CDN traffic provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the device for masquerading attacks based on CDN traffic provided by the embodiments of the present disclosure may be generally disposed in the server 105. The method for masquerading attacks based on CDN traffic provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the apparatus for masquerading attack based on CDN traffic provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The method for identifying the CDN-based traffic masquerading attack according to the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 4 based on the scenario described in fig. 1.

Fig. 2 schematically shows a flowchart of a method of masquerading attacks based on CDN traffic according to an embodiment of the present disclosure.

As shown in fig. 2, the method for identifying a spoofing attack based on CDN traffic of the embodiment includes operations S210 to S240.

In operation S210, CDN traffic data at a network node is collected in real time, packet processing is performed on the CDN traffic data, and DNS traffic data is obtained from a result of the packet processing.

The method for acquiring the CDN flow data at the network node in real time in the embodiment of the disclosure comprises the step of carrying out mirror image processing on the CDN flow at the network node. Specifically, for a common network user, when joining a website served by the CDN, the corresponding user may be configured to a nearby node through the control of the global load balancing system DNS, and the node is equivalent to a WEB placed around the user. When a user sends an access request, port mirroring is carried out on CDN flow generated by a request result responded at the node or CDN flow data is acquired in a light splitter mode. It should be noted that, when CDN traffic data is accelerated by using a CDN, network acceleration may generate a data usage amount, and the data usage amount in a certain period is counted up when the data usage amount reaches the certain period.

It can be understood that the obtained CDN traffic includes a large amount of daily network traffic, such as access content information. In order to find out the hidden real IP in the CDN traffic, it is necessary to obtain DNS resolution traffic in the CND volume and analyze the DNS resolution traffic. Based on the above, in the embodiment of the present disclosure, the CDN traffic is grouped by using the DNS protocol of the application layer, and the CDN traffic is divided into DNS resolution traffic and other daily network traffic.

In operation S220, the DNS traffic data is preprocessed to obtain feature information, and a feature value is obtained based on the feature information.

The preprocessing in the embodiment of the disclosure is to extend the domain name information of the CDN flow data based on the open source intelligence information and the collected historical CDN flow data, that is, to extend the domain name information in the current CND flow by taking the domain name information in the historically collected CND flow data as a reference and combining the already disclosed open source intelligence information. Specifically, the related information in the open source information and the historical data can be obtained by adopting a similar field query and acquisition mode. The expanded domain name information basically comprises a white list mark, domain name record information, CND attribution, IP resolution and resolution times.

In the embodiment of the present disclosure, access behavior characteristics of the DNS traffic data are determined in the CDN traffic data, that is, other network traffic data in the CDN traffic data based on the expanded domain name information, where the access behavior characteristics include access frequency, access time, access duration, source IP, destination IP, banner information, and certificate information, and the access behavior is statistically calculated to obtain a characteristic value.

In operation S230, the feature value is input into a classification model, and whether malicious attack exists in the CDN traffic data is output.

The classification model in the embodiment of the disclosure is obtained by training based on a machine learning method, and the CDN flow can be classified into two types, namely normal access and malicious attack by inputting the characteristic values into the classification model.

In operation S240, if the output CND traffic data has a malicious attack, the CDN traffic is blocked, and a user is prevented from accessing the CDN traffic.

It can be understood that the inaccessible page prompt can also be popped up on the terminal equipment of the user, or the alarm information can be displayed on the inaccessible page.

It can be understood that, in order to ensure the accuracy of detection, intermittent multiple detections may also be performed on the same CND traffic data, and in multiple detection results at different time periods, if more than half of the results are malicious attacks, the CDN traffic in the above is defined as the presence of the malicious attacks.

Fig. 3 schematically illustrates a flowchart of another embodiment of a method of identifying a CDN traffic-based masquerading attack, according to an embodiment of the present disclosure. Embodiments of the present disclosure are made on the basis of the embodiment illustrated in fig. 2.

As shown in fig. 3, the method for identifying a spoofing attack based on CDN traffic of the embodiment includes operations S250 to S260.

In operation S250, a blacklist library is created.

The blacklist library in the embodiment of the disclosure may be a small database, and the IP information with malicious attack behavior obtained from the open source information and the IP information with malicious attack behavior detected from the historical traffic data may be stored in the blacklist library.

In operation S260, if the output CND traffic data has a malicious attack, the CDN traffic is blocked, and then the destination IP of the CDN traffic data is recorded in the blacklist repository, so that the CDN traffic having the CDN traffic is directly blocked when such IP appears again. Specifically, as shown in operation S270 in fig. 4, the target IP information in the feature information is matched with the IP information in the blacklist library, and if the target IP information in the feature information exists in the blacklist library, the CDN traffic is blocked.

Based on the method for identifying the disguise attack based on the CDN flow, the disclosure also provides a device for identifying the disguise attack based on the CDN flow. The apparatus will be described in detail below with reference to fig. 5.

Fig. 5 schematically shows a block diagram of an apparatus for identifying a spoofing attack based on CDN traffic according to an embodiment of the present disclosure.

As shown in fig. 5, the flow rate acquiring module 300 of this embodiment includes a flow rate acquiring module 301, a first feature extracting module 302, a flow rate determining module 303, and a blocking module 304.

A traffic obtaining module 301, configured to collect CDN traffic data at a network node in real time, perform packet processing on the CDN traffic data, and obtain DNS traffic data from a result of the packet processing, where the traffic obtaining module is adapted to perform step S210 in the foregoing;

a first feature extraction module 302, configured to pre-process the DNS traffic data to obtain feature information and obtain a feature value based on the feature information, and is adapted to perform step 220 in the foregoing;

a traffic determination module 303, configured to input the feature value into a classification model, and output whether malicious attack exists in the CDN traffic data, and is adapted to execute step 230 in the foregoing;

a blocking module 304, configured to block the CDN traffic if the output CND traffic data has a malicious attack, and is adapted to execute step 240 in the foregoing.

As shown in fig. 6, an embodiment according to the present disclosure further includes: a blacklist module 305, configured to create a blacklist library, and record a destination IP of the CDN traffic data into the blacklist library after blocking the CDN traffic if the output CND traffic data has a malicious attack.

In the prior art, the identification of attack behaviors or malicious IP is mainly carried out by deploying an Intrusion Detection System (IDS). Intrusion detection systems are primarily based on threat intelligence to identify malicious IPs. Whether the attack behavior is caused or not is judged by manually checking the flow packet, whether the host is remotely controlled by the Trojan horse is judged by a tracing method, the IP is placed in a blacklist of a threat information library after the confirmation is found, and therefore the intrusion detection system can identify the next time.

The method for identifying the disguise attack based on the CDN flow carries out grouping processing on the CDN flow data by acquiring the CDN flow data at a network node in real time and obtains DNS flow data from a grouping processing result; preprocessing DNS flow data to obtain characteristic information and obtain a characteristic value based on the characteristic information; inputting the characteristic value into a classification model, and outputting whether malicious attack exists in CDN flow data; and if the output CND flow data has malicious attacks, blocking the CDN flow. In the embodiment of the disclosure, the CDN flow of the hidden IP is subjected to feature extraction, the extracted feature value of the feature is input into a classification model established in advance through machine learning, the classification model is utilized to rapidly identify the disguised attack behavior of the CDN flow, and malicious attack is actively discovered.

Compared with the prior art, the method disclosed by the invention adopts a machine learning method for identifying the malicious attack in the CND flow, the accuracy can be continuously improved, and the problems of time consumption and labor consumption caused by manually checking the flow packet are avoided.

According to the embodiment of the present disclosure, any multiple modules of the traffic obtaining module 301, the first feature extracting module 302, the traffic determining module 303, and the blocking module 304 may be combined into one module to be implemented, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to the embodiment of the present disclosure, at least one of the flow obtaining module 301, the first feature extracting module 302, the flow determining module 303, and the blocking module 304 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementation manners of software, hardware, and firmware, or implemented by any suitable combination of any several of them. Alternatively, at least one of the flow obtaining module 301, the first feature extracting module 302, the flow judging module 303 and the blocking module 304 may be at least partially implemented as a computer program module, which may perform a corresponding function when the computer program module is executed.

FIG. 7 schematically shows a flow diagram of a method of training a classification model according to an embodiment of the present disclosure.

As shown in fig. 7, the method of training the classification model of this embodiment includes operations S280 to S2100.

In operation S280, clustering historically acquired CDN flow data based on an unsupervised clustering algorithm, where a result of the clustering includes two types, namely normal access and malicious attack;

in the embodiment of the disclosure, a k-means algorithm in an unsupervised algorithm can be adopted to perform clustering processing on a plurality of historically acquired CND flow data packets, so as to realize identification of the historically acquired CND flow data.

In operation S290, performing feature extraction on the historically acquired CDN flow data and obtaining a feature value;

in the embodiment of the disclosure, the characteristic value obtaining process includes preprocessing DNS traffic data to obtain characteristic information and obtaining a characteristic value based on the characteristic information. The preprocessing operation is to expand the domain name information of the CDN flow data based on the open source intelligence information and other acquired historical CDN flow data, that is, to expand the domain name information in the current CND flow with the domain name information in the CND flow data acquired historically thereof as a reference in combination with the open source intelligence information that has been disclosed. Specifically, the related information in the open source information and the historical data can be obtained by adopting a similar field query and acquisition mode. The expanded domain name information basically comprises a white list mark, domain name record information, CND attribution, IP resolution and resolution times. In the embodiment of the disclosure, access behavior characteristics of the DNS traffic data are determined in CDN traffic data, that is, other network traffic data in the CDN traffic data based on the expanded domain name information, where the access behavior characteristics include access frequency, access time, access duration, source IP, destination IP, banner information, and certificate information, and the access behavior is statistically calculated to obtain a characteristic value.

In operation S2100, the feature values are used as inputs of the classification model, and the classification model is trained based on an output of the classification model and a result of the clustering.

In the embodiment of the present disclosure, the output result of the classification model is compared with the identification result in operation S280, and whether the convergence condition of the loss function is met is determined according to the comparison result, and when the convergence condition of the loss function is met, the classification model training is completed.

Fig. 8 schematically shows a block diagram of an apparatus for training a classification model according to an embodiment of the present disclosure.

As shown in fig. 8, 500 of this embodiment includes a traffic label classification module 501, a second feature extraction module 502, and a model training module 503.

A traffic label classification module 501, configured to cluster historically acquired CDN traffic data based on an unsupervised clustering algorithm, where a result of the clustering includes two types, namely normal access and malicious attack, and is adapted to execute step S280 in the foregoing;

a second feature extraction module 502, configured to perform feature extraction on historically acquired CDN flow data and obtain a feature value, and is adapted to perform step 290 in the foregoing;

a flow determination module 503, configured to use the feature value as an input of the classification model, train the classification model based on the output of the classification model and the clustering result, and is adapted to perform step 2100 in the foregoing.

According to the method for training the classification model, historically collected CDN flow data are clustered based on an unsupervised clustering algorithm, and the clustering result comprises two types of normal access and malicious attack; carrying out feature extraction on historically acquired CDN flow data and acquiring a feature value; and taking the characteristic value as the input of the classification model, and training the classification model based on the output of the classification model and the clustering result. In the embodiment of the disclosure, a model capable of automatically identifying whether malicious attacks are hidden in CDN flow can be trained through a machine learning method.

According to the embodiment of the present disclosure, any multiple modules of the traffic label classification module 501, the second feature extraction module 502, and the model training module 503 may be combined and implemented in one module, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the traffic signature classification module 501, the second feature extraction module 502, and the model training module 503 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or may be implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of these. Alternatively, at least one of the flow label classification module 501, the second feature extraction module 502, the model training module 503 may be at least partially implemented as a computer program module, which when executed, may perform corresponding functions.

Fig. 9 schematically illustrates a block diagram of an electronic device suitable for implementing a method of identifying CDN-based traffic masquerading attacks in accordance with an embodiment of the present disclosure.

As shown in fig. 9, an electronic device 400 according to an embodiment of the present disclosure includes a processor 401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. Processor 401 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 401 may also include onboard memory for caching purposes. Processor 401 may include a single processing unit or multiple processing units for performing the different actions of the method flows in accordance with embodiments of the present disclosure.

In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are stored. The processor 401, ROM 402 and RAM 403 are connected to each other by a bus 404. The processor 401 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 402 and/or the RAM 403. Note that the programs may also be stored in one or more memories other than the ROM 402 and RAM 403. The processor 401 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, electronic device 400 may also include an input/output (I/O) interface 405, input/output (I/O) interface 405 also being connected to bus 404. Electronic device 400 may also include one or more of the following components connected to I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 405 including a network interface card such as a LAN card, a modem, or the like. The communication section 405 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM 402 and/or RAM 403 and/or one or more memories other than ROM 402 and RAM 403 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the item recommendation method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 401. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 405, and/or installed from the removable medium 411. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 405 and/or installed from the removable medium 411. The computer program, when executed by the processor 401, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method for identifying a disguised attack based on CDN traffic is characterized by comprising the following steps:

the method comprises the steps of collecting CDN flow data at a network node in real time, executing grouping processing on the CDN flow data and obtaining DNS flow data from a grouping processing result;

preprocessing the DNS traffic data to obtain characteristic information and obtain a characteristic value based on the characteristic information;

inputting the characteristic value into a classification model, and outputting whether malicious attack exists in the CDN flow data;

and if the output CND flow data has malicious attacks, blocking the CDN flow.

2. The method of claim 1, wherein the preprocessing the DNS traffic data to obtain feature information comprises:

expanding domain name information of CDN flow data based on open source information and collected historical CDN flow data;

and determining the access behavior characteristics of the DNS traffic data in the CDN traffic data based on the expanded domain name information.

3. The method of identifying a disguise attack based on CDN traffic as recited in claim 2, wherein the expanded domain name information includes a white list flag, domain name filing information, CND attribution, resolution IP, and resolution times.

4. The method of identifying CDN traffic-based masquerading attacks as recited in claim 2, wherein the access behavior characteristics include access frequency, access time, access duration, source IP, destination IP, banner information, and credential information.

5. The method of identifying a CDN traffic-based masquerading attack as recited in claim 1, further comprising:

creating a blacklist library;

and if the output CND flow data has malicious attacks, recording the target IP of the CDN flow data into the blacklist library after the CDN flow is blocked.

6. The method of identifying CDN traffic-based masquerading attacks according to claim 5, further comprising, prior to said entering said feature values into a classification model:

and matching the target IP information in the characteristic information with the target IP information in the blacklist library, and blocking the CDN flow if the IP information in the characteristic information exists in the blacklist library.

7. The method of claim 1, wherein the collecting CDN traffic data at a network node in real-time comprises:

port mirror image collection is adopted for CDN flow at the network node;

or collecting CDN flow at the network node by adopting an optical splitter.

8. The method of claim 1, wherein the performing packet processing on the CDN traffic data and obtaining DNS traffic data from a result of the packet processing comprises:

and acquiring the DNS flow data from the CND flow according to a DNS protocol of an application layer.

9. A device for disguising attacks based on the recognition of CDN traffic is characterized by comprising the following steps:

the system comprises a flow acquisition module, a flow processing module and a flow management module, wherein the flow acquisition module is used for acquiring CDN flow data at a network node in real time, executing grouping processing on the CDN flow data and acquiring DNS flow data from a grouping processing result;

the first feature extraction module is used for preprocessing the DNS traffic data to acquire feature information and acquiring a feature value based on the feature information;

the flow judgment module is used for inputting the characteristic value into a classification model and outputting whether malicious attack exists in the CDN flow data;

and the blocking module is used for blocking the CDN flow if the output CND flow data has malicious attacks.

10. The apparatus for masquerading attack on the basis of identifying CDN traffic as recited in claim 1, further comprising:

and the blacklist module is used for creating a blacklist library, and recording a destination IP (Internet protocol) of the CDN traffic data into the blacklist library after the CDN traffic is blocked if the output CND traffic data has malicious attacks.

11. A method of training a classification model, comprising:

clustering historically acquired CDN flow data based on an unsupervised clustering algorithm, wherein the clustering result comprises two types of normal access and malicious attack;

carrying out feature extraction on historically acquired CDN flow data and acquiring a feature value;

and taking the characteristic value as the input of the classification model, and training the classification model based on the output of the classification model and the clustering result.

12. An apparatus for training a classification model, comprising:

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.

14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 8.

15. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 8.