CN113901976A - Malicious traffic identification method and device and electronic equipment - Google Patents

Malicious traffic identification method and device and electronic equipment Download PDF

Info

Publication number
CN113901976A
CN113901976A CN202010574518.1A CN202010574518A CN113901976A CN 113901976 A CN113901976 A CN 113901976A CN 202010574518 A CN202010574518 A CN 202010574518A CN 113901976 A CN113901976 A CN 113901976A
Authority
CN
China
Prior art keywords
metadata
flow
traffic
image
electrocardiogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010574518.1A
Other languages
Chinese (zh)
Inventor
宋冰晶
刘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guancheng Technology Co ltd
Original Assignee
Beijing Guancheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guancheng Technology Co ltd filed Critical Beijing Guancheng Technology Co ltd
Priority to CN202010574518.1A priority Critical patent/CN113901976A/en
Publication of CN113901976A publication Critical patent/CN113901976A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Abstract

The invention provides a method and a device for identifying malicious traffic and electronic equipment, wherein the method comprises the following steps: acquiring target traffic to be processed, and determining attribute parameters of each piece of metadata in the target traffic; generating image elements according to the attribute parameters of the metadata, and sequentially arranging a plurality of image elements according to the target flow to generate a single-flow electrocardiogram corresponding to the target flow; and classifying the single-flow electrocardiogram according to the image classification model to determine whether the target flow is malicious flow. By the method, the device and the electronic equipment for identifying the malicious traffic, provided by the embodiment of the invention, the target traffic is converted into the single-flow electrocardiogram, and the single-flow electrocardiogram can represent the behavior characteristics of communication activities in a time dimension, so that abnormal behaviors can be easily extracted during image classification, the malicious traffic can be accurately determined, and the accuracy of traffic identification can be improved; and the method can be executed under the conditions of no certificate, no domain name and the like, and has strong applicability.

Description

Malicious traffic identification method and device and electronic equipment
Technical Field
The invention relates to the technical field of traffic identification, in particular to a method and a device for identifying malicious traffic, electronic equipment and a computer readable storage medium.
Background
Malicious traffic refers to computer network traffic that an attacker constructs specifically for attacking a particular target, typically generated by a malicious program and propagated through the network. The method is an important task of network security, and is used for accurately and timely identifying malicious traffic and taking emergency treatment measures. Some research achievements exist for identifying encrypted malicious traffic, for example, a plurality of supervised machine learning methods are used to detect malicious encrypted traffic, which are respectively detected from the aspects of certificates, handshakes, domain names, and the like, and a weighted average is performed on the prediction score of each model to determine whether the encrypted traffic is malicious.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the existing scheme: for the conditions of no domain name and no certificate, the above partial models cannot be applied, and the detection effect cannot be guaranteed.
Disclosure of Invention
In order to solve the existing technical problem, embodiments of the present invention provide a method and an apparatus for identifying malicious traffic, an electronic device, and a computer-readable storage medium.
In a first aspect, an embodiment of the present invention provides a method for identifying malicious traffic, including:
acquiring target traffic to be processed, and determining attribute parameters of each piece of metadata in the target traffic, wherein the attribute parameters comprise at least two of message length, message type, message sequence and interaction direction;
generating image elements corresponding to the metadata according to the attribute parameters of the metadata, and sequentially arranging a plurality of image elements of the target flow to generate a single-flow electrocardiogram corresponding to the target flow;
and classifying the single-flow electrocardiogram according to a preset image classification model to determine whether the target flow is malicious flow.
In a second aspect, an embodiment of the present invention further provides an apparatus for identifying malicious traffic, including:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring target traffic to be processed and determining attribute parameters of each piece of metadata in the target traffic, and the attribute parameters comprise at least two of message length, message type, message sequence and interaction direction;
the image generation module is used for generating image elements corresponding to the metadata according to the attribute parameters of the metadata and sequentially arranging the image elements according to the target flow to generate a single-flow electrocardiogram corresponding to the target flow;
and the classification module is used for classifying the single-flow electrocardiogram according to a preset image classification model and determining whether the target flow is malicious flow.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor, where the transceiver, the memory, and the processor are connected via the bus, and when the computer program is executed by the processor, the method for identifying malicious traffic as any one of the above-mentioned steps is implemented.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method for identifying malicious traffic described in any one of the above.
According to the malicious traffic identification method, the malicious traffic identification device, the malicious traffic identification electronic equipment and the computer readable storage medium, the attribute parameters of the metadata in the target traffic are extracted, the target traffic is converted into the single-flow electrocardiogram containing a plurality of image elements based on the attribute parameters, then the single-flow electrocardiogram is classified based on the image classification technology, and whether the target traffic is encrypted traffic can be determined. In the embodiment, the target flow can be converted into a single-flow electrocardiogram based on the attribute parameters of the metadata, so that image construction is realized; the single-flow electrocardiogram can represent the behavior characteristics of communication activities in a time dimension, so that abnormal behaviors can be easily extracted during image classification, malicious flow can be accurately determined, and the accuracy of flow identification can be improved; and the method can be executed under the conditions of no certificate, no domain name and the like, and has strong applicability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present invention, the drawings required to be used in the embodiments or the background art of the present invention will be described below.
Fig. 1 is a flowchart illustrating a method for identifying malicious traffic according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a single-flow electrocardiogram in the method for identifying malicious traffic according to the embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating an apparatus for identifying malicious traffic according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of an electronic device for performing a method for identifying malicious traffic according to an embodiment of the present invention.
Detailed Description
In the description of the embodiments of the present invention, it should be apparent to those skilled in the art that the embodiments of the present invention can be embodied as methods, apparatuses, electronic devices, and computer-readable storage media. Thus, embodiments of the invention may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), a combination of hardware and software. Furthermore, in some embodiments, embodiments of the invention may also be embodied in the form of a computer program product in one or more computer-readable storage media having computer program code embodied in the medium.
The computer-readable storage media described above may take any combination of one or more computer-readable storage media. The computer-readable storage medium includes: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium include: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only Memory (ROM), an erasable programmable read-only Memory (EPROM), a Flash Memory, an optical fiber, a compact disc read-only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any combination thereof. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, device, or apparatus.
The computer program code embodied on the computer readable storage medium may be transmitted using any appropriate medium, including: wireless, wire, fiber optic cable, Radio Frequency (RF), or any suitable combination thereof.
Computer program code for carrying out operations for embodiments of the present invention may be written in assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or in one or more programming languages, including an object oriented programming language, such as: java, Smalltalk, C + +, and also include conventional procedural programming languages, such as: c or a similar programming language. The computer program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be over any of a variety of networks, including: a Local Area Network (LAN) or a Wide Area Network (WAN), which may be connected to the user's computer, may be connected to an external computer.
The method, the device and the electronic equipment are described through the flow chart and/or the block diagram.
It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions. These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner. Thus, the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The embodiments of the present invention will be described below with reference to the drawings.
Fig. 1 shows a flowchart of a method for identifying malicious traffic according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101: obtaining target flow to be processed, and determining attribute parameters of each piece of metadata in the target flow, wherein the attribute parameters comprise at least two items of message length, message type, message sequence and interaction direction.
In the embodiment of the present invention, the target traffic is network traffic that needs to be identified as malicious traffic, and the target traffic may specifically be TLS (Transport Layer Security) encrypted traffic. Meanwhile, the target traffic is traffic generated in the interaction process of both parties (such as the client and the server), and corresponding metadata is generated each time data is sent in both aspects, that is, the metadata in this embodiment refers to a single piece of data generated in the interaction process. Taking TLS encrypted traffic as an example, multiple messages such as Client Hello (Client handshake message), Server Hello (Server handshake message), Certificate message, and Application Data exist in the interaction process, and each message corresponds to one piece of metadata.
Meanwhile, each piece of metadata has corresponding attribute parameters, and the attribute parameters may specifically include message length, message type, message sequence, interaction direction, and the like. The message length refers to the length of the message corresponding to the metadata. The message type is a type to which a message corresponding to the metadata belongs, and the message types include a Client Hello, a Server Hello, a Certificate, and the like. For example, if the target traffic sequentially includes a Client Hello, a Server Hello, and a Certificate, the message sequence of the Client Hello may be 1, the message sequence of the Server Hello may be 2, and the message sequence of the Certificate may be 3. The interaction direction is an uplink direction and a downlink direction of the metadata, for example, if the metadata is a message sent from the client to the server, the interaction direction is the uplink direction, and if the metadata is a message sent from the server to the client, the interaction direction is the downlink direction. In this embodiment, if the target traffic is TLS encrypted traffic, the above-mentioned attribute parameters are recorded in the TLS recording layer, and the required attribute parameters can be obtained from the TLS recording layer.
Step 102: and generating image elements corresponding to the metadata according to the attribute parameters of the metadata, and sequentially arranging a plurality of image elements of the target flow to generate a single-flow electrocardiogram corresponding to the target flow.
In the embodiment of the invention, the corresponding image element is generated based on the attribute parameters of the metadata, the image element is essentially an image, so that the metadata in the text form can be converted into the image form, and the converted image element can contain the characteristics corresponding to the attribute parameters. Meanwhile, the target flow rate may include a plurality of metadata, that is, the target flow rate may correspond to a plurality of image elements, and then an image corresponding to the entire target flow rate, that is, a single-flow electrocardiogram may be generated based on the plurality of image elements. When the attribute parameters include a message sequence, the plurality of image elements may be sequentially arranged based on the message sequence, so as to generate a corresponding single-flow electrocardiogram.
In the embodiment of the present invention, since different image elements correspond to different attribute parameters, image elements of different metadata are generally different, and the variation trend of the image elements can be integrally represented by a plurality of different image elements, that is, the image corresponding to the target flow rate can integrally represent the variation trend of the image elements, which is similar to an electrocardiogram. The traditional electrocardiogram can reflect the behavior characteristics of the heart activity in the time dimension, and similarly, the single-flow electrocardiogram in the embodiment can reflect the behavior characteristics of the communication activity in the time dimension, so that the normal communication or the malicious communication can be judged.
Step 103: and classifying the single-flow electrocardiogram according to a preset image classification model to determine whether the target flow is malicious flow.
In the embodiment of the present invention, as described above, each image element includes the feature corresponding to the attribute parameter, so that the entire single-flow electrocardiogram can include the attribute parameter of each metadata in the target traffic, and then the single-flow electrocardiogram can be subjected to feature extraction and classification based on an image processing method, so as to determine whether the target traffic corresponding to the single-flow electrocardiogram is malicious traffic. Specifically, in the present embodiment, the single-flow electrocardiogram is input into the pre-trained image classification model for classification, and based on the output result of the image classification model, whether the single-flow electrocardiogram corresponds to malicious traffic can be determined, that is, whether the target traffic is normal traffic or malicious traffic can be determined.
The method for identifying the malicious traffic provided by the embodiment of the invention comprises the steps of extracting attribute parameters of metadata in target traffic, converting the target traffic into a single-flow electrocardiogram containing a plurality of image elements based on the attribute parameters, and then classifying the single-flow electrocardiogram based on an image classification technology, so as to determine whether the target traffic is encrypted traffic. In the embodiment, the target flow can be converted into a single-flow electrocardiogram based on the attribute parameters of the metadata, so that image construction is realized; the single-flow electrocardiogram can represent the behavior characteristics of communication activities in a time dimension, so that abnormal behaviors can be easily extracted during image classification, malicious flow can be accurately determined, and the accuracy of flow identification can be improved; and the method can be executed under the conditions of no certificate, no domain name and the like, and has strong applicability.
On the basis of the above embodiment, the step 102 "generating an image element corresponding to metadata according to the attribute parameter of the metadata" includes:
step A1: and generating image elements corresponding to the metadata according to the attribute parameters of the metadata and a corresponding preset processing mode.
The preset processing mode may specifically include:
step A11: the size of the image element is determined according to the message length of the metadata.
Step A12: the color of the image element is determined according to the message type of the metadata.
Step A13: and determining the position of the image element according to the message sequence of the metadata.
Step A14: and determining the positive and negative directions of the image elements according to the interaction direction of the metadata.
In the embodiment of the present invention, after the attribute parameters included in the metadata are determined, the parameters (such as color, size, and the like) of the image elements can be determined based on the preset processing manner in the above steps a11-a14, so as to generate corresponding image elements. In this embodiment, the size of the image element may be determined based on the message length of the metadata, and the longer the message length is, the larger the corresponding size is; the dimension may be an area, a length, etc. Optionally, the step a11 may specifically be: the magnitude of the image element is determined according to the message length of the metadata. That is, the size may be specifically the amplitude of the image element, and in this embodiment, the plurality of image elements are sequentially arranged to form a single-stream electrocardiogram, so that the plurality of image elements may form a time axis, and the amplitude refers to the height of the image element along the direction perpendicular to the time axis. As shown in fig. 2, fig. 2 is a schematic structural diagram of a single-flow electrocardiogram, and fig. 2 shows the single-flow electrocardiogram in the form of a histogram, where the direction from left to right is the time axis direction, and each image element is a stripe in the histogram; the height of the histogram bar is the size of the picture element, which may represent the message length.
In this embodiment, different message types are represented by different colors; the "color" may be black and white with different gray scales, or may be a color formed by three primary colors (such as RGB). Optionally, the step a12 specifically includes: allocating corresponding colors for each message type in advance, wherein the discrimination between each two colors is greater than a preset threshold; after determining the message type of the metadata, a color corresponding to the message type of the metadata is taken as a color of the image element. In this embodiment, the discrimination between the two colors can be determined by the difference or difference between the gray value or the RGB value, and specifically, the function of calculating the discrimination can be realized by writing a python program or the like; the higher the discrimination is, the easier it is to discriminate different message types, so that the features in the single-flow electrocardiogram can be more accurately extracted when the image classification is subsequently performed, and the more accurate classification can be realized. For example, Client Hello is represented in red, Server Hello is represented in purple, Certificate is represented in pink, and Application Data is represented in black.
In the embodiment of the present invention, as described above, the message sequence of the metadata can indicate the relative order between the metadata, and thus when a plurality of metadata are arranged, the position of each image element can be determined based on the message sequence. In this embodiment, the "location of the image element" may specifically be a location of the image element on a time axis. In addition, the interaction direction of the metadata can represent the uplink and downlink directions of the data, and the values of the interaction direction of the metadata only have two possibilities, so that the positive and negative directions of the image elements are determined based on the interaction direction of the metadata in the embodiment; specifically, the "positive and negative directions of the picture elements" refer to the side of the time axis on which the picture elements are located. As shown in fig. 2, the positions of the corresponding image elements are determined according to the message sequence of the metadata, and from left to right in fig. 2, the following are sequentially performed: client Hello, Server Hello, Certificate, …, Application Data, and the like. In fig. 2, only, since the upper part of the time axis represents the upstream packet and the lower part represents the downstream packet, the Client Hello and the Application Data transmitted from the Client are located above the time axis; the Server Hello, Certificate, Application Data transmitted from the Server, and the like are located below the time axis. In summary, the position of each image element can be determined based on the attribute parameters of the metadata, and finally a single-flow electrocardiogram is formed, which can be specifically seen in fig. 2.
In addition, in the embodiment of the present invention, in order to enable the generated single-flow electrocardiogram to relatively accurately characterize the target flow rate, it is necessary to generate image elements of corresponding metadata based on at least two attribute parameters. In this embodiment, the message length is required, at least one of the message type and the message sequence is provided, and the interaction direction is optional. That is, the attribute parameters may include a message length and a message type; or, the attribute parameters comprise message length and message sequence; or, the attribute parameters include message length, message type and message sequence; or further, may also include an interaction direction. In this embodiment, even if some attribute parameters are missing, a single-flow electrocardiogram may still be generated, but the generated single-flow electrocardiogram may not fully represent the characteristics of the target flow; further, the remaining missing attribute parameters may be supplemented based on the attribute parameters that can be obtained, and the default attribute parameters may be determined. For example, since normal traffic has a pre-agreed protocol, the sequence between different metadata may be roughly determined according to the message type, e.g. typically "Server Hello" follows "Client Hello", after determining the message type, a default message sequence may be determined according to the protocol; similarly, the "Client Hello" is generally an uplink packet sent by the Client to the server, and the interaction direction thereof is generally an uplink direction, so that a default value can be derived from the message type of the metadata even if the interaction direction of the metadata cannot be acquired.
It will be understood by those skilled in the art that the default attribute parameters determined above may not be consistent with actual parameters, so that the more the attribute parameters are, the more truly the characteristics of the target traffic can be represented, and therefore, the attribute parameters may be as comprehensive as possible if conditions allow, that is, the attribute parameters at least include four items of message length, message type, message sequence and interaction direction.
On the basis of the above embodiment, before the step 103 "classify the single-flow electrocardiogram according to the preset image classification model", the method further includes a model training process, where the training process specifically includes:
step B1: and obtaining the sample flow and the classification label of the sample flow, and determining the attribute parameter of each piece of sample metadata in the sample flow.
Step B2: and generating a sample image element corresponding to the sample metadata according to the attribute parameters of the sample metadata, and generating a sample single-flow electrocardiogram corresponding to the sample flow according to the plurality of sample image elements of the sample flow.
In the embodiment of the present invention, the sample traffic is substantially similar to the target traffic, and both the sample traffic and the target traffic are network traffic, and the difference is that the sample traffic has a predetermined label, that is, a classification label, which can indicate whether the sample traffic is malicious traffic. Similarly, the sample traffic also includes a plurality of metadata, that is, sample metadata, and each sample metadata may also have attribute parameters such as message length, message type, message sequence, interaction direction, and the like.
Furthermore, after determining the attribute parameters of the sample metadata, the corresponding image elements, i.e. sample image elements, can be generated based on the same process as the above step 102, and then a corresponding single-flow electrocardiogram, i.e. sample single-flow electrocardiogram, can be generated.
Step B3: and taking the sample single-flow electrocardiogram as input, taking the corresponding classification label as output, training the image classification model to be trained, and generating the required image classification model after training.
In the embodiment of the invention, an image classification model to be trained is preset, a sample single-flow electrocardiogram is taken as input, and a corresponding classification label is taken as output to train the image classification model, so that the trained image classification model is generated; when the target traffic needs to be classified subsequently, as shown in step 103, a single-flow electrocardiogram of the target traffic is input to the image classification model, i.e., whether the target traffic is malicious traffic can be determined based on the output result of the model.
Specifically, the image classification model may be a CNN (Convolutional Neural Networks) model, which performs functions of feature extraction and classification by convolution, pooling, full connection layer, and the like, and maps learned features to classification labels of samples for classification prediction. The CNN model is a well-established technique and is not described in detail here.
Optionally, in order to facilitate the image classification model to extract the features of the single-flow electrocardiogram, the single-flow electrocardiogram is constrained to a preset size in this embodiment. Specifically, the step 102 "generating a single-flow electrocardiogram corresponding to the target flow rate by sequentially arranging a plurality of image elements of the target flow rate" includes: generating an original image according to a plurality of image elements of the target flow; and carrying out normalization processing on the original image, and converting the original image into a single-flow electrocardiogram with a preset size. In the embodiment of the present invention, the preset size may be a square size, such as 64 × 64, 128 × 128, etc.; the single-flow electrocardiogram is constrained to be square in size, so that data input into the image classification model are square matrixes, and the model can be calculated and processed conveniently.
In addition, the input of the traditional model is text-form traffic, which can only determine whether certain traffic is malicious traffic at most, and cannot be classified more finely, so that the traditional model has low interpretability and cannot infer the reason from the prediction result. In this embodiment, a plurality of classification labels are set for the image classification model, so that fine classification can be performed according to the single-flow electrocardiogram of the target flow, and reason tracing is realized. Specifically, the classification labels of the sample traffic may include: normal browser traffic, normal application traffic, first malicious family traffic, and second malicious family traffic. In this embodiment, the normal browser traffic may be traffic generated when the browser is normally accessed (such as Chrome), and the normal application traffic is traffic generated when some applications (such as outlook) are normally used; meanwhile, due to the existence of multiple malicious families, in the embodiment, the malicious families are subdivided, for example, a Trickbot is equally divided into a first malicious family flow, a Vawtrak is equally divided into a second malicious family flow, and the like, so that the normal flow and the malicious flow are subdivided. In addition, the classification label of the sample flow can include more types, so that further refined classification can be realized, the principle of which is the same as that of the above process, and the detailed description is omitted here.
The method for identifying the malicious traffic provided by the embodiment of the invention comprises the steps of extracting attribute parameters of metadata in target traffic, converting the target traffic into a single-flow electrocardiogram containing a plurality of image elements based on the attribute parameters, and then classifying the single-flow electrocardiogram based on an image classification technology, so as to determine whether the target traffic is encrypted traffic. In the embodiment, the target flow can be converted into a single-flow electrocardiogram based on the attribute parameters of the metadata, so that image construction is realized; the single-flow electrocardiogram can represent the behavior characteristics of communication activities in a time dimension, so that abnormal behaviors can be easily extracted during image classification, malicious flow can be accurately determined, and the accuracy of flow identification can be improved; and the method can be executed under the conditions of no certificate, no domain name and the like, and has strong applicability. The size, the color, the position and the positive and negative directions of the image elements are respectively determined according to the message length, the message type, the message sequence and the interaction direction of the metadata, and a corresponding single-flow electrocardiogram can be generated by combining the characteristics of the target flow, so that the conversion from text to image is realized, and the subsequent accurate image classification is facilitated.
The above describes in detail the method for identifying malicious traffic provided in the embodiment of the present invention, which may also be implemented by a corresponding apparatus, and the following describes in detail the apparatus for identifying malicious traffic provided in the embodiment of the present invention.
Fig. 3 is a schematic structural diagram illustrating an apparatus for identifying malicious traffic according to an embodiment of the present invention. As shown in fig. 3, the apparatus for identifying malicious traffic includes:
the preprocessing module 31 is configured to acquire a target traffic to be processed, and determine attribute parameters of each piece of metadata in the target traffic, where the attribute parameters include at least two of a message length, a message type, a message sequence, and an interaction direction;
an image generating module 32, configured to generate image elements corresponding to the metadata according to the attribute parameters of the metadata, and sequentially arrange a plurality of image elements according to the target flow to generate a single-flow electrocardiogram corresponding to the target flow;
the classification module 33 is configured to perform classification processing on the single-flow electrocardiogram according to a preset image classification model, and determine whether the target flow is malicious flow.
On the basis of the above embodiment, the generating, by the image generating module 32, the image element corresponding to the metadata according to the attribute parameter of the metadata includes:
generating image elements corresponding to the metadata according to the attribute parameters of the metadata and a corresponding preset processing mode;
the preset processing mode comprises the following steps:
determining a size of an image element according to the message length of the metadata;
determining a color of an image element according to the message type of the metadata;
determining a location of an image element from the sequence of messages of the metadata;
determining the positive and negative directions of the image element according to the interaction direction of the metadata.
On the basis of the above embodiment, the determining, by the image generation module 31, the size of the image element according to the message length of the metadata includes: determining a magnitude of an image element according to the message length of the metadata;
the image generation module 31 determining a color of an image element according to the message type of the metadata includes: allocating corresponding colors for each message type in advance, wherein the discrimination between each two colors is greater than a preset threshold; after determining the message type of the metadata, a color corresponding to the message type of the metadata is taken as a color of the image element.
On the basis of the above embodiment, the generating, by the image generating module 32, a single-flow electrocardiogram corresponding to the target flow rate according to sequential arrangement of a plurality of image elements of the target flow rate includes:
sequentially arranging a plurality of image elements according to the target flow to generate an original image;
and carrying out normalization processing on the original image, and converting the original image into a single-flow electrocardiogram with a preset size.
On the basis of the embodiment, the device also comprises a training module;
before the classification module 33 performs classification processing on the single-flow electrocardiogram according to a preset image classification model, the training module is configured to:
obtaining sample flow and a classification label of the sample flow, and determining attribute parameters of each piece of sample metadata in the sample flow;
generating a sample image element corresponding to the sample metadata according to the attribute parameters of the sample metadata, and generating a sample single-flow electrocardiogram corresponding to the sample traffic according to the plurality of sample image elements of the sample traffic;
and taking the sample single-flow electrocardiogram as input, taking the corresponding classification label as output, training the image classification model to be trained, and generating the required image classification model after training.
On the basis of the above embodiment, the classification label includes: normal browser traffic, normal application traffic, first malicious family traffic, and second malicious family traffic.
The device for identifying malicious traffic provided by the embodiment of the invention firstly extracts the attribute parameters of metadata in the target traffic, converts the target traffic into a single-flow electrocardiogram containing a plurality of image elements based on the attribute parameters, and then classifies the single-flow electrocardiogram based on an image classification technology, thereby determining whether the target traffic is encrypted traffic. In the embodiment, the target flow can be converted into a single-flow electrocardiogram based on the attribute parameters of the metadata, so that image construction is realized; the single-flow electrocardiogram can represent the behavior characteristics of communication activities in a time dimension, so that abnormal behaviors can be easily extracted during image classification, malicious flow can be accurately determined, and the accuracy of flow identification can be improved; and the method can be executed under the conditions of no certificate, no domain name and the like, and has strong applicability. The size, the color, the position and the positive and negative directions of the image elements are respectively determined according to the message length, the message type, the message sequence and the interaction direction of the metadata, and a corresponding single-flow electrocardiogram can be generated by combining the characteristics of the target flow, so that the conversion from text to image is realized, and the subsequent accurate image classification is facilitated.
In addition, an embodiment of the present invention further provides an electronic device, including a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and operable on the processor, where the transceiver, the memory, and the processor are connected via the bus, and when being executed by the processor, the computer program implements each process of the above-mentioned malicious traffic identification method embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.
Specifically, referring to fig. 4, an embodiment of the present invention further provides an electronic device, which includes a bus 1110, a processor 1120, a transceiver 1130, a bus interface 1140, a memory 1150, and a user interface 1160.
In an embodiment of the present invention, the electronic device further includes: a computer program stored on the memory 1150 and executable on the processor 1120, the computer program, when executed by the processor 1120, implements the processes of the above-described malicious traffic identification method embodiments.
A transceiver 1130 for receiving and transmitting data under the control of the processor 1120.
In embodiments of the invention in which a bus architecture (represented by bus 1110) is used, bus 1110 may include any number of interconnected buses and bridges, with bus 1110 connecting various circuits including one or more processors, represented by processor 1120, and memory, represented by memory 1150.
Bus 1110 represents one or more of any of several types of bus structures, including a memory bus, and memory controller, a peripheral bus, an Accelerated Graphics Port (AGP), a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include: an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA), a Peripheral Component Interconnect (PCI) bus.
Processor 1120 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits in hardware or instructions in software in a processor. The processor described above includes: general purpose processors, Central Processing Units (CPUs), Network Processors (NPs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), Programmable Logic Arrays (PLAs), Micro Control Units (MCUs) or other Programmable Logic devices, discrete gates, transistor Logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. For example, the processor may be a single core processor or a multi-core processor, which may be integrated on a single chip or located on multiple different chips.
Processor 1120 may be a microprocessor or any conventional processor. The steps of the method disclosed in connection with the embodiments of the present invention may be directly performed by a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules may be located in a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), a register, and other readable storage media known in the art. The readable storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The bus 1110 may also connect various other circuits such as peripherals, voltage regulators, or power management circuits to provide an interface between the bus 1110 and the transceiver 1130, as is well known in the art. Therefore, the embodiments of the present invention will not be further described.
The transceiver 1130 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 1130 receives external data from other devices, and the transceiver 1130 transmits data processed by the processor 1120 to other devices. Depending on the nature of the computer system, a user interface 1160 may also be provided, such as: touch screen, physical keyboard, display, mouse, speaker, microphone, trackball, joystick, stylus.
It is to be appreciated that in embodiments of the invention, the memory 1150 may further include memory located remotely with respect to the processor 1120, which may be coupled to a server via a network. One or more portions of the above-described networks may be an ad hoc network (ad hoc network), an intranet (intranet), an extranet (extranet), a Virtual Private Network (VPN), a Local Area Network (LAN), a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a Wireless Wide Area Network (WWAN), a Metropolitan Area Network (MAN), the Internet (Internet), a Public Switched Telephone Network (PSTN), a plain old telephone service network (POTS), a cellular telephone network, a wireless fidelity (Wi-Fi) network, and combinations of two or more of the above. For example, the cellular telephone network and the wireless network may be a global system for Mobile Communications (GSM) system, a Code Division Multiple Access (CDMA) system, a Worldwide Interoperability for Microwave Access (WiMAX) system, a General Packet Radio Service (GPRS) system, a Wideband Code Division Multiple Access (WCDMA) system, a Long Term Evolution (LTE) system, an LTE Frequency Division Duplex (FDD) system, an LTE Time Division Duplex (TDD) system, a long term evolution-advanced (LTE-a) system, a Universal Mobile Telecommunications (UMTS) system, an enhanced Mobile Broadband (eMBB) system, a mass Machine Type Communication (mtc) system, an Ultra Reliable Low Latency Communication (urrllc) system, or the like.
It is to be understood that the memory 1150 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Wherein the nonvolatile memory includes: Read-Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or Flash Memory.
The volatile memory includes: random Access Memory (RAM), which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as: static random access memory (Static RAM, SRAM), Dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 1150 of the electronic device described in the embodiments of the invention includes, but is not limited to, the above and any other suitable types of memory.
In an embodiment of the present invention, memory 1150 stores the following elements of operating system 1151 and application programs 1152: an executable module, a data structure, or a subset thereof, or an expanded set thereof.
Specifically, the operating system 1151 includes various system programs such as: a framework layer, a core library layer, a driver layer, etc. for implementing various basic services and processing hardware-based tasks. Applications 1152 include various applications such as: media Player (Media Player), Browser (Browser), for implementing various application services. A program implementing a method of an embodiment of the invention may be included in application program 1152. The application programs 1152 include: applets, objects, components, logic, data structures, and other computer system executable instructions that perform particular tasks or implement particular abstract data types.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when being executed by a processor, the computer program implements each process of the above-mentioned malicious traffic identification method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The computer-readable storage medium includes: permanent and non-permanent, removable and non-removable media may be tangible devices that retain and store instructions for use by an instruction execution apparatus. The computer-readable storage medium includes: electronic memory devices, magnetic memory devices, optical memory devices, electromagnetic memory devices, semiconductor memory devices, and any suitable combination of the foregoing. The computer-readable storage medium includes: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), non-volatile random access memory (NVRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape cartridge storage, magnetic tape disk storage or other magnetic storage devices, memory sticks, mechanically encoded devices (e.g., punched cards or raised structures in a groove having instructions recorded thereon), or any other non-transmission medium useful for storing information that may be accessed by a computing device. As defined in embodiments of the present invention, the computer-readable storage medium does not include transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses traveling through a fiber optic cable), or electrical signals transmitted through a wire.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to solve the problem to be solved by the embodiment of the invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially or partially contributed by the prior art, or all or part of the technical solutions may be embodied in a software product stored in a storage medium and including instructions for causing a computer device (including a personal computer, a server, a data center, or other network devices) to execute all or part of the steps of the methods of the embodiments of the present invention. And the storage medium includes various media that can store the program code as listed in the foregoing.
The above description is only a specific implementation of the embodiments of the present invention, but the scope of the embodiments of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present invention, and all such changes or substitutions should be covered by the scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for identifying malicious traffic, comprising:
acquiring target traffic to be processed, and determining attribute parameters of each piece of metadata in the target traffic, wherein the attribute parameters comprise at least two of message length, message type, message sequence and interaction direction;
generating image elements corresponding to the metadata according to the attribute parameters of the metadata, and sequentially arranging a plurality of image elements of the target flow to generate a single-flow electrocardiogram corresponding to the target flow;
and classifying the single-flow electrocardiogram according to a preset image classification model to determine whether the target flow is malicious flow.
2. The method of claim 1, wherein the generating an image element corresponding to the metadata according to the attribute parameters of the metadata comprises:
generating image elements corresponding to the metadata according to the attribute parameters of the metadata and a corresponding preset processing mode;
the preset processing mode comprises the following steps:
determining a size of an image element according to the message length of the metadata;
determining a color of an image element according to the message type of the metadata;
determining a location of an image element from the sequence of messages of the metadata;
determining the positive and negative directions of the image element according to the interaction direction of the metadata.
3. The method of claim 2,
said determining a size of an image element according to the message length of the metadata comprises: determining a magnitude of an image element according to the message length of the metadata;
said determining a color of an image element according to the message type of the metadata comprises: allocating corresponding colors for each message type in advance, wherein the discrimination between each two colors is greater than a preset threshold; after determining the message type of the metadata, a color corresponding to the message type of the metadata is taken as a color of the image element.
4. The method of claim 1, wherein generating a single-flow electrocardiogram corresponding to the target flow rate from the sequential arrangement of the plurality of image elements of the target flow rate comprises:
sequentially arranging a plurality of image elements according to the target flow to generate an original image;
and carrying out normalization processing on the original image, and converting the original image into a single-flow electrocardiogram with a preset size.
5. The method according to any one of claims 1 to 4, wherein before the classifying the single-flow electrocardiogram according to a preset image classification model, the method further comprises:
obtaining sample flow and a classification label of the sample flow, and determining attribute parameters of each piece of sample metadata in the sample flow;
generating a sample image element corresponding to the sample metadata according to the attribute parameters of the sample metadata, and generating a sample single-flow electrocardiogram corresponding to the sample traffic according to the plurality of sample image elements of the sample traffic;
and taking the sample single-flow electrocardiogram as input, taking the corresponding classification label as output, training the image classification model to be trained, and generating the required image classification model after training.
6. The method of claim 5, wherein the classification tag comprises: normal browser traffic, normal application traffic, first malicious family traffic, and second malicious family traffic.
7. An apparatus for identifying malicious traffic, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is used for acquiring target traffic to be processed and determining attribute parameters of each piece of metadata in the target traffic, and the attribute parameters comprise at least two of message length, message type, message sequence and interaction direction;
the image generation module is used for generating image elements corresponding to the metadata according to the attribute parameters of the metadata and sequentially arranging the image elements according to the target flow to generate a single-flow electrocardiogram corresponding to the target flow;
and the classification module is used for classifying the single-flow electrocardiogram according to a preset image classification model and determining whether the target flow is malicious flow.
8. The apparatus of claim 7, wherein the image generation module generates the image element corresponding to the metadata according to the attribute parameter of the metadata comprises:
generating image elements corresponding to the metadata according to the attribute parameters of the metadata and a corresponding preset processing mode;
the preset processing mode comprises the following steps:
determining a size of an image element according to the message length of the metadata;
determining a color of an image element according to the message type of the metadata;
determining a location of an image element from the sequence of messages of the metadata;
determining the positive and negative directions of the image element according to the interaction direction of the metadata.
9. An electronic device comprising a bus, a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, the transceiver, the memory and the processor being connected via the bus, characterized in that the computer program, when executed by the processor, implements the steps in the method for identification of malicious traffic according to any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for identification of malicious traffic according to any one of claims 1 to 6.
CN202010574518.1A 2020-06-22 2020-06-22 Malicious traffic identification method and device and electronic equipment Pending CN113901976A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010574518.1A CN113901976A (en) 2020-06-22 2020-06-22 Malicious traffic identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010574518.1A CN113901976A (en) 2020-06-22 2020-06-22 Malicious traffic identification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113901976A true CN113901976A (en) 2022-01-07

Family

ID=79186318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010574518.1A Pending CN113901976A (en) 2020-06-22 2020-06-22 Malicious traffic identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113901976A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134276A (en) * 2022-05-12 2022-09-30 亚信科技(成都)有限公司 Ore digging flow detection method and device
CN115314240A (en) * 2022-06-22 2022-11-08 国家计算机网络与信息安全管理中心 Data processing method for encryption abnormal flow identification

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134276A (en) * 2022-05-12 2022-09-30 亚信科技(成都)有限公司 Ore digging flow detection method and device
CN115134276B (en) * 2022-05-12 2023-12-08 亚信科技(成都)有限公司 Mining flow detection method and device
CN115314240A (en) * 2022-06-22 2022-11-08 国家计算机网络与信息安全管理中心 Data processing method for encryption abnormal flow identification

Similar Documents

Publication Publication Date Title
US10693872B1 (en) Identity verification system
CN111881991B (en) Method and device for identifying fraud and electronic equipment
US20190034703A1 (en) Attack sample generating method and apparatus, device and storage medium
US20170185913A1 (en) System and method for comparing training data with test data
EP3756130B1 (en) Image hidden information detector
US11315040B2 (en) System and method for detecting instances of lie using Machine Learning model
CN113901976A (en) Malicious traffic identification method and device and electronic equipment
CN111695539A (en) Evaluation method and device for handwritten Chinese characters and electronic equipment
JP2023507248A (en) System and method for object detection and recognition
CN111881706B (en) Living body detection, image classification and model training method, device, equipment and medium
Khanna Identifying Privacy Vulnerabilities in Key Stages of Computer Vision, Natural Language Processing, and Voice Processing Systems
CN114494935B (en) Video information processing method and device, electronic equipment and medium
US20220067585A1 (en) Method and device for identifying machine learning models for detecting entities
CN111695537A (en) Method and device for stroke recognition and electronic equipment
CN115546907A (en) In-vivo detection method and system for multi-scale feature aggregation
CN114970670A (en) Model fairness assessment method and device
CN113327212A (en) Face driving method, face driving model training device, electronic equipment and storage medium
CN115545087A (en) Method and device for identifying encrypted application and electronic equipment
CN112070022A (en) Face image recognition method and device, electronic equipment and computer readable medium
CN107679460B (en) Face self-learning method, intelligent terminal and storage medium
CN116226382B (en) Text classification method and device for given keywords, electronic equipment and medium
WO2023203687A1 (en) Accuracy predicting system, accuracy predicting method, apparatus, and non-transitory computer-readable storage medium
US11068569B2 (en) Method and apparatus for human activity tracking and authenticity verification of human-originated digital assets
CN115657916B (en) Method and device for acquiring e-commerce data and electronic equipment
CN113095210A (en) Method and device for detecting pages of exercise book and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination