CN117097577B - Method, system, electronic equipment and storage medium for classifying encrypted message data streams - Google Patents

Method, system, electronic equipment and storage medium for classifying encrypted message data streams Download PDF

Info

Publication number
CN117097577B
CN117097577B CN202311362322.6A CN202311362322A CN117097577B CN 117097577 B CN117097577 B CN 117097577B CN 202311362322 A CN202311362322 A CN 202311362322A CN 117097577 B CN117097577 B CN 117097577B
Authority
CN
China
Prior art keywords
message
encrypted
message data
classification
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311362322.6A
Other languages
Chinese (zh)
Other versions
CN117097577A (en
Inventor
马增协
胡宁
韩伟红
贾焰
程运财
梁都成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202311362322.6A priority Critical patent/CN117097577B/en
Publication of CN117097577A publication Critical patent/CN117097577A/en
Application granted granted Critical
Publication of CN117097577B publication Critical patent/CN117097577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0485Networking architectures for enhanced packet encryption processing, e.g. offloading of IPsec packet processing or efficient security association look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the application provides an encryption message data stream classification method, an encryption message data stream classification system, electronic equipment and a storage medium. The method comprises the following steps: obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets; dividing the encrypted message data stream into a plurality of message fragments according to time sequence, dividing the message fragments into a plurality of message groups according to message directions, wherein the message directions of encrypted message data packets in the message groups are consistent; in each message group, determining the height of a message group sub-image according to the message length of the encrypted message data packet, determining the width and the message direction of the message group sub-image according to the message quantity, determining the direction of the message group sub-image, and generating a fragment image according to a plurality of message group sub-images; extracting image characteristics of the fragment images, inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a message classification network to classify the message behaviors, and obtaining a classification result of the message behaviors.

Description

Method, system, electronic equipment and storage medium for classifying encrypted message data streams
Technical Field
The present disclosure relates to the field of data flow classification technologies, and in particular, to a method, a system, an electronic device, and a storage medium for classifying encrypted message data flows.
Background
Virtual private network (Virtual Private Network, VPN) traffic identification refers to the process of identifying and classifying data streams transmitted through a VPN. In general, VPN traffic identification may be achieved by analyzing header information, protocol type, port number, etc. of the data packets. VPN traffic identification may help network administrators monitor and manage VPN usage to ensure network security and performance. In the process of transmitting the encrypted message through the VPN, preliminary analysis and identification can be carried out according to the message length, the message number and the like of the encrypted message so as to determine the type and the characteristics of the encrypted message.
In the related art, a secure encryption tunnel implementation manner is generally adopted to transmit an encrypted message data stream, so that the encrypted message cannot intuitively count quintuple information such as a destination address, a source address, a port number and the like of a plaintext, and thus different streams or sessions cannot be distinguished directly from the encrypted message, and the whole original internet protocol (Internet Protocol, IP) data packet is protected. However, this encryption method also has the disadvantage that only statistical information of the data stream is used for classifying, such as only analyzing the message length and the message number, but ignoring the information transmission path of the message and the context-related information of the encrypted message in a short time, so that it is difficult to directly distinguish the fixed data packet format from the encrypted message, thereby reducing the efficiency of classifying the encrypted message data stream, and resulting in inaccurate classification result of the finally obtained encrypted message data stream.
Disclosure of Invention
The embodiment of the application mainly aims to provide an encrypted message data stream classification method, an encrypted message data stream classification system, electronic equipment and a storage medium, which can solve the problems of low encrypted message data stream classification efficiency and inaccurate classification results.
To achieve the above object, a first aspect of an embodiment of the present application provides a method for classifying an encrypted packet data stream, where the method includes: obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets; dividing the encrypted message data stream into a plurality of message fragments according to the time sequence, and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent; for each message Wen Pianduan, in each message group corresponding to the message segment, determining the height and the number of the sub-images of the message group according to the message length of the encrypted message data packet contained in each message group, determining the width of the sub-images of the message group and the direction of the message, determining the direction of the sub-images of the message group, and generating a segment image according to a plurality of sub-images of the message group; extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a classification result of the message behaviors of the encrypted message data stream.
According to some embodiments of the present application, the dividing the encrypted message data stream into a plurality of message segments according to the time sequence, and dividing each of the message segments into a plurality of message groups according to the message direction of the encrypted message data packet includes: in each encrypted message data stream, arranging the encrypted message data packets according to a time sequence, and dividing the encrypted message data packets according to a preset dividing number to obtain a plurality of message fragments; and in each message Wen Pianduan, distinguishing the message direction of each encrypted message data packet, and dividing the encrypted message data packets with the same message direction into the same message group to obtain a plurality of message groups.
According to some embodiments of the present application, the determining the height of the sub-image of the packet includes: screening the encrypted message data packets in the message groups according to a preset exclusion proportion in each message group; dividing the message length of each encrypted message data packet after screening by the message number of the encrypted message data packets in the message group after adding to obtain an average message length; the reference length obtained by multiplying the average message length by a preset extraction proportion is compared with the message length of each encrypted message data packet in the message group to obtain a comparison result; and determining the height of the sub-images of the message group according to the comparison result.
According to some embodiments of the present application, the determining the height of the sub-image of the packet according to the comparison result includes: if the comparison result indicates that the encrypted message data packet with the message length shorter than the reference length does not exist in the message group, the reference length is taken as the height of the sub-image of the message group; and if the comparison result represents that the encrypted message data packet with the message length shorter than the reference length exists in the message group, taking the message length of the corresponding encrypted message data packet as the height of the sub-image of the message group.
According to some embodiments of the present application, the method further comprises: acquiring a plurality of message groups corresponding to each message segment after screening the encrypted message data packet; and determining the color of the sub-image of the message group according to the encrypted message data packet contained in each message group.
According to some embodiments of the present application, the message direction includes a sending direction and a receiving direction; the distinguishing the message direction of each encrypted message data packet in each message Wen Pianduan, and dividing the encrypted message data packets with the same message direction into the same message group, to obtain a plurality of message groups, includes: dividing the message direction of each encrypted message data packet into a sending direction or a receiving direction in each message Wen Pianduan; dividing the encrypted message data packets corresponding to the continuous sending directions or the encrypted message data packets corresponding to the continuous receiving directions according to the time sequence to obtain a plurality of message groups which are arranged according to the time sequence.
According to some embodiments of the application, the extracting the image features of each of the segment images includes: loading a pre-trained convolutional neural network; and sequentially inputting the fragment images into the convolutional neural network according to a time sequence to perform feature extraction, so as to obtain image features corresponding to the fragment images.
According to some embodiments of the present application, the sequentially inputting the historical classification result of the image feature at the previous moment and the image feature at the current moment into a preset packet classification network to perform packet behavior classification, to obtain a classification result of the packet behavior of the encrypted packet data stream, includes: acquiring a history classification result obtained after the image characteristic at the previous moment is input to the message classification network; and sequentially inputting the image characteristics at each current moment and the corresponding historical classification results into a preset message classification network to classify the message behaviors until the last image characteristic of the encrypted message data stream is input into the message classification network, so as to obtain the classification results of the message behaviors of the encrypted message data stream.
According to some embodiments of the present application, the sequentially inputting the image feature at each current time and the corresponding historical classification result into a preset packet classification network to perform packet behavior classification until the last image feature of the encrypted packet data stream is input into the packet classification network, to obtain a classification result of the packet behavior of the encrypted packet data stream, includes: sequentially inputting the image characteristics at each current moment into the message classification network, and determining first classification information to be reserved from the image characteristics through an input gate of the message classification network; inputting the first classification information and the historical classification result to a forgetting gate of the message classification network, and determining second classification information to be reserved from the first classification information and the historical classification result; weighting the historical classification result and the second classification information to obtain weighted classification information; the weighted classification information is input to an output gate for screening, and a current classification result of the encrypted message data stream message behavior at the current moment is obtained; and continuing to input the image features of the encrypted message data stream which are not classified into the message classification network until the last image feature of the encrypted message data stream is input into the message classification network, and outputting a classification result of the message behavior of the encrypted message data stream.
According to some embodiments of the present application, the packet classification network is trained by: acquiring a plurality of encrypted message data streams, and forming a training data set according to the encrypted message data streams; preprocessing each encrypted message data stream in the training data set to obtain a plurality of message fragments and fragment images corresponding to the message fragments; extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a first classification result of the message behaviors of the encrypted message data stream; and calculating a loss value of the first classification result according to a preset loss function, and carrying out parameter adjustment on the message classification network according to the loss value to obtain a trained message classification network.
To achieve the above object, a second aspect of the embodiments of the present application proposes an encrypted packet data stream classification system, the system including: the encrypted message data stream acquisition module is used for acquiring an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets; the message group dividing module is used for dividing the encrypted message data stream into a plurality of message fragments according to the time sequence and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent; a segment image generating module, configured to determine, for each message Wen Pianduan, in each message group corresponding to the message segment, a height of a sub-image of the message group according to a message length of the encrypted message data packet included in each message group, determine a width of the sub-image of the message group according to a number of messages, and determine a direction of the sub-image of the message group according to the direction of the message, and generate a segment image according to a plurality of sub-images of the message group; the classification result acquisition module is used for extracting the image characteristics of each fragment image, and sequentially inputting the historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify the message behaviors, so as to obtain the classification result of the message behaviors of the encrypted message data stream.
To achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor implements the method for classifying encrypted packet data streams according to any one of the embodiments of the first aspect of the present application when executing the computer program.
To achieve the above object, a fourth aspect of the embodiments of the present application proposes a computer-readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the method for classifying an encrypted packet data stream according to any one of the embodiments of the first aspect of the present application.
According to the method, the system, the electronic equipment and the storage medium for classifying the encrypted message data stream, the encrypted message data stream is divided into the message fragments according to the time sequence, each message fragment is divided into the message groups, and the fragment images can be generated according to the message length, the message quantity and the message direction of each message group, so that the originally complicated encrypted message data stream is converted into the visualized fragment images, and meanwhile, the information transmission paths in the encryption process are clearly displayed, so that the images are classified conveniently. And then, extracting the characteristics of the segment images to obtain the image characteristics. And the historical classification result of the image characteristic at the previous moment and the image characteristic at the current moment are input into the message classification network together, so that the message classification network can fully consider the context association relation of the characteristic of the encrypted message data stream when classifying the image characteristic, thereby the message classification network can identify a fixed data packet format by combining the context association relation, the efficiency of classifying the encrypted message data stream is improved, and the accuracy of the classification result of the obtained encrypted message data stream behavior is improved.
Drawings
Fig. 1 is a schematic structural diagram of an encrypted packet data stream classification system according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for classifying data flows of encrypted messages according to an embodiment of the present application;
fig. 3 is a fragment image corresponding to a user login behavior provided in an embodiment of the present application;
fig. 4 is a fragment image corresponding to a user logout behavior provided in an embodiment of the present application;
fig. 5 is a clip image corresponding to a voice request of a user provided in an embodiment of the present application;
fig. 6 is a clip image corresponding to a video request of a user provided in an embodiment of the present application;
fig. 7 is a flowchart of step S102 in fig. 2;
FIG. 8 is a high flow chart of determining a message sub-image provided by an embodiment of the present application;
fig. 9 is a flowchart of step S304 in fig. 8;
FIG. 10 is another flowchart of a method for classifying encrypted message data streams according to an embodiment of the present application;
fig. 11 is a flowchart of step S202 in fig. 8;
FIG. 12 is a flowchart for extracting image features of each clip image provided in an embodiment of the present application;
fig. 13 is a flowchart of step S104 in fig. 2;
fig. 14 is a flowchart of step S802 in fig. 13;
fig. 15 is a training flowchart of a packet classification network provided in an embodiment of the present application;
FIG. 16 is a further flowchart of a method for classifying encrypted message data streams according to an embodiment of the present application;
fig. 17 is a schematic diagram of a functional module of an encrypted packet data stream classification system according to an embodiment of the present application;
fig. 18 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
WireGuard is a modern virtual private network (Virtual Private Network, VPN) protocol, which is simple and efficient to design and aims to provide a secure and reliable network connection. WireGuard uses encryption technology to protect the privacy and security of communication data, and has lower latency and faster transmission speeds.
However, after the communication data is encrypted by using WireGuard or other encryption methods, the encrypted message cannot intuitively count the quintuple information such as the destination address, the source address, the port number and the like of the plaintext, so that different streams or sessions cannot be distinguished directly. In the related art, generally, flow characteristics are directly extracted according to statistical information of a data flow, and then the extracted flow characteristics are directly classified, in the process, the associated information of the context of the encrypted message data packet in a short time is ignored, and the associated relation of the context of the encrypted message data packet is often helpful for identifying a fixed data packet format, so that the encrypted message data flow is quickly identified, and therefore, the classification efficiency of the encrypted message data packet is low, and the accuracy of the classification result of the finally obtained encrypted message data flow is affected.
Based on this, the embodiment of the application provides a method, a system, an electronic device and a storage medium for classifying encrypted message data streams, which can fully consider the transmission direction of encrypted message data packets and the context association relation of the characteristics of the encrypted message data streams, identify a fixed data packet format, improve the efficiency of classifying the encrypted message data streams and improve the accuracy of the classification result of the obtained encrypted message data stream message behaviors.
The embodiment of the application provides a method, a system, an electronic device and a storage medium for classifying encrypted message data streams, and specifically describes the encrypted message data stream classification system in the embodiment of the application through the following embodiment.
Referring to fig. 1, in some embodiments, the encrypted packet data stream classification system includes a sender 101, a feature extraction network 102, a packet classification network 103, a receiver 104, and a controller 105.
By way of example, the controller 105 may be a neural and command center of the system. The controller 105 may generate operation control signals according to the instruction operation code and the timing signals to complete instruction fetching and instruction execution control. For example, the controller 105 may generate an operation control signal, obtain encrypted packet data streams from the transmitting end 101 and the receiving end 104, generate a segment image according to the encrypted packet data streams, then control the feature extraction network 102 to perform feature extraction on the segment image, and then control the packet classification network 103 to classify the extracted features, so as to obtain a classification result of the encrypted packet data streams.
The method for classifying the encrypted message data stream in the embodiment of the application can be illustrated by the following embodiment.
In the embodiments of the present application, when related processing is required according to data related to a user identity or a characteristic, such as user information, user behavior data, user history data, user location information, and the like, permission or consent of the user is obtained first. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.
Fig. 2 is an optional flowchart of a method for classifying encrypted packet data flows according to an embodiment of the present application, where the method in fig. 2 may include steps S101 to S104.
Step S101, obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets.
It is understood that the encrypted message data stream refers to a data stream that is encrypted in a network transmission process, where each encrypted message data stream is a continuous segment of data that is processed by the same encryption algorithm, and each encrypted message data stream includes a plurality of continuous encrypted message data packets. Generally, the original data can be converted into encrypted data through an encryption algorithm for network transmission so as to protect the security of the data. There are various ways to transmit the encrypted message data stream, one of which is a WireGuard VPN. WireGuard may be used to create secure encrypted tunnels over public networks. WireGuard employs advanced encryption algorithms, such as Curve25519, chacha20, and Poly1305 algorithms, to provide high-strength data protection.
Step S102, dividing the encrypted message data stream into a plurality of message fragments according to the time sequence, and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein, the message direction of the encrypted message data packet in each message group is consistent.
In some embodiments, since the encrypted message data stream may include a plurality of actions, such as user login, voice request, video request, chat, logout, etc., in order to identify different encrypted message data streams, the encrypted message data stream needs to be classified into a plurality of message segments, and then a plurality of message segments are obtained.
In some embodiments, the time for transmitting the encrypted message data stream may be divided to obtain a plurality of message fragments, so as to reassemble the messages in a correct sequence. The encrypted message data stream may be divided according to a length of the encrypted message data stream to obtain a plurality of message fragments, and in general, the encrypted message data stream may be uniformly divided, or a plurality of division ratios may be set according to requirements to perform division.
It can be understood that, since encrypted message data packets with the same message direction are most likely to belong to the same class, and encrypted message data packets with different message directions generally belong to different classes, each message segment can be divided according to the message direction of each encrypted message data packet, so that the classification efficiency is improved, and the accuracy of classification results is improved. For example, if there are 6 encrypted packets in a packet segment, the packet direction of the encrypted packets is as follows in time sequence: transmission direction 1, transmission direction 2, transmission direction 3, reception direction 4, reception direction 5, and transmission direction 6. Then, the encrypted packets in the same direction may be grouped into three groups, namely, a group of transmitting direction 1, transmitting direction 2, transmitting direction 3, a group of receiving direction 4, receiving direction 5, and a group of transmitting direction 6. Or, an interception time may be set for each packet segment, and each time the preset interception time is reached, the intercepted encrypted packet is taken as a packet group, and encrypted packets in inconsistent packet directions in the packet group are excluded, for example, if the packet directions of more than 60% of the encrypted packets in one packet group are all the sending directions, in the packet group, the encrypted packet with the packet direction being the receiving direction is excluded, or the packet directions of the packet group are specified, and the packets in other packet directions are excluded.
Step S103, for each message segment, determining the height of the sub-image of the message group according to the message length of the encrypted message data packet contained in each message group, determining the width of the sub-image of the message group according to the number of messages, determining the direction of the sub-image of the message group according to the direction of the message, and generating a segment image according to the sub-images of the message group in each message group corresponding to the message segment.
It will be appreciated that, since there may be some outliers or outliers at the head and tail of the packet, in order to avoid the influence of the outliers or outliers on the overall average value, the outliers may be removed according to a certain percentage, for example, after the encrypted packet data packet of each packet is sequentially arranged or inversely arranged according to the packet length of each encrypted packet data packet, the outliers inside the packet are removed according to a certain percentage, for example, 5%, so as to improve the accuracy of the calculated average packet length.
In some embodiments, since the average packet length represents the average length of all the encrypted packets in the packet, and the header bytes generally represent the main information of the encrypted packets, the header byte extraction ratio of each encrypted packet may be determined according to the average packet length, so as to extract the header byte of each encrypted packet, where the length of the extracted header byte is used as the height of the corresponding packet subimage. In some embodiments, the header bytes may not be extracted, and the average value may be used as the corresponding packet group to be higher in the packet group sub-image, which is not particularly limited in the embodiments of the present application.
In some embodiments, the number of packets of each encrypted packet included in each packet group may be used as the width of the sub-image of the packet group, for example, if there are 5 encrypted packets in one packet group, the value of the width of the packet group in the sub-image of the corresponding packet group is 5, and the unit corresponding to the value of the width is set according to the actual situation. It will be appreciated that since the number of encrypted packets included in each packet may be different, the width of the image representation corresponding to each packet may be different in the clip image.
It can be understood that the color of the sub-image of the packet is generated by the encrypted packet in the packet with the outlier removed, or the sub-image of the packet is formed only according to the header bytes of the encrypted packet in the packet, and different bytes correspond to different colors, so as to form a visualized image. It will be appreciated that bytes are bytes that have been encoded and that the color of the fragment image may be formed from the mapping of bytes to the pixel values of the image. Illustratively, the encoding of bytes may be converted into a corresponding Red Green Blue (RGB) color value RGB color value. It can be understood that, since the header bytes can generally represent the changing mode or the characteristic of the corresponding encrypted packet, only the header bytes of the encrypted packet are required to be extracted to draw the color of the fragment image, so that the drawn fragment image is more accurate.
In some embodiments, the direction of the packet of each encrypted packet data included in each packet group may be used as the direction of the image. The forward direction and the reverse direction of the message direction can be defined by users, for example, the encrypted message data flow can be divided into a sending direction and a receiving direction, the sending direction is defined to be positive, the receiving direction is defined to be negative, or the sending direction is defined to be positive, the receiving direction is defined to be negative, and the like. In some embodiments, the sending direction may be defined as positive, and then the graphic representation corresponding to the packet with the positive packet direction is in the positive direction of the coordinate axis, and the graphic representation corresponding to the packet with the negative packet direction is in the negative direction of the coordinate axis.
It will be appreciated that in the segment image, the graphic representation corresponding to each packet group may be cylindrical, rectangular, or the like, which is not particularly limited in this application. It can be understood that the segment images are generated according to the message length, the message number, the bytes of the encrypted message data packet and the message direction of each message group, so that the originally complex encrypted message data stream can be converted into the visualized segment images, and visual image representation is obtained.
Referring to fig. 3 to 6, fig. 3 to 6 are segment images under different message behaviors, fig. 3 is a segment image corresponding to a user login behavior, fig. 4 is a segment image corresponding to a user logout behavior, fig. 5 is a segment image corresponding to a user voice request, and fig. 6 is a segment image corresponding to a user video request. In fig. 3 to 6, the fragment images corresponding to different behaviors can be intuitively displayed, in fig. 3 to 6, the horizontal axis represents time, the vertical axis represents the length of the processed message, the color representation of the image is mapped by the header byte, and in order to facilitate distinguishing different colors, different legends are selected for representation, and in practical application, the color of the image can be displayed. As can be seen from fig. 3 to fig. 6, the message groups are arranged according to the time sequence, and the direction of the sub-image of each message group is determined according to the sending direction and the receiving direction, so that the change condition of different message groups in different time periods can be intuitively displayed.
It can be understood that the method for classifying the encrypted message data stream further comprises summarizing the plurality of fragment images of the same type, so as to classify the fragment images according to the historical classification result. Since the same type of fragment image representation is most likely to be the same, the classified images can be subjected to rule analysis to obtain the image rule of the fragment image of the corresponding class, so that the fragment image is directly classified. For example, the image rule of the segment image corresponding to the video request of the user can be analyzed, and when the subsequent segment images present the same image rule, the segment images are directly classified into the video request of the user, so that the classification efficiency of the message behavior of the encrypted message data stream is improved.
Step S104, extracting the image characteristics of each fragment image, and sequentially inputting the historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify the message behaviors, so as to obtain the classification result of the message behaviors of the encrypted message data stream.
In some embodiments, since the encrypted message stream message behavior generally has a fixed packet format, the packet format is typically only analyzed in response to the historical classification result, thereby classifying the message behavior. Therefore, feature extraction can be performed on each fragment image through a pre-good convolutional neural network, then the extracted image features and the historical classification result of the message behavior of the last encrypted message data stream are input into a message classification network together for message behavior classification, so that the classification information of the historical fragment images is combined with the image features of the time, and the context connection is fully considered.
According to the method, the system, the electronic equipment and the storage medium for classifying the encrypted message data stream, the encrypted message data stream is divided into the message fragments according to the time sequence, each message fragment is divided into the message groups, and the fragment images can be generated according to the message length, the message quantity and the message direction of each message group, so that the originally complicated encrypted message data stream is converted into the visualized fragment images, and meanwhile, the information transmission paths in the encryption process are clearly displayed, so that the images are classified conveniently. And then, extracting the characteristics of the segment images to obtain the image characteristics. And the historical classification result of the image characteristic at the previous moment and the image characteristic at the current moment are input into the message classification network together, so that the message classification network can fully consider the context association relation of the characteristic of the encrypted message data stream when classifying the image characteristic, thereby the message classification network can identify a fixed data packet format by combining the context association relation, the efficiency of classifying the encrypted message data stream is improved, and the accuracy of the classification result of the obtained encrypted message data stream behavior is improved.
Referring to fig. 7, in some embodiments, step S102 includes steps S201 to S202:
step S201, in each encrypted message data stream, the encrypted message data packets are arranged according to a time sequence, and the encrypted message data packets are divided according to a preset dividing number, so as to obtain a plurality of message fragments.
It will be appreciated that, in general, each packet behavior may be formed by a plurality of consecutive encrypted packets, and in order to more easily track and check the transmission and processing conditions of the packets, the encrypted packets may be continuously arranged in time sequence, and the arranged encrypted packets may be divided according to a preset dividing number, so as to obtain a plurality of packet fragments. In some embodiments, the preset dividing number is a dividing number set according to practical situations, and the preset dividing number can be adjusted, for example, the preset dividing number is 10, and every 10 encrypted packet data packets form a packet segment.
In some embodiments, the encrypted packet may be divided according to a preset dividing time, for example, the encrypted packet may be divided every 1 second to obtain a corresponding packet Wen Pianduan, which is not limited in this embodiment of the present application.
In step S202, in each message segment, the message direction of each encrypted message data packet is differentiated, and the encrypted message data packets with the same message direction are divided into the same message group, so as to obtain a plurality of message groups.
It can be appreciated that the directions of the messages of the same type are always consistent, so that the continuous encrypted message data packets in the same direction can be divided into the same message group, thereby improving the classification efficiency. Or, the message segments may be divided according to a preset dividing number to obtain a plurality of initial message groups, and according to the proportion of the message directions of the initial message groups, the encrypted message data packets to be excluded from each initial message group are determined, for example, the number of the encrypted message data packets with the positive message direction is 60% of that of the message groups, the number of the encrypted message data packets with the negative message direction is 40% of that of the message groups, and since 60% is greater than 40%, the encrypted message data packets with the negative message direction are excluded from the corresponding message groups.
In some embodiments, a message group having two message directions may be assigned a message direction, encrypted message data packets having a message direction different from the assigned message direction may be screened, and finally, the message directions of the encrypted message data packets in each message group are consistent.
Referring to fig. 8, in some embodiments, determining the high of the sub-images of the packet group includes steps S301 to S304:
step S301, in each message group, screening the encrypted message data packets in the message group according to a preset exclusion proportion.
In some embodiments, since WireGuard or other encryption method is a VPN protocol in tunnel encapsulation mode, the encapsulated portion can be removed after the header is removed, and the remaining data is encrypted message data of the original traffic data. Therefore, the preset clipping proportion can be set according to the actual situation to clip the message head of each encrypted message data packet in the message group.
In some embodiments, a preset exclusion ratio may be set to screen encrypted packet data packets in the packet group. For example, the preset exclusion ratio may be set to be 5%, and there are 100 encrypted packets in the packet group, so that the encrypted packets may be sorted from long to short or from short to long in the packet group according to the packet length, and then 5% of the total number of the encrypted packets in the group are removed at the head and the tail, that is, 5 encrypted packets are removed at the head and 5 encrypted packets are removed at the tail, so as to remove the maximum value and the minimum value, and eliminate the possibility that the abnormal value affects the classification result.
Step S302, dividing the added message length of each encrypted message data packet by the message number of the encrypted message data packet in the message group according to the message length of each encrypted message data packet after screening to obtain the average message length.
In some embodiments, after screening each encrypted packet in each packet, all the encrypted packets may be added to obtain a sum divided by the number of encrypted packets in the corresponding packet, so as to obtain an average packet length, so as to facilitate subsequent drawing of the segment image. In some embodiments, the average message length may be directly taken as the high of the group sub-images.
Step S303, comparing the reference length obtained by multiplying the average message length by the preset extraction ratio with the message length of each encrypted message data packet in the message group to obtain a comparison result.
In some embodiments, the preset extraction ratio is a ratio of valid bytes of each encrypted packet that can be adjusted. It can be understood that, in order to make the drawn fragment image remove the redundant portion, the fragment image is directly generated according to the effective portion, and the effective byte of the encrypted packet may be extracted by setting a preset extraction ratio. For example, the preset extraction ratio may be set to 60%, and then the average packet length may be multiplied by the reference length obtained by the preset extraction ratio, and compared with the packet length of each encrypted packet, so as to determine whether or not the encrypted packet may be completely extracted according to the reference length.
Step S304, determining the high of the sub-image of the message group according to the comparison result.
In some embodiments, if the comparison results indicate that the reference length is less than the message length of each encrypted message packet, the description may extract each encrypted message packet according to the reference length. If the comparison result represents that the reference length is greater than the message length of a certain encrypted message data packet, the condition that the byte length of the extracted encrypted message data packet is less than the reference length exists in extracting each encrypted message data packet according to the reference length is indicated, so that the drawing of a fragment image is affected, therefore, the message length of the encrypted message data packet with the shortest message length in the message group can be selected to be compared with the reference length, and the message length of the encrypted message data packet with the shortest message length is taken as the height of a sub-image of the message group if the reference length is greater than the message length of a certain encrypted message data packet, so that the extraction length of each encrypted message data packet in the message group is ensured to be equal.
Referring to fig. 9, in some embodiments, step S304 includes steps S401 to S402:
in step S401, if the comparison result indicates that no encrypted packet with a packet length shorter than the reference length exists in the packet group, the reference length is taken as the high of the sub-image of the packet group.
Step S402, if the comparison result represents that the encrypted message data packet with the message length shorter than the reference length exists in the message group, the message length of the corresponding encrypted message data packet is taken as the height of the sub-image of the message group.
In some embodiments, the average packet length may be multiplied by a preset extraction ratio to calculate the length of the header byte of each encrypted packet in the packet, and the length of the header byte is taken as the height of the sub-image of the packet.
In some embodiments, the high calculation formula for the group sub-images is as follows:
wherein M represents the high of the sub-images of the message group,represents the average message length,/->Indicating the length of any one encrypted packet,/->Representing a preset proportion, the preset proportion may be set to 60% or other proportions.
It can be understood that in the above formula, if the reference length obtained by multiplying the average packet length in the packet group by the preset extraction ratio is smaller than the lengths of all the encrypted packet data packets, it indicates that header bytes can be extracted for all the encrypted packet data packets in the packet group, and the situation that the extracted packet length of the encrypted packet data packet is not smaller than the reference length will not exist. If the reference length is greater than the shortest message length in the message group, the length of the encrypted message data packet with the shortest length is taken as the height of the image, so that the byte length extracted by each encrypted message data packet in the message group is consistent, and each encrypted message data packet in the message group can extract bytes with the same length. For example, if the average packet length is equal to 7 after multiplying by the preset ratio, but the lengths of the encrypted packet data packets in the packet group are 6 and 4, 7 cannot be extracted from the encrypted packet data packet with the length of 4, and 4 is regarded as the high of the image.
Referring to fig. 10, in some embodiments, the method for classifying the encrypted message data stream further includes steps S501 to S502:
step S501, a plurality of packet groups corresponding to each packet segment after screening the encrypted packet data packet are obtained.
In some embodiments, the plurality of packet groups after the encrypted packet data packet is screened are packet groups with outliers screened according to a preset exclusion ratio. The preset exclusion ratio may be set to 5%, 10%, etc., specifically may be sequentially arranged according to the message length of the encrypted packet, and the encrypted packet may be screened out at the head or the tail or the head and the tail after the arrangement according to the preset exclusion ratio, respectively. For example, if there are 100 encrypted packets in the packet, and the preset exclusion ratio is 5%, after the encrypted packets are arranged in sequence, it is assumed that the header is the encrypted packet with the shortest packet length and the tail is the encrypted packet with the longest packet length, then 5 encrypted packets may be cut at the header, and then 5 encrypted packets may be cut at the tail. After all the message groups included in the message fragments are cut, the color of the sub-image of the message group can be determined according to the encrypted message data packet included in the message group, so that the change information of the encrypted message data packet in the message group is displayed through the fragment image, and the characteristic extraction and classification can be conveniently carried out subsequently. Or after screening the encrypted message data packet, obtaining the average message length of each message group, then extracting the header bytes of the message according to the multiplication of the average message length and the preset extraction proportion, and generating the color of the sub-image of the message group according to the extracted header bytes of each encrypted message data packet.
Step S502, determining the color of the sub-image of the message group according to the encrypted message data packet contained in each message group.
In some embodiments, the image color of each packet may be generated based on the extracted header byte of the corresponding packet. In some embodiments, the header bytes may be mapped into the range of values for the RGB components. Specifically, the value of the header byte may be taken as the value of the RGB component. For example, if the value of the header byte is 100, then both values of R, G and B may be set to 100, thereby generating a gray pixel. In some embodiments, the values of the header bytes may be mapped into the range of values of the RGB components according to certain rules. For example, the value of the header byte may be divided by 255 to obtain a fraction between 0 and 1, and then this fraction is multiplied by 255 to obtain a new value as the value of the RGB component, thereby ensuring that the generated color is uniformly distributed throughout the RGB color space.
Further, the color of the image generated by each encrypted message data packet of each message group is added to obtain the color of the segment image drawn by each message group, so that the change rule of the image corresponding to the message group is clearly and accurately displayed.
Referring to fig. 11, in some embodiments, step S202 includes steps S601 to S602:
in step S601, in each packet segment, the packet direction of each encrypted packet is divided into a transmitting direction or a receiving direction.
Step S602, dividing the encrypted packet data corresponding to the continuous sending direction or the encrypted packet data corresponding to the continuous receiving direction according to the time sequence, to obtain a plurality of packet groups arranged according to the time sequence.
In some embodiments, the message direction includes a transmit direction and a receive direction. The message direction of each encrypted message data packet can be analyzed, and all encrypted message data packets are divided into a transmitting direction or a receiving direction.
For example, if the message directions of the encrypted message packets in the message segments obtained according to the time sequence are respectively the encrypted message packet 1: transmitting direction, encrypted message data packet 2: transmitting direction, encrypted message data packet 3: receiving direction, encrypting message data packet 4: receiving direction, encrypting message data packet 4: a reception direction. Then the division result is: the encrypted message data packet 1 and the encrypted message data packet 2 are a message group, and the encrypted message data packet 3, the encrypted message data packet 4 and the encrypted message data packet 5 are a message group.
It can be understood that the continuous encrypted message data packets with the same message direction are processed according to the time sequence, and the finally generated fragment image can represent the message transmission trend, so that the more accurate classification of the message behavior is facilitated.
Referring to fig. 12, in some embodiments, extracting image features of each segment image includes steps S701 to S702:
step S701, loading a pre-trained convolutional neural network.
In some embodiments, the convolutional neural network may be used to extract image features of each of the segment images, or other feature extraction networks may be used to extract image features of each of the segment images. The convolutional neural network is obtained by training a data set consisting of a large number of fragment images in advance, and has good feature extraction capability.
Step S702, sequentially inputting each segment image into a convolutional neural network according to a time sequence to perform feature extraction, and obtaining image features corresponding to the segment images.
In some embodiments, each segment image can be sequentially input into a convolutional neural network according to a time sequence for feature extraction, and change information and important feature information of the segment images are captured to obtain image features of each segment image, so that the image features can be conveniently classified subsequently, and the classification efficiency is improved.
Referring to fig. 13, in some embodiments, step S104 includes steps S801 to S802:
step S801, obtain the history classification result obtained after the image feature of the previous moment is input to the packet classification network.
In some embodiments, the packet classification network may be a Long Short-Term Memory (LSTM) network, which may process sequence data to capture the timing dependence of the input data. It will be appreciated that the historical classification results may have an important impact on the classification at the current time, and thus, in order to establish a contextual relationship of image features, the historical classification results of image features at a previous time may be obtained.
Step S802, sequentially inputting the image characteristics at each current moment and the corresponding historical classification results into a preset message classification network to classify the message behaviors until the last image characteristic of the encrypted message data stream is input into the message classification network, and obtaining the classification results of the message behaviors of the encrypted message data stream.
It will be appreciated that in processing an encrypted message data stream, much information may not be directly available due to the encrypted nature of the data. But partial information loss can be filled by utilizing the relation between the historical classification result and the image characteristics at the current moment, so that the accuracy of the classification of the message behaviors is improved.
If the encrypted message data stream has 5 message segments, each message segment corresponds to 1 image feature and has 5 image features, the first image feature, namely, the image feature 1, is input into the message classification network according to the time sequence of the image features to obtain a classification result 1; inputting the classification result 1 and the image features 2 into a message classification network to obtain a classification result 2; inputting the classification result 2 and the image characteristic 3 into a message classification network to obtain a classification result 3; inputting the classification result 3 and the image characteristics 4 into a message classification network to obtain a classification result 4; and inputting the classification result 4 and the image characteristics 5 into a message classification network to obtain a classification result of the encrypted message data stream.
In some embodiments, the classification result of the image feature at the current moment can be predicted by taking the image feature at the previous moment and the historical classification result as inputs through the message classification network, so that the time sequence relationship between the historical classification result and the image feature is fully utilized, and the accuracy of image classification is improved.
It will be appreciated that the packet classification network may include an ingress gate, a forget gate, and an egress gate. The following describes an input door, a forget door and an output door: the input gate determines how much information from the input data can enter the memory unit, the input gate controls the importance of the input data through an activation function, and the output result of the input gate multiplies the input data at the element level, thereby selectively inputting the important information into the memory unit. The forgetting gate determines which historic memories need to be forgotten, the forgetting gate controls the importance of the previous memories through an activation function, and the output result of the forgetting gate is multiplied by the previous memories at element level, so that some unimportant memories are forgotten selectively. The output gate determines the output of the information in the memory cell at the current time step, controls the importance of the information in the memory cell by an activation function, and selectively outputs the information in the memory cell by mapping the information in the memory cell to an appropriate range, and multiplying the output result of the output gate by the information of the memory cell at the element level.
Referring to fig. 14, in some embodiments, step S802 includes steps S901 to S905:
step S901, sequentially inputting the image features at each current moment to the packet classification network, and determining the first classification information to be retained from the image features through the input gate of the packet classification network.
In some embodiments, the message classification network may control the effect of the input information at the current time on memory through an input gate, thereby selectively remembering important image features. The image characteristics of each current moment are input into a message classification network, first classification information of the image characteristics to be reserved is determined through an input gate, and information related to message classification can be extracted from the image characteristics of the current moment so as to facilitate subsequent classification processing.
Step S902, inputting the first classification information and the historical classification result into a forgetting gate of the message classification network, and determining second classification information to be reserved from the first classification information and the historical classification result.
In some embodiments, the forget gate is an important component in Long Short-Term Memory (LSTM) networks for deciding which information needs to be preserved at the current time. By inputting the first classification information and the historical classification result into the forgetting gate, the second classification information to be reserved can be determined according to the learning ability and the gating mechanism of the network. In the forgetting gate, calculating and deciding are carried out according to the input first classification information and the historical classification result, and the second classification information which needs to be reserved is determined. Thus, the second classification information is passed on to the next layer or time step to continue the classification task. It can be understood that the first classification information and the historical classification result are input to the forgetting gate of the message classification network, and more comprehensive classification information can be extracted from the first classification information at the current moment and the previous historical classification result in combination, so that classification can be performed more accurately.
Step S903, weighting the historical classification result and the second classification information to obtain weighted classification information.
In some embodiments, the packet classification network may determine the output weight through the output gate, and perform weighting processing on the historical classification result and the current classification information to obtain more accurate classification information.
It can be understood that the message classification network can automatically adjust the weighted weight according to the contribution degree of the historical classification result and the second classification information, or the weighted weight can be adjusted by a technician according to the requirement, and then the historical classification result and the second classification information are weighted according to the adjusted weight, so that more reasonable classification information is obtained. It is understood that the weighted classification information may refer to weights or labels for distinguishing between different types of features.
Step S904, the weighted classification information is input into an output gate for screening, and a current classification result of the current time encrypted message data stream message behavior is obtained.
It will be appreciated that the output gate is a mechanism in neural networks for screening and controlling input signals. By inputting the weighted classification information to the output gate, different types of messages can be screened. In the output gate, the encrypted message data stream can be screened according to the input weighted classification information and the gating mechanism, so that the current classification result of the message behavior of the encrypted message data stream at the current moment is obtained.
Step S905, continuing to input unclassified image features of the encrypted message data stream to the message classification network until the last image feature of the encrypted message data stream is input to the message classification network, and outputting a classification result of the message behavior of the encrypted message data stream.
It can be understood that unclassified image features can be continuously input into the message classification network according to a time sequence, each image feature is input into the message classification network together with a historical classification result of the previous moment, a current classification result is obtained and then is input into the message classification network together with the image feature at the next moment until the last image feature of the encrypted message data stream is input, and a classification result of the message behavior of the encrypted message data stream is output through the message classification network, so that not only is each image feature ensured to be included in the classification process, but also the context relation of the image features is considered, and the efficiency and the accuracy of the classification result of the message behavior are improved.
Referring to fig. 15, in some embodiments, the packet classification network is trained by the following steps S1001 to S1004:
step S1001, a plurality of encrypted message data streams are acquired, and a training data set is formed according to the plurality of encrypted message data streams.
In some embodiments, in order to improve the classification capability of the packet classification network, a large number of encrypted packet data streams may be acquired to form a training data set, so as to train the packet classification network. Illustratively, the encrypted message data stream may be divided into a training set, a validation set, and a test set to facilitate subsequent training of the message classification network and adjustment of parameters.
Step S1002, preprocessing each encrypted message data stream in the training data set to obtain a plurality of message fragments and fragment images corresponding to the message fragments.
In some embodiments, preprocessing is performed on each encrypted packet data stream, specifically, the encrypted packet data stream may be divided into a plurality of packet fragments according to a time sequence, each packet fragment is divided into a plurality of packet groups, then, for each packet fragment, a fragment image corresponding to the packet fragment is generated according to the packet length of each encrypted packet included in each packet group as the height of a packet group sub-image, according to the number of packets of each encrypted packet included in each packet group as the width of a packet group sub-image, according to the bytes of each encrypted packet included in each packet group as the color of a packet group sub-image, according to the direction of each encrypted packet included in each packet as the direction of a packet group sub-image, and the detailed processing procedure is expanded above and will not be repeated.
Step S1003, extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify the message behaviors, so as to obtain a first classification result of the message behaviors of the encrypted message data stream.
It can be appreciated that extracting image features can capture spatial features of the image, helping to distinguish between different message behaviors. The historical classification result and the image characteristics at the current moment are input into the message classification network, and the context association of each message segment can be considered, so that a fixed data packet format is identified according to the context information, and the quick classification of the message classification network is facilitated.
Step S1004, calculating a loss value of the first classification result according to a preset loss function, and performing parameter adjustment on the message classification network according to the loss value to obtain a trained message classification network.
In some embodiments, the first classification result may be compared with the correct classification result in the verification set, the loss value of the first classification result is calculated through a preset loss function, parameters of the packet classification network are adjusted according to the loss value, the packet classification network is trained again, and in the training process, the parameters of the packet classification network are continuously adjusted until the packet classification network converges or reaches the preset training times, which indicates that the packet classification network has been trained, and at this time, the trained packet classification network is obtained. It can be appreciated that the loss function is a general loss function and can be set as required.
Referring to fig. 16, in some embodiments, the method for classifying encrypted message data streams according to the present application is generally described with reference to fig. 16.
Illustratively, an encrypted message data stream may be obtained, and the encrypted message data stream may be segmented into a plurality of message segments according to time, and then a message group may be determined according to a message direction. In each packet, the number of encrypted packets in the packet is used as the width of the packet pattern, the average length of the encrypted packets in the packet is calculated, the main bytes with the average length are extracted, the extracted main bytes are used as the lengths of the packet sub-images, then header byte extraction is performed on the encrypted packets in the packet, the extracted header bytes are converted into the colors of the packet sub-images, and then the packet direction is used as the direction of the packet sub-images, so that the segment images of the corresponding message segments are drawn according to the multiple packet sub-images. Further, after feature extraction is carried out on the fragment images, the extracted image features are sequentially input into a message classification network for classification. Further, when classifying the image features, the historical classification result and the current image features are input into the message classification network together according to the time sequence to obtain the current classification result, and after the last image feature of the encrypted message data stream and the historical classification result are input into the message classification network together, the classification result of the message behavior of the encrypted message data stream is obtained.
Referring to fig. 17, the embodiment of the present application further provides a system for classifying encrypted message data streams, which may implement the method for classifying encrypted message data streams, where the system for classifying encrypted message data streams includes:
an encrypted message data stream acquiring module 1701, configured to acquire an encrypted message data stream, where the encrypted message data stream includes a plurality of encrypted message data packets in succession;
the packet dividing module 1702 is configured to divide the encrypted packet data stream into a plurality of packet fragments according to a time sequence, and divide each packet fragment into a plurality of packet groups according to a packet direction of the encrypted packet data packet; wherein, the message direction of the encrypted message data packet in each message group is consistent;
a segment image generating module 1703, configured to determine, for each message segment, in each message group corresponding to the message segment, a height of a sub-image of the message group according to a message length of an encrypted message packet included in each message group, determine a width of the sub-image of the message group according to a number of messages, determine a direction of the sub-image of the message group according to a direction of the message, and generate a segment image according to a plurality of sub-images of the message group;
the classification result obtaining module 1704 is configured to extract image features of each segment image, and sequentially input a historical classification result of the image feature at a previous time and an image feature at a current time into a preset packet classification network to perform packet behavior classification, so as to obtain a classification result of the packet behavior of the encrypted packet data stream.
The specific implementation of the encrypted message data stream classification system is basically the same as the specific embodiment of the above encrypted message data stream classification method, and will not be described herein. On the premise of meeting the requirements of the embodiment of the application, other functional modules can be further arranged in the encrypted message data stream classification system so as to realize the encrypted message data stream classification method in the embodiment.
The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the encryption message data stream classification method when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.
Referring to fig. 18, fig. 18 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:
the processor 1801 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an application-specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;
The memory 1802 may be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage, dynamic storage, or random access memory (RandomAccessMemory, RAM), among others. The memory 1802 may store an operating system and other application programs, and when the technical solutions provided in the embodiments of the present application are implemented by software or firmware, relevant program codes are stored in the memory 1802, and the processor 1801 invokes the encrypted packet data stream classification method to execute the embodiments of the present application;
an input/output interface 1803 for implementing information input and output;
the communication interface 1804 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);
a bus 1805 for transferring information between components of the device (e.g., processor 1801, memory 1802, input/output interfaces 1803, and communication interfaces 1804);
wherein the processor 1801, memory 1802, input/output interface 1803, and communication interface 1804 enable communication connection among each other within the device via bus 1805.
The embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the encryption message data stream classification method when being executed by a processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.
The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in this application, "at least one (item)" and "a number" mean one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in this application, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the above elements is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.
Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (13)

1. An encrypted message data stream classification method, which is characterized by comprising the following steps:
obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets;
dividing the encrypted message data stream into a plurality of message fragments according to the time sequence, and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent;
for each message Wen Pianduan, in each message group corresponding to the message segment, determining the height and the number of the sub-images of the message group according to the message length of the encrypted message data packet contained in each message group, determining the width of the sub-images of the message group and the direction of the message, determining the direction of the sub-images of the message group, and generating a segment image according to a plurality of sub-images of the message group;
Extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a classification result of the message behaviors of the encrypted message data stream.
2. The method for classifying encrypted message data streams according to claim 1, wherein the dividing the encrypted message data streams into a plurality of message segments according to the time sequence, and dividing each of the message segments into a plurality of message groups according to the message direction of the encrypted message data packets, comprises:
in each encrypted message data stream, arranging the encrypted message data packets according to a time sequence, and dividing the encrypted message data packets according to a preset dividing number to obtain a plurality of message fragments;
and in each message Wen Pianduan, distinguishing the message direction of each encrypted message data packet, and dividing the encrypted message data packets with the same message direction into the same message group to obtain a plurality of message groups.
3. The method for classifying an encrypted packet data stream according to claim 1, wherein determining the height of the packet sub-image includes:
Screening the encrypted message data packets in the message groups according to a preset exclusion proportion in each message group;
dividing the message length of each encrypted message data packet after screening by the message number of the encrypted message data packets in the message group after adding to obtain an average message length;
the reference length obtained by multiplying the average message length by a preset extraction proportion is compared with the message length of each encrypted message data packet in the message group to obtain a comparison result;
and determining the height of the sub-images of the message group according to the comparison result.
4. The method for classifying an encrypted packet data stream according to claim 3, wherein determining the height of the packet sub-image according to the comparison result comprises:
if the comparison result indicates that the encrypted message data packet with the message length shorter than the reference length does not exist in the message group, the reference length is taken as the height of the sub-image of the message group;
and if the comparison result represents that the encrypted message data packet with the message length shorter than the reference length exists in the message group, taking the message length of the corresponding encrypted message data packet as the height of the sub-image of the message group.
5. A method of classifying an encrypted message data stream according to claim 3, wherein the method further comprises:
acquiring a plurality of message groups corresponding to each message segment after screening the encrypted message data packet;
and determining the color of the sub-images of the message group according to the encrypted message data packet contained in each message group.
6. The method for classifying an encrypted message data stream according to claim 2, wherein the message direction includes a transmission direction and a reception direction; the distinguishing the message direction of each encrypted message data packet in each message Wen Pianduan, and dividing the encrypted message data packets with the same message direction into the same message group, to obtain a plurality of message groups, includes:
dividing the message direction of each encrypted message data packet into a sending direction or a receiving direction in each message Wen Pianduan;
dividing the encrypted message data packets corresponding to the continuous sending directions or the encrypted message data packets corresponding to the continuous receiving directions according to the time sequence to obtain a plurality of message groups which are arranged according to the time sequence.
7. The method for classifying an encrypted message data stream according to claim 1, wherein the extracting image features of each of the segment images includes:
loading a pre-trained convolutional neural network;
and sequentially inputting the fragment images into the convolutional neural network according to a time sequence to perform feature extraction, so as to obtain image features corresponding to the fragment images.
8. The method for classifying encrypted message data streams according to claim 1, wherein the step of sequentially inputting the historical classification result of the image feature at the previous time and the image feature at the current time into a preset message classification network to classify the message behaviors to obtain the classification result of the message behaviors of the encrypted message data streams includes:
acquiring a history classification result obtained after the image characteristic at the previous moment is input to the message classification network;
and sequentially inputting the image characteristics at each current moment and the corresponding historical classification results into a preset message classification network to classify the message behaviors until the last image characteristic of the encrypted message data stream is input into the message classification network, so as to obtain the classification results of the message behaviors of the encrypted message data stream.
9. The method for classifying encrypted message data streams according to claim 8, wherein the sequentially inputting the image feature and the corresponding historical classification result at each current time into a preset message classification network to classify the message behaviors until the last image feature of the encrypted message data stream is input into the message classification network, to obtain the classification result of the message behaviors of the encrypted message data stream, includes:
sequentially inputting the image characteristics at each current moment into the message classification network, and determining first classification information to be reserved from the image characteristics through an input gate of the message classification network;
inputting the first classification information and the historical classification result to a forgetting gate of the message classification network, and determining second classification information to be reserved from the first classification information and the historical classification result;
weighting the historical classification result and the second classification information to obtain weighted classification information;
the weighted classification information is input to an output gate for screening, and a current classification result of the encrypted message data stream message behavior at the current moment is obtained;
And continuing to input the image features of the encrypted message data stream which are not classified into the message classification network until the last image feature of the encrypted message data stream is input into the message classification network, and outputting a classification result of the message behavior of the encrypted message data stream.
10. The method for classifying encrypted message data streams according to claim 9, characterized in that the message classification network is trained by the following steps:
acquiring a plurality of encrypted message data streams, and forming a training data set according to the encrypted message data streams;
preprocessing each encrypted message data stream in the training data set to obtain a plurality of message fragments and fragment images corresponding to the message fragments;
extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a first classification result of the message behaviors of the encrypted message data stream;
and calculating a loss value of the first classification result according to a preset loss function, and carrying out parameter adjustment on the message classification network according to the loss value to obtain a trained message classification network.
11. An encrypted message data stream classification system, the system comprising:
the encrypted message data stream acquisition module is used for acquiring an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets;
the message group dividing module is used for dividing the encrypted message data stream into a plurality of message fragments according to the time sequence and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent;
a segment image generating module, configured to determine, for each message Wen Pianduan, in each message group corresponding to the message segment, a height of a sub-image of the message group according to a message length of the encrypted message data packet included in each message group, determine a width of the sub-image of the message group according to a number of messages, and determine a direction of the sub-image of the message group according to the direction of the message, and generate a segment image according to a plurality of sub-images of the message group;
the classification result acquisition module is used for extracting the image characteristics of each fragment image, and sequentially inputting the historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify the message behaviors, so as to obtain the classification result of the message behaviors of the encrypted message data stream.
12. An electronic device comprising a memory storing a computer program and a processor implementing the method of classifying an encrypted message data stream according to any one of claims 1 to 10 when the computer program is executed by the processor.
13. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the encrypted message data stream classification method of any one of claims 1 to 10.
CN202311362322.6A 2023-10-20 2023-10-20 Method, system, electronic equipment and storage medium for classifying encrypted message data streams Active CN117097577B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311362322.6A CN117097577B (en) 2023-10-20 2023-10-20 Method, system, electronic equipment and storage medium for classifying encrypted message data streams

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311362322.6A CN117097577B (en) 2023-10-20 2023-10-20 Method, system, electronic equipment and storage medium for classifying encrypted message data streams

Publications (2)

Publication Number Publication Date
CN117097577A CN117097577A (en) 2023-11-21
CN117097577B true CN117097577B (en) 2024-01-09

Family

ID=88783370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311362322.6A Active CN117097577B (en) 2023-10-20 2023-10-20 Method, system, electronic equipment and storage medium for classifying encrypted message data streams

Country Status (1)

Country Link
CN (1) CN117097577B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905328A (en) * 2017-12-08 2019-06-18 华为技术有限公司 The recognition methods of data flow and device
CN113452810A (en) * 2021-07-08 2021-09-28 恒安嘉新(北京)科技股份公司 Traffic classification method, device, equipment and medium
CN115314240A (en) * 2022-06-22 2022-11-08 国家计算机网络与信息安全管理中心 Data processing method for encryption abnormal flow identification
CN115603980A (en) * 2022-09-30 2023-01-13 山石网科通信技术股份有限公司(Cn) Data packet aggregation method and device and electronic equipment
CN116074087A (en) * 2023-01-17 2023-05-05 哈尔滨工业大学 Encryption traffic classification method based on network traffic context characterization, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10305928B2 (en) * 2015-05-26 2019-05-28 Cisco Technology, Inc. Detection of malware and malicious applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905328A (en) * 2017-12-08 2019-06-18 华为技术有限公司 The recognition methods of data flow and device
CN113452810A (en) * 2021-07-08 2021-09-28 恒安嘉新(北京)科技股份公司 Traffic classification method, device, equipment and medium
CN115314240A (en) * 2022-06-22 2022-11-08 国家计算机网络与信息安全管理中心 Data processing method for encryption abnormal flow identification
CN115603980A (en) * 2022-09-30 2023-01-13 山石网科通信技术股份有限公司(Cn) Data packet aggregation method and device and electronic equipment
CN116074087A (en) * 2023-01-17 2023-05-05 哈尔滨工业大学 Encryption traffic classification method based on network traffic context characterization, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GBDT与LR融合模型在加密流量识别中的应用;王垚等;计算机与现代化(03);第93-97页 *
基于离散序列报文的轮廓格式特征提取方法;李阳等;信息工程大学学报;19(02);第10-15页 *

Also Published As

Publication number Publication date
CN117097577A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN107864168B (en) Method and system for classifying network data streams
CN111211980B (en) Transmission link management method, transmission link management device, electronic equipment and storage medium
CN107547300B (en) Network quality detection method and device
CN112949702B (en) Network malicious encryption traffic identification method and system
US20220414264A1 (en) Privacy transformations in data analytics
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN112769633B (en) Proxy traffic detection method and device, electronic equipment and readable storage medium
CN107770132A (en) A kind of method and device detected to algorithm generation domain name
CN113765846B (en) Intelligent detection and response method and device for network abnormal behaviors and electronic equipment
CN111586075B (en) Hidden channel detection method based on multi-scale stream analysis technology
CN117097577B (en) Method, system, electronic equipment and storage medium for classifying encrypted message data streams
CN106576072B (en) Information processing unit and information processing method
CN110213292B (en) Data sending method and device and data receiving method and device
CN109327404B (en) P2P prediction method and system based on naive Bayes classification algorithm, server and medium
CN112671662A (en) Data stream acceleration method, electronic device, and storage medium
JP6943105B2 (en) Information processing systems, information processing devices, and programs
CN108989244B (en) Data processing method, data processing device, storage medium and electronic equipment
EP3629577A1 (en) Data transmission method, camera and electronic device
CN112738808B (en) DDoS attack detection method in wireless network, cloud server and mobile terminal
CN112468285B (en) Data processing method and device based on privacy protection and server
CN112615713A (en) Detection method and device of hidden channel, readable storage medium and electronic equipment
CN112671670A (en) VR video service identification method and device, intelligent terminal and storage medium
CN114095364B (en) Network congestion control method and device
EP2854025A2 (en) Information processing apparatus, information processing method, and information processing program
CN115150165B (en) Flow identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant