CN117097577B

CN117097577B - Method, system, electronic equipment and storage medium for classifying encrypted message data streams

Info

Publication number: CN117097577B
Application number: CN202311362322.6A
Authority: CN
Inventors: 马增协; 胡宁; 韩伟红; 贾焰; 程运财; 梁都成
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2024-01-09
Anticipated expiration: 2043-10-20
Also published as: CN117097577A

Abstract

The embodiment of the application provides an encryption message data stream classification method, an encryption message data stream classification system, electronic equipment and a storage medium. The method comprises the following steps: obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets; dividing the encrypted message data stream into a plurality of message fragments according to time sequence, dividing the message fragments into a plurality of message groups according to message directions, wherein the message directions of encrypted message data packets in the message groups are consistent; in each message group, determining the height of a message group sub-image according to the message length of the encrypted message data packet, determining the width and the message direction of the message group sub-image according to the message quantity, determining the direction of the message group sub-image, and generating a fragment image according to a plurality of message group sub-images; extracting image characteristics of the fragment images, inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a message classification network to classify the message behaviors, and obtaining a classification result of the message behaviors.

Description

Encrypted message data flow classification method, system, electronic equipment and storage medium

技术领域Technical field

本申请涉及数据流分类技术领域，尤其涉及一种加密报文数据流分类方法、系统、电子设备及存储介质。The present application relates to the technical field of data flow classification, and in particular to an encrypted message data flow classification method, system, electronic equipment and storage medium.

背景技术Background technique

虚拟专用网络（Virtual Private Network，VPN）流量识别是指对经过VPN传输的数据流进行识别和分类的过程。一般来说，VPN流量识别可以通过分析数据包的头部信息、协议类型、端口号等来实现。VPN流量识别可以帮助网络管理员监控和管理VPN使用情况，以确保网络的安全性和性能。在经过VPN进行加密报文传输的过程中，可以根据加密报文的报文长度和报文数量等进行初步分析和识别，以确定加密报文的类型和特征。Virtual Private Network (VPN) traffic identification refers to the process of identifying and classifying data flows transmitted through VPN. Generally speaking, VPN traffic identification can be achieved by analyzing the header information, protocol type, port number, etc. of the data packet. VPN traffic identification can help network administrators monitor and manage VPN usage to ensure network security and performance. During the transmission of encrypted messages through VPN, preliminary analysis and identification can be performed based on the length and number of encrypted messages to determine the type and characteristics of the encrypted messages.

相关技术中，通常采用安全加密隧道实现方式进行加密报文数据流的传输，使加密后的报文无法直观地统计出明文的目的地址、源地址、端口号等五元组信息，从而无法直接从加密报文中区分不同的流或会话，保护整个原始的互联网协议（Internet Protocol，IP）数据包。但是，这种加密方式也存在缺点，即只利用了数据流的统计信息进行分类，如仅仅对报文长度和报文数量进行分析，而忽略了报文的信息传递路径以及在短时间内加密报文的上下文关联信息，因而难以直接从加密报文中区分固定的数据包格式，从而降低对加密报文数据流进行分类的效率，也导致了最终得到的加密报文数据流的分类结果并不准确。In related technologies, secure encrypted tunnel implementation is usually used to transmit encrypted message data streams, so that the encrypted message cannot intuitively count the five-tuple information such as the plaintext destination address, source address, port number, etc., making it impossible to directly Distinguish different flows or sessions from encrypted messages and protect the entire original Internet Protocol (IP) data packet. However, this encryption method also has shortcomings, that is, it only uses the statistical information of the data flow for classification, such as only analyzing the length and number of messages, while ignoring the information transmission path of the message and the encryption within a short period of time. Therefore, it is difficult to directly distinguish the fixed data packet format from the encrypted message, thereby reducing the efficiency of classifying the encrypted message data flow, and also leads to the final classification result of the encrypted message data flow. Inaccurate.

发明内容Contents of the invention

本申请实施例的主要目的在于提出一种加密报文数据流分类方法、系统、电子设备及存储介质，能够解决加密报文数据流分类效率低以及分类结果不准确的问题。The main purpose of the embodiments of this application is to propose a method, system, electronic device and storage medium for classifying encrypted message data flows, which can solve the problems of low efficiency in classifying encrypted message data flows and inaccurate classification results.

为实现上述目的，本申请实施例的第一方面提出了一种加密报文数据流分类方法，所述方法包括：获取加密报文数据流，其中，所述加密报文数据流包括连续的多个加密报文数据包；按照时间的先后顺序，将所述加密报文数据流划分成多个报文片段，并按照所述加密报文数据包的报文方向将每个所述报文片段划分为多个报文组；其中，每个所述报文组内的所述加密报文数据包的所述报文方向一致；针对每个所述报文片段，在所述报文片段对应的每个所述报文组中，根据每个所述报文组所包含的所述加密报文数据包的报文长度确定报文组子图像的高、报文数量确定所述报文组子图像的宽以及所述报文方向确定所述报文组子图像的方向，并根据多个所述报文组子图像生成片段图像；提取各个所述片段图像的图像特征，并依次将前一时刻所述图像特征的历史分类结果与当前时刻的所述图像特征输入到预设的报文分类网络中进行报文行为分类，得到所述加密报文数据流报文行为的分类结果。In order to achieve the above purpose, the first aspect of the embodiment of the present application proposes a method for classifying encrypted message data streams. The method includes: obtaining an encrypted message data stream, wherein the encrypted message data stream includes multiple consecutive an encrypted message data packet; divide the encrypted message data stream into multiple message fragments according to the order of time, and divide each of the message fragments according to the message direction of the encrypted message data packet. Divided into multiple message groups; wherein the message directions of the encrypted message data packets in each of the message groups are consistent; for each of the message fragments, the message fragment corresponds to In each of the message groups, the height of the message group sub-image and the number of messages are determined according to the message length of the encrypted message data packet contained in each of the message groups. The width of the sub-image and the message direction determine the direction of the message group sub-image, and a segment image is generated based on a plurality of the message group sub-images; the image features of each of the segment images are extracted, and the previous ones are sequentially The historical classification results of the image features at a moment and the image features at the current moment are input into a preset message classification network to classify message behaviors, and a classification result of the message behavior of the encrypted message data stream is obtained.

根据本申请的一些实施例，所述按照时间的先后顺序，将所述加密报文数据流划分成多个报文片段，并按照所述加密报文数据包的报文方向将每个所述报文片段划分为多个报文组，包括：在每个所述加密报文数据流中，将所述加密报文数据包按照时间顺序进行排列，并根据预设划分数量对所述加密报文数据包进行划分，得到多个报文片段；在每个所述报文片段内，对每个所述加密报文数据包的报文方向进行区分，并将所述报文方向相同的所述加密报文数据包划分为同一个报文组，得到多个报文组。According to some embodiments of the present application, the encrypted message data stream is divided into multiple message fragments in chronological order, and each of the encrypted message data packets is divided according to the message direction of the encrypted message data packet. Dividing the message fragments into multiple message groups includes: in each encrypted message data stream, arranging the encrypted message data packets in chronological order, and dividing the encrypted message packets according to a preset division number. Divide the message data packet to obtain multiple message fragments; within each of the message fragments, distinguish the message direction of each of the encrypted message data packets, and divide all messages with the same message direction into The encrypted message data packets are divided into the same message group to obtain multiple message groups.

根据本申请的一些实施例，所述确定报文组子图像的高，包括：在每个所述报文组中，根据预设排除比例对所述报文组内的所述加密报文数据包进行筛选；根据筛选后各个所述加密报文数据包的报文长度进行相加后除以所述报文组内的所述加密报文数据包的报文数量，得到平均报文长度；根据所述平均报文长度乘以预设提取比例后得到的参照长度，与所述报文组内的每个所述加密报文数据包的所述报文长度进行比较，得到比较结果；根据所述比较结果确定报文组子图像的高。According to some embodiments of the present application, determining the height of a message group sub-image includes: in each of the message groups, classifying the encrypted message data in the message group according to a preset exclusion ratio. Filter the packets; add the message lengths of each of the encrypted message data packets after screening and divide by the number of the encrypted message data packets in the message group to obtain the average message length; According to the reference length obtained by multiplying the average message length by the preset extraction ratio, the reference length is compared with the message length of each encrypted message data packet in the message group to obtain a comparison result; according to The comparison result determines the height of the message group sub-image.

根据本申请的一些实施例，所述根据所述比较结果确定报文组子图像的高，包括：若所述比较结果表征所述报文组内不存在所述报文长度短于所述参照长度的所述加密报文数据包，则将所述参照长度作为所述报文组子图像的高；若所述比较结果表征所述报文组内存在所述报文长度短于所述参照长度的所述加密报文数据包，将对应的所述加密报文数据包的所述报文长度作为所述报文组子图像的高。According to some embodiments of the present application, determining the height of a message group sub-image based on the comparison result includes: if the comparison result indicates that there is no message in the message group with a length shorter than the reference length of the encrypted message data packet, then the reference length is used as the height of the message group sub-image; if the comparison result indicates that the message length in the message group is shorter than the reference The length of the encrypted message data packet is determined by taking the length of the corresponding encrypted message data packet as the height of the message group sub-image.

根据本申请的一些实施例，所述方法还包括：获取每个所述报文片段对应的筛选所述加密报文数据包后的多个报文组；根据每个所述报文组所包含的所述加密报文数据包确定报文组子图像的颜色。According to some embodiments of the present application, the method further includes: obtaining multiple message groups corresponding to each of the message fragments after filtering the encrypted message data packets; The encrypted message packet determines the color of the message group sub-image.

根据本申请的一些实施例，所述报文方向包括发送方向和接收方向；所述在每个所述报文片段内，对每个所述加密报文数据包的报文方向进行区分，并将所述报文方向相同的所述加密报文数据包划分为同一个报文组，得到多个报文组，包括：在每个所述报文片段内，将每个所述加密报文数据包的报文方向区分为发送方向或者接收方向；按照时间顺序将连续的所述发送方向对应的所述加密报文数据包或者连续的所述接收方向对应的所述加密报文数据包进行划分，得到按照时间顺序排列的多个报文组。According to some embodiments of the present application, the message direction includes a sending direction and a receiving direction; in each of the message fragments, the message direction of each of the encrypted message data packets is distinguished, and Dividing the encrypted message data packets with the same message direction into the same message group to obtain multiple message groups includes: in each of the message fragments, dividing each encrypted message into The message direction of the data packet is divided into a sending direction or a receiving direction; the encrypted message data packets corresponding to the continuous sending direction or the encrypted message data packets corresponding to the continuous receiving direction are processed in chronological order. Divide to obtain multiple message groups arranged in chronological order.

根据本申请的一些实施例，所述提取各个所述片段图像的图像特征，包括：加载预先训练好的卷积神经网络；将各个所述片段图像按照时间顺序依次输入至所述卷积神经网络进行特征提取，得到所述片段图像对应的图像特征。According to some embodiments of the present application, extracting image features of each segment image includes: loading a pre-trained convolutional neural network; inputting each segment image into the convolutional neural network in chronological order. Feature extraction is performed to obtain the image features corresponding to the fragment image.

根据本申请的一些实施例，所述依次将前一时刻所述图像特征的历史分类结果与当前时刻的所述图像特征输入到预设的报文分类网络中进行报文行为分类，得到所述加密报文数据流报文行为的分类结果，包括：获取前一时刻的所述图像特征输入至所述报文分类网络之后得到的历史分类结果；依次将每个当前时刻的所述图像特征与对应的所述历史分类结果输入至预设的报文分类网络中进行报文行为分类，直至向所述报文分类网络输入所述加密报文数据流的最后一个所述图像特征，得到所述加密报文数据流报文行为的分类结果。According to some embodiments of the present application, the historical classification results of the image features at the previous moment and the image features at the current moment are sequentially input into a preset message classification network to classify message behaviors, and the above-mentioned The classification results of the message behavior of the encrypted message data stream include: obtaining the historical classification results obtained after the image features of the previous moment are input to the message classification network; and sequentially comparing the image features of each current moment with The corresponding historical classification results are input into a preset message classification network for classifying message behavior until the last image feature of the encrypted message data stream is input to the message classification network to obtain the Classification results of encrypted message data flow message behavior.

根据本申请的一些实施例，所述依次将每个当前时刻的所述图像特征与对应的所述历史分类结果输入至预设的报文分类网络中进行报文行为分类，直至向所述报文分类网络输入所述加密报文数据流的最后一个所述图像特征，得到所述加密报文数据流报文行为的分类结果，包括：依次将每个当前时刻的所述图像特征输入至所述报文分类网络，通过所述报文分类网络的输入门从所述图像特征中确定要保留的第一分类信息；将所述第一分类信息和所述历史分类结果输入至所述报文分类网络的遗忘门，并从所述第一分类信息和所述历史分类结果中确定需要保留的第二分类信息；对所述历史分类结果与所述第二分类信息进行加权，得到加权分类信息；将所述加权分类信息输入至输出门进行筛选，得到当前时刻所述加密报文数据流报文行为的当前分类结果；继续向所述报文分类网络输入所述加密报文数据流未分类的所述图像特征，直至向所述报文分类网络输入所述加密报文数据流的最后一个所述图像特征，输出所述加密报文数据流报文行为的分类结果。According to some embodiments of the present application, the image features at each current moment and the corresponding historical classification results are sequentially input into a preset message classification network to classify message behavior until the message is sent to the message classification network. The text classification network inputs the last image feature of the encrypted message data stream to obtain a classification result of the message behavior of the encrypted message data stream, including: inputting the image features of each current moment into the The message classification network determines the first classification information to be retained from the image features through the input gate of the message classification network; inputs the first classification information and the historical classification results to the message The forgetting gate of the classification network, and determine the second classification information that needs to be retained from the first classification information and the historical classification results; weight the historical classification results and the second classification information to obtain weighted classification information ; Input the weighted classification information to the output gate for screening to obtain the current classification result of the encrypted message data flow message behavior at the current moment; continue to input the unclassified encrypted message data flow to the message classification network The image features until the last image feature of the encrypted message data stream is input to the message classification network, and the classification result of the message behavior of the encrypted message data stream is output.

根据本申请的一些实施例，所述报文分类网络通过以下步骤训练得到：获取多个加密报文数据流，并根据多个所述加密报文数据流组成训练数据集；对所述训练数据集中的每个所述加密报文数据流进行预处理，得到多个报文片段以及所述报文片段对应的片段图像；提取各个所述片段图像的图像特征，并依次将前一时刻所述图像特征的历史分类结果与当前时刻的所述图像特征输入到预设的报文分类网络中进行报文行为分类，得到所述加密报文数据流报文行为的第一分类结果；根据预设的损失函数计算所述第一分类结果的损失值，并根据所述损失值对所述报文分类网络进行参数调整，得到训练好的报文分类网络。According to some embodiments of the present application, the message classification network is trained through the following steps: obtaining multiple encrypted message data streams, and forming a training data set based on the multiple encrypted message data streams; Each of the encrypted message data streams in the collection is preprocessed to obtain multiple message fragments and fragment images corresponding to the message fragments; image features of each of the fragment images are extracted, and the information described at the previous moment is sequentially The historical classification results of image features and the image features at the current moment are input into the preset message classification network to classify the message behavior, and the first classification result of the message behavior of the encrypted message data stream is obtained; according to the preset The loss function calculates the loss value of the first classification result, and adjusts parameters of the message classification network according to the loss value to obtain a trained message classification network.

为实现上述目的，本申请实施例的第二方面提出了一种加密报文数据流分类系统，所述系统包括：加密报文数据流获取模块，用于获取加密报文数据流，其中，所述加密报文数据流包括连续的多个加密报文数据包；报文组划分模块，用于按照时间的先后顺序，将所述加密报文数据流划分成多个报文片段，并按照所述加密报文数据包的报文方向将每个所述报文片段划分为多个报文组；其中，每个所述报文组内的所述加密报文数据包的所述报文方向一致；片段图像生成模块，用于针对每个所述报文片段，在所述报文片段对应的每个所述报文组中，根据每个所述报文组所包含的所述加密报文数据包的报文长度确定报文组子图像的高、报文数量确定所述报文组子图像的宽以及所述报文方向确定所述报文组子图像的方向，并根据多个所述报文组子图像生成片段图像；分类结果获取模块，用于提取各个所述片段图像的图像特征，并依次将前一时刻所述图像特征的历史分类结果与当前时刻的所述图像特征输入到预设的报文分类网络中进行报文行为分类，得到所述加密报文数据流报文行为的分类结果。In order to achieve the above purpose, the second aspect of the embodiment of the present application proposes an encrypted message data flow classification system. The system includes: an encrypted message data flow acquisition module, used to obtain the encrypted message data flow, wherein, The encrypted message data stream includes multiple consecutive encrypted message data packets; the message group dividing module is used to divide the encrypted message data stream into multiple message fragments according to the order of time, and divide the encrypted message data stream into multiple message fragments according to the required order. The message direction of the encrypted message data packet divides each of the message fragments into multiple message groups; wherein, the message direction of the encrypted message data packet in each of the message groups Consistent; a fragment image generation module, configured for each of the message fragments, in each of the message groups corresponding to the message fragments, according to the encrypted message contained in each of the message groups. The message length of the message data packet determines the height of the message group sub-image, the number of messages determines the width of the message group sub-image, and the message direction determines the direction of the message group sub-image, and based on multiple The message group sub-image generates a fragment image; a classification result acquisition module is used to extract the image features of each of the fragment images, and sequentially compare the historical classification results of the image features at the previous moment with the image features at the current moment It is input into a preset message classification network to classify the message behavior, and obtains the classification result of the message behavior of the encrypted message data flow.

为实现上述目的，本申请实施例的第三方面提出了一种电子设备，所述电子设备包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现本申请第一方面实施例任一项所述的加密报文数据流分类方法。In order to achieve the above object, a third aspect of the embodiment of the present application proposes an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the present invention is implemented. Apply for the encrypted message data flow classification method described in any one of the embodiments of the first aspect.

为实现上述目的，本申请实施例的第四方面提出了一种计算机可读存储介质，所述存储介质存储有计算机程序，所述计算机程序被处理器执行时实现本申请第一方面实施例任一项所述的加密报文数据流分类方法。In order to achieve the above object, the fourth aspect of the embodiment of the present application proposes a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by a processor, any of the embodiments of the first aspect of the present application is implemented. The encrypted message data flow classification method described in one item.

本申请提出的加密报文数据流分类方法、系统、电子设备及存储介质，通过将加密报文数据流按照时间顺序划分为多个报文片段，并将每个报文片段划分为多个报文组，可以根据每个报文组的报文长度、报文数量和报文方向生成片段图像，从而将原本复杂的加密报文数据流转化为可视化的片段图像，同时更清晰地展示了加密过程中的信息传递路径，以便于对图像进行分类。之后，将片段图像进行特征提取，得到图像特征。并将前一时刻图像特征的历史分类结果与当前时刻的图像特征一同输入到报文分类网络中，使得报文分类网络在对图像特征进行分类时，能够充分考虑加密报文数据流的特征的上下文关联关系，从而使得报文分类网络能够结合上下文的关联关系识别出固定的数据包格式，提高对加密报文数据流进行分类的效率，以及得到的加密报文数据流报文行为的分类结果的准确性。The encrypted message data stream classification method, system, electronic device and storage medium proposed by this application divide the encrypted message data stream into multiple message fragments in chronological order, and divide each message fragment into multiple message fragments. Message groups can generate fragment images based on the message length, number of messages, and message direction of each message group, thereby converting the originally complex encrypted message data stream into a visual fragment image, and at the same time displaying the encryption more clearly. Information transfer path in the process to facilitate classification of images. After that, feature extraction is performed on the fragment image to obtain image features. The historical classification results of image features at the previous moment are input into the message classification network together with the image features at the current moment, so that the message classification network can fully consider the characteristics of the encrypted message data stream when classifying image features. The context association relationship enables the message classification network to identify fixed data packet formats based on the context association relationship, improve the efficiency of classifying encrypted message data flow, and obtain the classification results of the encrypted message data flow message behavior. accuracy.

附图说明Description of the drawings

图1是本申请实施例提供的加密报文数据流分类系统的结构示意图；Figure 1 is a schematic structural diagram of an encrypted message data flow classification system provided by an embodiment of the present application;

图2是本申请实施例提供的加密报文数据流分类方法的流程图；Figure 2 is a flow chart of an encrypted message data flow classification method provided by an embodiment of the present application;

图3是本申请实施例提供的用户登录行为对应的片段图像；Figure 3 is a fragment image corresponding to the user login behavior provided by the embodiment of the present application;

图4是本申请实施例提供的用户登出行为对应的片段图像；Figure 4 is a fragment image corresponding to the user's logout behavior provided by the embodiment of the present application;

图5是本申请实施例提供的用户的语音请求对应的片段图像；Figure 5 is a fragment image corresponding to the user's voice request provided by the embodiment of the present application;

图6是本申请实施例提供的用户的视频请求对应的片段图像；Figure 6 is a fragment image corresponding to the user's video request provided by the embodiment of the present application;

图7是图2中的步骤S102的流程图；Figure 7 is a flow chart of step S102 in Figure 2;

图8是本申请实施例提供的确定报文子图像的高的流程图；Figure 8 is a flow chart for determining the height of a message sub-image provided by an embodiment of the present application;

图9是图8中的步骤S304的流程图；Figure 9 is a flow chart of step S304 in Figure 8;

图10是本申请实施例提供的加密报文数据流分类方法的另一个流程图；Figure 10 is another flow chart of the encrypted message data flow classification method provided by the embodiment of the present application;

图11是图8中的步骤S202的流程图；Figure 11 is a flow chart of step S202 in Figure 8;

图12是本申请实施例提供的提取各个片段图像的图像特征的流程图；Figure 12 is a flow chart for extracting image features of each segment image provided by an embodiment of the present application;

图13是图2中的步骤S104的流程图；Figure 13 is a flow chart of step S104 in Figure 2;

图14是图13中的步骤S802的流程图；Figure 14 is a flow chart of step S802 in Figure 13;

图15是本申请实施例提供的报文分类网络的训练流程图；Figure 15 is a training flow chart of the message classification network provided by the embodiment of the present application;

图16是本申请实施例提供的加密报文数据流分类方法的又一个流程图；Figure 16 is another flow chart of the encrypted message data flow classification method provided by the embodiment of the present application;

图17是本申请实施例提供的加密报文数据流分类系统的功能模块示意图；Figure 17 is a schematic diagram of the functional modules of the encrypted message data flow classification system provided by the embodiment of the present application;

图18是本申请实施例提供的电子设备的硬件结构示意图。Figure 18 is a schematic diagram of the hardware structure of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

需要说明的是，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the device schematic diagram and the logical sequence is shown in the flow chart, in some cases, the modules can be divided into different modules in the device or the order in the flow chart can be executed. The steps shown or described. The terms "first", "second", etc. in the description, claims, and above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific sequence or sequence.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的，不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application and are not intended to limit the present application.

WireGuard是一种现代化的虚拟专用网络（Virtual Private Network，VPN）协议，设计简单且高效，旨在提供安全可靠的网络连接。WireGuard使用加密技术来保护通信数据的隐私和安全性，并且具有较低的延迟和更快的传输速度。WireGuard is a modern Virtual Private Network (VPN) protocol designed to be simple and efficient, aiming to provide secure and reliable network connections. WireGuard uses encryption technology to protect the privacy and security of communication data, and has lower latency and faster transfer speeds.

然而，在使用WireGuard或者其他加密方式对通信数据进行加密之后，加密后的报文无法直观统计出明文的目的地址、源地址、端口号等五元组信息，因此无法直接区分不同的流或会话。相关技术中，一般是直接根据数据流的统计信息提取流量特征，再对提取的流量特征进行直接分类，在这个过程中，忽略了短时间内加密报文数据包上下文的关联信息，而加密报文数据包上下文的关联关系往往有助于识别固定的数据包格式，快速实现对加密报文数据流的识别，因此，会导致对加密报文数据包的分类效率低，影响最终得到的加密报文数据流报文行为的分类结果的准确性。However, after using WireGuard or other encryption methods to encrypt communication data, the encrypted message cannot intuitively count the five-tuple information such as the plaintext destination address, source address, port number, etc., so it is impossible to directly distinguish different flows or sessions. . In related technologies, traffic features are generally extracted directly based on the statistical information of the data flow, and then the extracted traffic features are directly classified. In this process, the associated information of the context of the encrypted packets in a short period of time is ignored, and the encrypted packets are The correlation between the text packet context often helps to identify fixed packet formats and quickly identify the encrypted message data flow. Therefore, it will lead to low efficiency in classifying encrypted message packets and affect the final encrypted packet. The accuracy of the classification results of text data flow packet behavior.

基于此，本申请实施例提供了一种加密报文数据流分类方法、系统、电子设备及存储介质，能够充分考虑加密报文数据包的传递方向以及加密报文数据流的特征的上下文关联关系，识别出固定的数据包格式，提高对加密报文数据流进行分类的效率，以及得到的加密报文数据流报文行为的分类结果的准确性。Based on this, embodiments of the present application provide a method, system, electronic device and storage medium for classifying encrypted message data streams, which can fully consider the transmission direction of encrypted message data packets and the contextual relationship of the characteristics of encrypted message data streams. , identify the fixed data packet format, improve the efficiency of classifying the encrypted message data flow, and obtain the accuracy of the classification results of the encrypted message data flow message behavior.

本申请实施例提供的加密报文数据流分类方法、系统、电子设备及存储介质，具体通过如下实施例进行说明，首先描述本申请实施例中的加密报文数据流分类系统。The encrypted message data flow classification method, system, electronic device and storage medium provided by the embodiments of the present application are specifically described through the following embodiments. First, the encrypted message data flow classification system in the embodiment of the present application is described.

请参照图1，在一些实施例中，加密报文数据流分类系统包括发送端101、特征提取网络102、报文分类网络103、接收端104以及控制器105。Please refer to Figure 1. In some embodiments, an encrypted message data flow classification system includes a sending end 101, a feature extraction network 102, a message classification network 103, a receiving end 104 and a controller 105.

示例性地，控制器105可以是系统的神经中枢和指挥中心。控制器105可以根据指令操作码和时序信号，产生操作控制信号，完成取指令和执行指令的控制。例如，控制器105可以产生操作控制信号，从发送端101和接收端104获取加密报文数据流，并根据加密报文数据流生成片段图像，之后控制特征提取网络102对片段图像进行特征提取，再控制报文分类网络103对提取到的特征进行分类，得到加密报文数据流的分类结果。By way of example, the controller 105 may be the nerve center and command center of the system. The controller 105 can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions. For example, the controller 105 can generate an operation control signal, obtain the encrypted message data stream from the sending end 101 and the receiving end 104, and generate a fragment image based on the encrypted message data stream, and then control the feature extraction network 102 to perform feature extraction on the fragment image, The message classification network 103 is then controlled to classify the extracted features to obtain a classification result of the encrypted message data flow.

本申请实施例中的加密报文数据流分类方法可以通过如下实施例进行说明。The encrypted message data flow classification method in the embodiment of the present application can be explained through the following embodiment.

需要说明的是，在本申请的各个具体实施方式中，当涉及到需要根据用户信息、用户行为数据，用户历史数据以及用户位置信息等与用户身份或特性相关的数据进行相关处理时，都会先获得用户的许可或者同意。此外，当本申请实施例需要获取用户的敏感个人信息时，会通过弹窗或者跳转到确认页面等方式获得用户的单独许可或者单独同意，在明确获得用户的单独许可或者单独同意之后，再获取用于使本申请实施例能够正常运行的必要的用户相关数据。It should be noted that in each specific implementation of the present application, when it comes to relevant processing based on user information, user behavior data, user historical data, user location information and other data related to user identity or characteristics, the first step is to perform relevant processing. Obtain the user's permission or consent. In addition, when the embodiment of this application needs to obtain the user's sensitive personal information, it will obtain the user's separate permission or separate consent through a pop-up window or jump to a confirmation page. After clearly obtaining the user's separate permission or separate consent, it will then Obtain necessary user-related data for normal operation of the embodiment of the present application.

图2是本申请实施例提供的加密报文数据流分类方法的一个可选的流程图，图2中的方法可以包括步骤S101至步骤S104。FIG. 2 is an optional flow chart of the encrypted message data flow classification method provided by the embodiment of the present application. The method in FIG. 2 may include steps S101 to S104.

步骤S101，获取加密报文数据流，其中，加密报文数据流包括连续的多个加密报文数据包。Step S101: Obtain an encrypted message data stream, where the encrypted message data stream includes multiple consecutive encrypted message data packets.

可以理解的是，加密报文数据流是指在网络传输过程中，经过加密处理的数据流，每个加密报文数据流都是连续的一段经过相同加密算法处理的数据，每个加密报文数据流包括连续的多个加密报文数据包。一般来说，可以通过加密算法将原始数据转换为加密数据进行网络传输，以保护数据的安全性。传输加密报文数据流的方式有多种，其中一种是WireGuard VPN。WireGuard可用于在公共网络上创建安全的加密隧道。WireGuard采用了先进的加密算法，如Curve25519算法、ChaCha20算法和Poly1305算法，以提供高强度的数据保护。It can be understood that the encrypted message data stream refers to the encrypted data stream during network transmission. Each encrypted message data stream is a continuous piece of data processed by the same encryption algorithm. Each encrypted message The data stream consists of multiple consecutive encrypted message packets. Generally speaking, the original data can be converted into encrypted data for network transmission through encryption algorithms to protect the security of the data. There are many ways to transmit encrypted message data streams, one of which is WireGuard VPN. WireGuard can be used to create secure, encrypted tunnels over public networks. WireGuard uses advanced encryption algorithms, such as Curve25519 algorithm, ChaCha20 algorithm and Poly1305 algorithm, to provide high-strength data protection.

步骤S102，按照时间的先后顺序，将加密报文数据流划分成多个报文片段，并按照加密报文数据包的报文方向将每个报文片段划分为多个报文组；其中，每个报文组内的加密报文数据包的报文方向一致。Step S102: Divide the encrypted message data flow into multiple message fragments in time order, and divide each message fragment into multiple message groups according to the message direction of the encrypted message data packet; wherein, The encrypted message packets in each message group have the same message direction.

在一些实施例中，由于加密报文数据流可能包括多个动作，例如用户登录、语音请求、视频请求、聊天、退出登录等等，因此，为了对不同的加密报文数据流进行识别，需要对加密报文数据流划分为多个报文片段后进行分类，得到多个报文片段。In some embodiments, since the encrypted message data stream may include multiple actions, such as user login, voice request, video request, chat, logout, etc., therefore, in order to identify different encrypted message data streams, it is necessary to Divide the encrypted message data stream into multiple message fragments and then classify them to obtain multiple message fragments.

在一些实施例中，可以根据传输加密报文数据流的时间进行划分，得到多个报文片段，从而按照正确的顺序重新组装报文。也可以根据加密报文数据流的长度进行划分得到多个报文片段，一般来说，可以对加密报文数据流进行均匀划分，也可以根据需求设定多个划分比例进行划分，本申请实施例对此不作具体限制。In some embodiments, the encrypted message data stream can be divided according to the time of transmission to obtain multiple message fragments, so that the message can be reassembled in the correct order. Multiple message fragments can also be obtained by dividing according to the length of the encrypted message data stream. Generally speaking, the encrypted message data stream can be evenly divided, or multiple dividing ratios can be set according to the needs. This application implements There are no specific restrictions on this.

可以理解的是，由于报文方向相同的加密报文数据包极大可能属于同一类别，不同报文方向的加密报文数据包一般属于不同类别，因此，对每个报文片段，可以根据每个加密报文数据包的报文方向进行划分，从而加快分类的效率，提高分类结果的准确性。例如，若在一个报文片段中，有6个加密报文数据包，按照时间顺序，加密报文数据包的报文方向如下：发送方向1、发送方向2、发送方向3、接收方向4、接收方向5、发送方向6。那么，相同方向的加密报文数据包为一组，可以分为三组，即发送方向1、发送方向2、发送方向3为一组，接收方向4、接收方向5为一组，发送方向6为一组。或者，还可以为每个报文片段设定一个截取时间，每达到预设的截取时间，截取到的加密报文数据包作为一个报文组，并对报文组中的不一致的报文方向的加密报文数据包进行排除，例如，若一个报文组内超过60%的加密报文数据包的报文方向均为发送方向，那么在这个报文组内，将报文方向为接收方向的加密报文数据包进行排除，或者指定报文组的报文方向，对其他报文方向的数据包进行排除，等等。It can be understood that since encrypted message data packets in the same message direction are most likely to belong to the same category, encrypted message data packets in different message directions generally belong to different categories. Therefore, for each message fragment, you can Divide the message directions of each encrypted message packet to speed up the classification efficiency and improve the accuracy of the classification results. For example, if there are 6 encrypted message packets in a message fragment, the message directions of the encrypted message data packets in chronological order are as follows: sending direction 1, sending direction 2, sending direction 3, receiving direction 4, Receive direction 5, send direction 6. Then, encrypted message packets in the same direction are one group, which can be divided into three groups, that is, sending direction 1, sending direction 2, and sending direction 3 are one group, receiving direction 4 and receiving direction 5 are one group, and sending direction 6 as a group. Alternatively, you can also set an interception time for each message fragment. Every time the preset interception time is reached, the intercepted encrypted message data packets are used as a message group, and inconsistent message directions in the message group are processed. Exclude encrypted message packets. For example, if more than 60% of the encrypted message packets in a message group are sent in the sending direction, then in this message group, the message direction is the receiving direction. Exclude encrypted message packets, or specify the message direction of the message group to exclude data packets in other message directions, etc.

步骤S103，针对每个报文片段，在报文片段对应的每个报文组中，根据每个报文组所包含的加密报文数据包的报文长度确定报文组子图像的高、报文数量确定报文组子图像的宽以及报文方向确定报文组子图像的方向，并根据多个报文组子图像生成片段图像。Step S103: For each message fragment, in each message group corresponding to the message fragment, determine the height and height of the message group sub-image according to the message length of the encrypted message data packet contained in each message group. The number of messages determines the width of the message group sub-image and the message direction determines the direction of the message group sub-image, and a fragment image is generated based on multiple message group sub-images.

可以理解的是，由于报文组的头部和尾部可能存在一些异常值或者离群值，为了避免这些异常值或者离群值对整体平均值造成影响，可以对这些异常值按照一定的百分比去除，如将每个报文组的加密报文数据包按照每个加密报文数据包的报文长度进行顺序排列或者逆序排列后，在头部（报文长度的最大值）和尾部（报文长度的最小值）按照一定的百分比，如5%去除，从而去掉报文组内部的异常值，提高计算得到的平均报文长度的准确性。It is understandable that since there may be some abnormal values or outliers at the head and tail of the packet group, in order to prevent these abnormal values or outliers from affecting the overall average, these outliers can be removed according to a certain percentage. , for example, after arranging the encrypted message data packets of each message group in order or in reverse order according to the message length of each encrypted message data packet, the header (the maximum value of the message length) and the tail (the message length The minimum value of the length) is removed according to a certain percentage, such as 5%, thereby removing outliers within the packet group and improving the accuracy of the calculated average packet length.

在一些实施例中，由于平均报文长度代表了报文组内所有加密报文数据包的平均长度，而首部字节一般代表了加密报文数据包的主要信息，因此可以根据平均报文长度确定每个加密报文数据包的首部字节提取比例，从而提取每个加密报文数据包的首部字节，提取到的首部字节的长度作为对应的报文组子图像的高。在一些实施例中，也可以不对首部字节进行提取，将平均值作为对应的报文组在报文组子图像的高，本申请实施例对此不作具体限制。In some embodiments, since the average message length represents the average length of all encrypted message packets in the message group, and the header byte generally represents the main information of the encrypted message packet, the average message length can be Determine the extraction ratio of the header byte of each encrypted message packet, thereby extracting the header byte of each encrypted message packet, and the length of the extracted header byte is used as the height of the corresponding message group sub-image. In some embodiments, the header bytes may not be extracted, and the average value may be used as the height of the corresponding packet group sub-image. The embodiments of the present application do not specifically limit this.

在一些实施例中，可以将每个报文组内所包含的各个加密报文数据包的报文数量作为报文组子图像的宽，例如，若一个报文组内有5个加密报文数据包，则报文组在对应的报文组子图像的宽的值为5，宽的值对应的单位根据实际情况设定。可以理解的是，由于每个报文组内包含的加密报文数据包的数量都可能不一样，因此，在片段图像中，每个报文组对应的图像表示的宽也可能不同。In some embodiments, the number of encrypted message packets contained in each message group can be used as the width of the message group sub-image. For example, if there are 5 encrypted messages in a message group packet, then the width value of the packet group sub-image in the corresponding packet group is 5, and the unit corresponding to the width value is set according to the actual situation. It can be understood that since the number of encrypted message packets contained in each message group may be different, in the fragment image, the width of the image corresponding to each message group may also be different.

可以理解的是，报文组子图像的颜色由剔除了异常值的报文组内的加密报文数据包生成，或者只根据报文组内的加密报文数据包的首部字节组成，不同的字节会对应不同的颜色，从而形成可视化的图像。可以理解的是，字节是已经进行编码的字节，可以根据字节映射到图像的像素值来形成片段图像的颜色。示例性地，可以将字节的编码转换为对应的红绿蓝（Red，Green and Blue，RGB）颜色值RGB颜色值。可以理解的是，由于首部字节一般能够表征对应的加密报文数据包的变化方式或者特征，因此，只需要提取加密报文数据包的首部字节绘制片段图像的颜色即可，从而使得绘制的片段图像更精确。It can be understood that the color of the message group sub-image is generated from the encrypted message packets in the message group with outliers, or it is only based on the header bytes of the encrypted message packets in the message group. Different The bytes will correspond to different colors to form a visual image. It can be understood that bytes are bytes that have been encoded, and the color of the fragment image can be formed based on the mapping of the bytes to the pixel values of the image. For example, the encoding of the bytes can be converted into corresponding red, green and blue (Red, Green and Blue, RGB) color values RGB color values. It can be understood that since the header byte can generally represent the change mode or characteristics of the corresponding encrypted message packet, it is only necessary to extract the header byte of the encrypted message packet to draw the color of the fragment image, so that the drawing The fragment image is more accurate.

在一些实施例中，可以根据每个报文组所包含的各个加密报文数据包的报文方向作为图像的朝向。报文方向的正向和反向可自行定义，如可以将加密报文数据流区分为发送方向和接收方向，并定义发送方向为正，接收方向为负，或者定义发送方向为正，接收方向为负，等等。在一些实施例中，可以定义发送方向为正，那么，报文方向为正的报文组对应的图形表示在坐标轴的正朝向，报文方向为负的报文组对应的图形表示在坐标轴的负朝向。In some embodiments, the orientation of the image may be based on the message direction of each encrypted message data packet included in each message group. The forward and reverse direction of the message can be defined by yourself. For example, you can divide the encrypted message data flow into the sending direction and the receiving direction, and define the sending direction as positive and the receiving direction as negative, or define the sending direction as positive and the receiving direction. is negative, etc. In some embodiments, the sending direction can be defined as positive. Then, the graphical representation corresponding to the message group with positive message direction is in the positive direction of the coordinate axis, and the graphical representation corresponding to the message group with negative message direction is in the coordinate axis. The negative orientation of the axis.

可以理解的是，在片段图像中，每个报文组对应的图形表示可以为柱形，也可以为长方形，等等，本申请对此不作具体限制。可以理解的是，根据每个报文组的报文长度、报文数量、加密报文数据包的字节和报文方向生成片段图像，可以将原本复杂的加密报文数据流转化为可视化的片段图像，从而得到直观的图像表示。It can be understood that in the fragment image, the graphical representation corresponding to each message group can be a columnar shape, a rectangular shape, etc., and this application does not specifically limit this. It can be understood that by generating fragment images based on the message length, number of messages, bytes of encrypted message packets and message direction of each message group, the originally complex encrypted message data stream can be transformed into a visual Fragment images to obtain an intuitive image representation.

请参照图3至图6，图3至图6为在不同的报文行为下的片段图像，图3为用户登录行为对应的片段图像，图4为用户登出行为对应的片段图像，图5为用户的语音请求对应的片段图像，图6为用户的视频请求对应的片段图像。在图3至图6中，可以直观地表现出不同的行为对应的片段图像也不相同，在图3至图6中，横轴代表时间，纵轴代表经过处理之后的报文长度，图像的颜色表示由首部字节映射得到，为了便于区分不同的颜色，选用不同的图例表示，在实际应用中，可以对图片的颜色进行显示。由图3至图6可知，将各个报文组按照时间顺序进行排列，并且根据发送方向和接收方向确定每个报文组子图像的朝向，可以直观地展示在不同时间段不同报文组的变化情况。Please refer to Figures 3 to 6. Figures 3 to 6 are fragment images under different message behaviors. Figure 3 is a fragment image corresponding to the user's login behavior. Figure 4 is a fragment image corresponding to the user's logout behavior. Figure 5 is the segment image corresponding to the user's voice request, and Figure 6 is the segment image corresponding to the user's video request. In Figures 3 to 6, it can be intuitively shown that the fragment images corresponding to different behaviors are also different. In Figures 3 to 6, the horizontal axis represents time, the vertical axis represents the length of the processed message, and the image The color representation is obtained by mapping the header byte. In order to easily distinguish different colors, different legend representations are selected. In practical applications, the color of the picture can be displayed. As can be seen from Figures 3 to 6, each message group is arranged in chronological order, and the orientation of the sub-image of each message group is determined according to the sending direction and receiving direction, which can visually display the different message groups in different time periods. Changes.

可以理解的是，加密报文数据流分类方法还包括根据多个同类型的片段图像进行汇总，从而根据历史的分类结果对片段图像进行分类。由于同类型的片段图像表示极有可能相同，因此，可以对已进行分类的图像进行规律分析，得到对应的类别的片段图像的图像规律，从而直接对片段图像进行分类。例如，可以分析用户的视频请求对应的片段图像的图像规律，在后续片段图像呈现相同的图像规律时，直接对片段图像分类为用户的视频请求，从而提高加密报文数据流报文行为的分类效率。It can be understood that the encrypted message data flow classification method also includes aggregating multiple fragment images of the same type, thereby classifying the fragment images based on historical classification results. Since the representations of fragment images of the same type are very likely to be the same, it is possible to perform regular analysis on the classified images to obtain the image patterns of the fragment images of the corresponding category, thereby directly classifying the fragment images. For example, the image pattern of the fragment image corresponding to the user's video request can be analyzed. When subsequent fragment images show the same image pattern, the fragment image can be directly classified as the user's video request, thereby improving the classification of encrypted message data flow message behavior. efficiency.

步骤S104，提取各个片段图像的图像特征，并依次将前一时刻图像特征的历史分类结果与当前时刻的图像特征输入到预设的报文分类网络中进行报文行为分类，得到加密报文数据流报文行为的分类结果。Step S104, extract the image features of each fragment image, and sequentially input the historical classification results of the image features at the previous moment and the image features at the current moment into the preset message classification network to classify the message behavior, and obtain the encrypted message data. Classification results of flow packet behavior.

在一些实施例中，由于加密报文数据流报文行为一般都有固定的数据包格式，一般只有联系历史分类结果，才能分析出数据包格式，从而对报文行为进行分类。因此，可以通过预先好的卷积神经网络对各个片段图像进行特征提取，之后，将提取到的图像特征与上一个加密报文数据流报文行为的历史分类结果一同输入至报文分类网络中进行报文行为分类，从而将历史的片段图像的分类信息与本次图像特征进行结合，充分考虑上下文的联系。In some embodiments, since encrypted message data stream message behavior generally has a fixed data packet format, generally only by contacting historical classification results can the data packet format be analyzed and the message behavior classified. Therefore, the features of each fragment image can be extracted through the pre-prepared convolutional neural network, and then the extracted image features and the historical classification results of the last encrypted message data flow message behavior are input into the message classification network. Carry out message behavior classification, thereby combining the classification information of historical fragment images with the characteristics of this image, fully considering the context.

请参照图7，在一些实施例中，步骤S102包括步骤S201至步骤S202：Referring to Figure 7, in some embodiments, step S102 includes steps S201 to S202:

步骤S201，在每个加密报文数据流中，将加密报文数据包按照时间顺序进行排列，并根据预设划分数量对加密报文数据包进行划分，得到多个报文片段。Step S201: In each encrypted message data stream, the encrypted message data packets are arranged in chronological order, and the encrypted message data packets are divided according to the preset number of divisions to obtain multiple message fragments.

可以理解的是，一般情况下，每个报文行为可以由多个连续的加密报文数据包形成，同时为了更容易地追踪和检查报文的传输和处理情况，因此，可以将加密报文数据包按照时间顺序进行连续排列，并根据预设划分数量对排列好的加密报文数据包进行划分，得到多个报文片段。在一些实施例中，预设划分数量为根据实际情况设定的划分数量，预设划分数量可以进行调节，例如，预设划分数量为10，则每10个加密报文数据包组成一个报文片段。It can be understood that, in general, each message behavior can be formed by multiple consecutive encrypted message packets. At the same time, in order to more easily track and check the transmission and processing of the message, the encrypted message can be The data packets are continuously arranged in chronological order, and the arranged encrypted message data packets are divided according to the preset number of divisions to obtain multiple message fragments. In some embodiments, the preset number of divisions is a number set according to the actual situation. The preset number of divisions can be adjusted. For example, if the preset number of divisions is 10, then every 10 encrypted message packets constitute one message. fragment.

在一些实施例中，可以根据预设划分时间对加密报文数据包进行划分，例如，每隔1秒对加密报文数据包进行划分，得到对应的报文片段，等等，本申请实施例对此不作具体限制。In some embodiments, the encrypted message data packet can be divided according to the preset dividing time. For example, the encrypted message data packet can be divided every 1 second to obtain corresponding message fragments, etc., the embodiments of the present application There are no specific restrictions on this.

步骤S202，在每个报文片段内，对每个加密报文数据包的报文方向进行区分，并将报文方向相同的加密报文数据包划分为同一个报文组，得到多个报文组。Step S202: In each message fragment, distinguish the message direction of each encrypted message data packet, and divide the encrypted message data packets with the same message direction into the same message group to obtain multiple messages. Article group.

可以理解的是，相同类型的报文方向往往一致，因此，可以将连续的、相同方向的加密报文数据包划分为同一个报文组，从而提高分类的效率。或者，也可以根据预设的划分数量对报文片段进行划分，得到多个初始报文组，并根据初始报文组的报文方向的比例，确定每个初始报文组需要排除的加密报文数据包，例如，报文方向为正向的加密报文数据包数量占报文组的60%，报文方向为负向的加密报文数据包数量占报文组的40%，由于60%大于40%，因此，在对应的报文组内将报文方向为负向的加密报文数据包进行排除。It is understandable that messages of the same type tend to have the same direction. Therefore, continuous encrypted message packets in the same direction can be divided into the same message group, thereby improving classification efficiency. Alternatively, you can also divide the message fragments according to the preset number of divisions to obtain multiple initial message groups, and determine the encrypted messages that need to be excluded for each initial message group based on the proportion of the message directions of the initial message group. For example, the number of encrypted message packets with forward message direction accounts for 60% of the message group, and the number of encrypted message data packets with negative message direction accounts for 40% of the message group. Since 60 % is greater than 40%, therefore, encrypted message packets with negative message direction are excluded from the corresponding message group.

在一些实施例中，可以对有两个报文方向的报文组指定一个报文方向，将报文方向与指定的报文方向不同的加密报文数据包进行筛选，最后，每个报文组内的加密报文数据包的报文方向均一致。In some embodiments, one message direction can be specified for a message group with two message directions, and encrypted message data packets whose message direction is different from the specified message direction can be filtered. Finally, each message The message directions of the encrypted message packets in the group are consistent.

请参照图8，在一些实施例中，确定报文组子图像的高包括步骤S301至步骤S304：Referring to Figure 8, in some embodiments, determining the height of the message group sub-image includes steps S301 to S304:

步骤S301，在每个报文组中，根据预设排除比例对报文组内的加密报文数据包进行筛选。Step S301: In each message group, filter the encrypted message data packets in the message group according to a preset exclusion ratio.

在一些实施例中，由于WireGuard或者其他加密方法是一种隧道封装模式的VPN协议，所以在去掉报文头部后，才能去掉封装部分，剩下的数据才是原始流量数据的经过加密处理的加密报文数据。因此，可以根据实际情况，设定预设的裁剪比例对报文组内的每个加密报文数据包的报文头部进行裁剪。In some embodiments, since WireGuard or other encryption methods are VPN protocols in tunnel encapsulation mode, the encapsulation part can be removed only after the message header is removed, and the remaining data is the encrypted version of the original traffic data. Encrypt message data. Therefore, according to the actual situation, a preset clipping ratio can be set to clip the message header of each encrypted message packet in the message group.

在一些实施例中，可以设定预设排除比例对报文组内的加密报文数据包进行筛选。示例性地，可以设定预设排除比例为5%，报文组内有100个加密报文数据包，那么，可以在报文组内，根据报文长度对加密报文数据包进行按照由长至短或者由短至长进行排序，之后，在头部和尾部分别去除组内加密报文数据包总数量的5%，即在头部去除5个加密报文数据包，在尾部去除5个加密报文数据包，以去除最大值和最小值，排除异常值对分类结果造成影响的可能性。In some embodiments, a preset exclusion ratio can be set to filter encrypted message packets in the message group. For example, the preset exclusion ratio can be set to 5%, and there are 100 encrypted message packets in the message group. Then, the encrypted message data packets can be classified according to the message length in the message group. Sort from longest to shortest or from shortest to longest, and then remove 5% of the total number of encrypted message packets in the group from the head and tail respectively, that is, remove 5 encrypted message packets from the head and 5 from the tail. encrypted message packets to remove the maximum and minimum values and eliminate the possibility of outliers affecting the classification results.

步骤S302，根据筛选后各个加密报文数据包的报文长度进行相加后除以报文组内的加密报文数据包的报文数量，得到平均报文长度。Step S302: Add the message lengths of each encrypted message packet after filtering and divide by the number of encrypted message packets in the message group to obtain the average message length.

在一些实施例中，对每个报文组内的各个加密报文数据包进行筛选之后，可以将所有加密报文数据包进行相加，得到的和除以对应的报文组内的加密报文数据包的数量，得到平均报文长度，以便于后续绘制片段图像。在一些实施例中，可以将平均报文长度直接作为报文组子图像的高。In some embodiments, after filtering each encrypted message data packet in each message group, all encrypted message data packets can be added up, and the obtained sum is divided by the encrypted message data packets in the corresponding message group. The number of message packets is calculated to obtain the average message length for subsequent drawing of fragment images. In some embodiments, the average message length can be directly used as the height of the message group sub-image.

步骤S303，根据平均报文长度乘以预设提取比例后得到的参照长度，与报文组内的每个加密报文数据包的报文长度进行比较，得到比较结果。Step S303: Compare the reference length obtained by multiplying the average message length by the preset extraction ratio with the message length of each encrypted message packet in the message group to obtain a comparison result.

在一些实施例中，预设提取比例是可以进行调节的提取每个加密报文数据包的有效字节的比例。可以理解的是，为了使得绘制的片段图像去除冗余部分，直接根据有效部分生成片段图像，可以设定预设提取比例对加密报文数据包的有效字节进行提取。示例性地，可以设定预设提取比例为60%，那么，可以将平均报文长度乘以预设提取比例后得到的参照长度，与每个加密报文数据包的报文长度进行比较，从而确定是否可以根据参照长度对每个加密报文数据包进行完整的提取。In some embodiments, the preset extraction ratio is an adjustable ratio of extracting valid bytes of each encrypted message packet. It can be understood that in order to remove redundant parts of the drawn fragment image and directly generate the fragment image based on the effective part, a preset extraction ratio can be set to extract the effective bytes of the encrypted message packet. For example, the preset extraction ratio can be set to 60%. Then, the reference length obtained by multiplying the average message length by the preset extraction ratio can be compared with the message length of each encrypted message packet. This determines whether each encrypted message packet can be completely extracted based on the reference length.

步骤S304，根据比较结果确定报文组子图像的高。Step S304: Determine the height of the message group sub-image according to the comparison result.

在一些实施例中，若比较结果表征参照长度均小于每个加密报文数据包的报文长度，则说明可以根据参照长度对每个加密报文数据包进行提取。而若是比较结果表征参照长度大于某个加密报文数据包的报文长度，则说明根据参照长度对每个加密报文数据包进行提取，会存在提取到的加密报文数据包的字节长度不足参照长度的情况，从而影响片段图像的绘制，因此，可以选取报文组内报文长度最短的加密报文数据包的报文长度与参照长度进行比较，参照长度大于某个加密报文数据包的报文长度，则将报文长度最短的加密报文数据包的报文长度作为报文组子图像的高，从而确保报文组内每个加密报文数据包提取的长度均相等。In some embodiments, if the comparison result indicates that the reference length is smaller than the message length of each encrypted message data packet, it means that each encrypted message data packet can be extracted according to the reference length. If the comparison result indicates that the reference length is greater than the message length of an encrypted message packet, it means that each encrypted message packet is extracted based on the reference length, and there will be a byte length of the extracted encrypted message packet. The reference length is insufficient, thus affecting the drawing of the fragment image. Therefore, the message length of the encrypted message packet with the shortest message length in the message group can be selected to compare with the reference length. The reference length is greater than a certain encrypted message data. If the message length of the packet is specified, the message length of the encrypted message packet with the shortest message length will be used as the height of the message group sub-image, thereby ensuring that the extracted length of each encrypted message packet in the message group is equal.

请参照图9，在一些实施例中，步骤S304包括步骤S401至步骤S402：Referring to Figure 9, in some embodiments, step S304 includes steps S401 to S402:

步骤S401，若比较结果表征报文组内不存在报文长度短于参照长度的加密报文数据包，则将参照长度作为报文组子图像的高。Step S401: If the comparison result indicates that there is no encrypted message packet in the message group with a message length shorter than the reference length, then the reference length is used as the height of the message group sub-image.

步骤S402，若比较结果表征报文组内存在报文长度短于参照长度的加密报文数据包，将对应的加密报文数据包的报文长度作为报文组子图像的高。Step S402: If the comparison result indicates that there are encrypted message packets in the message group whose message length is shorter than the reference length, the message length of the corresponding encrypted message data packet is used as the height of the message group sub-image.

在一些实施例中，可以将平均报文长度乘以预设提取比例，以计算得到报文组内每个加密报文数据包的首部字节的长度，并将首部字节的长度作为报文组子图像的高。In some embodiments, the average message length can be multiplied by the preset extraction ratio to calculate the length of the header byte of each encrypted message packet in the message group, and the length of the header byte is used as the message The height of the group subimage.

在一些实施例中，报文组子图像的高计算公式如下：In some embodiments, the height calculation formula of the packet group sub-image is as follows:

其中，M表示报文组子图像的高，表示平均报文长度，/>表示任意一个加密报文数据包的长度，/>表示预设比例，预设比例可以设置为60%或者其它比例。Among them, M represents the height of the message group sub-image, Indicates the average message length,/> Represents the length of any encrypted message packet,/> Indicates the preset ratio, which can be set to 60% or other ratios.

可以理解的是，在上述公式中，若报文组内的平均报文长度乘以预设提取比例之后得到的参照长度小于所有加密报文数据包的长度，表明可以对报文组内所有加密报文数据包进行首部字节的提取，不会存在加密报文数据包提取到的报文长度不足参照长度的情况。若参照长度大于报文组内最短的报文长度，则将长度最短的加密报文数据包的长度作为图像的高，以确保报文组内的每个加密报文数据包提取的字节长度一致，并且报文组内的每个加密报文数据包均能够提取到相同长度的字节。举个例子，若平均报文长度乘以预设比例之后等于7，但是报文组内存在加密报文数据包的长度为6和4，那么，无法从长度为4的加密报文数据包上提取7，则将4作为图像的高。It can be understood that in the above formula, if the reference length obtained after multiplying the average message length in the message group by the preset extraction ratio is less than the length of all encrypted message packets, it indicates that all encrypted messages in the message group can be The header bytes of the message data packet are extracted, and there will be no situation where the length of the message extracted from the encrypted message data packet is less than the reference length. If the reference length is greater than the shortest message length in the message group, the length of the encrypted message packet with the shortest length will be used as the height of the image to ensure that the byte length extracted from each encrypted message packet in the message group is Consistent, and each encrypted message packet in the message group can extract bytes of the same length. For example, if the average packet length multiplied by the preset ratio is equal to 7, but the lengths of encrypted packets in the packet group are 6 and 4, then the encrypted packet length of 4 cannot be obtained. Extract 7, then use 4 as the height of the image.

请参照图10，在一些实施例中，加密报文数据流分类方法还包括步骤S501至步骤S502：Referring to Figure 10, in some embodiments, the encrypted message data flow classification method further includes steps S501 to S502:

步骤S501，获取每个报文片段对应的筛选加密报文数据包后的多个报文组。Step S501: Obtain multiple message groups corresponding to each message fragment after filtering and encrypting the message data packets.

在一些实施例中，筛选加密报文数据包后的多个报文组即为按照预设排除比例筛除了异常值的报文组。预设排除比例可以设置为5%、10%等等，具体可以根据加密报文数据包的报文长度进行顺序排列，可以分别在排列后的头部或者尾部或者头部和尾部分别根据预设排除比例筛除加密报文数据包。示例性地，若报文组内部有100个加密报文数据包，预设排除比例为5%，那么，按照顺序对加密报文数据包进行排列之后，假设头部为报文长度最短的加密报文数据包，尾部为报文长度最长的加密报文数据包，那么可以在头部裁剪5个加密报文数据包，再在尾部裁剪5个加密报文数据包。之后，在报文片段包括的所有报文组均裁剪完毕之后，可以根据报文组所包含的加密报文数据包确定报文组子图像的颜色，从而通过片段图像展示报文组内的加密报文数据包的变化信息，以便于后续进行特征提取和分类。或者，在对加密报文数据包进行筛选之后，求得每个报文组的平均报文长度，再根据平均报文长度乘以预设提取比例提取报文的首部字节，根据提取到的每个加密报文数据包的首部字节生成报文组子图像的颜色。In some embodiments, filtering the multiple message groups after encrypted message data packets are the message groups that exclude outliers according to a preset exclusion ratio. The preset exclusion ratio can be set to 5%, 10%, etc. Specifically, it can be arranged sequentially according to the length of the encrypted message packets. The header or tail or the header and the tail can be arranged according to the preset The exclusion ratio filters out encrypted message packets. For example, if there are 100 encrypted message packets in the message group and the preset exclusion ratio is 5%, then, after arranging the encrypted message packets in order, it is assumed that the header is the one with the shortest message length. Message data packets, the tail is the encrypted message data packet with the longest message length, then 5 encrypted message data packets can be trimmed at the head, and then 5 encrypted message data packets can be trimmed at the tail. Later, after all message groups included in the message fragment are cropped, the color of the message group sub-image can be determined based on the encrypted message data packets contained in the message group, so that the encryption within the message group can be displayed through the fragment image. Change information of message packets to facilitate subsequent feature extraction and classification. Or, after filtering the encrypted message packets, obtain the average message length of each message group, and then extract the header byte of the message based on the average message length multiplied by the preset extraction ratio. The header bytes of each encrypted message packet generate the color of the message group subimage.

步骤S502，根据每个报文组所包含的加密报文数据包确定报文组子图像的颜色。Step S502: Determine the color of the message group sub-image based on the encrypted message data packets contained in each message group.

在一些实施例中，可以根据每个报文组提取的首部字节，生成对应的报文组的图像颜色。在一些实施例中，可以将首部字节映射到RGB分量的取值范围内。具体地，可以将首部字节的值作为RGB分量的数值。例如，如果首部字节的值是100，那么可以将R、G和B的值都设置为100，从而生成一个灰色的像素。在一些实施例中，可以将首部字节的值按照一定的规则映射到RGB分量的取值范围内。例如，可以将首部字节的值除以255，得到一个0到1之间的小数，然后将这个小数乘以255，得到一个新的值，作为RGB分量的数值，从而保证生成的颜色在整个RGB颜色空间中分布均匀。In some embodiments, the image color of the corresponding message group can be generated based on the extracted header bytes of each message group. In some embodiments, the header byte may be mapped to the value range of the RGB component. Specifically, the value of the header byte can be used as the value of the RGB component. For example, if the value of the header byte is 100, then the R, G, and B values can all be set to 100, resulting in a gray pixel. In some embodiments, the value of the header byte can be mapped to the value range of the RGB component according to certain rules. For example, you can divide the value of the header byte by 255 to get a decimal between 0 and 1, and then multiply this decimal by 255 to get a new value as the value of the RGB component, thereby ensuring that the generated color is consistent throughout the Evenly distributed in RGB color space.

进一步，对每个报文组的各个加密报文数据包生成的图像颜色进行加和，得到每个报文组绘制在片段图像的颜色，从而清晰、准确地展示报文组对应的图像的变化规律。Furthermore, the image colors generated by each encrypted message packet of each message group are added to obtain the color of each message group drawn in the fragment image, thereby clearly and accurately displaying the changes in the image corresponding to the message group. law.

请参照图11，在一些实施例中，步骤S202包括步骤S601至步骤S602：Referring to Figure 11, in some embodiments, step S202 includes steps S601 to S602:

步骤S601，在每个报文片段内，将每个加密报文数据包的报文方向区分为发送方向或者接收方向。Step S601: In each message fragment, the message direction of each encrypted message data packet is divided into a sending direction or a receiving direction.

步骤S602，按照时间顺序将连续的发送方向对应的加密报文数据包或者连续的接收方向对应的加密报文数据包进行划分，得到按照时间顺序排列的多个报文组。Step S602: Divide the encrypted message data packets corresponding to the continuous sending direction or the encrypted message data packets corresponding to the continuous receiving direction in time order to obtain multiple message groups arranged in time order.

在一些实施例中，报文方向包括发送方向和接收方向。可以对各个加密报文数据包的报文方向进行分析，将所有加密报文数据包均区分为发送方向或者接收方向。In some embodiments, the message direction includes a sending direction and a receiving direction. The message direction of each encrypted message packet can be analyzed, and all encrypted message packets can be divided into the sending direction or the receiving direction.

示例性地，若按照时间顺序得到的报文片段中各个加密报文数据包的报文方向分别为加密报文数据包1：发送方向、加密报文数据包2：发送方向、加密报文数据包3：接收方向、加密报文数据包4：接收方向、加密报文数据包4：接收方向。那么，划分结果为：加密报文数据包1、加密报文数据包2为一个报文组，加密报文数据包3、加密报文数据包4、加密报文数据包5为一个报文组。For example, if the message directions of each encrypted message packet in the message fragments obtained in chronological order are encrypted message packet 1: sending direction, encrypted message packet 2: sending direction, encrypted message data Packet 3: receiving direction, encrypted message packet 4: receiving direction, encrypted message packet 4: receiving direction. Then, the division result is: encrypted message packet 1, encrypted message packet 2 is a message group, encrypted message packet 3, encrypted message packet 4, encrypted message packet 5 is a message group .

可以理解的是，按照时间顺序对连续且报文方向相同的加密报文数据包进行，最终生成的片段图像可以表示报文传输的趋势，有利于对报文行为进行更加准确的分类。It can be understood that by processing consecutive encrypted message packets with the same message direction in chronological order, the final generated fragment image can represent the trend of message transmission, which is conducive to more accurate classification of message behavior.

请参照图12，在一些实施例中，提取各个片段图像的图像特征，包括步骤S701至步骤S702：Referring to Figure 12, in some embodiments, extracting image features of each segment image includes steps S701 to S702:

步骤S701，加载预先训练好的卷积神经网络。Step S701: Load the pre-trained convolutional neural network.

在一些实施例中，可以用卷积神经网络提取各个片段图像的图像特征，也可以用其它特征提取网络提取各个片段图像的图像特征。卷积神经网络预先经过大量的片段图像组成的数据集进行训练得到，具有良好的特征提取能力。In some embodiments, a convolutional neural network can be used to extract the image features of each segment image, or other feature extraction networks can be used to extract the image features of each segment image. The convolutional neural network is trained in advance on a data set composed of a large number of fragment images, and has good feature extraction capabilities.

步骤S702，将各个片段图像按照时间顺序依次输入至卷积神经网络进行特征提取，得到片段图像对应的图像特征。Step S702: Each segment image is input to the convolutional neural network in chronological order for feature extraction to obtain image features corresponding to the segment image.

在一些实施例中，可以将各个片段图像按照时间顺序依次输入至卷积神经网络进行持征提取，捕捉到片段图像的变化信息和重要的特征信息，得到每个片段图像的图像特征，从而便于后续对图像特征进行分类，提高分类的效率。In some embodiments, each segment image can be input to the convolutional neural network in chronological order for feature extraction, capturing the change information and important feature information of the segment image, and obtaining the image features of each segment image, thereby facilitating Subsequently, the image features are classified to improve the efficiency of classification.

请参照图13，在一些实施例中，步骤S104包括步骤S801至步骤S802：Referring to Figure 13, in some embodiments, step S104 includes steps S801 to S802:

步骤S801，获取前一时刻的图像特征输入至报文分类网络之后得到的历史分类结果。Step S801: Obtain the historical classification results obtained after the image features at the previous moment are input to the message classification network.

在一些实施例中，报文分类网络可以为长短期记忆网络(Long Short-TermMemory，LSTM），报文分类网络可以对序列数据进行处理，从而捕捉输入数据的时序依赖性。可以理解的是，历史分类结果可能对当前时刻的分类有重要影响，因此，为了建立图像特征的上下文联系，可以获取前一时刻的图像特征的历史分类结果。In some embodiments, the packet classification network may be a Long Short-Term Memory (LSTM) network, and the packet classification network may process sequence data to capture the temporal dependence of the input data. It is understandable that the historical classification results may have an important impact on the classification at the current moment. Therefore, in order to establish the contextual connection of the image features, the historical classification results of the image features at the previous moment can be obtained.

步骤S802，依次将每个当前时刻的图像特征与对应的历史分类结果输入至预设的报文分类网络中进行报文行为分类，直至向报文分类网络输入加密报文数据流的最后一个图像特征，得到加密报文数据流报文行为的分类结果。Step S802, sequentially input the image features and corresponding historical classification results of each current moment into the preset message classification network to classify the message behavior, until the last image of the encrypted message data stream is input to the message classification network. Features to obtain classification results of encrypted message data flow message behavior.

可以理解的是，在处理加密报文数据流时，由于数据的加密性质，很多信息可能无法直接获取。但是利用历史分类结果和当前时刻的图像特征间的关系，可以填补部分信息缺失，从而提高报文行为分类的准确性。It is understandable that when processing encrypted message data streams, much information may not be directly accessible due to the encrypted nature of the data. However, using the relationship between historical classification results and image features at the current moment can fill in some missing information, thereby improving the accuracy of message behavior classification.

示例性地，若加密报文数据流有5个报文片段，每个报文片段对应1个图像特征，共有5个图像特征，则根据图像特征的时间顺序，将第一个图像特征，即图像特征1输入至报文分类网络，得到分类结果1；将分类结果1与图像特征2输入至报文分类网络，得到分类结果2；将分类结果2与图像特征3输入至报文分类网络，得到分类结果3；将分类结果3与图像特征4输入至报文分类网络，得到分类结果4；将分类结果4与图像特征5输入至报文分类网络，得到加密报文数据流的分类结果。For example, if the encrypted message data stream has 5 message fragments, each message fragment corresponds to 1 image feature, and there are 5 image features in total, then according to the chronological order of the image features, the first image feature, that is, Image feature 1 is input into the message classification network, and classification result 1 is obtained; classification result 1 and image feature 2 are input into the message classification network, and classification result 2 is obtained; classification result 2 and image feature 3 are input into the message classification network, Obtain classification result 3; input classification result 3 and image feature 4 to the message classification network, and obtain classification result 4; input classification result 4 and image feature 5 to the message classification network, and obtain the classification result of the encrypted message data stream.

在一些实施例中，可以通过报文分类网络将前一时刻的图像特征和历史分类结果作为输入，预测当前时刻图像特征的分类结果，从而充分利用历史分类结果和图像特征之间的时序关系，提高图像分类的准确性。In some embodiments, the image features and historical classification results at the previous moment can be used as input through the message classification network to predict the classification results of the image features at the current moment, thereby making full use of the temporal relationship between the historical classification results and image features. Improve image classification accuracy.

可以理解的是，报文分类网络可以包括输入门、遗忘门和输出门。下面对输入门、遗忘门和输出门进行介绍：输入门决定了有多少来自输入数据的信息可以进入记忆单元，输入门通过一个激活函数来控制输入数据的重要性，输入门的输出结果与输入数据进行元素级别的相乘，从而选择性地将重要的信息输入到记忆单元。遗忘门决定了哪些历史的记忆需要被遗忘，遗忘门通过一个激活函数来控制之前的记忆的重要性，遗忘门的输出结果与之前的记忆进行元素级别的相乘，从而选择性地遗忘一些不重要的记忆。输出门决定了记忆单元中的信息在当前时间步的输出，输出门通过一个激活函数来控制记忆单元中的信息的重要性，并通过将记忆单元中的信息映射到一个合适的范围，输出门的输出结果与记忆单元的信息进行元素级别的相乘，从而选择性地输出记忆单元中的信息。It can be understood that the message classification network may include an input gate, a forget gate and an output gate. The input gate, forget gate and output gate are introduced below: the input gate determines how much information from the input data can enter the memory unit. The input gate controls the importance of the input data through an activation function. The output result of the input gate is the same as The input data are multiplied at the element level, thereby selectively inputting important information into the memory unit. The forgetting gate determines which historical memories need to be forgotten. The forgetting gate controls the importance of the previous memory through an activation function. The output of the forgetting gate is multiplied at the element level with the previous memory, thereby selectively forgetting some inappropriate memories. Important memories. The output gate determines the output of the information in the memory unit at the current time step. The output gate controls the importance of the information in the memory unit through an activation function and maps the information in the memory unit to an appropriate range. The output gate The output result is multiplied at the element level with the information in the memory unit, thereby selectively outputting the information in the memory unit.

请参照图14，在一些实施例中，步骤S802包括步骤S901至步骤S905：Referring to Figure 14, in some embodiments, step S802 includes steps S901 to step S905:

步骤S901，依次将每个当前时刻的图像特征输入至报文分类网络，通过报文分类网络的输入门从图像特征中确定要保留的第一分类信息。Step S901: Input the image features of each current moment into the packet classification network in turn, and determine the first classification information to be retained from the image features through the input gate of the packet classification network.

在一些实施例中，报文分类网络可以通过输入门来控制当前时刻的输入信息对于记忆的影响，从而选择性地记住重要的图像特征。将每个当前时刻的图像特征输入到报文分类网络中，通过输入门确定要保留的图像特征的第一分类信息，可以从当前时刻的图像特征中提取出与报文分类相关的信息，以便后续的分类处理。In some embodiments, the packet classification network can control the impact of input information at the current moment on memory through input gates, thereby selectively remembering important image features. Input the image features of each current moment into the packet classification network, and determine the first classification information of the image features to be retained through the input gate. Information related to packet classification can be extracted from the image features of the current moment, so that subsequent classification processing.

步骤S902，将第一分类信息和历史分类结果输入至报文分类网络的遗忘门，并从第一分类信息和历史分类结果中确定需要保留的第二分类信息。Step S902: Input the first classification information and historical classification results to the forgetting gate of the message classification network, and determine the second classification information that needs to be retained from the first classification information and historical classification results.

在一些实施例中，遗忘门是长短期记忆网络(Long Short-Term Memory，LSTM）中的一个重要组成部分，用于决定在当前时刻需要保留哪些信息。通过将第一分类信息和历史分类结果输入至遗忘门，可以根据网络的学习能力和门控机制，决定需要保留的第二分类信息。在遗忘门中，根据输入的第一分类信息和历史分类结果进行计算和决策，确定需要保留的第二分类信息。这样，第二分类信息就会被传递到下一层或下一个时间步，继续进行分类任务。可以理解的是，将第一分类信息和历史分类结果输入到报文分类网络的遗忘门，可以结合当前时刻的第一分类信息和之前的历史分类结果，从中提取出更全面的分类信息，以便更准确地进行分类。In some embodiments, the forgetting gate is an important component of the Long Short-Term Memory network (Long Short-Term Memory, LSTM) and is used to decide what information needs to be retained at the current moment. By inputting the first classification information and historical classification results into the forgetting gate, the second classification information that needs to be retained can be determined based on the network's learning ability and gating mechanism. In the forgetting gate, calculations and decisions are made based on the input first classification information and historical classification results to determine the second classification information that needs to be retained. In this way, the second classification information will be passed to the next layer or next time step to continue the classification task. It can be understood that by inputting the first classification information and historical classification results into the forgetting gate of the message classification network, the first classification information at the current moment and the previous historical classification results can be combined to extract more comprehensive classification information, so as to Classify more accurately.

步骤S903，对历史分类结果与第二分类信息进行加权，得到加权分类信息。Step S903: Weight the historical classification results and the second classification information to obtain weighted classification information.

在一些实施例中，报文分类网络可以通过输出门来确定输出的权重，而将历史分类结果与当前分类信息进行加权处理，得到更加准确的分类信息。In some embodiments, the packet classification network can determine the output weight through an output gate, and perform weighting processing on historical classification results and current classification information to obtain more accurate classification information.

可以理解的是，报文分类网络可以根据历史分类结果和第二分类信息的贡献度，自动调整加权的权重，或者，加权的权重也可以根据需要由技术人员进行调节，之后，根据调整的权重对历史分类结果和第二分类信息进行加权处理，从而得到更合理的分类信息。可以理解的是，加权分类信息可以指用于区分不同类型的特征的权重或标签。It can be understood that the packet classification network can automatically adjust the weighting based on the historical classification results and the contribution of the second classification information, or the weighting can also be adjusted by technical personnel as needed, and then, based on the adjusted weight The historical classification results and the second classification information are weighted to obtain more reasonable classification information. It can be understood that weighted classification information may refer to weights or labels used to distinguish different types of features.

步骤S904，将加权分类信息输入至输出门进行筛选，得到当前时刻加密报文数据流报文行为的当前分类结果。Step S904: Input the weighted classification information to the output gate for screening, and obtain the current classification result of the message behavior of the encrypted message data flow at the current moment.

可以理解的是，输出门是神经网络中的一种机制，用于对输入信号进行筛选和控制。通过将加权分类信息输入至输出门，可以对不同类型的报文进行筛选。在输出门中，可以根据输入的加权分类信息和门控机制，对加密报文数据流进行筛选，从而得到当前时刻加密报文数据流报文行为的当前分类结果。It can be understood that the output gate is a mechanism in a neural network used to filter and control input signals. By inputting weighted classification information to the output gate, different types of messages can be filtered. In the output gate, the encrypted message data flow can be filtered based on the input weighted classification information and gating mechanism, thereby obtaining the current classification results of the encrypted message data flow message behavior at the current moment.

步骤S905，继续向报文分类网络输入加密报文数据流未分类的图像特征，直至向报文分类网络输入加密报文数据流的最后一个图像特征，输出加密报文数据流报文行为的分类结果。Step S905, continue to input the unclassified image features of the encrypted message data stream to the message classification network until the last image feature of the encrypted message data stream is input to the message classification network, and output the classification of the message behavior of the encrypted message data stream. result.

可以理解的是，可以按照时间顺序，继续将未分类的图像特征输入到报文分类网络，每一个图像特征与上一时刻的历史分类结果一同输入至报文分类网络，得到当前的分类结果再在下一时刻与图像特征一同输入至报文分类网络，直至输入加密报文数据流的最后一个图像特征，并通过报文分类网络输出加密报文数据流报文行为的分类结果，从而既确保了每个图像特征都被纳入分类过程，又考虑了图像特征的上下文联系，从而提高报文行为分类结果的效率和准确性。It can be understood that unclassified image features can continue to be input to the packet classification network in chronological order. Each image feature and the historical classification result of the previous moment can be input to the packet classification network to obtain the current classification result. At the next moment, it is input to the message classification network together with the image features until the last image feature of the encrypted message data stream is input, and the classification result of the message behavior of the encrypted message data stream is output through the message classification network, thus ensuring that Each image feature is included in the classification process, and the contextual connection of the image features is considered, thereby improving the efficiency and accuracy of the message behavior classification results.

请参照图15，在一些实施例中，报文分类网络通过以下步骤S1001至步骤S1004训练得到：Referring to Figure 15, in some embodiments, the packet classification network is trained through the following steps S1001 to S1004:

步骤S1001，获取多个加密报文数据流，并根据多个加密报文数据流组成训练数据集。Step S1001: Obtain multiple encrypted message data streams, and form a training data set based on the multiple encrypted message data streams.

在一些实施例中，为了提高报文分类网络的分类能力，可以获取大量的加密报文数据流组成训练数据集，对报文分类网络进行训练。示例性地，可以将加密报文数据流划分为训练集、验证集和测试集，以便于后续对报文分类网络进行训练和参数的调节。In some embodiments, in order to improve the classification capability of the packet classification network, a large number of encrypted packet data streams can be obtained to form a training data set to train the packet classification network. For example, the encrypted message data stream can be divided into a training set, a verification set, and a test set to facilitate subsequent training and parameter adjustment of the message classification network.

步骤S1002，对训练数据集中的每个加密报文数据流进行预处理，得到多个报文片段以及报文片段对应的片段图像。Step S1002: Preprocess each encrypted message data stream in the training data set to obtain multiple message fragments and fragment images corresponding to the message fragments.

在一些实施例中，对每个加密报文数据流进行预处理，具体可以按照时间的先后顺序，将加密报文数据流划分成多个报文片段，并将每个报文片段划分为多个报文组，之后，针对每个报文片段，根据每个报文组所包含的各个加密报文数据包的报文长度作为报文组子图像的高、根据每个报文组所包含的各个加密报文数据包的报文数量作为报文组子图像的宽、根据每个报文组所包含的各个加密报文数据包的字节作为报文组子图像的颜色、根据每个报文组所包含的各个加密报文数据包的报文方向作为报文组子图像的朝向，生成报文片段对应的片段图像，详细处理过程的实施例已在上文展开，对此不予赘述。In some embodiments, each encrypted message data stream is preprocessed. Specifically, the encrypted message data stream can be divided into multiple message fragments in chronological order, and each message fragment can be divided into multiple message fragments. message group, and then, for each message fragment, according to the message length of each encrypted message packet contained in each message group as the height of the message group sub-image, according to the length of each message group contained The number of each encrypted message packet is used as the width of the message group sub-image, and the bytes of each encrypted message packet contained in each message group is used as the color of the message group sub-image. The message direction of each encrypted message data packet contained in the message group is used as the orientation of the sub-image of the message group to generate a fragment image corresponding to the message fragment. The embodiment of the detailed processing process has been unfolded above and will not be discussed here. Repeat.

步骤S1003，提取各个片段图像的图像特征，并依次将前一时刻图像特征的历史分类结果与当前时刻的图像特征输入到预设的报文分类网络中进行报文行为分类，得到加密报文数据流报文行为的第一分类结果。Step S1003, extract the image features of each fragment image, and sequentially input the historical classification results of the image features at the previous moment and the image features at the current moment into the preset message classification network to classify the message behavior, and obtain the encrypted message data. The first classification result of flow packet behavior.

可以理解的是，提取图像特征可以捕捉到图像的空间特征，有助于区分不同的报文行为。将历史分类结果与当前时刻的图像特征输入到报文分类网络中，可以考虑到各个报文片段的上下文联系，从而根据上下文信息识别出固定的数据包格式，有助于报文分类网络进行快速分类。It can be understood that extracting image features can capture the spatial characteristics of the image and help distinguish different message behaviors. Inputting the historical classification results and the image features at the current moment into the packet classification network can take into account the contextual connections of each packet fragment, thereby identifying the fixed packet format based on the context information, which helps the packet classification network to quickly Classification.

步骤S1004，根据预设的损失函数计算第一分类结果的损失值，并根据损失值对报文分类网络进行参数调整，得到训练好的报文分类网络。Step S1004: Calculate the loss value of the first classification result according to the preset loss function, and adjust the parameters of the packet classification network based on the loss value to obtain a trained packet classification network.

在一些实施例中，可以根据第一分类结果与验证集中正确的分类结果进行对比，并通过预设的损失函数计算第一分类结果的损失值，根据损失值对报文分类网络的参数进行调节，并再次对报文分类网络进行训练，在训练的过程中，不断调整报文分类网络的参数，直到报文分类网络收敛或者达到预设的训练次数，则表明报文分类网络已经训练完毕，此时，得到训练好的报文分类网络。可以理解的是，损失函数为一般的损失函数，可以根据需要自行设置。In some embodiments, the first classification result can be compared with the correct classification result in the verification set, the loss value of the first classification result can be calculated through a preset loss function, and the parameters of the packet classification network can be adjusted based on the loss value. , and train the packet classification network again. During the training process, continuously adjust the parameters of the packet classification network until the packet classification network converges or reaches the preset training times, which indicates that the packet classification network has been trained. At this point, the trained packet classification network is obtained. It can be understood that the loss function is a general loss function and can be set as needed.

请参照图16，在一些实施例中，通过图16，对本申请对加密报文数据流分类方法进行总体介绍。Please refer to Figure 16. In some embodiments, through Figure 16, the application's method for classifying encrypted message data flows is generally introduced.

示例性地，可以获取加密报文数据流，并按照时间将加密报文数据流切分成多个报文片段，之后，再根据报文方向确定报文组。在每个报文组中，将报文组内的加密报文数据包的数量作为报文组图形表示的宽，并计算报文组内的加密报文数据包的平均长度，对平均长度的主要字节进行提取，将提取的主要字节作为报文组子图像表示的长，之后，再对报文组内的加密报文数据包进行首部字节提取，将提取到的首部字节转换为报文组子图像表示的颜色，再将报文方向作为报文组子图像表示的方向，从而根据多个报文组子图像绘制对应的报文片段的片段图像。进一步，对片段图像进行特征提取后，依次将提取到的图像特征输入至报文分类网络中进行分类。进一步，在对图像特征进行分类时，按照时间顺序将历史分类结果与当前图像特征一同输入至报文分类网络，得到当前分类结果，并在加密报文数据流的最后一个图像特征与历史分类结果一同输入至报文分类网络之后，得到加密报文数据流报文行为的分类结果。For example, the encrypted message data stream can be obtained, and the encrypted message data stream can be divided into multiple message fragments according to time, and then the message group can be determined according to the message direction. In each message group, the number of encrypted message packets in the message group is used as the width of the graphic representation of the message group, and the average length of the encrypted message packets in the message group is calculated. For the average length Extract the main bytes, and use the extracted main bytes as the length of the message group sub-image representation. After that, extract the header bytes of the encrypted message packets in the message group, and use the extracted header bytes Convert to the color represented by the message group sub-image, and then use the message direction as the direction represented by the message group sub-image, so as to draw the fragment image of the corresponding message fragment based on multiple message group sub-images. Further, after extracting features from the fragment images, the extracted image features are sequentially input into the message classification network for classification. Furthermore, when classifying image features, the historical classification results and the current image features are input to the message classification network in chronological order to obtain the current classification results, and the last image feature and historical classification results of the encrypted message data stream are After being input into the packet classification network together, the classification results of the packet behavior of the encrypted packet data flow are obtained.

请参阅图17，本申请实施例还提供一种加密报文数据流分类系统，可以实现上述加密报文数据流分类方法，加密报文数据流分类系统包括：Please refer to Figure 17. This embodiment of the present application also provides an encrypted message data flow classification system, which can implement the above encrypted message data flow classification method. The encrypted message data flow classification system includes:

加密报文数据流获取模块1701，用于获取加密报文数据流，其中，加密报文数据流包括连续的多个加密报文数据包；The encrypted message data stream acquisition module 1701 is used to obtain the encrypted message data stream, where the encrypted message data stream includes multiple consecutive encrypted message data packets;

报文组划分模块1702，用于按照时间的先后顺序，将加密报文数据流划分成多个报文片段，并按照加密报文数据包的报文方向将每个报文片段划分为多个报文组；其中，每个报文组内的加密报文数据包的报文方向一致；The message group dividing module 1702 is used to divide the encrypted message data stream into multiple message fragments in time order, and divide each message fragment into multiple message fragments according to the message direction of the encrypted message data packet. Message group; where the encrypted message packets in each message group have the same message direction;

片段图像生成模块1703，用于针对每个报文片段，在报文片段对应的每个报文组中，根据每个报文组所包含的加密报文数据包的报文长度确定报文组子图像的高、报文数量确定报文组子图像的宽以及报文方向确定报文组子图像的方向，并根据多个报文组子图像生成片段图像；The fragment image generation module 1703 is used for each message fragment, in each message group corresponding to the message fragment, to determine the message group according to the message length of the encrypted message data packet contained in each message group. The height of the sub-image and the number of messages determine the width of the message group sub-image and the message direction determines the direction of the message group sub-image, and generates fragment images based on multiple message group sub-images;

分类结果获取模块1704，用于提取各个片段图像的图像特征，并依次将前一时刻图像特征的历史分类结果与当前时刻的图像特征输入到预设的报文分类网络中进行报文行为分类，得到加密报文数据流报文行为的分类结果。The classification result acquisition module 1704 is used to extract the image features of each fragment image, and sequentially input the historical classification results of the image features at the previous moment and the image features at the current moment into the preset message classification network to classify the message behavior. Obtain the classification results of the encrypted message data flow message behavior.

该加密报文数据流分类系统的具体实施方式与上述加密报文数据流分类方法的具体实施例基本相同，在此不再赘述。在满足本申请实施例要求的前提下，加密报文数据流分类系统还可以设置其他功能模块，以实现上述实施例中的加密报文数据流分类方法。The specific implementation of the encrypted message data flow classification system is basically the same as the specific embodiment of the above-mentioned encrypted message data flow classification method, and will not be described again here. On the premise of meeting the requirements of the embodiments of this application, the encrypted message data flow classification system can also be provided with other functional modules to implement the encrypted message data flow classification method in the above embodiment.

本申请实施例还提供了一种电子设备，电子设备包括存储器和处理器，存储器存储有计算机程序，处理器执行计算机程序时实现上述加密报文数据流分类方法。该电子设备可以为包括平板电脑、车载电脑等任意智能终端。An embodiment of the present application also provides an electronic device. The electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, it implements the above encrypted message data flow classification method. The electronic device can be any smart terminal including a tablet computer, a vehicle-mounted computer, etc.

请参阅图18，图18示意了另一实施例的电子设备的硬件结构，电子设备包括：Please refer to Figure 18. Figure 18 illustrates the hardware structure of an electronic device according to another embodiment. The electronic device includes:

处理器1801，可以采用通用的CPU(CentralProcessingUnit，中央处理器)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本申请实施例所提供的技术方案；The processor 1801 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute relevant programs to implement The technical solutions provided by the embodiments of this application;

存储器1802，可以采用只读存储器（ReadOnlyMemory，ROM）、静态存储设备、动态存储设备或者随机存取存储器(RandomAccessMemory，RAM)等形式实现。存储器1802可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器1802中，并由处理器1801来调用执行本申请实施例的加密报文数据流分类方法；The memory 1802 can be implemented in the form of read-only memory (ReadOnlyMemory, ROM), static storage device, dynamic storage device, or random access memory (RandomAccessMemory, RAM). The memory 1802 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 1802 and called by the processor 1801 to execute the implementation of this application. Example of encrypted message data flow classification method;

输入/输出接口1803，用于实现信息输入及输出；Input/output interface 1803, used to implement information input and output;

通信接口1804，用于实现本设备与其他设备的通信交互，可以通过有线方式（例如USB、网线等）实现通信，也可以通过无线方式（例如移动网络、WIFI、蓝牙等）实现通信；Communication interface 1804 is used to realize communication interaction between this device and other devices. Communication can be achieved through wired methods (such as USB, network cables, etc.) or wireless methods (such as mobile network, WIFI, Bluetooth, etc.);

总线1805，在设备的各个组件（例如处理器1801、存储器1802、输入/输出接口1803和通信接口1804）之间传输信息；Bus 1805, which transmits information between various components of the device (such as processor 1801, memory 1802, input/output interface 1803, and communication interface 1804);

其中处理器1801、存储器1802、输入/输出接口1803和通信接口1804通过总线1805实现彼此之间在设备内部的通信连接。The processor 1801, the memory 1802, the input/output interface 1803 and the communication interface 1804 implement communication connections between each other within the device through the bus 1805.

本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机程序，该计算机程序被处理器执行时实现上述加密报文数据流分类方法。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the above encrypted message data flow classification method is implemented.

存储器作为一种非暂态计算机可读存储介质，可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外，存储器可以包括高速随机存取存储器，还可以包括非暂态存储器，例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中，存储器可选包括相对于处理器远程设置的存储器，这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer executable programs. In addition, the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, the memory may optionally include memory located remotely from the processor, and the remote memory may be connected to the processor via a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.

本申请实施例描述的实施例是为了更加清楚的说明本申请实施例的技术方案，并不构成对于本申请实施例提供的技术方案的限定，本领域技术人员可知，随着技术的演变和新应用场景的出现，本申请实施例提供的技术方案对于类似的技术问题，同样适用。The embodiments described in the embodiments of the present application are to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. Those skilled in the art will know that with the evolution of technology and new technologies, As application scenarios arise, the technical solutions provided by the embodiments of this application are also applicable to similar technical problems.

本领域技术人员可以理解的是，图中示出的技术方案并不构成对本申请实施例的限定，可以包括比图示更多或更少的步骤，或者组合某些步骤，或者不同的步骤。Those skilled in the art can understand that the technical solutions shown in the figures do not limit the embodiments of the present application, and may include more or fewer steps than those shown in the figures, or combine certain steps, or different steps.

以上所描述的装置实施例仅仅是示意性的，其中作为分离部件说明的单元可以是或者也可以不是物理上分开的，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separate, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof.

本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe specific objects. Sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

应当理解，在本申请中，“至少一个(项)”和“若干”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，用于描述关联对象的关联关系，表示可以存在三种关系，例如，“A和/或B”可以表示：只存在A，只存在B以及同时存在A和B三种情况，其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达，是指这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a，b或c中的至少一项(个)，可以表示：a，b，c，“a和b”，“a和c”，“b和c”，或“a和b和c”，其中a，b，c可以是单个，也可以是多个。It should be understood that in this application, "at least one (item)" and "several" refer to one or more, and "plurality" refers to two or more. "And/or" is used to describe the relationship between associated objects, indicating that there can be three relationships. For example, "A and/or B" can mean: only A exists, only B exists, and A and B exist simultaneously. , where A and B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship. “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items). For example, at least one item (item) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c" ”, where a, b, c can be single or multiple.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统和方法，可以通过其它的方式实现。例如，以上所描述的系统实施例仅仅是示意性的，例如，上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the system embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括多指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等各种可以存储程序的介质。Integrated units may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc. that can store programs. medium.

以上参照附图说明了本申请实施例的优选实施例，并非因此局限本申请实施例的权利范围。本领域技术人员不脱离本申请实施例的范围和实质内所作的任何修改、等同替换和改进，均应在本申请实施例的权利范围之内。The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, but this does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall be within the scope of rights of the embodiments of the present application.

Claims

1. An encrypted message data stream classification method, which is characterized by comprising the following steps:

obtaining an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets;

dividing the encrypted message data stream into a plurality of message fragments according to the time sequence, and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent;

for each message Wen Pianduan, in each message group corresponding to the message segment, determining the height and the number of the sub-images of the message group according to the message length of the encrypted message data packet contained in each message group, determining the width of the sub-images of the message group and the direction of the message, determining the direction of the sub-images of the message group, and generating a segment image according to a plurality of sub-images of the message group;

Extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a classification result of the message behaviors of the encrypted message data stream.

2. The method for classifying encrypted message data streams according to claim 1, wherein the dividing the encrypted message data streams into a plurality of message segments according to the time sequence, and dividing each of the message segments into a plurality of message groups according to the message direction of the encrypted message data packets, comprises:

in each encrypted message data stream, arranging the encrypted message data packets according to a time sequence, and dividing the encrypted message data packets according to a preset dividing number to obtain a plurality of message fragments;

and in each message Wen Pianduan, distinguishing the message direction of each encrypted message data packet, and dividing the encrypted message data packets with the same message direction into the same message group to obtain a plurality of message groups.

3. The method for classifying an encrypted packet data stream according to claim 1, wherein determining the height of the packet sub-image includes:

Screening the encrypted message data packets in the message groups according to a preset exclusion proportion in each message group;

dividing the message length of each encrypted message data packet after screening by the message number of the encrypted message data packets in the message group after adding to obtain an average message length;

the reference length obtained by multiplying the average message length by a preset extraction proportion is compared with the message length of each encrypted message data packet in the message group to obtain a comparison result;

and determining the height of the sub-images of the message group according to the comparison result.

4. The method for classifying an encrypted packet data stream according to claim 3, wherein determining the height of the packet sub-image according to the comparison result comprises:

if the comparison result indicates that the encrypted message data packet with the message length shorter than the reference length does not exist in the message group, the reference length is taken as the height of the sub-image of the message group;

and if the comparison result represents that the encrypted message data packet with the message length shorter than the reference length exists in the message group, taking the message length of the corresponding encrypted message data packet as the height of the sub-image of the message group.

5. A method of classifying an encrypted message data stream according to claim 3, wherein the method further comprises:

acquiring a plurality of message groups corresponding to each message segment after screening the encrypted message data packet;

and determining the color of the sub-images of the message group according to the encrypted message data packet contained in each message group.

6. The method for classifying an encrypted message data stream according to claim 2, wherein the message direction includes a transmission direction and a reception direction; the distinguishing the message direction of each encrypted message data packet in each message Wen Pianduan, and dividing the encrypted message data packets with the same message direction into the same message group, to obtain a plurality of message groups, includes:

dividing the message direction of each encrypted message data packet into a sending direction or a receiving direction in each message Wen Pianduan;

dividing the encrypted message data packets corresponding to the continuous sending directions or the encrypted message data packets corresponding to the continuous receiving directions according to the time sequence to obtain a plurality of message groups which are arranged according to the time sequence.

7. The method for classifying an encrypted message data stream according to claim 1, wherein the extracting image features of each of the segment images includes:

loading a pre-trained convolutional neural network;

and sequentially inputting the fragment images into the convolutional neural network according to a time sequence to perform feature extraction, so as to obtain image features corresponding to the fragment images.

8. The method for classifying encrypted message data streams according to claim 1, wherein the step of sequentially inputting the historical classification result of the image feature at the previous time and the image feature at the current time into a preset message classification network to classify the message behaviors to obtain the classification result of the message behaviors of the encrypted message data streams includes:

acquiring a history classification result obtained after the image characteristic at the previous moment is input to the message classification network;

and sequentially inputting the image characteristics at each current moment and the corresponding historical classification results into a preset message classification network to classify the message behaviors until the last image characteristic of the encrypted message data stream is input into the message classification network, so as to obtain the classification results of the message behaviors of the encrypted message data stream.

9. The method for classifying encrypted message data streams according to claim 8, wherein the sequentially inputting the image feature and the corresponding historical classification result at each current time into a preset message classification network to classify the message behaviors until the last image feature of the encrypted message data stream is input into the message classification network, to obtain the classification result of the message behaviors of the encrypted message data stream, includes:

sequentially inputting the image characteristics at each current moment into the message classification network, and determining first classification information to be reserved from the image characteristics through an input gate of the message classification network;

inputting the first classification information and the historical classification result to a forgetting gate of the message classification network, and determining second classification information to be reserved from the first classification information and the historical classification result;

weighting the historical classification result and the second classification information to obtain weighted classification information;

the weighted classification information is input to an output gate for screening, and a current classification result of the encrypted message data stream message behavior at the current moment is obtained;

And continuing to input the image features of the encrypted message data stream which are not classified into the message classification network until the last image feature of the encrypted message data stream is input into the message classification network, and outputting a classification result of the message behavior of the encrypted message data stream.

10. The method for classifying encrypted message data streams according to claim 9, characterized in that the message classification network is trained by the following steps:

acquiring a plurality of encrypted message data streams, and forming a training data set according to the encrypted message data streams;

preprocessing each encrypted message data stream in the training data set to obtain a plurality of message fragments and fragment images corresponding to the message fragments;

extracting image characteristics of each fragment image, and sequentially inputting a historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify message behaviors, so as to obtain a first classification result of the message behaviors of the encrypted message data stream;

and calculating a loss value of the first classification result according to a preset loss function, and carrying out parameter adjustment on the message classification network according to the loss value to obtain a trained message classification network.

11. An encrypted message data stream classification system, the system comprising:

the encrypted message data stream acquisition module is used for acquiring an encrypted message data stream, wherein the encrypted message data stream comprises a plurality of continuous encrypted message data packets;

the message group dividing module is used for dividing the encrypted message data stream into a plurality of message fragments according to the time sequence and dividing each message fragment into a plurality of message groups according to the message direction of the encrypted message data packet; wherein the message directions of the encrypted message data packets in each message group are consistent;

a segment image generating module, configured to determine, for each message Wen Pianduan, in each message group corresponding to the message segment, a height of a sub-image of the message group according to a message length of the encrypted message data packet included in each message group, determine a width of the sub-image of the message group according to a number of messages, and determine a direction of the sub-image of the message group according to the direction of the message, and generate a segment image according to a plurality of sub-images of the message group;

the classification result acquisition module is used for extracting the image characteristics of each fragment image, and sequentially inputting the historical classification result of the image characteristics at the previous moment and the image characteristics at the current moment into a preset message classification network to classify the message behaviors, so as to obtain the classification result of the message behaviors of the encrypted message data stream.

12. An electronic device comprising a memory storing a computer program and a processor implementing the method of classifying an encrypted message data stream according to any one of claims 1 to 10 when the computer program is executed by the processor.

13. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the encrypted message data stream classification method of any one of claims 1 to 10.